Комментарии - danforth - Профиль вебмастера - Форум об интернет-маркетинге

To filter the data?

13 июня 2020, 15:30

timo-71:
It can relate to the problem of cost, taking into account the first phrase of the theme:

Well, it is as if, yes, but on the other hand why then tags in data science - it is not clear)

timo-71:
For XML for PHP even have the funds that do not ship the entire file into memory. Logs - csv row readable .a in general, yes, I agree that if there is to read streaming tools JSON is good. "Doubtful," this I mean that I would be in JSON files such data would not be stored

Well, usually no one important in the JSON or store. Usually, there is a suspended file, which is written the flow of some events, and these events have to read and parse. Option to read the entire file into memory and parse into an array, for obvious reasons, not always possible. What I mean is that if someone seems too easy task, for example, you can apply a bit of skill and a little bit to read the data differently :) After all the data is not always needed, and all at once, as in this example. We do not need to find duplicates, etc.

Let's wait Sly32, posomtrite as it solves the problem

Как отфильтровать данные?

13 июня 2020, 14:43

timo-71:
Иметь такой JSON, идея сама по себе сомнительная. Если это файл, то да, ваше решение наверное единственное здесь, которое как то решит задачу.

А в чем тогда сложность задачи? Если у нас 20 строк, мы их распаковали в массив, отфильтровали и... на этом все. Тот же array_filter или filter/reduce в JS. В чем интерес? Где потоковое чтение? В метках темы sly32 написал data science, только им тут и не пахнет. Пока что это junior tasks.

Ну и кстати, идея не сомнительная. Обычно такие файлы остаются от логов Nginx, которые нужно на лету парсить и строить аналитику по ним. Или например прислали выгрузку для синхронизации товарных остатков, там не редкость те же XML файлы на пару гигов. Или сайтмеп например распарсить у сайта. Примеров уйма.

To filter the data?

13 июня 2020, 14:43

timo-71:
Have the JSON, the idea itself is questionable. If it is a file, then yes, your solution is probably the only thing here is how it will solve the problem.

And what's the complexity of the problem? If we have 20 lines, we unpacked them into an array, filtered, and ... that's it. The same array_filter or filter / reduce in the JS. What is the interest? Where streaming reading? The labels sly32 threads written data science, only for them here and does not smell. While that is a junior tasks.

Oh and by the way, the idea is not questionable. Typically, these files remain on Nginx logs that you want to fly to parse and build analytics on them. Or sent for example to synchronize unloading cash balances, it is not uncommon same XML files for a couple of gigs. Or parse XML sitemap for example at the site. A lot of examples.

Как отфильтровать данные?

13 июня 2020, 14:15

Sly32, а ты как это решишь, свою же задачу? Вот у тебя есть JSON файл с твоим содержимым, допустим файл на 10ГБ и там пару десятком миллионов строк:


[
  {
    "id": 0,
    "media_category": "clip"
  },
  {
    "id": 1,
    "media_category": "promo"
  },
  {
    "id": 2,
    "media_category": "promo"
  },
  {
    "id": 3,
    "media_category": "start"
  },
  {
    "id": 3,
    "media_category": "video"
  },
  {
    "id": 3,
    "media_category": "anime"
  },
  {
    "id": 3,
    "media_category": "promo"
  },
  {
    "id": 4,
    "media_category": "clip"
  },
  {
    "id": 4,
    "meda_category": "promo"
  },
  {
    "id": 6,
    "media_category": "xxx"
  }
]

Условия те же, покажи как ты откроешь файл и отфильтруешь строки

To filter the data?

13 июня 2020, 14:15

Sly32, and you decide how, his same task? Here you have the JSON file with your content, for example a file to 10GB, and there a couple of tens of millions of lines:


 [
  { 
 "Id": 0, 
 "Media_category": "clip" 
 } 
 { 
 "Id": 1, 
 "Media_category": "promo" 
 } 
 { 
 "Id": 2 
 "Media_category": "promo" 
 } 
 { 
 "Id": 3 
 "Media_category": "start" 
 } 
 { 
 "Id": 3 
 "Media_category": "video" 
 } 
 { 
 "Id": 3 
 "Media_category": "anime" 
 } 
 { 
 "Id": 3 
 "Media_category": "promo" 
 } 
 { 
 "Id": 4 
 "Media_category": "clip" 
 } 
 { 
 "Id": 4 
 "Meda_category": "promo" 
 } 
 { 
 "Id": 6 
 "Media_category": "xxx" 
 } 
 ]

Conditions are the same, show me how you open the file and line otfiltruesh

Как отфильтровать данные?

13 июня 2020, 12:21

Sly32, перечитай я там выше писал, у тебя не JSON. Как в изначальной задаче, так и в примере что должно получится.

To filter the data?

13 июня 2020, 12:21

Sly32, re-read it, I wrote above, you did not JSON. As in the original problem, and in the example of what should happen.

Как отфильтровать данные?

13 июня 2020, 12:15

Sly32:
как ты этой строкой планируешь обработать исходные данные в виде списка json-ов?

У тебя не список JSONов, начнем с того. У тебя не валидный JSON. И на выходе тоже не валидный JSON.

To filter the data?

13 июня 2020, 12:15

Sly32:
how you plan to handle this string of the original data as a json-s list?

You do not list JSONov, to begin with. You have no valid JSON. And the output is also not valid JSON.

Как отфильтровать данные?

13 июня 2020, 12:14

Sly32:
мне нужно чтобы формат данных не поменялся?

Ты меня спрашиваешь?) ответ можно как угодно формировать.

Что делать, чтобы попасть в ответы Google Bard

Курс биткоина превысил $50 тысяч

danforth