Re: Abnormal JSON query performance

Поиск
Список
Период
Сортировка
От Pavel Stehule
Тема Re: Abnormal JSON query performance
Дата
Msg-id CAFj8pRD0ghtsnB_7=LwZyTDdMz=EbiOR+eruhRduXmpyv4gQyQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Abnormal JSON query performance  (007reader@gmail.com)
Ответы Re: Abnormal JSON query performance  (007reader@gmail.com)
Список pgsql-bugs


2018-05-13 19:16 GMT+02:00 <007reader@gmail.com>:
You are right. I tried similar queries for the array data type retrieving multiple selected array elements and the query time increases as the number of requested array elements increases. 
This is a very disappointing fact. Documentation and multiple articles on the internet promote ability to extract specific JSON keys like “select field1->field2->>key1, field2->>key2, ...”, but as turns out to be very inefficient. The design seems to favor retrieval of the entire json document and processing on the application side.

PostgreSQL is not JSON or XML database, that breaks any document to graph and works on some graph. Unfortunately I don't know any database, that does it. Searching in very complex graph is available only in memory databases.

PostgreSQL 11 allows index only scan over GIN index, if I remember well. Maybe it can help for you.
 

Is it possible add a note to the documentation articulating this important detail? I’ll ask someone to write a blog about this deficiency ...

"JSON data is subject to the same concurrency-control considerations as any other data type when stored in a table. Although storing large documents is practicable, keep in mind that any update acquires a row-level lock on the whole row. Consider limiting JSON documents to a manageable size in order to decrease lock contention among updating transactions. Ideally, JSON documents should each represent an atomic datum that business rules dictate cannot reasonably be further subdivided into smaller datums that could be modified independently."

This is part of documentation - and should be mentioned, so JSON or JSONB document is read/write as one value every time. So if you use long documents and usually read few fields, then seq scan will be very ineffective, and index scan for more rows will be expensive too.

Regards

Pavel


 

-Michael

On May 12, 2018, at 12:36 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:



2018-05-12 9:31 GMT+02:00 <007reader@gmail.com>:
Sorry again,  autocorrect kills me. I use JSONB, not JSON. Serving the entire document for jsonb doesn’t make much sense. If jsonb pre-paresed,  selecting and assembling a few keys must be faster than assembling all of them.
 
jsonb is preparsed, but that is all. PostgreSQL can't to store one value to separate blocks. It can't to do for any type - json(b) is not a exception.

Regards

Pavel




On May 11, 2018, at 10:03 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:

Hi

2018-05-12 6:32 GMT+02:00 <007reader@gmail.com>:
I observed performance anomaly with JSON that I believe is incorrect in version 10, although didn’t test earlier ones.
Here is a test table:
create table test(
  id integer primary key,
  data json
);

A simple query works fine on a table of 92,000 records with ~60KB average record size.

select data->>key1 from test limit 1000;

The query above takes about 155 ms. If I add additional field to the list of fields, execution time increases linearly - retrieving 2 fields take ~310 ms, 3 fields ~ 465 ms and so on. The explain statement for 5 fields looks normal

Limit  (cost=0.00..34.10 rows=1000 width=160) (actual time=1.783..813.643 rows=1000 loops=1)
   ->  Seq Scan on loans  (cost=0.00..3165.85 rows=92834 width=160) (actual time=1.753..805.629 rows=1000 loops=1)
 Planning time: 8.546 ms
 Execution time: 817.874 ms

Looks like every record gets parsed separately for each key... Unless I’m doing something incorrectly, this makes retrieval of individual key completely useless for practical cases. It’d leave only unattractive alternative of getting the entire JSON document. In my case, instead of retrieving a few hundred byte, it would force getting 60KB for each query...

It is not a bug. This behave is expected.

Json values are stored as plain text in PostgreSQL. Every time if it is necessary it is parsed. If need fast access to one specific field, then you can use functional index

CREATE INDEX ON test(data->>key1);

You can try JSONB type - it is like Json, but it is preparsed, so access to some fields should be faster. Again, there can be used functional index, or special JSONB indexes for faster searching.

Don't forget - PostgreSQL is relational databases - the sequential access to data will be most fast, when data will be stored in relational form - any attribute is stored in one specific column. No atomic data types are more dynamic, but access is significantly slower - it is not important for hundreds values, but it is clean when you read more than thousands values, ..

p.s. index is your friend

Regards

Pavel


 

-Bob Jones



В списке pgsql-bugs по дате отправления:

Предыдущее
От: 007reader@gmail.com
Дата:
Сообщение: Re: Abnormal JSON query performance
Следующее
От: Alexey Bashtanov
Дата:
Сообщение: Cannot create an aggregate function with variadic parameters andenabled for parallel execution