[PERFORM] Speeding up JSON + TSQUERY + GIN

Поиск

Список

Период

Сортировка

От	Sven R. Kunze
Тема	[PERFORM] Speeding up JSON + TSQUERY + GIN
Дата	26 февраля 2017 г. 16:28:10
Msg-id	2260e4e6-89ae-90dc-278c-81243b3c3510@mail.de обсуждение
Ответы	Re: [PERFORM] Speeding up JSON + TSQUERY + GIN Re: [PERFORM] Speeding up JSON + TSQUERY + GIN
Список	pgsql-performance

Дерево обсуждения

Hello everyone,I am currently evaluating the possibility of using PostgreSQL for storing and querying jsonb+tsvector queries. Let's consider this setup: create table docs (id serial primary key, meta jsonb); # generate 10M entries, cf. appendix create index docs_meta_idx ON docs using gin (meta jsonb_path_ops);create index docs_name_idx ON docs using gin (to_tsvector('english', meta->>'name')); create index docs_address_idx ON docs using gin (to_tsvector('english', meta->>'address')); Testing around with some smaller datasets, functionality-wise it's great. However increasing to 10M, things tend to slow down (using PostgreSQL 9.5): explain analyze select id from docs where meta @> '{"age": 20}'; Planning time: 0.121 ms Execution time: 4873.507 ms explain analyze select id from docs where meta @> '{"age": 20}'; Planning time: 0.122 ms Execution time: 206.289 ms explain analyze select id from docs where meta @> '{"age": 30}'; Planning time: 0.109 ms Execution time: 7496.886 ms explain analyze select id from docs where meta @> '{"age": 30}'; Planning time: 0.114 ms Execution time: 1169.649 ms explain analyze select id from docs where to_tsvector('english', meta->>'name') @@ to_tsquery('english', 'john'); Planning time: 0.179 ms Execution time: 10109.375 msexplain analyze select id from docs where to_tsvector('english', meta->>'name') @@ to_tsquery('english', 'john'); Planning time: 0.188 ms Execution time: 238.854 ms Using "select pg_prewarm('docs');" and on any of the indexes doesn't help either. After a "systemctl stop postgresql.service && sync && echo 3 > /proc/sys/vm/drop_caches && systemctl start postgresql.service" the age=20, 30 or name=john queries are slow again. Is there a way to speed up or to warm up things permanently?Regards,SvenAppendix I: example json: {"age": 20, "name": "Michelle Hernandez", "birth": "1991-08-16", "address": "94753 Tina Bridge Suite 318\\nEmilyport, MT 75302"} Appendix II: The Python script to generate fake json data. Needs "pip install faker". >>> python fake_json.py > test.json # generates 2M entries; takes some time >>> cat test.json | psql -c 'copy docs (meta) from stdin'>>> cat test.json | psql -c 'copy docs (meta) from stdin'
>>> cat test.json | psql -c 'copy docs (meta) from stdin'>>> cat test.json | psql -c 'copy docs (meta) from stdin'>>> cat test.json | psql -c 'copy docs (meta) from stdin' -- fake_json.py -- import faker, json; fake = faker.Faker(); for i in range(2*10**6): print(json.dumps({"name": fake.name(), "birth": fake.date(), "address": fake.address(), "age": fake.random_int(0,100)}).replace('\\n', '\\\\n'))

В списке pgsql-performance по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

[PERFORM] Speeding up JSON + TSQUERY + GIN