[GENERAL] Perfomance of IN-clause with many elements and possible solutions

Поиск

Список

Период

Сортировка

От	dilaz03 .
Тема	[GENERAL] Perfomance of IN-clause with many elements and possible solutions
Дата	23 июля 2017 г. 17:35:24
Msg-id	CALUfWrYp0YFHJ4Rijtb0z_ktYyy51N5N_jsqz0or8LmCWFu=HA@mail.gmail.com обсуждение исходный текст
Ответы	Re: [GENERAL] Perfomance of IN-clause with many elements andpossible solutions Re: [GENERAL] Perfomance of IN-clause with many elements and possible solutions Re: [GENERAL] Perfomance of IN-clause with many elements and possible solutions
Список	pgsql-general

Дерево обсуждения

Hello.

I have database with events with type from different souces identified by id. I have query which filters events by IN-clause with many ids (1-500 ids). I see poor perfomance of IN-clause and try to investigate this problem.

SELECT version();

version

-------------------------------------------------------------------------------------------------------------------

PostgreSQL 10beta2 on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 5.4.0-6ubuntu1~16.04.4) 5.4.0 20160609, 64-bit

-- Full table can fit in memory

show shared_buffers;

shared_buffers

----------------

2GB

show work_mem;

work_mem

----------

16MB

SET max_parallel_workers_per_gather TO 0;

SET max_parallel_workers TO 0;

-- Create table with 10 000 000 rows with 500 bigints

CREATE TABLE ids AS SELECT trunc(random() * 500)::bigint as id from generate_series(1, 10000000);

-- IN (...)

SELECT ('(' || string_agg(id::text, ',') || ')') AS in_clause

FROM (SELECT id FROM ids GROUP BY id ORDER BY id LIMIT 200) AS s \gset

EXPLAIN (ANALYZE, BUFFERS) SELECT count(*) FROM ids WHERE id IN :in_clause;

Aggregate (cost=2654265.02..2654265.03 rows=1 width=8) (actual time=17268.831..17268.831 rows=1 loops=1)

Buffers: shared hit=44248

-> Seq Scan on ids (cost=0.00..2644260.48 rows=4001815 width=0) (actual time=0.066..16722.072 rows=3998646 loops=1)

Filter: (id = ANY ('{0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127,128,129,130,131,132,133,134,135,136,137,138,139,140,141,142,143,144,145,146,147,148,149,150,151,152,153,154,155,156,157,158,159,160,161,162,163,164,165,166,167,168,169,170,171,172,173,174,175,176,177,178,179,180,181,182,183,184,185,186,187,188,189,190,191,192,193,194,195,196,197,198,199}'::bigint[]))

Rows Removed by Filter: 6001354

Buffers: shared hit=44248

Planning time: 3.324 ms

Execution time: 17268.907 ms

-- IN (VALUES ...)

SELECT ('(VALUES ' || string_agg('(' || id::text || ')', ',') || ')') AS values_clause

FROM (SELECT id FROM ids GROUP BY id ORDER BY id LIMIT 200) AS s \gset

EXPLAIN (ANALYZE, BUFFERS) SELECT count(*) FROM ids WHERE id IN :values_clause;

Aggregate (cost=245006.46..245006.47 rows=1 width=8) (actual time=4086.188..4086.188 rows=1 loops=1)

Buffers: shared hit=44248

-> Hash Join (cost=7.50..235006.42 rows=4000019 width=0) (actual time=0.978..3557.037 rows=3998646 loops=1)

Hash Cond: (ids.id = "*VALUES*".column1)

Buffers: shared hit=44248

-> Seq Scan on ids (cost=0.00..144248.48 rows=10000048 width=8) (actual time=0.031..1138.542 rows=10000000 loops=1)

Buffers: shared hit=44248

-> Hash (cost=5.00..5.00 rows=200 width=4) (actual time=0.923..0.923 rows=200 loops=1)

Buckets: 1024 Batches: 1 Memory Usage: 16kB

-> HashAggregate (cost=3.00..5.00 rows=200 width=4) (actual time=0.606..0.759 rows=200 loops=1)

Group Key: "*VALUES*".column1

-> Values Scan on "*VALUES*" (cost=0.00..2.50 rows=200 width=4) (actual time=0.003..0.330 rows=200 loops=1)

Planning time: 1.094 ms

Execution time: 4086.333 ms

-- '...'::hstore ? id

SELECT ('''' || string_agg(id::text || '=>NULL', ',') || '''::hstore') AS hstore_clause

FROM (SELECT id FROM ids GROUP BY id ORDER BY id LIMIT 200) AS s \gset

EXPLAIN (ANALYZE, BUFFERS) SELECT count(*) FROM ids WHERE :hstore_clause ? id::text;

Planning time: 0.206 ms

Execution time: 5032.794 ms

-- '...'::jsonb ? id

SELECT ('''{' || string_agg('"' || id::text || '": null', ',') || '}''::jsonb') AS jsonb_clause

FROM (SELECT id FROM ids GROUP BY id ORDER BY id LIMIT 200) AS s \gset

EXPLAIN (ANALYZE, BUFFERS) SELECT count(*) FROM ids WHERE :jsonb_clause ? id::text;

Planning time: 0.114 ms

Execution time: 9277.307 ms

IN-VALUES clause has the bestest perfomance. So I have some questions:

- May be exist better solution?

- Does PostgreSQL have support of hashset structure? Extension (I don't found)?

- IN-VALUES clause adds new node to plan. Has additional node big overhead? How about filter by two or more IN-VALUES clause?

Thanks.

В списке pgsql-general по дате отправления:

Предыдущее

От: Greg Atkins
Дата: 23 июля 2017 г., 12:59:16
Сообщение: Re: [GENERAL] pg_dump not dropping event trigger

Следующее

От: Tom Lane
Дата: 23 июля 2017 г., 21:48:49
Сообщение: Re: [GENERAL] pg_dump not dropping event trigger

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

[GENERAL] Perfomance of IN-clause with many elements and possible solutions

Предыдущее

Следующее