Re: Have I b0rked something? Slow comparisons on "where x in (...)"
От | Listmail |
---|---|
Тема | Re: Have I b0rked something? Slow comparisons on "where x in (...)" |
Дата | |
Msg-id | op.trrgen13zcizji@apollo13 обсуждение исходный текст |
Ответ на | Re: Have I b0rked something? Slow comparisons on "where x in (...)" (Richard Huxton <dev@archonet.com>) |
Список | pgsql-general |
> Try creating a temporary table, populating with the list and joining > against it. That's probably your best bet for a long list of target > values. Check : forum_bench=> CREATE TABLE test (value INTEGER NOT NULL); CREATE TABLE forum_bench=> INSERT INTO test SELECT * FROM generate_series( 1, 1000000 ); INSERT 0 1000000 forum_bench=> ANALYZE test; forum_bench=> EXPLAIN ANALYZE SELECT * FROM test; QUERY PLAN --------------------------------------------------------------------------------------------------------------- Seq Scan on test (cost=0.00..14405.24 rows=999924 width=4) (actual time=0.030..349.699 rows=1000000 loops=1) Total runtime: 542.914 ms (2 lignes) OK : 542 ms to grab the data. IN() : EXPLAIN ANALYZE SELECT * FROM test WHERE value IN ( 1000 values from 0 to 999000 in steps of 1000 ): Seq Scan on test (cost=0.00..1264310.24 rows=1000 width=4) (actual time=17.649..17977.085 rows=999 loops=1) Filter: (value = ANY ('{0,1000..........99000}'::integer[])) Total runtime: 17978.061 ms Ouch. forum_bench=> EXPLAIN ANALYZE SELECT * FROM test WHERE value IN (VALUES (0),(1000),(2000),....................(998000),(999000)); QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------- Hash Join (cost=19.50..18176.45 rows=200 width=4) (actual time=2.823..736.960 rows=999 loops=1) Hash Cond: (test.value = "*VALUES*".column1) -> Seq Scan on test (cost=0.00..14405.24 rows=999924 width=4) (actual time=0.032..335.680 rows=1000000 loops=1) -> Hash (cost=17.00..17.00 rows=200 width=4) (actual time=2.108..2.108 rows=1000 loops=1) -> HashAggregate (cost=15.00..17.00 rows=200 width=4) (actual time=1.165..1.542 rows=1000 loops=1) -> Values Scan on "*VALUES*" (cost=0.00..12.50 rows=1000 width=4) (actual time=0.004..0.478 rows=1000 loops=1) Total runtime: 737.362 ms Removing the 542 ms to read the table, we see checking if the values are in the hash is really rally fast. So, obvious truth : hash is faster than dumb compare. Much faster. Now, postgres should do this on its own, I think. PS : if the 1000 values are all the same (1000 times 1), IN() doesn't detect it, so the runtime does not change. Hash join doesn't care, so the runtime doesn't change either.
В списке pgsql-general по дате отправления: