Re: Have I b0rked something? Slow comparisons on "where x in (...)"

Поиск
Список
Период
Сортировка
От Listmail
Тема Re: Have I b0rked something? Slow comparisons on "where x in (...)"
Дата
Msg-id op.trrgen13zcizji@apollo13
обсуждение исходный текст
Ответ на Re: Have I b0rked something? Slow comparisons on "where x in (...)"  (Richard Huxton <dev@archonet.com>)
Список pgsql-general
> Try creating a temporary table, populating with the list and joining
> against it. That's probably your best bet for a long list of target
> values.

    Check :

forum_bench=> CREATE TABLE test (value INTEGER NOT NULL);
CREATE TABLE
forum_bench=> INSERT INTO test SELECT * FROM generate_series( 1, 1000000 );
INSERT 0 1000000
forum_bench=> ANALYZE test;
forum_bench=> EXPLAIN ANALYZE SELECT * FROM test;
                                                   QUERY PLAN
---------------------------------------------------------------------------------------------------------------
  Seq Scan on test  (cost=0.00..14405.24 rows=999924 width=4) (actual
time=0.030..349.699 rows=1000000 loops=1)
  Total runtime: 542.914 ms
(2 lignes)


    OK : 542 ms to grab the data.
    IN() :

EXPLAIN ANALYZE SELECT * FROM test WHERE value IN ( 1000 values from 0 to
999000 in steps of 1000 ):
  Seq Scan on test  (cost=0.00..1264310.24 rows=1000 width=4) (actual
time=17.649..17977.085 rows=999 loops=1)
    Filter: (value = ANY ('{0,1000..........99000}'::integer[]))
  Total runtime: 17978.061 ms

    Ouch.

forum_bench=> EXPLAIN ANALYZE SELECT * FROM test WHERE value IN (VALUES
(0),(1000),(2000),....................(998000),(999000));
                                                            QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------
  Hash Join  (cost=19.50..18176.45 rows=200 width=4) (actual
time=2.823..736.960 rows=999 loops=1)
    Hash Cond: (test.value = "*VALUES*".column1)
    ->  Seq Scan on test  (cost=0.00..14405.24 rows=999924 width=4) (actual
time=0.032..335.680 rows=1000000 loops=1)
    ->  Hash  (cost=17.00..17.00 rows=200 width=4) (actual
time=2.108..2.108 rows=1000 loops=1)
          ->  HashAggregate  (cost=15.00..17.00 rows=200 width=4) (actual
time=1.165..1.542 rows=1000 loops=1)
                ->  Values Scan on "*VALUES*"  (cost=0.00..12.50 rows=1000
width=4) (actual time=0.004..0.478 rows=1000 loops=1)
  Total runtime: 737.362 ms

    Removing the 542 ms to read the table, we see checking if the values are
in the hash is really rally fast.

    So, obvious truth : hash is faster than dumb compare. Much faster.
    Now, postgres should do this on its own, I think.

    PS : if the 1000 values are all the same (1000 times 1), IN() doesn't
detect it, so the runtime does not change. Hash join doesn't care, so the
runtime doesn't change either.

В списке pgsql-general по дате отправления:

Предыдущее
От: "Islam Hegazy"
Дата:
Сообщение: Re: C functions under windows
Следующее
От: Listmail
Дата:
Сообщение: Re: Have I b0rked something? Slow comparisons on "where x in (...)"