Re: [GENERAL] Perfomance of IN-clause with many elements and possiblesolutions

Поиск
Список
Период
Сортировка
От Dmitry Lazurkin
Тема Re: [GENERAL] Perfomance of IN-clause with many elements and possiblesolutions
Дата
Msg-id 3059a9df-dcf7-8ea0-a452-9d4db4783b67@gmail.com
обсуждение исходный текст
Ответ на Re: [GENERAL] Perfomance of IN-clause with many elements and possible solutions  ("David G. Johnston" <david.g.johnston@gmail.com>)
Список pgsql-general
On 25.07.2017 01:25, David G. Johnston wrote:
On Mon, Jul 24, 2017 at 3:22 PM, Dmitry Lazurkin <dilaz03@gmail.com> wrote:
ALTER TABLE ids ALTER COLUMN id SET NOT NULL;
EXPLAIN (ANALYZE, BUFFERS) SELECT count(*) FROM ids WHERE id IN :values_clause;

 Aggregate  (cost=245006.46..245006.47 rows=1 width=8) (actual time=3824.095..3824.095 rows=1 loops=1)
   Buffers: shared hit=44248
   ->  Hash Join  (cost=7.50..235006.42 rows=4000019 width=0) (actual time=1.108..3327.112 rows=3998646 loops=1)
   ...

​You haven't constrained the outer relation (i.e., :values_clause) to be non-null which is what I believe is required for the semi-join algorithm to be considered.​

David J.

CREATE TABLE second_ids (i bigint);
INSERT INTO second_ids :values_clause;

EXPLAIN (ANALYZE, BUFFERS) SELECT count(*) FROM ids WHERE id IN (select i from second_ids);

 Aggregate  (cost=225004.36..225004.37 rows=1 width=8) (actual time=3826.641..3826.641 rows=1 loops=1)
   Buffers: shared hit=44249
   ->  Hash Semi Join  (cost=5.50..215004.32 rows=4000019 width=0) (actual time=0.352..3338.601 rows=3998646 loops=1)
         Hash Cond: (ids.id = second_ids.i)
         Buffers: shared hit=44249
         ->  Seq Scan on ids  (cost=0.00..144248.48 rows=10000048 width=8) (actual time=0.040..1069.006 rows=10000000 loops=1)
               Buffers: shared hit=44248
         ->  Hash  (cost=3.00..3.00 rows=200 width=8) (actual time=0.288..0.288 rows=200 loops=1)
               Buckets: 1024  Batches: 1  Memory Usage: 16kB
               Buffers: shared hit=1
               ->  Seq Scan on second_ids  (cost=0.00..3.00 rows=200 width=8) (actual time=0.024..0.115 rows=200 loops=1)
                     Buffers: shared hit=1
 Planning time: 0.413 ms
 Execution time: 3826.752 ms

Hash Semi-Join without NOT NULL constraint on second table.

В списке pgsql-general по дате отправления:

Предыдущее
От: "David G. Johnston"
Дата:
Сообщение: Re: [GENERAL] Perfomance of IN-clause with many elements and possible solutions
Следующее
От: Tom Lane
Дата:
Сообщение: Re: [GENERAL] Perfomance of IN-clause with many elements and possible solutions