Re: Performance Anomaly with "col in (A,B)" vs. "col = A OR col = B" ver. 9.0.3

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: Performance Anomaly with "col in (A,B)" vs. "col = A OR col = B" ver. 9.0.3
Дата
Msg-id 9594.1317056871@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Performance Anomaly with "col in (A,B)" vs. "col = A OR col = B" ver. 9.0.3  (Timothy Garnett <tgarnett@panjiva.com>)
Ответы Re: Performance Anomaly with "col in (A,B)" vs. "col = A OR col = B" ver. 9.0.3  (Craig James <craig_james@emolecules.com>)
Re: Performance Anomaly with "col in (A,B)" vs. "col = A OR col = B" ver. 9.0.3  (Timothy Garnett <tgarnett@panjiva.com>)
Список pgsql-performance
Timothy Garnett <tgarnett@panjiva.com> writes:
> -- Problematic Query
> dev=> explain analyze SELECT "exp_detls".id FROM "exp_detls" WHERE
> ("exp_detls"."hts_code_id" IN (12,654)) LIMIT 1;
>  Limit  (cost=0.00..158.18 rows=1 width=4) (actual time=9661.363..9661.363
> rows=0 loops=1)
>    ->  Seq Scan on exp_detls  (cost=0.00..1336181.90 rows=8447 width=4)
> (actual time=9661.360..9661.360 rows=0 loops=1)
>          Filter: (hts_code_id = ANY ('{12,654}'::integer[]))
>  Total runtime: 9661.398 ms
> (4 rows)

> -- Using OR =, much faster, though more complicated plan then below
> dev=> explain analyze SELECT "exp_detls".id FROM "exp_detls" WHERE
> ("exp_detls"."hts_code_id" = 12 OR "exp_detls"."hts_code_id" = 654) LIMIT 1;
>  Limit  (cost=162.59..166.29 rows=1 width=4) (actual time=0.029..0.029
> rows=0 loops=1)
>    ->  Bitmap Heap Scan on exp_detls  (cost=162.59..31188.14 rows=8370
> width=4) (actual time=0.028..0.028 rows=0 loops=1)
>          Recheck Cond: ((hts_code_id = 12) OR (hts_code_id = 654))
>          ->  BitmapOr  (cost=162.59..162.59 rows=8370 width=0) (actual
> time=0.027..0.027 rows=0 loops=1)
>                ->  Bitmap Index Scan on
> index_exp_detls_on_hts_code_id_and_data_month  (cost=0.00..79.20 rows=4185
> width=0) (actual time=0.017..0.017 rows=0 loops=1)
>                      Index Cond: (hts_code_id = 12)
>                ->  Bitmap Index Scan on
> index_exp_detls_on_hts_code_id_and_data_month  (cost=0.00..79.20 rows=4185
> width=0) (actual time=0.007..0.007 rows=0 loops=1)
>                      Index Cond: (hts_code_id = 654)
>  Total runtime: 0.051 ms
> (9 rows)

Well, the reason it likes the first plan is that that has a smaller
estimated cost ;-).  Basically this is a startup-time-versus-total-time
issue: the cost of the seqscan+limit is estimated to be about 1/8447'th
of the time to read the whole table, since it's estimating 8447
candidate matches and assuming that those are uniformly distributed in
the table.  Meanwhile, the bitmap scan has a significant startup cost
because the entire indexscan is completed before we start to do any
fetching from the heap.  The overestimate of the number of matching
rows contributes directly to overestimating the cost of the indexscan,
too.  It ends up being a near thing --- 158 vs 166 cost units --- but
on the basis of these estimates the planner did the right thing.

So, what you need to do to make this better is to get it to have a
better idea of how many rows match the query condition; the overestimate
is both making the expensive plan look cheap, and making the cheaper
plan look expensive.  Cranking up the statistics target for the
hts_code_id column (and re-ANALYZEing) ought to fix it.  If all your
tables are this large you might want to just increase
default_statistics_target across the board.

            regards, tom lane

В списке pgsql-performance по дате отправления:

Предыдущее
От: Biswa
Дата:
Сообщение: Re: How to find record having % as part of data.
Следующее
От: Craig James
Дата:
Сообщение: Re: Performance Anomaly with "col in (A,B)" vs. "col = A OR col = B" ver. 9.0.3