An "obvious" index not being used

Поиск
Список
Период
Сортировка
От Daniele Varrazzo
Тема An "obvious" index not being used
Дата
Msg-id 4859B13F.3070103@develer.com
обсуждение исходный текст
Ответы Re: An "obvious" index not being used  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: An "obvious" index not being used  ("Kevin Grittner" <Kevin.Grittner@wicourts.gov>)
Список pgsql-performance
Hello,

I am experiencing a query for which an useful index is not being used by
PostgreSQL. The query is in the form:

     select count(*) from foo
     where foo.account_id in (
         select id from accounts where system = 'abc');

and the size of the tables it works on is:

   - 270 records in "accounts" 22 of which match the condition 'abc';
   - 5.3M records in "foo", 92K of which match the query condition.

There is an index in the field "foo.account_id" but is not used. The resulting
query plan is:

  Aggregate  (cost=300940.70..300940.71 rows=1 width=0) (actual
time=13412.088..13412.089 rows=1 loops=1)
    ->  Hash IN Join  (cost=11.97..299858.32 rows=432953 width=0) (actual
time=0.678..13307.074 rows=92790 loops=1)
          Hash Cond: (foo.account_id = accounts.id)
          ->  Seq Scan on foo  (cost=0.00..275591.14 rows=5313514 width=4)
(actual time=0.014..7163.538 rows=5313514 loops=1)
          ->  Hash  (cost=11.70..11.70 rows=22 width=4) (actual
time=0.199..0.199 rows=22 loops=1)
                ->  Bitmap Heap Scan on accounts  (cost=1.42..11.70 rows=22
width=4) (actual time=0.092..0.160 rows=22 loops=1)
                      Recheck Cond: (("system")::text = 'abc'::text)
                      ->  Bitmap Index Scan on iaccounts_x1
(cost=0.00..1.42 rows=22 width=0) (actual time=0.077..0.077 rows=22
loops=1)
                            Index Cond: (("system")::text = 'abc'::text)
  Total runtime: 13412.226 ms


There is a seqscan on the large table. If seqscans are disabled, the plan
becomes the more acceptable:

  Aggregate  (cost=2471979.99..2471980.00 rows=1 width=0) (actual
time=630.977..630.978 rows=1 loops=1)
    ->  Nested Loop  (cost=1258.12..2470897.61 rows=432953 width=0) (actual
time=0.164..526.174 rows=92790 loops=1)
          ->  HashAggregate  (cost=12.75..12.97 rows=22 width=4) (actual
time=0.131..0.169 rows=22 loops=1)
                ->  Bitmap Heap Scan on accounts  (cost=2.42..12.70 rows=22
width=4) (actual time=0.047..0.091 rows=22 loops=1)
                      Recheck Cond: (("system")::text = 'abc'::text)
                      ->  Bitmap Index Scan on iaccounts_x1
(cost=0.00..2.42 rows=22 width=0) (actual time=0.036..0.036 rows=22
loops=1)
                            Index Cond: (("system")::text = 'abc'::text)
          ->  Bitmap Heap Scan on foo  (cost=1245.37..111275.14 rows=83024
width=4) (actual time=3.086..14.391 rows=4218 loops=22)
                Recheck Cond: (foo.account_id = accounts.id)
                ->  Bitmap Index Scan on ifoo_x1  (cost=0.00..1224.61
rows=83024 width=0) (actual time=2.962..2.962 rows=4218 loops=22)
                      Index Cond: (foo.account_id = accounts.id)
  Total runtime: 631.121 ms

where the index "ifoo_x1" is used.


A similar query plan can be also obtained performing first the internal query
and hardcoding the result in a new query:

     explain analyze select count(*) from foo
     where account_id in
(70,33,190,21,191,223,203,202,148,246,85,281,280,319,234,67,245,310,318,279,320,9);


I have tried to:

   - rewrite the query with a JOIN instead of an IN (no change in the plan),
   - rewrite the query using EXISTS (it gets worse),
   - raise the statistics for the foo.account_id field to 100 and to 1000,
   - decrease the random_page_cost down to 1,
   - vacuum-analyze the tables at each change,

none of which has changed the situation.

The system is an Ubuntu Hardy 64 bits running PG 8.3. The issue has been
confirmed on Mac OS 1.5/PG 8.3. Although I made fewer tests on a PG 8.2 we
recently switched from, I think the issue presents on that version too.

This is the first time I see the query planner failing a plan rather obvious:
is there any other setting to tweak to force it to do good? (but a sensible
tweaking: the random_page_cost to 1 was just a try to have the index used,
nothing to be really put in production)

If you want to try the issue, an anonimized dataset is available on
http://piro.develer.com/test.sql.bz2 . The file size is 46MB (1.5GB
uncompressed). Chris Mair, who tested it on Mac OS, also noticed that PG
behaved correctly with the freshly imported data: as soon as he VACUUMed the
database he started experiencing the described issue.

Thank you very much.

--
Daniele Varrazzo - Develer S.r.l.
http://www.develer.com

В списке pgsql-performance по дате отправления:

Предыдущее
От: Antonio Perez
Дата:
Сообщение: WAL DUDAS
Следующее
От: Tom Lane
Дата:
Сообщение: Re: An "obvious" index not being used