Re: Why is PostgreSQL not using my index?

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: Why is PostgreSQL not using my index?
Дата
Msg-id 14049.1422304305@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Why is PostgreSQL not using my index?  ("Christian Roche" <Christian.Roche@workshare.com>)
Список pgsql-performance
"Christian Roche" <Christian.Roche@workshare.com> writes:
> Now when I select a subset of the possible event IDs in the big table, PG uses the appropriate index:

> select *
>   from mixpanel_events_201409
>  where event_id in (3, 4, 5, 6, 7, 8, 9, 10, 11, 373, 375, 376, 318);

> Bitmap Heap Scan on mixpanel_events_201409  (cost=7663.36..1102862.70 rows=410022 width=949)
>   Recheck Cond: (event_id = ANY ('{3,4,5,6,7,8,9,10,11,373,375,376,318}'::integer[]))
>   ->  Bitmap Index Scan on mixpanel_idx_event_201409  (cost=0.00..7560.85 rows=410022 width=0)
>         Index Cond: (event_id = ANY ('{3,4,5,6,7,8,9,10,11,373,375,376,318}'::integer[]))

> But when I try to join the lookup table and select from it, the index is dismissed for a full table scan with a
catastrophiceffect on performance: 

> select *
> from mixpanel_events_201409 mp
>   inner join mixpanel_event_list ev on ( ev.id = mp.event_id )
> where ev.id in (3, 4, 5, 6, 7, 8, 9, 10, 11, 373, 375, 376, 318);

> Hash Join  (cost=20.73..2892183.32 rows=487288 width=1000)
>   Hash Cond: (mp.event_id = ev.id)
>   ->  Seq Scan on mixpanel_events_201409 mp  (cost=0.00..2809276.70 rows=20803470 width=949)
>   ->  Hash  (cost=20.57..20.57 rows=13 width=51)
>         ->  Seq Scan on mixpanel_event_list ev  (cost=0.00..20.57 rows=13 width=51)
>               Filter: (id = ANY ('{3,4,5,6,7,8,9,10,11,373,375,376,318}'::integer[]))

Given the estimated costs and rowcounts here, I'm far from convinced that
the planner made the wrong decision.  You seem to be expecting that it
will go for a nestloop plan that would require 13 separate indexscans of
the large table.  Those are unlikely to be only 1/13th the cost of the
unified bitmap scan with =ANY; there's going to be overhead from repeated
work.  If there's say a factor of 2 penalty for the repeated scans, that'd
be plenty enough to push the cost of that plan to be more than the
hashjoin.

If, indeed, the hashjoin is slower, that may suggest that you need to dial
down random_page_cost to better represent your environment.  But you
should be wary of making such an adjustment on the basis of a single
example; you might find that it makes other plan choices worse.

            regards, tom lane


В списке pgsql-performance по дате отправления:

Предыдущее
От: Jeff Janes
Дата:
Сообщение: Re: How to tell ANALYZE to collect statistics from the whole table?
Следующее
От: Tomas Vondra
Дата:
Сообщение: Re: Why is PostgreSQL not using my index?