Why is PostgreSQL not using my index?

Поиск
Список
Период
Сортировка
От Christian Roche
Тема Why is PostgreSQL not using my index?
Дата
Msg-id dd25f397d7424a50b4c548a435d6847d@LN1-EX2013-01.workshare.com
обсуждение исходный текст
Ответы Re: Why is PostgreSQL not using my index?  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: Why is PostgreSQL not using my index?  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Re: Why is PostgreSQL not using my index?  (Merlin Moncure <mmoncure@gmail.com>)
Список pgsql-performance

Hi guys,

 

Can I take a jab at the celebrated “why is Postgres not using my index” riddle?

 

I’m using PostgreSQL 9.3.3 on an Amazon RDS “db.r3.xlarge” 64-bit instance. I have two tables, one with about 30M rows and two indexes (in fact a monthly partition):

 

CREATE TABLE staging.mixpanel_events_201409 (

  date_day          date NOT NULL,

  event_id          int NOT NULL REFERENCES mixpanel_event_list,

  prop              hstore

);

 

CREATE INDEX mixpanel_idx_date_201409

  ON mixpanel_events_201409

  USING btree

  (date_day);

 

CREATE INDEX mixpanel_idx_event_201409

  ON mixpanel_events_201409

  USING btree

  (event_id);

 

 

And a lookup table with about 600 rows:

 

CREATE TABLE staging.mixpanel_event_list (

  id                serial PRIMARY KEY,

  name              text UNIQUE,

  source            event_source NULL

);

 

 

Now when I select a subset of the possible event IDs in the big table, PG uses the appropriate index:

 

select *

  from mixpanel_events_201409

 where event_id in (3, 4, 5, 6, 7, 8, 9, 10, 11, 373, 375, 376, 318);

 

 

Bitmap Heap Scan on mixpanel_events_201409  (cost=7663.36..1102862.70 rows=410022 width=949)

  Recheck Cond: (event_id = ANY ('{3,4,5,6,7,8,9,10,11,373,375,376,318}'::integer[]))

  ->  Bitmap Index Scan on mixpanel_idx_event_201409  (cost=0.00..7560.85 rows=410022 width=0)

        Index Cond: (event_id = ANY ('{3,4,5,6,7,8,9,10,11,373,375,376,318}'::integer[]))

 

 

But when I try to join the lookup table and select from it, the index is dismissed for a full table scan with a catastrophic effect on performance:

 

select *

from mixpanel_events_201409 mp

  inner join mixpanel_event_list ev on ( ev.id = mp.event_id )

where ev.id in (3, 4, 5, 6, 7, 8, 9, 10, 11, 373, 375, 376, 318);

 

Hash Join  (cost=20.73..2892183.32 rows=487288 width=1000)

  Hash Cond: (mp.event_id = ev.id)

  ->  Seq Scan on mixpanel_events_201409 mp  (cost=0.00..2809276.70 rows=20803470 width=949)

  ->  Hash  (cost=20.57..20.57 rows=13 width=51)

        ->  Seq Scan on mixpanel_event_list ev  (cost=0.00..20.57 rows=13 width=51)

              Filter: (id = ANY ('{3,4,5,6,7,8,9,10,11,373,375,376,318}'::integer[]))

 

 

Both tables have been vacuum analyzed.

 

What gives?

 

Thanks a lot for your help,

Chris

 

This email is from Workshare Limited. The information contained in and accompanying this communication may be confidential, subject to legal privilege, or otherwise protected from disclosure, and is intended solely for the use of the intended recipient(s). If you are not the intended recipient of this communication, please delete and destroy all copies in your possession and note that any review or dissemination of, or the taking of any action in reliance on, this communication is expressly prohibited. Please contact the sender if you believe you have received this email in error. Workshare Limited is a limited liability company registered in England and Wales (registered number 3559880), its registered office is at 20 Fashion Street, London, E1 6PX for further information, please refer to http://www.workshare.com.

В списке pgsql-performance по дате отправления:

Предыдущее
От: Marc Mamin
Дата:
Сообщение: Re: Query performance
Следующее
От: Jeff Janes
Дата:
Сообщение: Re: How to tell ANALYZE to collect statistics from the whole table?