Seq scan on big table, episode 2

Поиск

Список

Период

Сортировка

От	Vincenzo Melandri
Тема	Seq scan on big table, episode 2
Дата	31 октября 2012 г. 13:56:17
Msg-id	CAHSd9GfH6sJ4U0632QbRj6T6TaO59OXceaLeMADt6oFPhdr-aw@mail.gmail.com обсуждение исходный текст
Ответы	Re: Seq scan on big table, episode 2 (Vincenzo Melandri <vmelandri@imolinfo.it>) Re: Seq scan on big table, episode 2 (Shaun Thomas <sthomas@optionshouse.com>)
Список	pgsql-performance

Дерево обсуждения

Hi all :)

I'm here again.

This time I'll provide more details (explain analyze, data-type, and indexes), hope it will be enough :)

The query that is performing a plan that i do not understand is the following:

--------------------

select [some fields from all 3 tables]

from

DATA_SEQUENCES

join SUBSCRIPTION on

SUBSCRIPTION.key1 = DATA_SEQUENCES.key1 AND

SUBSCRIPTION.key2 = DATA_SEQUENCES.key2

join people on

people.key1 = SUBSCRIPTION.people_key1 AND

people.key2 = SUBSCRIPTION.people_key2

WHERE DATA_SEQUENCES.import_id = 1351674661

--------------------

This is the explain analyze:

--------------------

Merge Join (cost=2902927.01..2973307.79 rows=790371 width=240) (actual time=40525.439..40525.439 rows=0 loops=1)

Merge Cond: ((people.key1 = subscription.people_key1) AND (people.key2 = subscription.people_key2))

-> Sort (cost=2885618.73..2904468.49 rows=7539905 width=240) (actual time=40525.268..40525.268 rows=1 loops=1)

Sort Key: people.key1, people.key2

Sort Method: external merge Disk: 466528kB

-> Seq Scan on people (cost=0.00..323429.05 rows=7539905 width=240) (actual time=0.029..5193.057 rows=7539469 loops=1)

-> Sort (cost=17308.28..17318.76 rows=4193 width=16) (actual time=0.167..0.167 rows=0 loops=1)

Sort Key: subscription.people_key1, subscription.people_key2

Sort Method: quicksort Memory: 25kB

-> Nested Loop (cost=0.00..17055.99 rows=4193 width=16) (actual time=0.154..0.154 rows=0 loops=1)

-> Seq Scan on data_sequences (cost=0.00..150.15 rows=39 width=16) (actual time=0.154..0.154 rows=0 loops=1)

Filter: (import_id = 1351674661)

-> Index Scan using xpksubscription on subscription (cost=0.00..431.86 rows=108 width=16) (never executed)

Index Cond: ((subscription.key1 = data_sequences.key1) AND (subscription.key2 = data_sequences.key2))

Total runtime: 40600.815 ms

--------------------

All the key, key2, and relative foreign keys are int4. Import_id is a bigint.

I'm not reporting the full create table script 'cause people and subscription both have lots of fields. I know this can be wrong (lots of field on big table), but this is an environment born something like 20 years ago and not intended from the start for such a big data volume.

I have the following indexes:

on People:

CREATE UNIQUE INDEX people_pkey ON people USING btree (key1, key2)

CREATE INDEX people_pkey_hash_loc ON people USING hash (key1);

CREATE INDEX people_pkey_hash_id ON people USING hash (key2);

on Subscription:

CREATE UNIQUE INDEX subscription_pkey ON subscription USING btree (key1, key2)

CREATE INDEX subscription_fk_people ON subscription USING btree (people_key1, people_key2)

on Data_sequences:

create index data_sequences_key on data_sequences USING btree (key1, key2);

create index data_sequences_id on data_sequences USING btree (import_id);

What i don't understand is WHY the seq scan on people, and how can I cast the import_id to make it use the index on data_sequences (another useless seq scan).

Mind that when I run this explain analyze there were no records on data_sequences. So all the time (40 seconds!) is for the useless seq scan on people. Both people and subscription have lots of records (10-20.000.000 range).

I'm running 8.4 (haven't tested it on 9.2.1 yet, but we planned to upgrade ASAP cause we have other queries which will benefit from the index-only-scan new feature).

Thank you in advance,

--
Vincenzo.

В списке pgsql-performance по дате отправления:

Предыдущее

От: Samuel Gendler
Дата: 31 октября 2012 г., 13:34:34
Сообщение: Re: Invalid memory alloc request size

Следующее

От: AndyG
Дата: 31 октября 2012 г., 14:28:17
Сообщение: Re: Slow query, where am I going wrong?

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Seq scan on big table, episode 2

Предыдущее

Следующее