demystifying nested loop vs. merge join query plan choice

Поиск

Список

Период

Сортировка

От	Sandeep Gupta
Тема	demystifying nested loop vs. merge join query plan choice
Дата	31 июля 2013 г. 22:09:18
Msg-id	CAAywg7s0YhS+VmTvTv4QgmRv_wubbLRdYLNAEUk-QO=7m2jMvA@mail.gmail.com обсуждение исходный текст
Ответы	Re: demystifying nested loop vs. merge join query plan choice (Pavel Stehule <pavel.stehule@gmail.com>)
Список	pgsql-general

Дерево обсуждения

I have two postgres instances each with a database of same schema. The dataset in both is ''same'' but for randomness i.e. both contain two tables pc(did) and tc(pid, did) that have almost

same number of rows and have been generate from same distribution.

However the query plan for the join turns out to be completely different: on one join takes 2.3 secs while on the other it takes 7 secs.

Here are the statistics:

for database 1:

size of tc table: 49987585

size of pc table: 499616

join plan:

QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------------------------

Aggregate (cost=1534125.08..1534125.09 rows=1 width=0) (actual time=8473.296..8473.296 rows=1 loops=1)

-> Merge Join (cost=2.48..1514765.90 rows=7743672 width=0) (actual time=0.084..8409.065 rows=998038 loops=1)

Merge Cond: (pc.did = tc.did)

-> Index Only Scan using pc_did_idx on pc (cost=0.00..12987.04 rows=499616 width=4) (actual time=0.016..58.202 rows=499616 loops=1)

Heap Fetches: 0

-> Index Only Scan using tc_did_idx on tc (cost=0.00..1298125.32 rows=49987616 width=4) (actual time=0.014..5141.809 rows=49997291 loops=1)

Heap Fetches: 0

Total runtime: 8473.337 ms

Query Running time: 5135

for database 2:

size of tc table: 50012415

size of pc table: 500384

QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------------------

Aggregate (cost=35279895.52..35279895.53 rows=1 width=0) (actual time=2501.970..2501.970 rows=1 loops=1)

-> Nested Loop (cost=0.00..35276697.82 rows=1279080 width=0) (actual time=0.038..2418.766 rows=1000834 loops=1)

-> Index Only Scan using pc_did_idx on pc (cost=0.00..15224.56 rows=500384 width=4) (actual time=0.017..109.080 rows=500384 loops=1)

Heap Fetches: 500384

-> Index Only Scan using tc_did_idx on tc (cost=0.00..70.44 rows=3 width=4) (actual time=0.004..0.004 rows=2 loops=500384)

Index Cond: (did = pc.did)

Heap Fetches: 1000834

Total runtime: 2502.017 ms

Query running time: 2090.388 ms

My question is why is the query plan so different for two datasets that are really exactly the same. And how can i force the plan to be nested index scan on

database 1 .

-Sandeep

В списке pgsql-general по дате отправления:

Предыдущее

От: KimmoHintikka
Дата: 31 июля 2013 г., 21:38:20
Сообщение: Re: Roadmap for Postgres on AIX

Следующее

От: Pavel Stehule
Дата: 31 июля 2013 г., 22:14:49
Сообщение: Re: demystifying nested loop vs. merge join query plan choice

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

demystifying nested loop vs. merge join query plan choice

Предыдущее

Следующее