Re: demystifying nested loop vs. merge join query plan choice

Поиск

Список

Период

Сортировка

От	Sandeep Gupta
Тема	Re: demystifying nested loop vs. merge join query plan choice
Дата	31 июля 2013 г. 22:34:23
Msg-id	CAAywg7v_57K5UYxcrsTSpTWscQA7O1R3Sp2znzHg_nVsMd7i=g@mail.gmail.com обсуждение исходный текст
Ответ на	Re: demystifying nested loop vs. merge join query plan choice (Sandeep Gupta <gupta.sandeep@gmail.com>)
Ответы	Re: demystifying nested loop vs. merge join query plan choice (Tom Lane <tgl@sss.pgh.pa.us>)
Список	pgsql-general

Дерево обсуждения

details regarding buffer usage:

for database 1:

QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------------------------

Aggregate (cost=1534125.08..1534125.09 rows=1 width=0) (actual time=9149.366..9149.366 rows=1 loops=1)

Buffers: shared hit=137991

-> Merge Join (cost=2.48..1514765.90 rows=7743672 width=0) (actual time=0.091..9075.008 rows=998038 loops=1)

Merge Cond: (pc.did = tc.did)

Buffers: shared hit=137991

-> Index Only Scan using pc_did_idx on pc (cost=0.00..12987.04 rows=499616 width=4) (actual time=0.017..58.237 rows=499616 loops=1)

Heap Fetches: 0

Buffers: shared hit=1369

-> Index Only Scan using tc_did_idx on tc (cost=0.00..1298125.32 rows=49987616 width=4) (actual time=0.015..5301.727 rows=49997291 loops=1)

Heap Fetches: 0

Buffers: shared hit=136622

Total runtime: 9149.414 ms

(12 rows)

for database 2:

QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------------------

Aggregate (cost=35279895.52..35279895.53 rows=1 width=0) (actual time=2386.865..2386.865 rows=1 loops=1)

Buffers: shared hit=2235978 read=208446

-> Nested Loop (cost=0.00..35276697.82 rows=1279080 width=0) (actual time=0.049..2292.338 rows=1000834 loops=1)

Buffers: shared hit=2235978 read=208446

-> Index Only Scan using pc_did_idx on pc (cost=0.00..15224.56 rows=500384 width=4) (actual time=0.016..108.407 rows=500384 loops=1)

Heap Fetches: 500384

Buffers: shared hit=6 read=3579

-> Index Only Scan using tc_did_idx on tc (cost=0.00..70.44 rows=3 width=4) (actual time=0.003..0.004 rows=2 loops=500384)

Index Cond: (did = pc.did)

Heap Fetches: 1000834

Buffers: shared hit=2235972 read=204867

Total runtime: 2386.914 ms

(12 rows)

On Wed, Jul 31, 2013 at 3:16 PM, Sandeep Gupta <gupta.sandeep@gmail.com> wrote:

Hi Pavel,

Yes. The postgresql.conf is exactly the same. The have the same index and clustering and are on the same compute node as well but running on different ports.

-Sandeep

On Wed, Jul 31, 2013 at 3:14 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
Hello

do you have same configuration?

Regards

Pavel

2013/7/31 Sandeep Gupta <gupta.sandeep@gmail.com>:
> I have two postgres instances each with a database of same schema. The
> dataset in both is ''same'' but for randomness i.e. both contain two tables
> pc(did) and tc(pid, did) that have almost
> same number of rows and have been generate from same distribution.
>
> However the query plan for the join turns out to be completely different: on
> one join takes 2.3 secs while on the other it takes 7 secs.
>
>
> Here are the statistics:
>
> for database 1:
> size of tc table: 49987585
> size of pc table: 499616
>
> join plan:
>
> QUERY PLAN
> -------------------------------------------------------------------------------------------------------------------------------------------------------
> Aggregate (cost=1534125.08..1534125.09 rows=1 width=0) (actual
> time=8473.296..8473.296 rows=1 loops=1)
> -> Merge Join (cost=2.48..1514765.90 rows=7743672 width=0) (actual
> time=0.084..8409.065 rows=998038 loops=1)
> Merge Cond: (pc.did = tc.did)
> -> Index Only Scan using pc_did_idx on pc (cost=0.00..12987.04
> rows=499616 width=4) (actual time=0.016..58.202 rows=499616 loops=1)
> Heap Fetches: 0
> -> Index Only Scan using tc_did_idx on tc (cost=0.00..1298125.32
> rows=49987616 width=4) (actual time=0.014..5141.809 rows=49997291 loops=1)
> Heap Fetches: 0
> Total runtime: 8473.337 ms
> '
>
> Query Running time: 5135
>
>
> for database 2:
> size of tc table: 50012415
> size of pc table: 500384
>
> QUERY
> PLAN
> ------------------------------------------------------------------------------------------------------------------------------------------------
> Aggregate (cost=35279895.52..35279895.53 rows=1 width=0) (actual
> time=2501.970..2501.970 rows=1 loops=1)
> -> Nested Loop (cost=0.00..35276697.82 rows=1279080 width=0) (actual
> time=0.038..2418.766 rows=1000834 loops=1)
> -> Index Only Scan using pc_did_idx on pc (cost=0.00..15224.56
> rows=500384 width=4) (actual time=0.017..109.080 rows=500384 loops=1)
> Heap Fetches: 500384
> -> Index Only Scan using tc_did_idx on tc (cost=0.00..70.44
> rows=3 width=4) (actual time=0.004..0.004 rows=2 loops=500384)
> Index Cond: (did = pc.did)
> Heap Fetches: 1000834
> Total runtime: 2502.017 ms
>
> Query running time: 2090.388 ms
>
> My question is why is the query plan so different for two datasets that are
> really exactly the same. And how can i force the plan to be nested index
> scan on
> database 1 .
>
>
> -Sandeep
>

В списке pgsql-general по дате отправления:

Предыдущее

От: Sandeep Gupta
Дата: 31 июля 2013 г., 22:16:25
Сообщение: Re: demystifying nested loop vs. merge join query plan choice

Следующее

От: Tom Lane
Дата: 31 июля 2013 г., 22:55:51
Сообщение: Re: demystifying nested loop vs. merge join query plan choice

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: demystifying nested loop vs. merge join query plan choice

Предыдущее

Следующее