Sort push down through a nested loop, for 9.7

Поиск

Список

Период

Сортировка

От	Jeff Janes
Тема	Sort push down through a nested loop, for 9.7
Дата	9 апреля 2016 г. 23:01:40
Msg-id	CAMkU=1zg3nFP8EJ4PRi5K+NyzCD-QCthXNHoru8U+0zTK9nQbg@mail.gmail.com обсуждение исходный текст
Список	pgsql-hackers

Дерево обсуждения

I was testing to see if the newer changes in 9.6 fixed some planning
issues I've seen in prior versions.  It does not, for the ones of
interest to me, but while looking into it I see what seems to be a
missing optimization (in all versions).

If the first child of a nested loop produces naturally ordered output
(from an index), the planner is smart enough to realize that this
ordering passes up and applies to the entire nested loop node:

explain select * from pgbench_accounts a join pgbench_accounts b using
(bid) order by a.aid;                                                 QUERY PLAN
---------------------------------------------------------------------------------------------------------------Nested
Loop (cost=0.29..150007137.29 rows=10000000000 width=194)  Join Filter: (a.bid = b.bid)  ->  Index Scan using
pgbench_accounts_pkeyon pgbench_accounts a
 
(cost=0.29..4247.29 rows=100000 width=97)  ->  Materialize  (cost=0.00..3140.00 rows=100000 width=97)        ->  Seq
Scanon pgbench_accounts b  (cost=0.00..2640.00
 
rows=100000 width=97)

But if a naturally ordering index is not available, it is not smart
enough to push the sort down through the nested loop to the first
child:

set enable_hashjoin TO off;

explain select * from pgbench_accounts a join pgbench_accounts b using
(bid) order by a.filler;                                        QUERY PLAN
---------------------------------------------------------------------------------------------Sort
(cost=5707453952.44..5732453952.44rows=10000000000 width=275)  Sort Key: a.filler  ->  Nested Loop
(cost=0.00..150005530.00rows=10000000000 width=275)        Join Filter: (a.bid = b.bid)        ->  Seq Scan on
pgbench_accountsa  (cost=0.00..2640.00
 
rows=100000 width=97)        ->  Materialize  (cost=0.00..3140.00 rows=100000 width=97)              ->  Seq Scan on
pgbench_accountsb  (cost=0.00..2640.00
 
rows=100000 width=97)


Sorting 100000 rows would be a lot faster than sorting 10000000000 rows.

Is this one of the cases where the CPU cycles needed to detect the
opportunity during planning are just not worthwhile, considering how
rarely the opportunity arises?

Cheers,

Jeff

В списке pgsql-hackers по дате отправления:

Предыдущее

От: Andres Freund
Дата: 09 апреля 2016 г., 22:49:37
Сообщение: Re: Move PinBuffer and UnpinBuffer to atomics

Следующее

От: Pavel Stehule
Дата: 09 апреля 2016 г., 23:38:29
Сообщение: Re: proposal: PL/Pythonu - function ereport

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Sort push down through a nested loop, for 9.7

Предыдущее

Следующее