Sort push down through a nested loop, for 9.7

Поиск
Список
Период
Сортировка
От Jeff Janes
Тема Sort push down through a nested loop, for 9.7
Дата
Msg-id CAMkU=1zg3nFP8EJ4PRi5K+NyzCD-QCthXNHoru8U+0zTK9nQbg@mail.gmail.com
обсуждение исходный текст
Список pgsql-hackers
I was testing to see if the newer changes in 9.6 fixed some planning
issues I've seen in prior versions.  It does not, for the ones of
interest to me, but while looking into it I see what seems to be a
missing optimization (in all versions).

If the first child of a nested loop produces naturally ordered output
(from an index), the planner is smart enough to realize that this
ordering passes up and applies to the entire nested loop node:

explain select * from pgbench_accounts a join pgbench_accounts b using
(bid) order by a.aid;                                                 QUERY PLAN
---------------------------------------------------------------------------------------------------------------Nested
Loop (cost=0.29..150007137.29 rows=10000000000 width=194)  Join Filter: (a.bid = b.bid)  ->  Index Scan using
pgbench_accounts_pkeyon pgbench_accounts a
 
(cost=0.29..4247.29 rows=100000 width=97)  ->  Materialize  (cost=0.00..3140.00 rows=100000 width=97)        ->  Seq
Scanon pgbench_accounts b  (cost=0.00..2640.00
 
rows=100000 width=97)

But if a naturally ordering index is not available, it is not smart
enough to push the sort down through the nested loop to the first
child:

set enable_hashjoin TO off;

explain select * from pgbench_accounts a join pgbench_accounts b using
(bid) order by a.filler;                                        QUERY PLAN
---------------------------------------------------------------------------------------------Sort
(cost=5707453952.44..5732453952.44rows=10000000000 width=275)  Sort Key: a.filler  ->  Nested Loop
(cost=0.00..150005530.00rows=10000000000 width=275)        Join Filter: (a.bid = b.bid)        ->  Seq Scan on
pgbench_accountsa  (cost=0.00..2640.00
 
rows=100000 width=97)        ->  Materialize  (cost=0.00..3140.00 rows=100000 width=97)              ->  Seq Scan on
pgbench_accountsb  (cost=0.00..2640.00
 
rows=100000 width=97)


Sorting 100000 rows would be a lot faster than sorting 10000000000 rows.

Is this one of the cases where the CPU cycles needed to detect the
opportunity during planning are just not worthwhile, considering how
rarely the opportunity arises?

Cheers,

Jeff



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andres Freund
Дата:
Сообщение: Re: Move PinBuffer and UnpinBuffer to atomics
Следующее
От: Pavel Stehule
Дата:
Сообщение: Re: proposal: PL/Pythonu - function ereport