Re: Consider the number of columns in the sort cost model

Поиск
Список
Период
Сортировка
От Andrei Lepikhov
Тема Re: Consider the number of columns in the sort cost model
Дата
Msg-id ae713843-0bb7-4726-82c9-0adae203357e@gmail.com
обсуждение исходный текст
Ответ на Re: Consider the number of columns in the sort cost model  (David Rowley <dgrowleyml@gmail.com>)
Список pgsql-hackers
On 10/15/24 12:15, David Rowley wrote:
> As for your patch, I'm suspicious that the general idea you're
> proposing is an actual improvement.
I didn't intend to treat it as an 'improvement' but as an intermediate 
patch. The main purpose here is to debate the way & introduce 
considering of number of columns. Conservative development approach has 
been preferred before.
> 
> it seems you're just charging cpu_operator_cost * <number of columns
> to sort by>. It seems like it won't be very hard to fool that into
> doing the wrong thing when the first column to sort by is distinct or
> almost distinct. There's going to be far fewer or no tiebreaker
> comparisons for that case.
As I've written above, it is for starters. It allow to analyse how the 
balance between different types of orderings can be changed in extreme 
cases. I can join this patch with the following, implementing 
differentiation by distincts, but the effect on regression tests will be 
smoothed out.
The primary idea is 1) to elaborate GROUP-BY optimisation and 2) give 
the optimiser a tool to choose a more optimal sort, comparing 
MergeAppend/Sort/IncrementalSort costs.
The whole idea is implemented in the branch [1] and described in the 
post [2]. Of course, we differentiate sortings by distinct of the first 
column (only one trustworthy statistic). It is not so difficult (but I 
doubt the real necessity) to use extended statistics and reflect in the 
cost values of different combinations of columns.

> 
> As mentioned by Kirill, I also don't understand the cost_sort
> signature change. Why would you do that over just doing
> list_length(pathkeys) within cost_sort()? Your throwing away a
> parameter that might  be the most useful one of the bunch for allowing
> better sort cost estimates.
Ok, may be it is too much for an intermediate patch. We can change the 
interface later, if necessary.
> Perhaps that could be done within the EquivalenceClass.
Thanks for the idea!

[1] https://github.com/postgrespro/postgres/tree/sort-columnsnum
[2] https://danolivo.substack.com/p/elaboration-of-the-postgresql-sort

-- 
regards, Andrei Lepikhov




В списке pgsql-hackers по дате отправления: