Re: Consider the number of columns in the sort cost model
От | Andrei Lepikhov |
---|---|
Тема | Re: Consider the number of columns in the sort cost model |
Дата | |
Msg-id | ae713843-0bb7-4726-82c9-0adae203357e@gmail.com обсуждение исходный текст |
Ответ на | Re: Consider the number of columns in the sort cost model (David Rowley <dgrowleyml@gmail.com>) |
Список | pgsql-hackers |
On 10/15/24 12:15, David Rowley wrote: > As for your patch, I'm suspicious that the general idea you're > proposing is an actual improvement. I didn't intend to treat it as an 'improvement' but as an intermediate patch. The main purpose here is to debate the way & introduce considering of number of columns. Conservative development approach has been preferred before. > > it seems you're just charging cpu_operator_cost * <number of columns > to sort by>. It seems like it won't be very hard to fool that into > doing the wrong thing when the first column to sort by is distinct or > almost distinct. There's going to be far fewer or no tiebreaker > comparisons for that case. As I've written above, it is for starters. It allow to analyse how the balance between different types of orderings can be changed in extreme cases. I can join this patch with the following, implementing differentiation by distincts, but the effect on regression tests will be smoothed out. The primary idea is 1) to elaborate GROUP-BY optimisation and 2) give the optimiser a tool to choose a more optimal sort, comparing MergeAppend/Sort/IncrementalSort costs. The whole idea is implemented in the branch [1] and described in the post [2]. Of course, we differentiate sortings by distinct of the first column (only one trustworthy statistic). It is not so difficult (but I doubt the real necessity) to use extended statistics and reflect in the cost values of different combinations of columns. > > As mentioned by Kirill, I also don't understand the cost_sort > signature change. Why would you do that over just doing > list_length(pathkeys) within cost_sort()? Your throwing away a > parameter that might be the most useful one of the bunch for allowing > better sort cost estimates. Ok, may be it is too much for an intermediate patch. We can change the interface later, if necessary. > Perhaps that could be done within the EquivalenceClass. Thanks for the idea! [1] https://github.com/postgrespro/postgres/tree/sort-columnsnum [2] https://danolivo.substack.com/p/elaboration-of-the-postgresql-sort -- regards, Andrei Lepikhov
В списке pgsql-hackers по дате отправления: