Re: Remove useless GROUP BY columns considering unique index

Поиск
Список
Период
Сортировка
От Andrei Lepikhov
Тема Re: Remove useless GROUP BY columns considering unique index
Дата
Msg-id f358f934-44d6-4c17-83fe-d61c5c89e191@gmail.com
обсуждение исходный текст
Ответ на Re: Remove useless GROUP BY columns considering unique index  (David Rowley <dgrowleyml@gmail.com>)
Список pgsql-hackers
On 12/12/24 10:09, David Rowley wrote:
> On Mon, 2 Dec 2024 at 17:18, Andrei Lepikhov <lepihov@gmail.com> wrote:
>> Patch 0002 looks helpful and performant. I propose to check 'relid > 0'
>> to avoid diving into 'foreach(lc, parse->rtable)' at all if nothing has
>> been found.
> 
> I did end up adding another fast path there, but I felt like checking
> relid > 0 wasn't as good as it could be as that would have only
> short-circuited when we don't see any Vars of level 0 in the GROUP BY.
> It seemed cheap enough to short-circuit when none of the relations
> mentioned in the GROUP BY have multiple columns mentioned.
Your solution seems much better my proposal. Thanks to apply it!

> when how do you decide if the GROUP BY should become t1.a,t1.b or
> t2.x,t2.y? It's not clear to me that using t1's columns is always
> better than using t2's. I imagine using a mix is never better, but I'm
> unsure how you'd decide which ones to use.
Depends on how to calculate that 'better'. Right now, GROUP-BY employs 
two strategies to reduce path cost: 1) ORDER-BY statement (avoid final 
sorting); 2) To fit incoming subtree pathkeys (avoid grouping presorting).
My idea comes close with [1], where the cost depends on the estimated 
number of groups in the first grouping column because cost_sort predicts 
the number of comparison operator calls based on statistics. In this 
case, the choice between (x,y) and (a,b) will depend on the ndistinct of 
'x' and 'a'.
In general, it was the idea to debate, more for further development than 
for the patch in this thread.

[1] Consider the number of columns in the sort cost model
https://www.postgresql.org/message-id/flat/8742aaa8-9519-4a1f-91bd-364aec65f5cf%40gmail.com

-- 
regards, Andrei Lepikhov



В списке pgsql-hackers по дате отправления: