Re: POC: GROUP BY optimization

Поиск
Список
Период
Сортировка
От Andrey Lepikhov
Тема Re: POC: GROUP BY optimization
Дата
Msg-id 3d30ae10-5c35-63c4-577c-d2e2479e15fd@postgrespro.ru
обсуждение исходный текст
Ответ на Re: POC: GROUP BY optimization  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
Ответы Re: POC: GROUP BY optimization
Список pgsql-hackers
On 20/7/2023 18:46, Tomas Vondra wrote:
> On 7/20/23 08:37, Andrey Lepikhov wrote:
>> On 3/10/2022 21:56, Tom Lane wrote:
>>> Revert "Optimize order of GROUP BY keys".
>>>
>>> This reverts commit db0d67db2401eb6238ccc04c6407a4fd4f985832 and
>>> several follow-on fixes.
>>> ...
>>> Since we're hard up against the release deadline for v15, let's
>>> revert these changes for now.  We can always try again later.
>>
>> It may be time to restart the project. As a first step, I rebased the
>> patch on the current master. It wasn't trivial because of some latest
>> optimizations (a29eab, 1349d27 and 8d83a5d).
>> Now, Let's repeat the review and rewrite the current path according to
>> the reasons uttered in the revert commit.
> 
> I think the fundamental task is to make the costing more reliable, and
> the commit message 443df6e2db points out a couple challenges in this
> area. Not sure how feasible it is to address enough of them ...
> 
> 1) procost = 1.0 - I guess we could make this more realistic by doing
> some microbenchmarks and tuning the costs for the most expensive cases.
> 
> 2) estimating quicksort comparisons - This relies on ndistinct
> estimates, and I'm not sure how much more reliable we can make those.
> Probably not much :-( Not sure what to do about this, the only thing I
> can think of is to track "reliability" of the estimates and only do the
> reordering if we have high confidence in the estimates. That means we'll
> miss some optimization opportunities, but it should limit the risk.
I read up on the history of this thread.
As I see, all the problems mentioned above can be beaten by excluding 
the new cost model at all. We can sort GROUP BY columns according to the 
'ndistinct' value.
I see the reason for introducing the cost model in [1]. The main 
supporting point here is that with this patch, people couldn't optimize 
the query by themselves, organizing the order of the columns in a more 
optimal way. But now we have at least the GUC to switch off the 
behaviour introduced here. Also, some extensions, like the well-known 
pg_hint_plan, can help with automation.
So, how about committing of the undoubted part of the feature and 
working on the cost model in a new thread?

[1] 
https://www.postgresql.org/message-id/6d1e0cdb-dde3-f62a-43e2-e90bbd9b0f42%402ndquadrant.com

-- 
regards,
Andrey Lepikhov
Postgres Professional




В списке pgsql-hackers по дате отправления:

Предыдущее
От: Masahiko Sawada
Дата:
Сообщение: Re: pg16b2: REINDEX segv on null pointer in RemoveFromWaitQueue
Следующее
От: Bharath Rupireddy
Дата:
Сообщение: Re: Synchronizing slots from primary to standby