Re: POC: GROUP BY optimization

Поиск
Список
Период
Сортировка
От Andrey V. Lepikhov
Тема Re: POC: GROUP BY optimization
Дата
Msg-id dc933c66-41b9-8907-327e-f6361a690654@postgrespro.ru
обсуждение исходный текст
Ответ на Re: POC: GROUP BY optimization  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
Ответы Re: POC: GROUP BY optimization  (Andrey Lepikhov <a.lepikhov@postgrespro.ru>)
Re: POC: GROUP BY optimization  ("Andrey V. Lepikhov" <a.lepikhov@postgrespro.ru>)
Список pgsql-hackers
I keep work on this patch. Here is intermediate results.

On 7/22/21 3:58 AM, Tomas Vondra wrote:
> in the first loop. Which seems pretty bogus - why would there be just
> two groups? When processing the first expression, it's as if there was
> one big "prev group" with all the tuples, so why not to just use nGroups
> as it is?

I think, heapsort code seems very strange. Look into fallback case. It 
based on an output_tuples value. Maybe we should use nGroups value here, 
but based on a number of output_tuples?

 > 1) I looked at the resources mentioned as sources the formulas came
 > from, but I've been unable to really match the algorithm to them. The
 > quicksort paper is particularly "dense", the notation seems to be very
 > different, and none of the theorems seem like an obvious fit. Would be
 > good to make the relationship clearer in comments etc.

Fixed (See attachment).

> 3) I'm getting a bit skeptical about the various magic coefficients that
> are meant to model higher costs with non-uniform distribution. But
> consider that we do this, for example:
> 
>     tuplesPerPrevGroup = ceil(1.5 * tuplesPerPrevGroup / nGroups);
> 
> but then in the next loop we call estimate_num_groups_incremental and
> pass this "tweaked" tuplesPerPrevGroup value to it. I'm pretty sure this
> may have various strange consequences - we'll calculate the nGroups
> based on the inflated value, and we'll calculate tuplesPerPrevGroup from
> that again - that seems susceptible to "amplification".
> 
> We inflate tuplesPerPrevGroup by 50%, which means we'll get a higher
> nGroups estimate in the next loop - but not linearly. An then we'll
> calculate the inflated tuplesPerPrevGroup and estimated nGroup ...

Weighting coefficient '1.5' shows our desire to minimize the number of 
comparison operations on each next attribute of a pathkeys list.
Increasing this coef we increase a chance, that planner will order 
pathkeys by decreasing of uniqueness.
I think, it's ok.

-- 
regards,
Andrey Lepikhov
Postgres Professional

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Kyotaro Horiguchi
Дата:
Сообщение: Re: standby recovery fails (tablespace related) (tentative patch and discussion)
Следующее
От: Michael Paquier
Дата:
Сообщение: Re: Refactoring of compression options in pg_basebackup