Re: POC: GROUP BY optimization

Поиск
Список
Период
Сортировка
От Dmitry Dolgov
Тема Re: POC: GROUP BY optimization
Дата
Msg-id CA+q6zcVRrd-z4YZ4M43ccst7aGL9==w5r1fionRWhP9ot6mybQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: POC: GROUP BY optimization  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Ответы Re: POC: GROUP BY optimization  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Список pgsql-hackers
> On Tue, Apr 9, 2019 at 5:21 PM Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
>
> So I personally would suggest to treat those patches as independent until
> the very last moment, develop the costing improvements needed by each
> of them, and then decide which of them are committable / in what order.

I had the same idea, but judging from the questions, raised in this thread,
it's quite hard to go with reordering based only on frequency of values. I
hoped that the cost_sort improvement patch would be simple enough to
incorporate it here, but of course it wasn't. Having an assumption, that the
amount of work, required for performing sorting, depends only on the number of
distinct groups and how costly it is to compare a values of this data type,
I've ended up extracting get_width_multiplier and get_func_cost parts from
cost_sort patch and including them into 0003-Reorder-by-values-distribution.
This allows to take into account situations when we compare e.g. long strings
or a custom data type with high procost for comparison (but I've used this
values directly without any adjusting coefficients yet).

> On Wed, Jun 13, 2018 at 6:41 PM Teodor Sigaev <teodor@sigaev.ru> wrote:
>
> > So that's a nice improvement, although I think we should also consider
> > non-uniform distributions (using the per-column MCV lists).
>
> Could you clarify how to do that?

Since I'm not familiar with this topic, I would like to ask the same question,
how to do that and what are the advantages?

> On Sat, Jun 16, 2018 at 5:59 PM Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
>
> I still think we need to be careful when introducing new optimizations
> in this area - reordering the grouping keys by ndistinct, ORDER BY or
> whatever. In particular I don't think we should commit these patches
> that may quite easily cause regressions, and then hope some hypothetical
> future patch fixes the costing.

I'm a bit concerned about this part of the discussion. There is an idea through
the whole thread about avoiding the situation, when a user knows which order is
better and we generate different one by mistake. From what I see right now even
if all the objections would be addressed, there is a chance that some
coefficients will be not good enough (e.g. width multiplier is based on an
average width, or it can suddenly happen that all the compared string have some
difference at the very beginning) and the chosen order will be not optimal.
Does it mean that in any case the implementation of such optimization should
provide a way to override it?

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Robert Haas
Дата:
Сообщение: improving wraparound behavior
Следующее
От: Andres Freund
Дата:
Сообщение: Re: improving wraparound behavior