Re: POC: GROUP BY optimization

Поиск

Список

Период

Сортировка

От	Claudio Freire
Тема	Re: POC: GROUP BY optimization
Дата	7 июня 2018 г. 05:29:55
Msg-id	CAGTBQpaLKt+3V5167GteO_y7e8qgw3+Pi7vn6HoG-T4RMNC=oQ@mail.gmail.com обсуждение исходный текст
Ответ на	Re: POC: GROUP BY optimization (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Список	pgsql-hackers

Дерево обсуждения

On Wed, Jun 6, 2018 at 8:06 PM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
> >>> Comparison cost can be approximated probabilistically as:
> >>>
> >>> cost_comp = sum(cost_op_n * (1.0 / ndistinct(col_1_to_n)))
> >>>
> >>> Where cost_op_n is the cost of the comparison function for column N,
> >>> and ndistinct(col_1_to_n) is an estimation of the number of distinct
> >>> values for columns prior to N in the sort order.
> >>>
> >>> You can approximate ndistinct as the product of ndistinct of previous
> >>> columns, or use extended statistics when available.
> >>>
> >>
> >> Sure. The trouble is this also assumes uniform distribution, which is
> >> not always the case.
> >
> > Well, (1.0 / ndistinct) = p(left == right).
> >
> > If you can compute a better p(left == right) with an MCV, you can get
> > a better estimate. If you have an MCV. But that'd be a bit expensive,
> > and I'm not sure it's all that relevant.
> >
> > To what degree does improving this produce better plans? As long as
> > average expected cost is relatively well estimated (as in, one sort
> > order relative to another sort order), it won't affect the decision.
> >
>
> I'd bet I can construct corner-case examples with vastly different
> behavior depending on column data distributions, so it's not entirely
> insignificant. How important is it in practice I don't know.

I guess that can only be answered by building that solution and
testing it against the corner cases.

В списке pgsql-hackers по дате отправления:

Предыдущее

От: Tom Lane
Дата: 07 июня 2018 г., 05:16:52
Сообщение: Re: Why is fncollation in FunctionCallInfoData rather than fmgr_info?

Следующее

От: David Rowley
Дата: 07 июня 2018 г., 06:00:44
Сообщение: Re: Needless additional partition check in INSERT?

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: POC: GROUP BY optimization

Предыдущее

Следующее