Re: Expand applicability of aggregate's sortop optimization

Поиск
Список
Период
Сортировка
От Matthias van de Meent
Тема Re: Expand applicability of aggregate's sortop optimization
Дата
Msg-id CAEze2Wj44i8eiLFCpv2wW+u_3af-wGcWnA1iuPaMVy9LOmSKFw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Expand applicability of aggregate's sortop optimization  (Andrei Lepikhov <lepihov@gmail.com>)
Ответы Re: Expand applicability of aggregate's sortop optimization
Re: Expand applicability of aggregate's sortop optimization
Список pgsql-hackers
On Wed, 17 Jul 2024 at 05:29, Andrei Lepikhov <lepihov@gmail.com> wrote:
>
> On 5/8/24 17:13, Matthias van de Meent wrote:
> > As you may know, aggregates like SELECT MIN(unique1) FROM tenk1; are
> > rewritten as SELECT unique1 FROM tenk1 ORDER BY unique1 USING < LIMIT
> > 1; by using the optional sortop field in the aggregator.
> > However, this optimization is disabled for clauses that in itself have
> > an ORDER BY clause such as `MIN(unique1 ORDER BY <anything>), because
> > <anything> can cause reordering of distinguisable values like 1.0 and
> > 1.00, which then causes measurable differences in the output. In the
> > general case, that's a good reason to not apply this optimization, but
> > in some cases, we could still apply the index optimization.
>
> Thanks for the job! I guess you could be more brave and push down also
> FILTER statement.

While probably not impossible, I wasn't planning on changing this code
with new optimizations; just expanding the applicability of the
current optimizations.

Note that the "aggfilter" clause was not new, but moved up in the code
to make sure we use this local information to bail out (if applicable)
before trying to use the catalogs for bail-out information.

> >
> > One of those cases is fixed in the attached patch: if we order by the
> > same column that we're aggregating, using the same order class as the
> > aggregate's sort operator (i.e. the aggregate's sortop is in the same
> > btree opclass as the ORDER BY's sort operator), then we can still use
> > the index operation: The sort behaviour isn't changed, thus we can
> > apply the optimization.
> >
> > PFA the small patch that implements this.
> >
> > Note that we can't blindly accept just any ordering by the same
> > column: If we had an opclass that sorted numeric values by the length
> > of the significant/mantissa, then that'd provide a different (and
> > distinct) ordering from that which is expected by the normal
> > min()/max() aggregates for numeric, which could cause us to return
> > arguably incorrect results for the aggregate expression.
>
> As I see, the code:
> aggsortop = fetch_agg_sort_op(aggref->aggfnoid);
> if (!OidIsValid(aggsortop))
>    return false;                /* not a MIN/MAX aggregate */
>
> used twice and can be evaluated earlier to avoid duplicated code.

The code is structured like this to make sure we only start accessing
catalogs once we know that all other reasons to bail out from this
optimization indicate we can apply the opimization. You'll notice that
I've tried to put the cheapest checks that only use caller-supplied
information first, and catalog accesses only after that.

If the fetch_agg_sort_op clause would be deduplicated, it would either
increase code complexity to handle both aggref->aggorder paths, or it
would increase the cost of planning MAX(a ORDER BY b) because of the
newly added catalog access.

> Also, I'm unsure about the necessity of looking through the btree
> classes. Maybe just to check the commutator to the sortop, like in the
> diff attached? Or could you provide an example to support your approach?

I think it could work, but I'd be hesitant to rely on that, as
commutator registration is optional (useful, but never required for
btree operator classes' operators). Looking at the btree operator
class, which is the definition of sortability in PostgreSQL, seems
more suitable and correct.

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Anthonin Bonnefoy
Дата:
Сообщение: Re: query_id, pg_stat_activity, extended query protocol
Следующее
От: Amit Kapila
Дата:
Сообщение: Re: speed up a logical replica setup