Обсуждение: [RFC][PATCH] Order qual clauses by combined cost and selectivity

Поиск
Список
Период
Сортировка

[RFC][PATCH] Order qual clauses by combined cost and selectivity

От
Staroverov Ilja
Дата:
Hi,

I'd like to propose a change to qual clause ordering in the executor.

Currently, order_qual_clauses() ranks predicates mainly by estimated
per-tuple cost.  For conjunctive predicates under short-circuit
evaluation, this can be suboptimal: a slightly more expensive clause
that rejects many rows may be better to evaluate earlier, because it
can avoid many later evaluations.

The attached patch changes the ranking heuristic to use

    cost / (1 - selectivity)

where selectivity is the fraction of rows that pass the clause.

For ANDed predicates evaluated left-to-right with short-circuit
semantics, this is a natural heuristic: (1 - selectivity) is the
probability that a clause rejects a row, so lower cost per rejected
row should generally be preferred.

The patch preserves the existing constraints:
- security_level ordering is still honored;
- leakproof-related behavior is unchanged;
- original clause order is preserved when ranks are equal.

It also adds a regression test covering:
1. reordering of equal-cost clauses by selectivity,
2. a case where better selectivity outweighs slightly higher cost,
3. a case where high cost still dominates despite better selectivity,
4. preservation of security_level ordering with row-level security,
5. stability when computed ranks are identical.

I also updated regression outputs whose EXPLAIN Filter or Join Filter
clause order changed due to the new ordering.

I also attached a small synthetic SQL benchmark script that I used to
illustrate one case where the new ordering reduces executor work for
user-defined operators.  I'm including it only as an illustration, not
as a claim of universal improvement.

I searched the archives but could not find prior discussion of this
specific approach.  Has incorporating selectivity into qual ordering
been considered before?  If so, I would appreciate pointers to any
earlier discussion or reasons it was not pursued.

Patch attached.

Thanks,
Ilya Staroverov

Вложения

Re: [RFC][PATCH] Order qual clauses by combined cost and selectivity

От
Tom Lane
Дата:
Staroverov Ilja <i.staroverov@ftdata.ru> writes:
> The attached patch changes the ranking heuristic to use
>     cost / (1 - selectivity)
> where selectivity is the fraction of rows that pass the clause.

This (or some close relative) has been proposed before, but we
have been hesitant to do it because our cost metrics for qual
clauses are pretty nearly completely bogus: practically all
the built-in functions are assigned cost 1, even though in
reality they have a wide range of runtimes.  Selectivity isn't
enormously reliable either.  We could easily be taking a qual
order that the user has chosen carefully and stirring it around
more or less at random.

I'm suspicious of the particular form of this expression, too,
because selectivities close to 1 will produce very substantial
effects on the estimate even though there may not be that much
difference in practice, and the selectivity difference may be
mostly sampling error in the first place.  I think you need
a formula that's not very sensitive to small differences, but
this will fail that test.

We had a similar discussion about two years ago concerning a
patch that (IIRC) tried to order sort columns according to
the estimated cost of the comparison functions.  That got
reverted for a few reasons, but one of the big ones was that
the cost comparisons were largely garbage-in-garbage-out.

I think that a prerequisite for any work in this area is to
try to assign more realistic procost estimates to at least
a substantial fraction of the built-in pg_proc entries.
That's going to be tedious and probably contentious, but
it's hard to believe we can make much progress without
better cost data.

            regards, tom lane