Re: POC, WIP: OR-clause support for indexes

Поиск
Список
Период
Сортировка
От Peter Geoghegan
Тема Re: POC, WIP: OR-clause support for indexes
Дата
Msg-id CAH2-WzkNaPbBJ_yWDXzk5ESJZH62vi=+brHHQ2SrDrPXka1xYA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: POC, WIP: OR-clause support for indexes  (Alena Rybakina <lena.ribackina@yandex.ru>)
Ответы Re: POC, WIP: OR-clause support for indexes  ("Finnerty, Jim" <jfinnert@amazon.com>)
Re: POC, WIP: OR-clause support for indexes  (Alena Rybakina <lena.ribackina@yandex.ru>)
Список pgsql-hackers
On Mon, Jul 31, 2023 at 9:38 AM Alena Rybakina <lena.ribackina@yandex.ru> wrote:
> I noticed only one thing there: when we have unsorted array values in
> SOAP, the query takes longer than
> when it has a sorted array. I'll double-check it just in case and write
> about the results later.

I would expect the B-Tree preprocessing by _bt_preprocess_array_keys()
to be very slightly faster when the query is written with presorted,
duplicate-free constants. Sorting is faster when you don't really have
to sort. However, I would not expect the effect to be significant
enough to matter, except perhaps in very extreme cases.
Although...some of the cases you care about are very extreme cases.

> I am also testing some experience with multi-column indexes using SAOPs.

Have you thought about a similar transformation for when the row
constructor syntax happens to have been used?

Consider a query like the following, against a table with a composite
index on (a, b):

select * from multi_test where ( a, b ) in (( 1, 1 ), ( 2, 1 ));

This query will get a BitmapOr based plan that's similar to the plans
that OR-based queries affected by your transformation patch get today,
on HEAD.  However, this equivalent spelling has the potential to be
significantly faster:

select * from multi_test where a = any('{1,2}') and b = 1;

(Of course, this is more likely to be true with my nbtree SAOP patch in place.)

Note that we currently won't use RowCompareExpr in many simple cases
where the row constructor syntax has been used. For example, a query
like this:

select * from multi_test where ( a, b ) = (( 2, 1 ));

This case already involves a transformation that is roughly comparable
to the one you're working on now. We'll remove the RowCompareExpr
during parsing. It'll be as if my example row constructor equality
query was written this way instead:

select * from multi_test where a = 2 and b = 1;

This can be surprisingly important, when combined with other things,
in more realistic examples.

The nbtree code has special knowledge of RowCompareExpr that makes the
rules for comparing index tuples different to those from other kinds
of index scans. However, due to the RowCompareExpr transformation
process I just described, we don't need to rely on that specialized
nbtree code when the row constructor syntax is used with a simple
equality clause -- which is what makes the normalization process have
real value. If the nbtree code cannot see RowCompareExpr index quals
then it cannot have this problem in the first place. In general it is
useful to "normalize to conjunctive normal form" when it might allow
scan key preprocessing in the nbtree code to come up with a much
faster approach to executing the scan.

It's easier to understand what I mean by showing a simple example. The
nbtree preprocessing code is smart enough to recognize that the
following query doesn't really need to do any work, due to having
quals that it recognizes as contradictory (it can set so->qual_okay to
false for unsatisfiable quals):

select * from multi_test where ( a, b ) = (( 2, 1 )) and a = -1;

However, it is not smart enough to perform the same trick if we change
one small detail with the query:

select * from multi_test where ( a, b ) >= (( 2, 1 )) and a = -1;

Ideally, the optimizer would canonicalize/normalize everything in a
way that made all of the nbtree preprocessing optimizations work just
as well, without introducing any new special cases. Obviously, there
is no reason why we can't perform the same trick with the second
variant. (Note also that the nbtree preprocessing code can be smart
about redundant quals, not just contradictory quals, so it matters
more than it may appear from this simple, unrealistic example of
mine.)

While these similar RowCompareExpr transformations are at least
somewhat important, that's not really why I bring them up now. I am
pointing them out now because I think that it might help you to
develop a more complete mental model of these transformations.
Ideally, your initial approach will generalize to other situations
later on. So it's worth considering the relationship between this
existing RowCompareExpr transformation, and the one that you're
working on currently. Plus other, future transformations.

This example might also give you some appreciation of why my SAOP
patch is confused about when we need to do normalization/safety
checks. Some things seem necessary when generating index paths in the
optimizer. Other things seem necessary during preprocessing, in the
nbtree code, at the start of the index scan. Unfortunately, it's not
obvious to me where the right place is to deal with each aspect of
setting up multi-column SAOP index quals. My mental model is very
incomplete.

--
Peter Geoghegan



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Dean Rasheed
Дата:
Сообщение: Re: Performance degradation on concurrent COPY into a single relation in PG16.
Следующее
От: Andrew Dunstan
Дата:
Сообщение: Re: How to build a new grammer for pg?