Re: POC, WIP: OR-clause support for indexes

Поиск
Список
Период
Сортировка
От Alena Rybakina
Тема Re: POC, WIP: OR-clause support for indexes
Дата
Msg-id e58a1ad1-1da1-4228-ab86-aa588c03756d@postgrespro.ru
обсуждение исходный текст
Ответ на Re: POC, WIP: OR-clause support for indexes  (Peter Geoghegan <pg@bowt.ie>)
Ответы Re: POC, WIP: OR-clause support for indexes
Список pgsql-hackers
On 24.06.2024 18:28, Robert Haas wrote:
On Fri, Jun 21, 2024 at 6:52 PM Alena Rybakina
<a.rybakina@postgrespro.ru> wrote:
It's hard to tell, but I think it might be one of the good places to apply transformation. Let me describe a brief conclusion on the two approaches.
This explanation is somewhat difficult for me to follow. For example:

In the first approach, we definitely did not process the extra "OR" expressions in the first approach, since they were packaged as an Array. It could lead to the fact that less planning time would be spent on the optimizer.
I don't know what the "first approach" refers to, or what processing
the extra "OR" expressions means, or what it would mean to package OR
expressions as an array. If you made them into an SAOP then you'd have
an array *instead of* OR expressions, not OR expressions "packaged as
an array" but even then, they'd still be processed somewhere, unless
the patch was just wrong.

I think you should try writing this summary again and see if you can
make it a lot clearer and more precise.

I'm suspicious based that we should actually be postponing the
transformation even further. If, for example, the transformation is
advantageous for index scans and disadvantageous for bitmap scans, or
the other way around, then this approach can't help much: it either
does the transformation and all scan types are affected, or it doesn't
do it and no scan types are affected. But if you decided for each scan
whether to transform the quals, then you could handle that. Against
that, there might be increased planning cost. But, perhaps that could
be avoided somehow.

Sorry, you are right and I'll try to explain more precisely. The first approach is the first part of the patch, where we made "Or" expressions into an SAOP at an early stage of plan generation [0], the second one was the one proposed by A.Korotkov [1].

So, when we made "OR" expressions into an SAOP at the post-parsing stage of the plan generation [0], we definitely did not process the redundant expressions "OR" expressions there (for example, duplicates), since they were transformed to SAOP expression. Furthermore, the list of OR expressions can be significantly reduced, since constants belonging to the same predicate will already be converted into an SAOP expression. I assume this may reduce planning time, as I know several places in the optimizer where these lists of "OR" expressions are scanned several times.
Also the selectivity for SAOP expressions is estimated better, which could lead to the generation of a more optimal plan, but, to be honest, this is just an observation from changes in regression tests and, in general, how the process of calculating the selectivity of a complex expression works. And I think it needs further consideration. SELECT * FROM check_estimated_rows('SELECT * FROM functional_dependencies WHERE (a = 1 OR a = 51) AND b = ''1'''); estimated | actual -----------+-------- - 99 | 100 + 100 | 100 (1 row) SELECT * FROM check_estimated_rows('SELECT * FROM functional_dependencies WHERE (a = 1 OR a = 51) AND (b = ''1'' OR b = ''2'')'); estimated | actual -----------+-------- - 99 | 100 + 100 | 100 (1 row) SELECT * FROM check_estimated_rows('SELECT * FROM functional_dependencies WHERE (a = 1 OR a = 2 OR a = 51 OR a = 52) AND (b = ''1'' OR b = ''2'')'); estimated | actual -----------+-------- - 197 | 200 + 200 | 200 Speaking of the main disadvantages, we do not give the optimizer the opportunity to generate a plan using BitmapScan, which can lead to the generation of a suboptimal plan, but in the current approach the same thing happens [2]. And you mentioned about it before:

On 24.06.2024 18:28, Robert Haas wrote:
I'm suspicious based that we should actually be postponing the
transformation even further. If, for example, the transformation is
advantageous for index scans and disadvantageous for bitmap scans, or
the other way around, then this approach can't help much: it either
does the transformation and all scan types are affected, or it doesn't
do it and no scan types are affected. But if you decided for each scan
whether to transform the quals, then you could handle that. Against
that, there might be increased planning cost. But, perhaps that could
be avoided somehow.

Andrei mentioned the problem, which might be caused by the transformation on the later stage of optimization is updating references to expressions in RestrictInfo [3] lists, because they can be used in different parts during the formation of the query plan. As the practice of Self-join removal [4] has shown, this can be expensive, but feasible. By applying the transformation at the analysis stage [0], because no links were created, so we did not encounter such problems, so this approach was more suitable than the others.

If some things were not clear enough, let me know.

[0] https://www.postgresql.org/message-id/attachment/156971/v21-0001-Transform-OR-clauses-to-ANY-expression.patch [1] https://www.postgresql.org/message-id/CAPpHfduah1PLzajBJFDmp7%2BMZuaWYpie2p%2BGsV0r03fcGghQ-g%40mail.gmail.com [2] https://www.postgresql.org/message-id/7d5aed92-d4cc-4b76-8ae0-051d182c9eec%40postgrespro.ru [3] https://www.postgresql.org/message-id/6850c306-4e9d-40b7-8096-1f3c7d29cd9e%40postgrespro.ru [4] https://commitfest.postgresql.org/48/5043/

On 24.06.2024 18:28, Robert Haas wrote:
The alternative that should be considered is not combining things if
the types don't match. If we're going to combine such things, we need
to be absolutely certain that type conversion cannot fail.

Peter, Robert, thanks for the detailed discussion, I realized that here you need to look carefully at the patch. In general, it comes out, I need to pay attention and highlight the cases where POLA violation occurs

On 24.06.2024 18:28, Robert Haas wrote:
One reason is that it is extra work to convert things to a name and
then back to an OID. It's got to be slower than using the OID you
already have.

The other reason is that it's error-prone. If somehow the second
lookup doesn't produce the same OID as the first lookup, bad things
will happen, possibly including security vulnerabilities. I see you've
taken steps to avoid that, like nailing down the schema, and that's
good, but it's not a good enough reason to do it like this. If we
don't have a function that can construct the node we need with the OID
rather than the name as an argument, we should invent one, not do this
sort of thing.

I understood. I'll try to fix it.


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Should we document how column DEFAULT expressions work?
Следующее
От: Robert Haas
Дата:
Сообщение: Re: Vacuum ERRORs out considering freezing dead tuples from before OldestXmin