Re: POC, WIP: OR-clause support for indexes

Поиск
Список
Период
Сортировка
От Alena Rybakina
Тема Re: POC, WIP: OR-clause support for indexes
Дата
Msg-id bf504287-c3ac-e492-c6ea-31be60b2c92f@yandex.ru
обсуждение исходный текст
Ответ на Re: POC, WIP: OR-clause support for indexes  (Peter Geoghegan <pg@bowt.ie>)
Ответы Re: POC, WIP: OR-clause support for indexes  (Peter Geoghegan <pg@bowt.ie>)
Список pgsql-hackers
On 02.08.2023 21:58, Peter Geoghegan wrote:
> On Wed, Aug 2, 2023 at 8:58 AM Alena Rybakina <lena.ribackina@yandex.ru> wrote:
>> No, I haven't thought about it yet. I studied the example and it would
>> really be nice to add optimization here. I didn't notice any problems
>> with its implementation. I also have an obvious example with the "or"
>> operator, for example
>> , select * from multi_test, where (a, b ) = ( 1, 1 ) or (a, b ) = ( 2, 1
>> ) ...;
>>
>> Although I think such a case will be used less often.
> Right. As I said, I don't particularly care about the row constructor
> syntax -- it's not essential.
>
> In my experience patches like this one that ultimately don't succeed
> usually *don't* have specific problems that cannot be fixed. The real
> problem tends to be ambiguity about the general high level design. So
> more than anything else, ambiguity is the thing that you need to
> minimize to be successful here. This is the #1 practical problem, by
> far. This may be the only thing about your patch that I feel 100% sure
> of.
>
> In my experience it can actually be easier to expand the scope of a
> project, and to come up with a more general solution:
>
> https://en.wikipedia.org/wiki/Inventor%27s_paradox
>
> I'm not trying to make your work more difficult by expanding its
> scope. I'm actually trying to make your work *easier* by expanding its
> scope. I don't claim to know what the specific scope of your patch
> should be at all. Just that it might be possible to get a much clearer
> picture of what the ideal scope really is by *trying* to generalize it
> further -- that understanding is what we lack right now. Even if this
> exercise fails in some way, it won't really have been a failure. The
> reasons why it fails will still be interesting and practically
> relevant.
>
> As I explained to Jim, I am trying to put things in this area on a
> more rigorous footing. For example, I have said that the way that the
> nbtree code executes SAOP quals is equivalent to DNF. That is
> basically true, but it's also my own slightly optimistic
> interpretation of history and of the design. That's a good start, but
> it's not enough on its own.
>
> My interpretation might still be wrong in some subtle way, that I have
> yet to discover. That's really what I'm concerned about with your
> patch, too. I'm currently trying to solve a problem that I don't yet
> fully understand, so for me "getting a properly working flow of
> information" seems like a good practical exercise. I'm trying to
> generalize the design of my own patch as far as I can, to see what
> breaks, and why it breaks. My intuition is that this will help me with
> my own patch by forcing me to gain a truly rigorous understanding of
> the problem.
>
> My suggestion about generalizing your approach to cover RowCompareExpr
> cases is what I would do, if I were you, and this was my patch. That's
> almost exactly what I'm doing with my own patch already, in fact.
It's all right. I understand your position)

I also agree to try to find other optimization cases and generalize them.

I read the wiki article, and as I understand it, in such a situation we 
see a difficult problem with finding expressions that need to be 
converted into a logically correct expression and simplify execution for 
the executor. For example, this is a ROW type. It can have a simpler 
expression with AND and OR operations, besides we can exclude 
duplicates. But some of these transformations may be incorrect or they 
will have a more complex representation. We can try to find the 
problematic expressions and try to combine them into groups and finally 
find a solutions for each groups or, conversely, discover that the 
existing transformation is uncorrected. If I understand correctly, we 
should first start searching for "ROW" expressions (define a group for 
them) and think about a solution for the group.
>> It seems to me the most difficult thing is to notice problematic cases
>> where the transformations are incorrect, but I think it can be implemented.
> Right. To be clear, I am sure that it won't be practical to come up
> with a 100% theoretically pure approach. If for no other reason than
> this: normalizing to CNF in all cases will run into problems with very
> complex predicates. It might even be computationally intractable
> (could just be very slow). So there is a clear need to keep
> theoretical purity in balance with practical considerations. There is
> a need for some kind of negotiation between those two things. Probably
> some set of heuristics will ultimately be required to keep costs and
> benefits in balance.
>
> I don't expect you to determine what set of heuristics will ultimately
> be required to determine when and how to perform CNF conversions, in
> the general case. But having at least some vague idea seems like it
> might increase confidence in your design.
I agree, but I think this will be the second step after solutions are 
found.
> Do you think that this problem is just an accidental side-effect? It
> isn't necessarily the responsibility of your patch to fix such things.
> If it's even possible for selectivity estimates to change, then it's
> already certain that sometimes they'll be worse than before -- if only
> because of chance interactions. The optimizer is often right for the
> wrong reasons, and wrong for the right reasons -- we cannot really
> expect any patch to completely avoid problems like that.
To be honest, I tried to fix it many times by calling the function to 
calculate selectivity, and each time the result of the estimate did not 
change. I didn't have any problems in this part after moving the 
transformation to the parsing stage. I even tried to perform this 
transformation at the planning stage (to the preprocess_qual_conditions 
function), but I ran into the same problem there as well.

To tell the truth, I think I'm ready to investigate this problem again 
(maybe I'll be able to see it differently or really find that I missed 
something in previous times).

-- 
Regards,
Alena Rybakina
Postgres Professional




В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tatsuo Ishii
Дата:
Сообщение: Re: Using defines for protocol characters
Следующее
От: Daniel Gustafsson
Дата:
Сообщение: Re: [PoC] Implementation of distinct in Window Aggregates