Re: POC, WIP: OR-clause support for indexes
От | Alena Rybakina |
---|---|
Тема | Re: POC, WIP: OR-clause support for indexes |
Дата | |
Msg-id | bf504287-c3ac-e492-c6ea-31be60b2c92f@yandex.ru обсуждение исходный текст |
Ответ на | Re: POC, WIP: OR-clause support for indexes (Peter Geoghegan <pg@bowt.ie>) |
Ответы |
Re: POC, WIP: OR-clause support for indexes
(Peter Geoghegan <pg@bowt.ie>)
|
Список | pgsql-hackers |
On 02.08.2023 21:58, Peter Geoghegan wrote: > On Wed, Aug 2, 2023 at 8:58 AM Alena Rybakina <lena.ribackina@yandex.ru> wrote: >> No, I haven't thought about it yet. I studied the example and it would >> really be nice to add optimization here. I didn't notice any problems >> with its implementation. I also have an obvious example with the "or" >> operator, for example >> , select * from multi_test, where (a, b ) = ( 1, 1 ) or (a, b ) = ( 2, 1 >> ) ...; >> >> Although I think such a case will be used less often. > Right. As I said, I don't particularly care about the row constructor > syntax -- it's not essential. > > In my experience patches like this one that ultimately don't succeed > usually *don't* have specific problems that cannot be fixed. The real > problem tends to be ambiguity about the general high level design. So > more than anything else, ambiguity is the thing that you need to > minimize to be successful here. This is the #1 practical problem, by > far. This may be the only thing about your patch that I feel 100% sure > of. > > In my experience it can actually be easier to expand the scope of a > project, and to come up with a more general solution: > > https://en.wikipedia.org/wiki/Inventor%27s_paradox > > I'm not trying to make your work more difficult by expanding its > scope. I'm actually trying to make your work *easier* by expanding its > scope. I don't claim to know what the specific scope of your patch > should be at all. Just that it might be possible to get a much clearer > picture of what the ideal scope really is by *trying* to generalize it > further -- that understanding is what we lack right now. Even if this > exercise fails in some way, it won't really have been a failure. The > reasons why it fails will still be interesting and practically > relevant. > > As I explained to Jim, I am trying to put things in this area on a > more rigorous footing. For example, I have said that the way that the > nbtree code executes SAOP quals is equivalent to DNF. That is > basically true, but it's also my own slightly optimistic > interpretation of history and of the design. That's a good start, but > it's not enough on its own. > > My interpretation might still be wrong in some subtle way, that I have > yet to discover. That's really what I'm concerned about with your > patch, too. I'm currently trying to solve a problem that I don't yet > fully understand, so for me "getting a properly working flow of > information" seems like a good practical exercise. I'm trying to > generalize the design of my own patch as far as I can, to see what > breaks, and why it breaks. My intuition is that this will help me with > my own patch by forcing me to gain a truly rigorous understanding of > the problem. > > My suggestion about generalizing your approach to cover RowCompareExpr > cases is what I would do, if I were you, and this was my patch. That's > almost exactly what I'm doing with my own patch already, in fact. It's all right. I understand your position) I also agree to try to find other optimization cases and generalize them. I read the wiki article, and as I understand it, in such a situation we see a difficult problem with finding expressions that need to be converted into a logically correct expression and simplify execution for the executor. For example, this is a ROW type. It can have a simpler expression with AND and OR operations, besides we can exclude duplicates. But some of these transformations may be incorrect or they will have a more complex representation. We can try to find the problematic expressions and try to combine them into groups and finally find a solutions for each groups or, conversely, discover that the existing transformation is uncorrected. If I understand correctly, we should first start searching for "ROW" expressions (define a group for them) and think about a solution for the group. >> It seems to me the most difficult thing is to notice problematic cases >> where the transformations are incorrect, but I think it can be implemented. > Right. To be clear, I am sure that it won't be practical to come up > with a 100% theoretically pure approach. If for no other reason than > this: normalizing to CNF in all cases will run into problems with very > complex predicates. It might even be computationally intractable > (could just be very slow). So there is a clear need to keep > theoretical purity in balance with practical considerations. There is > a need for some kind of negotiation between those two things. Probably > some set of heuristics will ultimately be required to keep costs and > benefits in balance. > > I don't expect you to determine what set of heuristics will ultimately > be required to determine when and how to perform CNF conversions, in > the general case. But having at least some vague idea seems like it > might increase confidence in your design. I agree, but I think this will be the second step after solutions are found. > Do you think that this problem is just an accidental side-effect? It > isn't necessarily the responsibility of your patch to fix such things. > If it's even possible for selectivity estimates to change, then it's > already certain that sometimes they'll be worse than before -- if only > because of chance interactions. The optimizer is often right for the > wrong reasons, and wrong for the right reasons -- we cannot really > expect any patch to completely avoid problems like that. To be honest, I tried to fix it many times by calling the function to calculate selectivity, and each time the result of the estimate did not change. I didn't have any problems in this part after moving the transformation to the parsing stage. I even tried to perform this transformation at the planning stage (to the preprocess_qual_conditions function), but I ran into the same problem there as well. To tell the truth, I think I'm ready to investigate this problem again (maybe I'll be able to see it differently or really find that I missed something in previous times). -- Regards, Alena Rybakina Postgres Professional
В списке pgsql-hackers по дате отправления:
Следующее
От: Daniel GustafssonДата:
Сообщение: Re: [PoC] Implementation of distinct in Window Aggregates