Re: POC, WIP: OR-clause support for indexes
От | Peter Geoghegan |
---|---|
Тема | Re: POC, WIP: OR-clause support for indexes |
Дата | |
Msg-id | CAH2-Wz=9N_4+EyhtyFqYQRx4OgVbP+1aoYU2JQPVogCir61ZEQ@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: POC, WIP: OR-clause support for indexes ("Finnerty, Jim" <jfinnert@amazon.com>) |
Список | pgsql-hackers |
Jim, On Tue, Aug 1, 2023 at 1:11 PM Finnerty, Jim <jfinnert@amazon.com> wrote: > Peter, I'm very glad to hear that you're researching this! Glad to hear it! > Will this include skip-scan optimizations for OR or IN predicates, or when the number of distinct values in a leading non-constantindex column(s) is sufficiently small? Yes -- though perhaps not in the first iteration. As I go into on the thread associated with my own patch [1], my initial goal is to support efficient execution of multiple IN() lists for multiple columns from the same index, all while preserving index sort order on output, and avoiding a separate duplicate elimination step. Some of the most compelling cases for these MDAM techniques involve GroupAggregates, ORDER BY ... LIMIT, and DISTINCT -- I understand the importance of making the index scan appear to be a conventional index scan to the optimizer. > If you have an ORDER BY clause and a lower and upper bound on the first column of the ORDER BY list, you have a potentialto reduce search effort versus a full index scan, even when that upper and lower bound needs to be derived froma complex predicate. It sounds like your example is an attempt to ascertain whether or not my design considers the need to convert complicated predicates into disjuncts that can be executed as if by one single index scan, via CNF -> DNF transformations/preprocessing. That is certainly the plan, at least medium term -- I fully expect to be able to combine all of these techniques together, in ways that continue to work even with very complicated predicates. Like the really hairy example from the MDAM paper, or like your example. There are already some nbtree scan key preprocessing steps a little like the ones considered by the MDAM paper. These steps eliminate redundant and contradictory quals -- but they weren't specifically written with the very general MDAM DNF design requirements in mind. Plus there are already at least some transformations like the one that Alena is working on in the patch discussed on this thread -- these were also not written with MDAM stuff in mind. A major goal of mine for this project in the short term is to come up with a very general design. I must reconcile all this stuff, somehow or other, so that these very complicated cases will work just as well as simpler and more obvious cases. I really hate special cases. > Of course, if you have an IN list you can either skip to the distinct values listed or scan the entire index, dependingon estimated cost. Actually, I think that it should be possible to decide on how to skip dynamically, without needing an up-front decision around skipping from the optimizer. In other words, the scans can skip using an adaptive strategy. This is feasible provided I can make the overhead of a dynamic/adaptive approach negligible. When it turns out that a full index scan is appropriate, we'll just end up doing it that way at runtime. Nothing stops a given scan from needing to do skip a great deal in the first half of an index, while scanning everything in the second half of the index. Obviously, a static choice won't do well there, since it works at the level of the whole scan/index, which seems like the wrong framing to me. (Of course we'll still need to model skipping stuff in the planner -- just not so that we can decide between two index paths that are essentially identical, that should just be one index path.) [1] https://postgr.es/m/CAH2-Wz=ksvN_sjcnD1+Bt-WtifRA5ok48aDYnq3pkKhxgMQpcw@mail.gmail.com -- Peter Geoghegan
В списке pgsql-hackers по дате отправления:
Предыдущее
От: Daniel GustafssonДата:
Сообщение: Re: bug: ANALYZE progress report with inheritance tables