Re: POC, WIP: OR-clause support for indexes
От | Peter Geoghegan |
---|---|
Тема | Re: POC, WIP: OR-clause support for indexes |
Дата | |
Msg-id | CAH2-WzkzzDK7FPEv7M5GL_jo0E81DDG41t7wHU9SOAwnqwX=eQ@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: POC, WIP: OR-clause support for indexes (Alena Rybakina <lena.ribackina@yandex.ru>) |
Ответы |
Re: POC, WIP: OR-clause support for indexes
|
Список | pgsql-hackers |
On Tue, Jun 27, 2023 at 6:19 AM Alena Rybakina <lena.ribackina@yandex.ru> wrote: > I learned something new from your letter, thank you very much for that! Cool. The MDAM paper is also worth a read: https://vldb.org/conf/1995/P710.PDF Some of the techniques it describes are already in Postgres. With varying degrees of maturity. The paper actually mentions OR optimization at one point, under "Duplicate Elimination". The general idea is that ScalarArrayOpExpr execution can "eliminate duplicates before the data is read". The important underlying principle is that it can be really useful to give the B-Tree code the context it requires to be clever about stuff like that. We can do this by (say) using one ScalarArrayOpExpr, rather than using two or more index scans that the B-Tree code will treat as independent things. So a lot of the value in your patch comes from the way that it can enable other optimizations (the immediate benefits are also nice). In the past, OR optimizations have been prototyped that were later withdrawn/rejected because the duplicate elimination aspect was...too scary [1]. It's very easy to see that ScalarArrayOpExpr index scans don't really have the same problem. "Giving the B-Tree code the required context" helps here too. > I analyzed the buffer consumption when I ran control regression tests using my patch. diff shows me that there is no differencebetween the number of buffer block scans without and using my patch, as far as I have seen. (regression.diffs) To be clear, I wasn't expecting that there'd be any regressions from your patch. Intuitively, it seems like this optimization should make the query plan do almost the same thing at execution time -- just slightly more efficiently on average, and much more efficiently in some individual cases. It would probably be very hard for the optimizer to model/predict how much work it can save by using a ScalarArrayOpExpr instead of an "equivalent" set of bitmap index scans, OR'd together. But it doesn't necessarily matter -- the only truly critical detail is understanding the worst case for the transformation optimization. It cannot be too bad (maybe it's ~zero added runtime overhead relative to not doing the transformation, even?). At the same time, nbtree can be clever about ScalarArrayOpExpr execution at runtime (once that's implemented), without ever needing to make any kind of up-front commitment to navigating through the index in any particular way. It's all dynamic, and can be driven by the actual observed characteristics of the index structure. In other words, we don't really need to gamble (in the planner, or at execution time). We're just keeping our options open in more cases. (My thinking on these topics was influenced by Goetz Graefe -- "choice is confusion" [2]). [1] https://www.postgresql.org/message-id/flat/1397.1486598083%40sss.pgh.pa.us#310f974a8dc84478d6d3c70f336807bb [2] https://sigmodrecord.org/publications/sigmodRecord/2009/pdfs/05_Profiles_Graefe.pdf -- Peter Geoghegan
В списке pgsql-hackers по дате отправления: