Re: POC, WIP: OR-clause support for indexes

Поиск
Список
Период
Сортировка
От Alena Rybakina
Тема Re: POC, WIP: OR-clause support for indexes
Дата
Msg-id 08f5ff34-1497-2123-8701-461a299035c9@yandex.ru
обсуждение исходный текст
Ответ на Re: POC, WIP: OR-clause support for indexes  (Peter Geoghegan <pg@bowt.ie>)
Ответы Re: POC, WIP: OR-clause support for indexes  (Peter Geoghegan <pg@bowt.ie>)
Список pgsql-hackers
Hi!

On 26.07.2023 02:47, Peter Geoghegan wrote:
> On Thu, Jun 29, 2023 at 2:32 AM Alena Rybakina <lena.ribackina@yandex.ru> wrote:
>> Hi! I'm sorry I didn't answer you right away, I was too busy with work.
> Same for me, this time. I was busy working on my patch, which I
> finally posted yesterday.
I'm glad to hear it, I've seen your thread ("Optimizing nbtree 
ScalarArrayOp execution, allowing multi-column ordered scans, skip 
scan"), but, unfortunately, I didn't have enough time to read it. I'll 
review it soon!
>> To be honest, I didn't think about the fact that my optimization can help eliminate duplicates before reading the
databefore.
 
> I'm not surprised that you didn't specifically think of that, because
> it's very subtle.
>
>> I am still only in the process of familiarizing myself with the thread [1] (reference from your letter), but I have
alreadyseen that there are problems related, for example, to when "or" expressions refer to the parent element.
 
> I didn't intend to imply that you might have the same problem here. I
> just meant that OR optimizations can have problems with duplicate
> elimination, in general. I suspect that your patch doesn't have that
> problem, because you are performing a transformation of one kind of OR
> into another kind of OR.

Yes, you are right, but I studied this topic and two other sources to 
accumulate my knowledge. It was an exciting experience for me)

I was especially inspired after studying the interview with Goetz Graf 
[2], his life experience is the most inspiring, and from this article I 
was able to get a broad understanding of the field of databases:
current problems, future development, how it works... Thank you for the 
recommendation.

I discovered for myself that the idea described in the article [1] is 
similar to the idea of representing grouped data in OLAP cubes, and 
also, if I saw correctly, an algorithm like depth-first search is used 
there, but for indexes.

I think it really helps to speed up the search with similar deep 
filtering compared to cluster indexes, but do we have cases where we 
don't use this algorithm because it takes longer than an usual index?
I thought about the situation with wide indexes (with a lot of multiple 
columns) and having a lot of filtering predicates for them.
But I'm not sure about this, so it seems to me that this is a problem of 
improper use of indexes rather.
>> I think, I would face the similar problems if I complicate the current code, for example, so that not only or
expressionsstanding on the same level are written in any, but also on different ones without violating the logic of the
priorityof executing operators.
 
> I can't say that I am particularly experienced in this general area --
> I have never tried to formally reason about how two different
> statements are equivalent. It just hasn't been something that I've
> needed to have a totally rigorous understanding of up until now. But
> my recent patch changes that. Now I need to study this area to make
> sure that I have a truly rigorous understanding.
>
> Jeff Davis suggested that I read "Logic and Databases", by C.J. Date.
> So now I have some homework to do.
I'll read this book too. Maybe I can finish work with the knowledge I 
got from there. Thank you for sharing!
>> Unfortunately, when I tried to make a transformation at the stage of index formation, I encountered too incorrect an
assessmentof the selectivity of relation, which affected the incorrect calculation of the cost and cardinality.
 
> It's not surprising that a weird shift in the plan chosen by the
> optimizer is seen with some random test case, as a result of this
> added transformation. Even changes that are 100% strictly better (e.g.
> changes in a selectivity estimation function that is somehow
> guaranteed to be more accurate in all cases) might do that. Here is a
> recent example of that with another patch, involving a bitmap OR:
>
> https://postgr.es/m/CAH2-WznCDK9n2tZ6j_-iLN563_ePuC3NzP6VSVTL6jHzs6nRuQ@mail.gmail.com
At first, this surprised me very much. It took time to find a suitable 
place to implement the transformation.

I have looked through this thread many times, I will study it in more 
detail .
> This example was *not* a regression, if you go by conventional
> measures. It was merely a less robust plan than the bitmap OR plan,
> because it didn't pass down both columns as index quals.
>
> BTW, there are various restrictions on the sort order of SAOPs that
> you might want to try to familiarize yourself with. I describe them
> (perhaps not very clearly) here:
>
> https://postgr.es/m/CAH2-Wz=ksvN_sjcnD1+Bt-WtifRA5ok48aDYnq3pkKhxgMQpcw@mail.gmail.com
Thank you! Yes, I'll study it too)
> Currently, the optimizer doesn't recognize multi-column indexes with
> SAOPs on every column as having a valid sort order, except on the
> first column. It seems possible that that has consequences for your
> patch. (I'm really only guessing, though; don't trust anything that I
> say about the optimizer too much.)
>
Honestly, I couldn't understand your concerns very well, could you 
describe it in more detail?

1. https://vldb.org/conf/1995/P710.PDF

2. 
https://sigmodrecord.org/publications/sigmodRecord/2009/pdfs/05_Profiles_Graefe.pdf

-- 
Regards,
Alena Rybakina
Postgres Professional




В списке pgsql-hackers по дате отправления:

Предыдущее
От: Nathan Bossart
Дата:
Сообщение: Re: psql: Could we get "-- " prefixing on the **** QUERY **** outputs? (ECHO_HIDDEN)
Следующее
От: Gurjeet Singh
Дата:
Сообщение: Re: There should be a way to use the force flag when restoring databases