Re: POC, WIP: OR-clause support for indexes

Поиск
Список
Период
Сортировка
От Alexander Korotkov
Тема Re: POC, WIP: OR-clause support for indexes
Дата
Msg-id CAPpHfdsAWGF2VvcszNofc32Sa2pefBpgY1nw2aYV=FTG7Q3wyA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: POC, WIP: OR-clause support for indexes  (Andrei Lepikhov <a.lepikhov@postgrespro.ru>)
Ответы Re: POC, WIP: OR-clause support for indexes  (Andrei Lepikhov <a.lepikhov@postgrespro.ru>)
Список pgsql-hackers
Hi Andrei,

Thank you for your response.

On Mon, Mar 11, 2024 at 7:13 AM Andrei Lepikhov
<a.lepikhov@postgrespro.ru> wrote:
> On 7/3/2024 21:51, Alexander Korotkov wrote:
> > Hi!
> >
> > On Tue, Mar 5, 2024 at 9:59 AM Andrei Lepikhov
> > <a.lepikhov@postgrespro.ru <mailto:a.lepikhov@postgrespro.ru>> wrote:
> >  > On 5/3/2024 12:30, Andrei Lepikhov wrote:
> >  > > On 4/3/2024 09:26, jian he wrote:
> >  > ... and the new version of the patchset is attached.
> >
> > I made some revisions for the patchset.
> Great!
> > 1) Use hash_combine() to combine hash values.
> Looks better
> > 2) Upper limit the number of array elements by MAX_SAOP_ARRAY_SIZE.
>
> I'm not convinced about this limit. The initial reason was to combine
> long lists of ORs into the array because such a transformation made at
> an early stage increases efficiency.
> I understand the necessity of this limit in the array decomposition
> routine but not in the creation one.

The comment near MAX_SAOP_ARRAY_SIZE says that this limit is because
N^2 algorithms could be applied to arrays.  Are you sure that's not
true for our case?

> > 3) Better save the original order of clauses by putting hash entries and
> > untransformable clauses to the same list.  A lot of differences in
> > regression tests output have gone.
> I agree that reducing the number of changes in regression tests looks
> better. But to achieve this, you introduced a hack that increases the
> complexity of the code. Is it worth it? Maybe it would be better to make
> one-time changes in tests instead of getting this burden on board. Or
> have you meant something more introducing the node type?

For me the reason is not just a regression test.  The current code
keeps the original order of quals as much as possible.  The OR
transformation code reorders quals even in cases when it doesn't
eventually apply any optimization.  I don't think that's acceptable.
However, less hackery ways for this is welcome for sure.

> > We don't make array values unique.  That might make query execution
> > performance somewhat worse, and also makes selectivity estimation
> > worse.  I suggest Andrei and/or Alena should implement making array
> > values unique.
> The fix Alena has made looks correct. But I urge you to think twice:
> The optimizer doesn't care about duplicates, so why do we do it?
> What's more, this optimization is intended to speed up queries with long
> OR lists. Using the list_append_unique() comparator on such lists could
> impact performance. I suggest sticking to the common rule and leaving
> the responsibility on the user's shoulders.

I don't see why the optimizer doesn't care about duplicates for OR
lists.  As I showed before in an example, it successfully removes the
duplicate.  So, currently OR transformation clearly introduces a
regression in terms of selectivity estimation.  I think we should
evade that.

> At least, we should do this optimization later, in one pass, with
> sorting elements before building the array. But what if we don't have a
> sort operator for the type?

It was probably discussed before, but can we do our work later?  There
is a canonicalize_qual() which calls find_duplicate_ors().  This is
the place where currently duplicate OR clauses are removed.  Could our
OR-to-ANY transformation be just another call from
canonicalize_qual()?

------
Regards,
Alexander Korotkov



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Dagfinn Ilmari Mannsåker
Дата:
Сообщение: Re: Using the %m printf format more
Следующее
От: Amit Kapila
Дата:
Сообщение: Re: Have pg_basebackup write "dbname" in "primary_conninfo"?