Re: Optimizing nbtree ScalarArrayOp execution, allowing multi-column ordered scans, skip scan

Поиск
Список
Период
Сортировка
От Matthias van de Meent
Тема Re: Optimizing nbtree ScalarArrayOp execution, allowing multi-column ordered scans, skip scan
Дата
Msg-id CAEze2WhRRnqeCjTZ3peSSdYVH5vc=o1V_hwRFqbjjNggpgDx4Q@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Optimizing nbtree ScalarArrayOp execution, allowing multi-column ordered scans, skip scan  (Peter Geoghegan <pg@bowt.ie>)
Ответы Re: Optimizing nbtree ScalarArrayOp execution, allowing multi-column ordered scans, skip scan  (Peter Geoghegan <pg@bowt.ie>)
Список pgsql-hackers
On Wed, 26 Jul 2023 at 15:42, Peter Geoghegan <pg@bowt.ie> wrote:
>
> On Wed, Jul 26, 2023 at 5:29 AM Matthias van de Meent
> > I'm not sure I understand. MDAM seems to work on an index level to
> > return full ranges of values, while "skip scan" seems to try to allow
> > systems to signal to the index to skip to some other index condition
> > based on arbitrary cutoffs. This would usually be those of which the
> > information is not stored in the index, such as "SELECT user_id FROM
> > orders GROUP BY user_id HAVING COUNT(*) > 10", where the scan would go
> > though the user_id index and skip to the next user_id value when it
> > gets enough rows of a matching result (where "enough" is determined
> > above the index AM's plan node, or otherwise is impossible to
> > determine with only the scan key info in the index AM). I'm not sure
> > how this could work without specifically adding skip scan-related
> > index AM functionality, and I don't see how it fits in with this
> > MDAM/SAOP system.
>
> I think of that as being quite a different thing.
>
> Basically, the patch that added that feature had to revise the index
> AM API, in order to support a mode of operation where scans return
> groupings rather than tuples. Whereas this patch requires none of
> that. It makes affected index scans as similar as possible to
> conventional index scans.

Hmm, yes. I see now where my confusion started. You called it out in
your first paragraph of the original mail, too, but that didn't help
me then:

The wiki does not distinguish "Index Skip Scans" and "Loose Index
Scans", but these are not the same.

In the one page on "Loose indexscan", it refers to MySQL's "loose
index scan" documentation, which does handle groupings, and this was
targeted with the previous, mislabeled, "Index skipscan" patchset.
However, crucially, it also refers to other databases' Index Skip Scan
documentation, which document and implement this approach of 'skipping
to the next potential key range to get efficient non-prefix qual
results', giving me a false impression that those two features are one
and the same when they are not.

It seems like I'll have to wait a bit longer for the functionality of
Loose Index Scans.

> > > [...]
> > >
> > > Thoughts?
> >
> > MDAM seems to require exponential storage for "scan key operations"
> > for conditions on N columns (to be precise, the product of the number
> > of distinct conditions on each column); e.g. an index on mytable
> > (a,b,c,d,e,f,g,h) with conditions "a IN (1, 2) AND b IN (1, 2) AND ...
> > AND h IN (1, 2)" would require 2^8 entries.
>
> Note that I haven't actually changed anything about the way that the
> state machine generates new sets of single value predicates -- it's
> still just cycling through each distinct set of array keys in the
> patch.
>
> What you describe is a problem in theory, but I doubt that it's a
> problem in practice. You don't actually have to materialize the
> predicates up-front, or at all.

Yes, that's why I asked: The MDAM paper's examples seem to materialize
the full predicate up-front, which would require a product of all
indexed columns' quals in size, so that materialization has a good
chance to get really, really large. But if we're not doing that
materialization upfront, then there is no issue with resource
consumption (except CPU time, which can likely be improved with other
methods)

Kind regards,

Matthias van de Meent
Neon (https://neon.tech/)



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Kuwamura Masaki
Дата:
Сообщение: Re: Issue in postgres_fdw causing unnecessary wait for cancel request reply
Следующее
От: David Geier
Дата:
Сообщение: Re: Let's make PostgreSQL multi-threaded