RE: MDAM techniques and Index Skip Scan patch

Поиск
Список
Период
Сортировка
От Floris Van Nee
Тема RE: MDAM techniques and Index Skip Scan patch
Дата
Msg-id 2f26e1e475364ab7b23f466e4a4f4392@opammb0562.comp.optiver.com
обсуждение исходный текст
Ответ на MDAM techniques and Index Skip Scan patch  (Peter Geoghegan <pg@bowt.ie>)
Ответы Re: MDAM techniques and Index Skip Scan patch  (Julien Rouhaud <rjuju123@gmail.com>)
Список pgsql-hackers
Great to see some interest in the skip scan patch series again!

> Like many difficult patches, the skip scan patch is not so much troubled by
> problems with the implementation as it is troubled by
> *ambiguity* about the design. Particularly concerning how skip scan meshes
> with existing designs, as well as future designs -- particularly designs for
> other MDAM techniques. I've started this thread to have a big picture
> conversation about how to think about these things. Many other MDAM
> techniques also seem highly appealing.

I think it is good to have this discussion. In my opinion, Postgres could make really good use of some of the described
MDAMtechniques.
 

> Much of the MDAM stuff is for data warehousing use-cases, while skip
> scan/loose index scan is seen as more of an OLTP thing. But they are still
> related, clearly.

FWIW I think skip scan is very much data warehousing use-case related - hence why the TimescaleDb people in your [2]
referenceimplemented a simple form of it already for their extension. Skip scan is a really useful feature for large
datasets. However, I agree it is only one part of the bigger MDAM picture.
 

> 
> My general concern is that the skip scan patch may currently be structured in
> a way that paints us into a corner, MDAM-wise.
> 

One of the concerns I raised before was that the patch may be thinking too simplistically on some things, that would
makeit difficult to adopt more complex optimizations in the future. One concrete example can be illustrated by a
differentquery on the sales table of the paper's example:
 

SELECT DISTINCT dept, date WHERE item_class = 100

This should skip with prefix of (dept, date). Suppose we're at (dept, date) = (1, 2021-01-01) and it's skipping to the
nextprefix, the patch just implements what the MDAM paper describes as the 'probing' step. It finds the beginning of
thenext prefix. This could be for example (dept, date, item_class) = (1, 2021-01-02, 1). From there onwards, it would
justscan the index until it finds item_class=100. What it should do however, is to first 'probe' for the next prefix
valueand then skip directly to (1, 2021-01-02, 100) (skipping item_class 1-99 altogether). The problem if it doesn't
supportthis is that skip scan could have a quite unpredictable performance, because sometimes it'll end up going
throughmost of the index where it should be skipping.
 

A while ago, I spent quite some time trying to come up with an implementation that would work in this more general
case.The nice thing is that with such a more generic implementation, you get almost all the features from the MDAM
paperalmost for free. I focused on the executor code, not on the planner code - the planner code is for the DISTINCT
skippart very similar to the original patch and I hacked in a way to make it choose a 'skip scan' also for non-DISTINCT
queriesfor testing purposes. For this discussion about MDAM, the planner part is less relevant though. There's still a
lotof discussion and work on the planner-side too, but I think that is completely independent from each other.
 

The more generic patch I originally posted in [1], together with some technical considerations. That was quite a while
agoso it obviously doesn't apply anymore on master. Therefore, I've attached a rebased version. Unfortunately, it's
stillbased on an older version of the UniqueKeys patch - but since that patch is all planner machinery as well, it
doesn'tmatter so much for the discussion about the executor code either.
 

I believe if we want something that fits better with future MDAM use cases, we should take a closer look at the
executorcode of this patch to drive this discussion. The logic is definitely more complex than the original patch,
howeverI believe it is also more flexible and future-proof.
 

> 
> Another more concrete concern about the patch series comes from the
> backwards scan stuff. This is added by a later patch in the patch series, "v39-
> 0004-Extend-amskip-implementation-for-Btree.patch". It strikes me as a bad
> thing that we cannot just do leaf-page-at-a-time processing, without usually
> needing to hold a pin on the leaf page.
> After all, ordinary backwards scans manage to avoid that today, albeit by
> using trickery inside _bt_walk_left(). MDAM-style "Maintenance of Index
> Order" (as described on page 8) seems like a good goal for us here. I don't
> like the idea of doing ad-hoc duplicate TID elimination inside nbtree, across
> calls made from the executor (whether it's during backwards skip scans, or at
> any other time). Not because it seems to go against the approach taken by
> the MDAM paper (though it does); just because it seems kludgy. (I think that
> Tom felt the same way about the TID deduplication stuff in his own patch
> back in 2018,
> too.)

It's good to mention that the patch I attached does proper 'leaf-page-at-a-time' processing, so it avoids the problem
youdescribe with v39. It is implemented instead in the same way as a "regular" index scan - we process the full leaf
pageand store the matched tuples in the local state. If a DISTINCT scan wants to do a skip, we check our local state
firstif that skipping would be possible with the matched tuples from the current page. Then we avoid double work and
alsothe need to look at the same page again.
 

> Open question: What does all of this MDAM business mean for
> ScalarArrayOpExpr, if anything?
> 

This is a really interesting combination actually. I think, ideally, you'd probably get rid of it and provide full
supportfor that with the 'skip' based approach (essentially the ScalarArrayOpExpr seems to do some form of skipping
already- it transforms x IN (1,2,3) into 3 separate index scans for x).
 
However, even without doing any work on it, it actually interacts nicely with the skip based approach.

As an example, here's some queries based on the 'sales' table of the paper with some data in it (18M rows total, see
sales_query.sqlattachment for the full example):
 

-- terminology from paper: "intervening range predicate"
select date, sum(total_sales)
from sales
where dept between 2 and 3 and date between '2021-01-05' and '2021-01-10' and item_class=20 and store=250
group by dept, date
;
Patch: Execution Time: 0.368 ms
Master: Execution Time: 416.792 ms

-- terminology from paper: "missing key predicate"
select date, sum(total_sales)
from sales
where date between '2021-01-05' and '2021-01-10' and item_class=20 and store=250
group by dept, date
;
Patch: Execution Time: 0.667 ms
Master: Execution Time: 654.684 ms

-- terminology from paper: "IN lists" 
-- this is similar to the query from your example with an IN list
-- in the current patch, this query is done with a skip scan with skip prefix (dept, date) and then the ScalarOpArray
foritem_class=(20,30,50)
 
select date, sum(total_sales)
from sales
where date between '2021-01-05' and '2021-01-10' and item_class in (20, 30, 50) and store=250
group by dept, date
;
Patch: Execution Time: 1.767 ms
Master: Execution Time: 629.792 ms

The other mentioned MDAM optimizations in the paper (NOT =, general OR) are not implemented but don't seem to be
conflictingat all with the current implementation - they seem to be just a matter of transforming the expressions into
theright form.
 

From the patch series above, v9-0001/v9-0002 is the UniqueKeys patch series, which focuses on the planner. v2-0001 is
Dmitry'spatch that extends it to make it possible to use UniqueKeys for the skip scan. Both of these are unfortunately
stillolder patches, but because they are planner machinery they are less relevant to the discussion about the executor
here.
Patch v2-0002 contains most of my work and introduces all the executor logic for the skip scan and hooks up the planner
forDISTINCT queries to use the skip scan.
 
Patch v2-0003 is a planner hack that makes the planner pick a skip scan on virtually every possibility. This also
enablesthe skip scan on the queries above that don't have a DISTINCT but where it could be useful.
 


-Floris

[1] https://www.postgresql.org/message-id/c5c5c974714a47f1b226c857699e8680@opammb0561.comp.optiver.com




Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Mikhail
Дата:
Сообщение: Re: [PATCH] Make ENOSPC not fatal in semaphore creation
Следующее
От: Jeff Davis
Дата:
Сообщение: Allow pg_signal_backend members to use pg_log_backend_memory_stats().