Re: Support tid range scan in parallel?

Поиск

Список

Период

Сортировка

От	David Rowley
Тема	Re: Support tid range scan in parallel?
Дата	29 июля 08:11:19
Msg-id	CAApHDvoyxYkH5eg_RVxfz+mVt+eWgHvB25xy-z=Xs3nLhU+RLQ@mail.gmail.com обсуждение исходный текст
Ответ на	Re: Support tid range scan in parallel? (Cary Huang <cary.huang@highgo.ca>)
Список	pgsql-hackers

Дерево обсуждения

On Fri, 25 Jul 2025 at 06:46, Cary Huang <cary.huang@highgo.ca> wrote:
> Thank you so much for the review! I have addressed the comments in the
> attached v7 patch.

I've now spent quite a bit of time going over this patch and testing
it.  One issue I found was in heap_set_tidrange() where you were not
correctly setting the scan limits for the "if
(ItemPointerCompare(&highestItem, &lowestItem) < 0)" case. Through a
bit of manually overwriting the planner's choice using the debugger, I
could get the executor to read an entire table despite the tid range
being completely empty. I likely could have got this to misbehave
without the debugger if I'd used PREPAREd statements and made the
ctids parameters to that. It's just the planner didn't choose a
parallel plan with an empty TID range due to the costs being too low.

For the record, here's the unpatched output below:

# explain (analyze) select count(*) from parallel_tidrangescan where
ctid >= '(10,0)' and ctid <= '(9,10)';
 Aggregate  (cost=2.00..2.01 rows=1 width=8) (actual
time=18440.132..18440.174 rows=1.00 loops=1)
   Buffers: shared read=40
   ->  Gather  (cost=0.01..2.00 rows=1 width=0) (actual
time=18440.126..18440.166 rows=0.00 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         Buffers: shared read=40
         ->  Parallel Tid Range Scan on parallel_tidrangescan
(cost=0.01..2.00 rows=1 width=0) (actual time=2414.495..2414.495
rows=0.00 loops=3)
               TID Cond: ((ctid >= '(10,0)'::tid) AND (ctid <= '(9,10)'::tid))
               Buffers: shared read=40

Note the "Buffers: shared read=40", which is this entire table. After
moving the code which sets the ParallelBlockTableScanDesc's limits
into heap_setscanlimits(), I get:

# explain (analyze) select count(*) from parallel_tidrangescan where
ctid >= '(10,0)' and ctid <= '(9,10)';
 Aggregate  (cost=2.00..2.01 rows=1 width=8) (actual
time=17.787..19.531 rows=1.00 loops=1)
   ->  Gather  (cost=0.01..2.00 rows=1 width=0) (actual
time=17.783..19.527 rows=0.00 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         ->  Parallel Tid Range Scan on parallel_tidrangescan
(cost=0.01..2.00 rows=1 width=0) (actual time=0.003..0.004 rows=0.00
loops=3)
               TID Cond: ((ctid >= '(10,0)'::tid) AND (ctid <= '(9,10)'::tid))

I'm now trying to convince myself that it's safe to adjust the
ParallelBlockTableScanDesc fields in heap_setscanlimits(). These
fields are being adjusted during the call to TidRangeNext() via
table_rescan_tidrange(), which is *during* execution, so there could
be any number of parallel workers doing this concurrently. I'm unsure
at this stage if all those workers want to be using the same scan
limits, either.

Currently, I think the above is a problem and it doesn't quite feel
like committer duty to fix this part. There's a chance I may get more
time, but if I don't, I've attached your v7 patch plus the adjustments
I've made to it so far.

David

Вложения

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Support tid range scan in parallel?

Вложения