Re: Support tid range scan in parallel?
От | David Rowley |
---|---|
Тема | Re: Support tid range scan in parallel? |
Дата | |
Msg-id | CAApHDvoyxYkH5eg_RVxfz+mVt+eWgHvB25xy-z=Xs3nLhU+RLQ@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Support tid range scan in parallel? (Cary Huang <cary.huang@highgo.ca>) |
Список | pgsql-hackers |
On Fri, 25 Jul 2025 at 06:46, Cary Huang <cary.huang@highgo.ca> wrote: > Thank you so much for the review! I have addressed the comments in the > attached v7 patch. I've now spent quite a bit of time going over this patch and testing it. One issue I found was in heap_set_tidrange() where you were not correctly setting the scan limits for the "if (ItemPointerCompare(&highestItem, &lowestItem) < 0)" case. Through a bit of manually overwriting the planner's choice using the debugger, I could get the executor to read an entire table despite the tid range being completely empty. I likely could have got this to misbehave without the debugger if I'd used PREPAREd statements and made the ctids parameters to that. It's just the planner didn't choose a parallel plan with an empty TID range due to the costs being too low. For the record, here's the unpatched output below: # explain (analyze) select count(*) from parallel_tidrangescan where ctid >= '(10,0)' and ctid <= '(9,10)'; Aggregate (cost=2.00..2.01 rows=1 width=8) (actual time=18440.132..18440.174 rows=1.00 loops=1) Buffers: shared read=40 -> Gather (cost=0.01..2.00 rows=1 width=0) (actual time=18440.126..18440.166 rows=0.00 loops=1) Workers Planned: 2 Workers Launched: 2 Buffers: shared read=40 -> Parallel Tid Range Scan on parallel_tidrangescan (cost=0.01..2.00 rows=1 width=0) (actual time=2414.495..2414.495 rows=0.00 loops=3) TID Cond: ((ctid >= '(10,0)'::tid) AND (ctid <= '(9,10)'::tid)) Buffers: shared read=40 Note the "Buffers: shared read=40", which is this entire table. After moving the code which sets the ParallelBlockTableScanDesc's limits into heap_setscanlimits(), I get: # explain (analyze) select count(*) from parallel_tidrangescan where ctid >= '(10,0)' and ctid <= '(9,10)'; Aggregate (cost=2.00..2.01 rows=1 width=8) (actual time=17.787..19.531 rows=1.00 loops=1) -> Gather (cost=0.01..2.00 rows=1 width=0) (actual time=17.783..19.527 rows=0.00 loops=1) Workers Planned: 2 Workers Launched: 2 -> Parallel Tid Range Scan on parallel_tidrangescan (cost=0.01..2.00 rows=1 width=0) (actual time=0.003..0.004 rows=0.00 loops=3) TID Cond: ((ctid >= '(10,0)'::tid) AND (ctid <= '(9,10)'::tid)) I'm now trying to convince myself that it's safe to adjust the ParallelBlockTableScanDesc fields in heap_setscanlimits(). These fields are being adjusted during the call to TidRangeNext() via table_rescan_tidrange(), which is *during* execution, so there could be any number of parallel workers doing this concurrently. I'm unsure at this stage if all those workers want to be using the same scan limits, either. Currently, I think the above is a problem and it doesn't quite feel like committer duty to fix this part. There's a chance I may get more time, but if I don't, I've attached your v7 patch plus the adjustments I've made to it so far. David
Вложения
В списке pgsql-hackers по дате отправления: