On Fri, 8 Sept 2023 at 19:14, Richard Guo <guofenglinux@gmail.com> wrote:
> explain select * from partsupp join lineitem on l_partkey > ps_partkey;
> QUERY PLAN
> --------------------------------------------------------------------------------------
> Gather (cost=0.00..1807085.44 rows=160466667 width=301)
> Workers Planned: 4
> -> Nested Loop (cost=0.00..1807085.44 rows=40116667 width=301)
> Join Filter: (lineitem.l_partkey > partsupp.ps_partkey)
> -> Parallel Seq Scan on lineitem (cost=0.00..1518.44 rows=15044 width=144)
> -> Materialize (cost=0.00..307.00 rows=8000 width=157)
> -> Seq Scan on partsupp (cost=0.00..267.00 rows=8000 width=157)
> (7 rows)
>
> The execution time (ms) are (avg of 3 runs):
>
> unpatched: 71769.21
> patched: 65510.04
This gap would be wider if the partsupp Seq Scan were filtering off
some rows and wider still if you added more rows to lineitem.
However, a clauseless seqscan is not the most compelling use case
below a material node. The inner side of the nested loop could be some
subquery that takes 6 days to complete. Running the 6 day query ~15044
times seems like something that would be good to avoid.
It seems worth considering Material paths to me. I think that the
above example could be tuned any way you like to make it look better
or worse.
David