Обсуждение: Teaching planner to short-circuit empty UNION/EXCEPT/INTERSECT inputs
In [1] there was a report that set operations didn't correctly detect when inputs were provably empty sets. While this is not the bug that the report claimed it to be, as it's just a missing optimisation, I did decide to look at it to check if there was much performance to gain from doing this. The short of it is, Yes, there are cases when this can help query performance. Primarily, this seems to come from when the code detects that an EXCEPT ALL has an empty right-hand input. In this case, we can scan the left-hand input and forego the SetOp node completely. With EXCEPT (without ALL), deduplication is still required, however that can be done with an Aggregate node on the left input rather than using the slightly less efficient SetOp node. If I create two tables with a single int column containing 1 million rows each, ANALYZE them and run some queries with and without the patch, I see: (work_mem=256MB, pgbench -M simple -T 10) master: EXCEPT ALL left dummy : tps = 8466.587802 EXCEPT ALL right dummy : tps = 3.160083 EXCEPT left dummy : tps = 8433.607519 EXCEPT right dummy : tps = 3.178104 INTERSECT (all types) : tps = 8392.695606 UNION left dummy : tps = 3.406355 patched: EXCEPT ALL left dummy : tps = 8973.958896 (+5.99%) EXCEPT ALL right dummy : tps = 53.583312 (+1595.63%) EXCEPT left dummy : tps = 8736.716176 (+3.59%) EXCEPT right dummy : tps = 3.385520 (+6.53%) INTERSECT (all types) : tps = 8759.123942 (+4.37%) UNION left dummy : tps = 3.590264 (+5.40%) You can see EXCEPT ALL with the empty right-hand input became ~15x faster, and all the others became ~5% faster. There are some additional benefits aside from the performance as it's possible to provide better row estimates in certain cases. For example, if a UNION query removes all apart from 1 input, we can do estimate_num_groups() on that input. Otherwise, we're left to the assumption that all rows are unique, which certainly could cause some trouble later in planning for queries consuming the results of set operations in subqueries. EXCEPT with an empty right-hand input also benefits from improved row estimates for the same reason. For me, this seems worth doing. Set operations have been drawn out of the dark ages with the last few releases, and I feel this makes them more aligned to the set of optimisations we've come to expect in other parts of the planner. I'm happy to hear other opinions. Patch attached. David [1] https://postgr.es/m/18904-c5fea7892f4d26ed@postgresql.org
Вложения
David Rowley <dgrowleyml@gmail.com> writes:
> In [1] there was a report that set operations didn't correctly detect
> when inputs were provably empty sets. While this is not the bug that
> the report claimed it to be, as it's just a missing optimisation, I
> did decide to look at it to check if there was much performance to
> gain from doing this.
I'm kind of resistant to the amount of code this patch adds in
comparison to the likely benefit. Sure, a badly written query can
profit, but is it worth debugging and maintaining a couple hundred
lines of code for that?
The first few hunks of changes seem fine by this light, but I think
you're expending too much effort on the EXCEPT-with-dummy-inputs
cases. I'm wondering if it could be shortened a great deal by
handling left-input-dummy and EXCEPT-ALL-with-right-input-dummy
but leaving the EXCEPT-with-right-input-dummy case unimproved.
regards, tom lane
On Thu, 2 Oct 2025 at 16:21, Tom Lane <tgl@sss.pgh.pa.us> wrote: > I'm wondering if it could be shortened a great deal by > handling left-input-dummy and EXCEPT-ALL-with-right-input-dummy > but leaving the EXCEPT-with-right-input-dummy case unimproved. Good idea. Less code and still get to keep the one that did well in the benchmark. See attached. I ended up splitting the patch in two. 0001 for UNION only, then 0002 for the INTERSECT and EXCEPT. David
Вложения
Hi,
David Rowley <dgrowleyml@gmail.com>于2025年10月2日 周四20:09写道:
On Thu, 2 Oct 2025 at 16:21, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I'm wondering if it could be shortened a great deal by
> handling left-input-dummy and EXCEPT-ALL-with-right-input-dummy
> but leaving the EXCEPT-with-right-input-dummy case unimproved.
Good idea. Less code and still get to keep the one that did well in
the benchmark. See attached.
I ended up splitting the patch in two. 0001 for UNION only, then 0002
for the INTERSECT and EXCEPT.
David
It seems that the optimization for `UNION ALL` is already implemented in the patch: it removes empty sub-paths and preserves the remaining ones.
Should we add a test case to formally validate this behavior like Union cases?
David Rowley <dgrowleyml@gmail.com> writes:
> Good idea. Less code and still get to keep the one that did well in
> the benchmark. See attached.
0001's change in is_dummy_rel() seems like a complete hack, especially
since you didn't bother to change the adjacent comment. Why is that
necessary?
v2 LGTM otherwise.
regards, tom lane
On Fri, 3 Oct 2025 at 04:18, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> 0001's change in is_dummy_rel() seems like a complete hack, especially
> since you didn't bother to change the adjacent comment. Why is that
> necessary?
build_setop_child_paths() wraps the child inputs in SubqueryScanPaths,
so we need to see through those.
An alternative way would be to propagate those during build_setop_child_paths()
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -523,6 +523,13 @@ build_setop_child_paths(PlannerInfo *root, RelOptInfo *rel,
bool is_sorted;
int presorted_keys;
+ /* If the input rel is dummy, propagate that to this query level */
+ if (is_dummy_rel(final_rel))
+ {
+ mark_dummy_rel(rel);
+ continue;
+ }
+
As attached.
> v2 LGTM otherwise.
Thanks
David
Вложения
David Rowley <dgrowleyml@gmail.com> writes:
> On Fri, 3 Oct 2025 at 04:18, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> 0001's change in is_dummy_rel() seems like a complete hack, especially
>> since you didn't bother to change the adjacent comment. Why is that
>> necessary?
> build_setop_child_paths() wraps the child inputs in SubqueryScanPaths,
> so we need to see through those.
Ah.
> An alternative way would be to propagate those during build_setop_child_paths()
That answer works for me. I was expecting you to just document the
need for the extra check in is_dummy_rel ;-) ... but this way is
perhaps better.
regards, tom lane
On Fri, 3 Oct 2025 at 02:55, Mingli Zhang <zmlpostgres@gmail.com> wrote: > It seems that the optimization for `UNION ALL` is already implemented in the patch: it removes empty sub-paths and preservesthe remaining ones. > Should we add a test case to formally validate this behavior like Union cases? If I were to do that, I'd have to come up with something that's flatten_simple_union_all() proof. Maybe something like varying types in the targetlist. I think it's probably not really worthwhile since it's not testing any new code that is not already being tested by the tests that I did add. David
David Rowley <dgrowleyml@gmail.com> writes:
> I'm just considering the best fix and can think of two options:
> 1) Move away from using varno==0 in generate_append_tlist(). Use varno==1, or;
> 2) Add handling in setrefs.c for T_Result to adjust varno==0 Vars to
> use varno==1 vars.
> The attached v4-0001 does #2, but wondering if #1 should be explored first.
I don't recall the details ATM, but if you poke around you will find
multiple comments complaining about how that varno-zero convention is
problematic or requires code to do something unusual. So I'd be in
favor of trying to get rid of it, but I'm not entirely sure what to do
instead, and the ramifications might be wider than you realize.
In particular it's not clear to me why varno==1 is better? As best
I can recall without diving into code, the fundamental mismatch is
that varno zero doesn't correspond to any RTE. It would be better
if the Vars matched the subquery RTEs that are at the base of the
set-operation nest, so that there were a useful referent as to where
a Var came from. Arbitrarily setting varno=1 sounds like the worst
case: we could neither identify a Var with a source subquery
accurately, nor realize that its varno is phony.
regards, tom lane
On Sun, 5 Oct 2025 at 04:43, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> In particular it's not clear to me why varno==1 is better? As best
> I can recall without diving into code, the fundamental mismatch is
> that varno zero doesn't correspond to any RTE. It would be better
> if the Vars matched the subquery RTEs that are at the base of the
> set-operation nest, so that there were a useful referent as to where
> a Var came from. Arbitrarily setting varno=1 sounds like the worst
> case: we could neither identify a Var with a source subquery
> accurately, nor realize that its varno is phony.
I've not tried it yet, but the idea with varno==1 is that is that it
does index the first UNION child's RTE. For the case at hand, all the
children are dummies. I thought doing this was ok because Appends and
MergeAppends show the target entries for the first child, and cases
like the following do show Vars from RTEs that don't get scanned in
the plan:
# explain verbose select * from t where 1=2 order by 1;
Sort (cost=0.01..0.02 rows=0 width=0)
Output: a, b
Sort Key: t.a
-> Result (cost=0.00..0.00 rows=0 width=0)
Output: a, b
Replaces: Scan on t
One-Time Filter: false
Another alternative that I'm thinking about which might be better is
to double-down on the varno==0 and invent a special varno and define
SETOP_VAR. I'd feel better about doing that as I didn't feel good
about coding the magic number for the if(var->varno == 0) check in
set_plan_refs() to make the T_Result work in EXPLAIN VERBOSE.
David
On Sun, 5 Oct 2025 at 13:56, David Rowley <dgrowleyml@gmail.com> wrote: > Another alternative that I'm thinking about which might be better is > to double-down on the varno==0 and invent a special varno and define > SETOP_VAR. I'd feel better about doing that as I didn't feel good > about coding the magic number for the if(var->varno == 0) check in > set_plan_refs() to make the T_Result work in EXPLAIN VERBOSE. I decided not to do it that way and instead just added code to create a varno==1 Var in setrefs.c. This basically amounts to following on with the varno==0 hack used in prepunion.c. The reason I didn't go down the route of SETOP_VAR was that it's still a hack, it's just making it look a bit more official. I suppose the correct way to fix all this and get rid of the varno==0 stuff forever is to have a proper top-level RTE for the top-level set operation and make it so each child is an OTHER_MEMBER rel at that query level. It felt like going a bit too far to do something like that to fix this bug, so I didn't explore that further. David
David Rowley <dgrowleyml@gmail.com> writes:
> The reason I didn't go down the route of SETOP_VAR was that it's still
> a hack, it's just making it look a bit more official. I suppose the
> correct way to fix all this and get rid of the varno==0 stuff forever
> is to have a proper top-level RTE for the top-level set operation and
> make it so each child is an OTHER_MEMBER rel at that query level. It
> felt like going a bit too far to do something like that to fix this
> bug, so I didn't explore that further.
Yeah, I think "more RTEs" is the ultimate solution here, but it's
never risen to the top of my to-do list either. I was kind of
thinking about an RTE per set-op child, though. Not sure if one
for the top-level op, or one for an intermediate op, would help;
though it's certainly possible it would.
regards, tom lane
Hello David, 04.10.2025 06:55, David Rowley wrote: > On Fri, 3 Oct 2025 at 10:02, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> David Rowley <dgrowleyml@gmail.com> writes: >>> An alternative way would be to propagate those during build_setop_child_paths() >> That answer works for me. I was expecting you to just document the >> need for the extra check in is_dummy_rel ;-) ... but this way is >> perhaps better. > So, I pushed the UNION portion earlier, but on hacking more on the > EXCEPT/INTERSECT patch, I noticed that I don't have the target lists > correct when marking the top-level set op as dummy. ... Please look at a new anomaly, introduced with 03d40e4b5: CREATE TABLE t(i integer); CREATE TABLE pt(i integer) PARTITION BY LIST(i); SET enable_seqscan = 'off'; SELECT * FROM t UNION SELECT * FROM t UNION ALL SELECT * FROM pt; produces: ERROR: XX000: unrecognized node type: 0 LOCATION: create_plan_recurse, createplan.c:538 Best regards. Alexander
On Tue, Nov 4, 2025 at 5:00 PM Alexander Lakhin <exclusion@gmail.com> wrote: > Please look at a new anomaly, introduced with 03d40e4b5: > CREATE TABLE t(i integer); > CREATE TABLE pt(i integer) PARTITION BY LIST(i); > > SET enable_seqscan = 'off'; > SELECT * FROM t UNION SELECT * FROM t > UNION ALL > SELECT * FROM pt; > produces: > ERROR: XX000: unrecognized node type: 0 > LOCATION: create_plan_recurse, createplan.c:538 I looked into this. The child relation with relid 3 (the scan on the partitioned table) is a dummy, so it is skipped in generate_union_paths(). As a result, the final setop relation ends up the same as the child relation with relid set to (1, 2). Then, generate_union_paths() creates an Append path using this relation's cheapest path as its subpath. Somehow, add_path() determines that this new Append path dominates the original cheapest path, causing the original cheapest path to be freed. This leads to the final Append path referencing a subpath that has already been freed. - Richard
On Tue, 4 Nov 2025 at 21:00, Alexander Lakhin <exclusion@gmail.com> wrote: > Please look at a new anomaly, introduced with 03d40e4b5: > CREATE TABLE t(i integer); > CREATE TABLE pt(i integer) PARTITION BY LIST(i); > > SET enable_seqscan = 'off'; > SELECT * FROM t UNION SELECT * FROM t > UNION ALL > SELECT * FROM pt; > produces: > ERROR: XX000: unrecognized node type: 0 Thanks for the report. This seems to be due to the fact that the same UPPERREL_SETOP RelOptInfo is being used for the UNION and UNION ALL. When doing the add_path() for the UNION ALL at prepunion.c:1030, the sole child path of the Append being added is an AggPath for the previous UNION operation. When we add_path() the AppendPath, the previous AggPath is already in the pathlist of the result_rel. add_path() ends up freeing the old Path due to the new AppendPath being better and the end result is the Append's subpath is free'd. The reason we end up with the same result_rel is that we're not passing all the relids in fetch_upper_rel(root, UPPERREL_SETOP, relids) due to having removed dummy rels. I guess the fix might be something like record the relids even when skipping dummy relations. I'll go and explore that as an option. David
On Tue, 4 Nov 2025 at 22:54, David Rowley <dgrowleyml@gmail.com> wrote: > The reason we end up with the same result_rel is that we're not > passing all the relids in fetch_upper_rel(root, UPPERREL_SETOP, > relids) due to having removed dummy rels. I guess the fix might be > something like record the relids even when skipping dummy relations. > I'll go and explore that as an option. This seems to fix it. I'll study it more in the morning (it's late in my time zone). David
Вложения
On Tue, Nov 4, 2025 at 6:19 PM Richard Guo <guofenglinux@gmail.com> wrote: > On Tue, Nov 4, 2025 at 5:00 PM Alexander Lakhin <exclusion@gmail.com> wrote: > > Please look at a new anomaly, introduced with 03d40e4b5: > > CREATE TABLE t(i integer); > > CREATE TABLE pt(i integer) PARTITION BY LIST(i); > > > > SET enable_seqscan = 'off'; > > SELECT * FROM t UNION SELECT * FROM t > > UNION ALL > > SELECT * FROM pt; > > produces: > > ERROR: XX000: unrecognized node type: 0 > > LOCATION: create_plan_recurse, createplan.c:538 > > I looked into this. The child relation with relid 3 (the scan on the > partitioned table) is a dummy, so it is skipped in > generate_union_paths(). As a result, the final setop relation ends up > the same as the child relation with relid set to (1, 2). Then, > generate_union_paths() creates an Append path using this relation's > cheapest path as its subpath. Somehow, add_path() determines that > this new Append path dominates the original cheapest path, causing the > original cheapest path to be freed. This leads to the final Append > path referencing a subpath that has already been freed. I think a quick fix is to always include the involved child relids when building the relid set for the setop relation, even if the child is dummy. A dummy child does not yield any rows, but theoretically it is still a relation that we should account for. - Richard
Вложения
On Tue, 4 Nov 2025 at 23:00, David Rowley <dgrowleyml@gmail.com> wrote: > > On Tue, 4 Nov 2025 at 22:54, David Rowley <dgrowleyml@gmail.com> wrote: > > The reason we end up with the same result_rel is that we're not > > passing all the relids in fetch_upper_rel(root, UPPERREL_SETOP, > > relids) due to having removed dummy rels. I guess the fix might be > > something like record the relids even when skipping dummy relations. > > I'll go and explore that as an option. > > This seems to fix it. I'll study it more in the morning (it's late in > my time zone). I went over this again this morning. I considered adding the following test: -- Try a more complex case with multiple chained UNIONs EXPLAIN (COSTS OFF) SELECT two FROM tenk1 UNION SELECT four FROM tenk1 UNION ALL SELECT ten FROM tenk1 WHERE 1=2; but it seems that enable_seqscan does also need to be disabled as otherwise add_path() just finds the new and old path to cost the same and rejects the new path. With enable_seqscan = off, compare_path_costs_fuzzily() will find the old path to have disabled_node = 2 and the new one to have disabled_nodes = 0, so accepts the new and pfree's the old. I finally decided that it was a bit too obscure a scenario to test to verify that the same silly mistake didn't reappear. Thanks again for the report and the simple recreation steps. David
Hello David!
05.11.2025 00:55, David Rowley wrote:
05.11.2025 00:55, David Rowley wrote:
I finally decided that it was a bit too obscure a scenario to test to verify that the same silly mistake didn't reappear. Thanks again for the report and the simple recreation steps.
Thank you for the fix!
But while playing around, I've just discovered another interesting error:
SET enable_seqscan = 'off';
SELECT * FROM t EXCEPT SELECT * FROM t
UNION SELECT * FROM pt;
leads to:
ERROR: XX000: no relation entry for relid 0
LOCATION: find_base_rel, relnode.c:541
Best regards,
Alexander
On Wed, 5 Nov 2025 at 17:00, Alexander Lakhin <exclusion@gmail.com> wrote: > But while playing around, I've just discovered another interesting error: > > SET enable_seqscan = 'off'; > SELECT * FROM t EXCEPT SELECT * FROM t > UNION SELECT * FROM pt; > > leads to: > ERROR: XX000: no relation entry for relid 0 > LOCATION: find_base_rel, relnode.c:541 :-( This happens because of the code I added to try to give better estimates to the number of groups when only a single child remains in the UNION. Because the only surviving Path is for the EXCEPT set-op, there are Vars with varno==0, which estimate_num_groups() does not like. I'm considering restricting that code Path to where the path->parent->reloptkind != RELOPT_UPPER_REL. It doesn't feel great adding the special case, but likely having better row estimates is worthwhile, when it's possible. I'll sit on that for a bit and see if I can think of a better way. (Yet another reason we should probably fix the varno==0 issue.) Thanks for the report and reproducer, again! David
On Wed, 5 Nov 2025 at 18:26, David Rowley <dgrowleyml@gmail.com> wrote: > I'm considering restricting that code Path to where the > path->parent->reloptkind != RELOPT_UPPER_REL. I couldn't think of anything better, so here's a patch to that effect. David
Вложения
On Wed, 5 Nov 2025 at 22:44, David Rowley <dgrowleyml@gmail.com> wrote: > > On Wed, 5 Nov 2025 at 18:26, David Rowley <dgrowleyml@gmail.com> wrote: > > I'm considering restricting that code Path to where the > > path->parent->reloptkind != RELOPT_UPPER_REL. > > I couldn't think of anything better, so here's a patch to that effect. I've pushed this. David