Обсуждение: Removing Functionally Dependent GROUP BY Columns
We already allow a SELECT's target list to contain non-aggregated columns in a GROUP BY query in cases where the non-aggregated column is functionally dependent on the GROUP BY clause.
For example a query such as;
SELECT p.product_id,p.description, SUM(s.quantity)
FROM product p
INNER JOIN sale s ON p.product_id = s.product_id
GROUP BY p.product_id;
is perfectly fine in PostgreSQL, as p.description is functionally dependent on p.product_id (assuming product_id is the PRIMARY KEY of product).
It seems that there's no shortage of relational databases in existence today which don't support this. These databases would require the GROUP BY clause to include the p.description column too.
It seems rather unfortunate that people who migrate applications to PostgreSQL may not be aware that we support this, as currently if we needlessly include the p.description column, PostgreSQL naively includes this column while grouping. These people could well be incurring a performance penalty due to our planner not removing the useless items from the list, as if the primary key is present, then including any other columns won't cause splitting of the groups any further, all other columns from the *same relation* can simply be removed from the GROUP BY clause.
There are in fact also two queries in TPC-H (Q10 and Q18) which are written to include all of the non-aggregated column in the GROUP BY list. During a recent test I witnessed a 50% gain in performance in Q10 by removing the unneeded columns from the GROUP BY clause.
I've attached a patch which implements this in PostgreSQL.
The patch may need a little more work in order to adjust the targetlist's tleSortGroupRefs to remove invalid ones and perhaps also remove the gaps.
I'm posting this now so that I can gauge the community interest in this.
Is it something that we'd like to have in PostgreSQL?
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Вложения
On 2015-12-01 05:00, David Rowley wrote: > We already allow a SELECT's target list to contain non-aggregated columns > in a GROUP BY query in cases where the non-aggregated column is > functionally dependent on the GROUP BY clause. > > For example a query such as; > > SELECT p.product_id,p.description, SUM(s.quantity) > FROM product p > INNER JOIN sale s ON p.product_id = s.product_id > GROUP BY p.product_id; > > is perfectly fine in PostgreSQL, as p.description is functionally dependent > on p.product_id (assuming product_id is the PRIMARY KEY of product). This has come up before (on other forums, at least), and my main concern has been that unlike the case where we go from throwing an error to allowing a query, this has a chance to make the planning of currently legal queries slower. Have you tried to measure the impact of this on queries where there's no runtime gains to be had? .m
On 1 December 2015 at 17:09, Marko Tiikkaja <marko@joh.to> wrote:
On 2015-12-01 05:00, David Rowley wrote:We already allow a SELECT's target list to contain non-aggregated columns
in a GROUP BY query in cases where the non-aggregated column is
functionally dependent on the GROUP BY clause.
For example a query such as;
SELECT p.product_id,p.description, SUM(s.quantity)
FROM product p
INNER JOIN sale s ON p.product_id = s.product_id
GROUP BY p.product_id;
is perfectly fine in PostgreSQL, as p.description is functionally dependent
on p.product_id (assuming product_id is the PRIMARY KEY of product).
This has come up before (on other forums, at least), and my main concern has been that unlike the case where we go from throwing an error to allowing a query, this has a chance to make the planning of currently legal queries slower. Have you tried to measure the impact of this on queries where there's no runtime gains to be had?
I've performed a series of benchmarks on the following queries:
Test1: explain select id1,id2 from t1 group by id1,id2;
Test2: explain select id from t2 group by id;
Test3: explain select t1.id1,t1.id2 from t2 inner join t1 on t1.id1=t2.id group by t1.id1,t1.id2;
I ran each of these with pgbench for 60 seconds, 3 runs per query. In each case below I've converted the TPS into seconds using the average TPS over the 3 runs.
In summary:
Test1 is the worst case test. It's a very simple query so planning overhead of join searching is non-existent. The fact that there's 2 columns in the GROUP BY means that the fast path cannot be used. I added this as if there's only 1 column in the GROUP BY then there's no point in searching for something to remove.
Average (Sec)
Master 0.0001043117
Patched 0.0001118961
Performance 93.22%
Microseconds of planning overhead 7.5844326722
Test2 is a simple query with a GROUP BY which can fast path due to there being only 1 GROUP BY column.
Average (Sec)
Master 0.000099374448
Patched 0.000099670124
Performance 99.70%
Microseconds of planning overhead 0.2956763193
Average (Sec)
Master 0.0001797165
Patched 0.0001798406
Performance 99.93%
Microseconds of planning overhead 0.1240776236
Test3 results seem a bit strange, I would have expected more of a slowdown. I ran the test again to make sure, and it came back with the same results the 2nd time.
I've attached the spreadsheet that used to collect the results, and also the raw pgbench output.
It seems that the worst case test adds about 7.6 microseconds onto planning time. To get this worse case result I had to add two GROUP BY columns, as having only 1 triggers a fast path as the code knows it can't remove any columns, since there's only 1. A similar fast path also exists which will only lookup the PRIMARY KEY details if there's more than 1 column per relation in the GROUP BY, so for example GROUP BY rel1.col1, rel2.col1 won't lookup any PRIMARY KEY constraint.
Given that the extra code really only does anything if the GROUP BY has 2 or more expressions, are you worried that this will affect too many short and fast to execute queries negatively?
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Вложения
On Mon, Nov 30, 2015 at 11:00 PM, David Rowley <david.rowley@2ndquadrant.com> wrote: > There are in fact also two queries in TPC-H (Q10 and Q18) which are written > to include all of the non-aggregated column in the GROUP BY list. During a > recent test I witnessed a 50% gain in performance in Q10 by removing the > unneeded columns from the GROUP BY clause. Wow, that's pretty impressive. +1 for trying to do something about this. (Full disclosure: I have not read your patch.) -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 11/30/15 11:00 PM, David Rowley wrote: > It seems that there's no shortage of relational databases in existence > today which don't support this. These databases would require the GROUP > BY clause to include the p.description column too. Well, actually, we implemented this because other databases had it and users kept complaining to us.
On 3 December 2015 at 14:28, Peter Eisentraut <peter_e@gmx.net> wrote:
On 11/30/15 11:00 PM, David Rowley wrote:
> It seems that there's no shortage of relational databases in existence
> today which don't support this. These databases would require the GROUP
> BY clause to include the p.description column too.
Well, actually, we implemented this because other databases had it and
users kept complaining to us.
Most likely you mean MySQL? I believe there was a bit of an influx in people emigrating away from that database not in the too distant past. It's not too surprising we picked up some of those people, and some of the features that they were used to at the same time... IF NOT EXISTS perhaps is a good example to back that up.
However, I think your comment makes it quite clear that we're not talking about the same database management systems.
For me, I tested the most popular 2 commercial ones and found neither of these supported columns in the SELECT list which were functionally dependent on the GROUP BY clause, unless the columns themselves were in the GROUP BY clause. I admit my "no shortage" comment was based on these two only. I also admit that I don't possess any statistics which show which RDBMSs users are migrating from. I merely thought that testing the two most popular commercial ones was enough justification to write what I wrote.
--
David Rowley http://www.2ndQuadrant.com/
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On 01/12/2015 12:07, David Rowley wrote: > On 1 December 2015 at 17:09, Marko Tiikkaja <marko@joh.to > <mailto:marko@joh.to>> wrote: > > On 2015-12-01 05:00, David Rowley wrote: > > We already allow a SELECT's target list to contain > non-aggregated columns > in a GROUP BY query in cases where the non-aggregated column is > functionally dependent on the GROUP BY clause. > > For example a query such as; > > SELECT p.product_id,p.description, SUM(s.quantity) > FROM product p > INNER JOIN sale s ON p.product_id = s.product_id > GROUP BY p.product_id; > > is perfectly fine in PostgreSQL, as p.description is > functionally dependent > on p.product_id (assuming product_id is the PRIMARY KEY of product). > > > This has come up before (on other forums, at least), and my main > concern has been that unlike the case where we go from throwing an > error to allowing a query, this has a chance to make the planning of > currently legal queries slower. Have you tried to measure the > impact of this on queries where there's no runtime gains to be had? > > > I've performed a series of benchmarks on the following queries: > > Test1: explain select id1,id2 from t1 group by id1,id2; > Test2: explain select id from t2 group by id; > Test3: explain select t1.id1,t1.id2 from t2 inner join t1 on > t1.id1=t2.id <http://t2.id> group by t1.id1,t1.id2; > > I ran each of these with pgbench for 60 seconds, 3 runs per query. In > each case below I've converted the TPS into seconds using the average > TPS over the 3 runs. > > In summary: > > Test1 is the worst case test. It's a very simple query so planning > overhead of join searching is non-existent. The fact that there's 2 > columns in the GROUP BY means that the fast path cannot be used. I added > this as if there's only 1 column in the GROUP BY then there's no point > in searching for something to remove. > > [...] > > It seems that the worst case test adds about 7.6 microseconds onto > planning time. To get this worse case result I had to add two GROUP BY > columns, as having only 1 triggers a fast path as the code knows it > can't remove any columns, since there's only 1. A similar fast path also > exists which will only lookup the PRIMARY KEY details if there's more > than 1 column per relation in the GROUP BY, so for example GROUP BY > rel1.col1, rel2.col1 won't lookup any PRIMARY KEY constraint. > > Given that the extra code really only does anything if the GROUP BY has > 2 or more expressions, are you worried that this will affect too many > short and fast to execute queries negatively? > > In identify_key_vars() + /* Only PK constraints are of interest for now, see comment above */ + if (con->contype != CONSTRAINT_PRIMARY) + continue; I think the comment should better refer to remove_useless_groupby_columns() (don't optimize further than what check_functional_grouping() does). + /* + * Exit if the constraint is deferrable, there's no point in looking + * for another PK constriant, as there can only be one. + */ small typo on "constriant" + { + varlistmatches = bms_add_member(varlistmatches, varlistidx); + found_col = true; + /* don't break, there might be dupicates */ + } small typo on "dupicates". Also, I thought transformGroupClauseExpr() called in the parse analysis make sure that there isn't any duplicate in the GROUP BY clause. Do I miss something? in remove_useless_groupby_columns() + /* + * Skip if there were no Vars found in the GROUP BY clause which belong + * to this relation. We can also skip if there's only 1 Var, as there + * needs to be more than one Var for there to be any columns which are + * functionally dependent on another column. + */ This comment doesn't sound very clear. I'm not a native english speaker, so maybe it's just me. + /* + * If we found any surplus Vars in the GROUP BY clause, then we'll build + * a new GROUP BY clause without these surplus Vars. + */ + if (anysurplus) + { + List *new_groupby = NIL; + + foreach (lc, root->parse->groupClause) + { + SortGroupClause *sgc = (SortGroupClause *) lfirst(lc); + TargetEntry *tle; + Var *var; + + tle = get_sortgroupclause_tle(sgc, root->parse->targetList); + var = (Var *) tle->expr; + + if (!IsA(var, Var)) + continue; + [...] if var isn't a Var, it needs to be added in new_groupby. + /* XXX do we to alter tleSortGroupRef to remove gaps? */ no idea on that :/ -- Julien Rouhaud http://dalibo.com - http://dalibo.org
Many thanks for the thorough review!
On 12 January 2016 at 23:37, Julien Rouhaud <julien.rouhaud@dalibo.com> wrote:
In identify_key_vars()
+ /* Only PK constraints are of interest for now, see comment above */
+ if (con->contype != CONSTRAINT_PRIMARY)
+ continue;
I think the comment should better refer to
remove_useless_groupby_columns() (don't optimize further than what
check_functional_grouping() does).
I've improved this by explaining it more clearly.
+ /*
+ * Exit if the constraint is deferrable, there's no point in looking
+ * for another PK constriant, as there can only be one.
+ */
small typo on "constriant"
Fixed.
+ {
+ varlistmatches = bms_add_member(varlistmatches,
varlistidx);
+ found_col = true;
+ /* don't break, there might be dupicates */
+ }
small typo on "dupicates". Also, I thought transformGroupClauseExpr()
called in the parse analysis make sure that there isn't any duplicate in
the GROUP BY clause. Do I miss something?
No I missed something :) You are right. I've put a break; here and a comment to explain why.
in remove_useless_groupby_columns()
+ /*
+ * Skip if there were no Vars found in the GROUP BY clause which
belong
+ * to this relation. We can also skip if there's only 1 Var, as
there
+ * needs to be more than one Var for there to be any columns
which are
+ * functionally dependent on another column.
+ */
This comment doesn't sound very clear. I'm not a native english speaker,
so maybe it's just me.
Yes it was badly worded. I've fixed in the attached.
+ /*
+ * If we found any surplus Vars in the GROUP BY clause, then we'll build
+ * a new GROUP BY clause without these surplus Vars.
+ */
+ if (anysurplus)
+ {
+ List *new_groupby = NIL;
+
+ foreach (lc, root->parse->groupClause)
+ {
+ SortGroupClause *sgc = (SortGroupClause *) lfirst(lc);
+ TargetEntry *tle;
+ Var *var;
+
+ tle = get_sortgroupclause_tle(sgc, root->parse->targetList);
+ var = (Var *) tle->expr;
+
+ if (!IsA(var, Var))
+ continue;
+ [...]
if var isn't a Var, it needs to be added in new_groupby.
Yes you're right, well, at least I've written enough code to ensure that it's not needed.
Technically we could look inside non-Vars and check if the Expr is just made up of Vars from a single relation, and then we could also make that surplus if we find other Vars which make up the table's primary key. I didn't make these changes as I thought it was a less likely scenario. It wouldn't be too much extra code to add however. I've went and added an XXX comment to explain that there might be future optimisation opportunities in the future around this.
I've attached an updated patch.
--
Вложения
On 14/01/2016 01:30, David Rowley wrote: > Many thanks for the thorough review! > And thanks for the patch and fast answer! > On 12 January 2016 at 23:37, Julien Rouhaud <julien.rouhaud@dalibo.com > <mailto:julien.rouhaud@dalibo.com>> wrote: > > > In identify_key_vars() > > + /* Only PK constraints are of interest for now, see comment > above */ > + if (con->contype != CONSTRAINT_PRIMARY) > + continue; > > I think the comment should better refer to > remove_useless_groupby_columns() (don't optimize further than what > check_functional_grouping() does). > > > I've improved this by explaining it more clearly. > Very nice. Small typo though: + * Technically we could look at UNIQUE indexes too, however we'd also + * have to ensure that each column of the unique index had a NOT NULL s/had/has/ + * constraint, however since NOT NULL constraints currently are don't s/are // > > [...] > > > + { > + varlistmatches = bms_add_member(varlistmatches, > varlistidx); > + found_col = true; > + /* don't break, there might be dupicates */ > + } > > small typo on "dupicates". Also, I thought transformGroupClauseExpr() > called in the parse analysis make sure that there isn't any duplicate in > the GROUP BY clause. Do I miss something? > > > No I missed something :) You are right. I've put a break; here and a > comment to explain why. > ok :) > [...] > > > + /* > + * If we found any surplus Vars in the GROUP BY clause, then > we'll build > + * a new GROUP BY clause without these surplus Vars. > + */ > + if (anysurplus) > + { > + List *new_groupby = NIL; > + > + foreach (lc, root->parse->groupClause) > + { > + SortGroupClause *sgc = (SortGroupClause *) lfirst(lc); > + TargetEntry *tle; > + Var *var; > + > + tle = get_sortgroupclause_tle(sgc, root->parse->targetList); > + var = (Var *) tle->expr; > + > + if (!IsA(var, Var)) > + continue; > + [...] > > if var isn't a Var, it needs to be added in new_groupby. > > > Yes you're right, well, at least I've written enough code to ensure that > it's not needed. Oh yes, I realize that now. > Technically we could look inside non-Vars and check if the Expr is just > made up of Vars from a single relation, and then we could also make that > surplus if we find other Vars which make up the table's primary key. I > didn't make these changes as I thought it was a less likely scenario. It > wouldn't be too much extra code to add however. I've went and added an > XXX comment to explain that there might be future optimisation > opportunities in the future around this. > Agreed. > I've attached an updated patch. > + /* don't try anything unless there's two Vars */ + if (varlist == NULL || list_length(varlist) < 2) + continue; To be perfectly correct, the comment should say "at least two Vars". Except the small remaining typos, this patch looks very fine to me. I flag it as ready for committer. -- Julien Rouhaud http://dalibo.com - http://dalibo.org
On 14 January 2016 at 11:19, Julien Rouhaud <julien.rouhaud@dalibo.com> wrote: > + /* don't try anything unless there's two Vars */ > + if (varlist == NULL || list_length(varlist) < 2) > + continue; > > To be perfectly correct, the comment should say "at least two Vars". Apologies for butting in and I appreciate I don't have any ownership over this codebase or right to suggest any changes, but this just caught my eye before I could hit "delete". My mantra tends to be "why, not what" for inline comments; in this case you can get the same information from the next line of code as you get from the comment. Perhaps something like /* it's clearly impossible to remove duplicates if there are fewer than two GROUPBY columns */ might be more helpful? (also sorry if I've misunderstood what it _actually_ does, I just made an assumption based on reading this thread!) Geoff
On 14/01/2016 14:04, Geoff Winkless wrote: > On 14 January 2016 at 11:19, Julien Rouhaud <julien.rouhaud@dalibo.com> wrote: >> + /* don't try anything unless there's two Vars */ >> + if (varlist == NULL || list_length(varlist) < 2) >> + continue; >> >> To be perfectly correct, the comment should say "at least two Vars". > > Apologies for butting in and I appreciate I don't have any ownership > over this codebase or right to suggest any changes, but this just > caught my eye before I could hit "delete". > > My mantra tends to be "why, not what" for inline comments; in this > case you can get the same information from the next line of code as > you get from the comment. > You're absolutely right, but in this case the comment is more like a reminder of a bigger comment few lines before that wasn't quoted in my mail: + *[...] If there are no Vars then + * nothing need be done for this rel. If there's less than two Vars then + * there's no room to remove any, as the PRIMARY KEY must have at least one + * Var, so we can safely also do nothing if there's less than two Vars. so I assume it's ok to keep it this way. > Perhaps something like > > /* it's clearly impossible to remove duplicates if there are fewer > than two GROUPBY columns */ > > might be more helpful? > > (also sorry if I've misunderstood what it _actually_ does, I just made > an assumption based on reading this thread!) > -- Julien Rouhaud http://dalibo.com - http://dalibo.org
On 14 January 2016 at 13:16, Julien Rouhaud <julien.rouhaud@dalibo.com> wrote: > You're absolutely right, but in this case the comment is more like a > reminder of a bigger comment few lines before that wasn't quoted in my mail Fair enough, although I have two niggles with that: a) the second comment could become physically separated from the first by later additions of extra code, or by refactoring; b) if you don't need the comment because the explanation for it is local anyway and the comment tells you nothing that the code doesn't, why have it at all? > so I assume it's ok to keep it this way. Of course it's ok to do whatever you decide is best: as I said previously, I fully appreciate that I have no ownership over any of the code. Geoff
On 14/01/2016 14:29, Geoff Winkless wrote: > On 14 January 2016 at 13:16, Julien Rouhaud <julien.rouhaud@dalibo.com> wrote: >> You're absolutely right, but in this case the comment is more like a >> reminder of a bigger comment few lines before that wasn't quoted in my mail > > Fair enough, although I have two niggles with that: > > a) the second comment could become physically separated from the first > by later additions of extra code, or by refactoring; > b) if you don't need the comment because the explanation for it is > local anyway and the comment tells you nothing that the code doesn't, > why have it at all? > I agree. If I had to choose, I'd vote for removing it. >> so I assume it's ok to keep it this way. > > Of course it's ok to do whatever you decide is best: as I said > previously, I fully appreciate that I have no ownership over any of > the code. > Neither do I, I'm just reviewer here just as you :) -- Julien Rouhaud http://dalibo.com - http://dalibo.org
On 15 January 2016 at 00:19, Julien Rouhaud <julien.rouhaud@dalibo.com> wrote:
+ * Technically we could look at UNIQUE indexes too, however we'd also
+ * have to ensure that each column of the unique index had a NOT NULL
s/had/has/
+ * constraint, however since NOT NULL constraints currently are don't
s/are //
Both fixed. Thanks.
> + /*
> + * If we found any surplus Vars in the GROUP BY clause, then
> we'll build
> + * a new GROUP BY clause without these surplus Vars.
> + */
> + if (anysurplus)
> + {
> + List *new_groupby = NIL;
> +
> + foreach (lc, root->parse->groupClause)
> + {
> + SortGroupClause *sgc = (SortGroupClause *) lfirst(lc);
> + TargetEntry *tle;
> + Var *var;
> +
> + tle = get_sortgroupclause_tle(sgc, root->parse->targetList);
> + var = (Var *) tle->expr;
> +
> + if (!IsA(var, Var))
> + continue;
> + [...]
>
> if var isn't a Var, it needs to be added in new_groupby.
>
>
> Yes you're right, well, at least I've written enough code to ensure that
> it's not needed.
Oh yes, I realize that now.
I meant to say "I've not written enough code" ...
> Technically we could look inside non-Vars and check if the Expr is just
> made up of Vars from a single relation, and then we could also make that
> surplus if we find other Vars which make up the table's primary key. I
> didn't make these changes as I thought it was a less likely scenario. It
> wouldn't be too much extra code to add however. I've went and added an
> XXX comment to explain that there might be future optimisation
> opportunities in the future around this.
>
Agreed.
> I've attached an updated patch.
>
+ /* don't try anything unless there's two Vars */
+ if (varlist == NULL || list_length(varlist) < 2)
+ continue;
To be perfectly correct, the comment should say "at least two Vars".
Changed per discussion from you and Geoff
Except the small remaining typos, this patch looks very fine to me. I
flag it as ready for committer.
Many thanks for your followup review.
I've attached an updated patch to address what you highlighted above.
Вложения
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 14/01/2016 23:05, David Rowley wrote: > On 15 January 2016 at 00:19, Julien Rouhaud > <julien.rouhaud@dalibo.com <mailto:julien.rouhaud@dalibo.com>> > wrote: > > > + * Technically we could look at UNIQUE indexes > too, however we'd also + * have to ensure that each > column of the unique index had a NOT NULL > > > s/had/has/ > > > + * constraint, however since NOT NULL constraints > currently are don't > > > s/are // > > > Both fixed. Thanks. > > >> + /* + * If we found any surplus Vars in the GROUP BY >> clause, then we'll build + * a new GROUP BY clause without >> these surplus Vars. + */ + if (anysurplus) + { + >> List *new_groupby = NIL; + + foreach (lc, >> root->parse->groupClause) + { + SortGroupClause >> *sgc = (SortGroupClause *) lfirst(lc); + TargetEntry >> *tle; + Var *var; + + tle = >> get_sortgroupclause_tle(sgc, root->parse->targetList); + >> var = (Var *) tle->expr; + + if (!IsA(var, Var)) + >> continue; + [...] >> >> if var isn't a Var, it needs to be added in new_groupby. >> >> >> Yes you're right, well, at least I've written enough code to >> ensure that it's not needed. > > Oh yes, I realize that now. > > > I meant to say "I've not written enough code" ... > Yes, that makes sense with the explanation you wrote just after :) > >> Technically we could look inside non-Vars and check if the Expr >> is just made up of Vars from a single relation, and then we could >> also make that surplus if we find other Vars which make up the >> table's primary key. I didn't make these changes as I thought it >> was a less likely scenario. It wouldn't be too much extra code to >> add however. I've went and added an XXX comment to explain that >> there might be future optimisation opportunities in the future >> around this. >> > > Agreed. > >> I've attached an updated patch. >> > > + /* don't try anything unless there's two Vars */ + > if (varlist == NULL || list_length(varlist) < 2) + > continue; > > To be perfectly correct, the comment should say "at least two > Vars". > > > Changed per discussion from you and Geoff > > > Except the small remaining typos, this patch looks very fine to me. > I flag it as ready for committer. > > > Many thanks for your followup review. > > I've attached an updated patch to address what you highlighted > above. > Thanks! > > -- David Rowley http://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Training & Services > > > - -- Julien Rouhaud http://dalibo.com - http://dalibo.org -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.17 (GNU/Linux) iQEcBAEBAgAGBQJWmCoYAAoJELGaJ8vfEpOqJzIIALajMHHd8aCDnAuH9uNUyevU EuKyHTRDJkc8KUNbtDeSpVf9UGT3XUaZx4k/o+aKDdRB/RfYK0GKyDv2Owr4Wx3F 5BY9GuEO3vjqaFuDBH5u9EjySal3jQYC57nB3I0hRvpVRQBi0nFyQre/SXplCB6q X1NqBfICyu6lwwocAMCeW9qN4ZQclLjxoScJpA4ml9Kj6CQvK2dDSyS00gstLFPH Bj+n20wEcC7ZyxCpxfHmoZW1sjAvZK5mwrEdFG+lvCO8OwN/73YvDFzQHrpvVXWE ZVoz069kfekZtXQ1OQ5CcvOAJLD9ewbPq+rpKh9YQorZB1R9QEj0qaxqo7LtjhA= =G/uH -----END PGP SIGNATURE-----
David Rowley <david.rowley@2ndquadrant.com> writes: > [ prune_group_by_clause_59f15cf_2016-01-15.patch ] I spent some time looking through this but decided that it needs some work to be committable. The main thing I didn't like was that identify_key_vars seems to have a rather poorly chosen API: it mixes up identifying a rel's pkey with sorting through the GROUP BY list. I think it would be better to create a function that, given a relid, hands back the OID of the pkey constraint and a Bitmapset of the column numbers in the pkey (or 0/NULL if there's no pkey or it's deferrable). The column numbers should be offset by FirstLowInvalidHeapAttributeNumber so that a pkey on OID can be represented correctly --- compare RelationGetIndexAttrBitmap(). The reason this seems like a more attractive solution is that the output of the pg_constraint lookup becomes independent of the particular query and thus is potentially cacheable (in the relcache, say). I do not think we need to cache it right now but I'd like to have the option to do so. (I wonder BTW whether check_functional_grouping couldn't be refactored to use the output of such a function, too.) Some lesser points: * I did not like where you plugged in the call to remove_useless_groupby_columns; there are weird interactions with grouping sets as to whether it will get called or not, and also that whole chunk of code is due for refactoring. I'd be inclined to call it from subquery_planner instead, maybe near where the HAVING clause preprocessing happens. * You've got it iterating over every RTE, even those that aren't RTE_RELATION RTEs. It's pure luck that identify_key_vars doesn't fail outright when passed a bogus relid, and it's certainly wasteful. * Both of the loops iterating over the groupClause neglect to check varlevelsup, thus leading to assert failures or worse if an outer-level Var is present in the GROUP BY list. (I'm pretty sure outer Vars can still be present at this point, though I might be wrong.) * I'd be inclined to see if the relvarlists couldn't be converted into bitmapsets too. Then the test to see if the pkey is a subset of the grouping columns becomes a simple and cheap bms_is_subset test. You don't really need surplusvars either, because you can instead test Vars for membership in the pkey bitmapsets that pg_constraint.c will be handing back. * What you did to join.sql/join.out seems a bit bizarre. The existing test case that you modified was meant to test something else, and conflating this behavior with the pre-existing one (and not documenting it) is confusing as can be. A bit more work on regression test cases seems indicated. I'm going to set this back to "Waiting on Author". I think the basic idea is good but it needs a rewrite. regards, tom lane
On 23 January 2016 at 12:44, Tom Lane <tgl@sss.pgh.pa.us> wrote: > I spent some time looking through this but decided that it needs some work > to be committable. > > The main thing I didn't like was that identify_key_vars seems to have a > rather poorly chosen API: it mixes up identifying a rel's pkey with > sorting through the GROUP BY list. I think it would be better to create > a function that, given a relid, hands back the OID of the pkey constraint > and a Bitmapset of the column numbers in the pkey (or 0/NULL if there's > no pkey or it's deferrable). The column numbers should be offset by > FirstLowInvalidHeapAttributeNumber so that a pkey on OID can be > represented correctly --- compare RelationGetIndexAttrBitmap(). That seems like a much better design to me too. I've made changes and attached the updated patch. > The reason this seems like a more attractive solution is that the output > of the pg_constraint lookup becomes independent of the particular query > and thus is potentially cacheable (in the relcache, say). I do not think > we need to cache it right now but I'd like to have the option to do so. I've not touched that yet, but good idea. > (I wonder BTW whether check_functional_grouping couldn't be refactored > to use the output of such a function, too.) I've attached a separate patch for that too. It applies after the prunegroupby patch. > Some lesser points: > > * I did not like where you plugged in the call to > remove_useless_groupby_columns; there are weird interactions with grouping > sets as to whether it will get called or not, and also that whole chunk > of code is due for refactoring. I'd be inclined to call it from > subquery_planner instead, maybe near where the HAVING clause preprocessing > happens. I've moved the call to subquery_planner() > * You've got it iterating over every RTE, even those that aren't > RTE_RELATION RTEs. It's pure luck that identify_key_vars doesn't fail > outright when passed a bogus relid, and it's certainly wasteful. :-( I've fixed that now. > * Both of the loops iterating over the groupClause neglect to check > varlevelsup, thus leading to assert failures or worse if an outer-level > Var is present in the GROUP BY list. (I'm pretty sure outer Vars can > still be present at this point, though I might be wrong.) Fixed in the first loop, and the way I've rewritten the code to use bms_difference, there's no need to check again in the 2nd loop. > * I'd be inclined to see if the relvarlists couldn't be converted into > bitmapsets too. Then the test to see if the pkey is a subset of the > grouping columns becomes a simple and cheap bms_is_subset test. You don't > really need surplusvars either, because you can instead test Vars for > membership in the pkey bitmapsets that pg_constraint.c will be handing > back. I've changed to use a bitmapset now, but I've kept surplusvars, having this allows a single pass over the group by clause to remove the surplus Vars, rather than doing it again and again for each relation. I've also found a better way so that array is only allocated if some surplus Vars are found. I understand what you mean, and yes, it is possible to get rid of it, but I'd need to still add something else to mark that this rel's extra Vars are okay to be removed. I could do that by adding another bitmapset and marking the relid in that, but I quite like how I've changed it now as it saves having to check varlevelsup again on the Vars in the GROUP BY clause, as I just use bms_difference() on the originally found Vars subtracting off the PK vars, and assume all of those are surplus. > * What you did to join.sql/join.out seems a bit bizarre. The existing > test case that you modified was meant to test something else, and > conflating this behavior with the pre-existing one (and not documenting > it) is confusing as can be. A bit more work on regression test cases > seems indicated. The history behind that is that at one point during developing the patch that test had started failing due to the group by item being removed therefore allowing the join removal conditions to be met. On testing again with the old test query I see this no longer happens, so I've removed the change, although the expected output still differs due to the group by item being removed. > I'm going to set this back to "Waiting on Author". I think the basic > idea is good but it needs a rewrite. Thanks for the review and the good ideas. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Вложения
On 23/01/2016 10:44, David Rowley wrote: > On 23 January 2016 at 12:44, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I spent some time looking through this but decided that it needs some work >> to be committable. >> >> The main thing I didn't like was that identify_key_vars seems to have a >> rather poorly chosen API: it mixes up identifying a rel's pkey with >> sorting through the GROUP BY list. I think it would be better to create >> a function that, given a relid, hands back the OID of the pkey constraint >> and a Bitmapset of the column numbers in the pkey (or 0/NULL if there's >> no pkey or it's deferrable). The column numbers should be offset by >> FirstLowInvalidHeapAttributeNumber so that a pkey on OID can be >> represented correctly --- compare RelationGetIndexAttrBitmap(). > > That seems like a much better design to me too. I've made changes and > attached the updated patch. > >> The reason this seems like a more attractive solution is that the output >> of the pg_constraint lookup becomes independent of the particular query >> and thus is potentially cacheable (in the relcache, say). I do not think >> we need to cache it right now but I'd like to have the option to do so. > > I've not touched that yet, but good idea. > >> (I wonder BTW whether check_functional_grouping couldn't be refactored >> to use the output of such a function, too.) > > I've attached a separate patch for that too. It applies after the > prunegroupby patch. > This refactoring makes the code much more better, +1 for me. I also reviewed it, it looks fine. However, the following removed comment could be kept for clarity: - /* The PK is a subset of grouping_columns, so we win */ >> Some lesser points: >> >> * I did not like where you plugged in the call to >> remove_useless_groupby_columns; there are weird interactions with grouping >> sets as to whether it will get called or not, and also that whole chunk >> of code is due for refactoring. I'd be inclined to call it from >> subquery_planner instead, maybe near where the HAVING clause preprocessing >> happens. > > I've moved the call to subquery_planner() > >> * You've got it iterating over every RTE, even those that aren't >> RTE_RELATION RTEs. It's pure luck that identify_key_vars doesn't fail >> outright when passed a bogus relid, and it's certainly wasteful. > > :-( I've fixed that now. > >> * Both of the loops iterating over the groupClause neglect to check >> varlevelsup, thus leading to assert failures or worse if an outer-level >> Var is present in the GROUP BY list. (I'm pretty sure outer Vars can >> still be present at this point, though I might be wrong.) > > Fixed in the first loop, and the way I've rewritten the code to use > bms_difference, there's no need to check again in the 2nd loop. > >> * I'd be inclined to see if the relvarlists couldn't be converted into >> bitmapsets too. Then the test to see if the pkey is a subset of the >> grouping columns becomes a simple and cheap bms_is_subset test. You don't >> really need surplusvars either, because you can instead test Vars for >> membership in the pkey bitmapsets that pg_constraint.c will be handing >> back. > > I've changed to use a bitmapset now, but I've kept surplusvars, having > this allows a single pass over the group by clause to remove the > surplus Vars, rather than doing it again and again for each relation. > I've also found a better way so that array is only allocated if some > surplus Vars are found. I understand what you mean, and yes, it is > possible to get rid of it, but I'd need to still add something else to > mark that this rel's extra Vars are okay to be removed. I could do > that by adding another bitmapset and marking the relid in that, but I > quite like how I've changed it now as it saves having to check > varlevelsup again on the Vars in the GROUP BY clause, as I just use > bms_difference() on the originally found Vars subtracting off the PK > vars, and assume all of those are surplus. > I wonder if in remove_useless_groupby_columns(), in the foreach loop you could change the + if (bms_subset_compare(pkattnos, relattnos) == BMS_SUBSET1) + { by something like + if (bms_num_members(relattnos) <= bms_num_members(pkattnos)) + continue; + + if (bms_is_subset(pkattnos, relattnos)) + { which may be cheaper. Otherwise, I couldn't find any issue with this new version of the patch. >> * What you did to join.sql/join.out seems a bit bizarre. The existing >> test case that you modified was meant to test something else, and >> conflating this behavior with the pre-existing one (and not documenting >> it) is confusing as can be. A bit more work on regression test cases >> seems indicated. > > The history behind that is that at one point during developing the > patch that test had started failing due to the group by item being > removed therefore allowing the join removal conditions to be met. On > testing again with the old test query I see this no longer happens, so > I've removed the change, although the expected output still differs > due to the group by item being removed. > >> I'm going to set this back to "Waiting on Author". I think the basic >> idea is good but it needs a rewrite. > > Thanks for the review and the good ideas. > -- Julien Rouhaud http://dalibo.com - http://dalibo.org
On 24 January 2016 at 00:56, Julien Rouhaud <julien.rouhaud@dalibo.com> wrote: > I wonder if in remove_useless_groupby_columns(), in the foreach loop you > could change the > > + if (bms_subset_compare(pkattnos, relattnos) == BMS_SUBSET1) > + { > > by something like > > > + if (bms_num_members(relattnos) <= bms_num_members(pkattnos)) > + continue; > + > + if (bms_is_subset(pkattnos, relattnos)) > + { > > > which may be cheaper. Thanks for looking over this again. I actually had that part written quite a few different ways before I finally decided to use bms_subset_compare. I didn't benchmark, but I thought 1 function call was better than 2, as I had it as bms_is_subset(pkattnos, relattnos) && !bms_is_subset(relattnos, pkattnos), and again with !bms_equal() instead of the 2nd subset test. I figured 1 function call was better than 2, so finally settled on bms_subset_compare(). I'm thinking that 3 function calls likely won't make things better. I can't imagine it's going to cost much either way, so I doubt it's worth trying to check whats faster. Although the thing about bms_num_members() is that it's going to loop over each word in the bitmap no matter what, whereas a subset check can abort early in some cases. -- David Rowley http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 23/01/2016 14:51, David Rowley wrote: > On 24 January 2016 at 00:56, Julien Rouhaud <julien.rouhaud@dalibo.com> wrote: >> I wonder if in remove_useless_groupby_columns(), in the foreach loop you >> could change the >> >> + if (bms_subset_compare(pkattnos, relattnos) == BMS_SUBSET1) >> + { >> >> by something like >> >> >> + if (bms_num_members(relattnos) <= bms_num_members(pkattnos)) >> + continue; >> + >> + if (bms_is_subset(pkattnos, relattnos)) >> + { >> >> >> which may be cheaper. > > Thanks for looking over this again. > > I actually had that part written quite a few different ways before I > finally decided to use bms_subset_compare. I didn't benchmark, but I > thought 1 function call was better than 2, as I had it as > bms_is_subset(pkattnos, relattnos) && !bms_is_subset(relattnos, > pkattnos), and again with !bms_equal() instead of the 2nd subset test. > I figured 1 function call was better than 2, so finally settled on > bms_subset_compare(). I'm thinking that 3 function calls likely won't > make things better. I can't imagine it's going to cost much either > way, so I doubt it's worth trying to check whats faster. Although the > thing about bms_num_members() is that it's going to loop over each > word in the bitmap no matter what, whereas a subset check can abort > early in some cases. > > FWIW, this code was simplified example. bms_num_members(relattnos) is already called a few lines before, so it'd be 1 function call against 2 function calls (and a var assignment). -- Julien Rouhaud http://dalibo.com - http://dalibo.org
Julien Rouhaud <julien.rouhaud@dalibo.com> writes: > I wonder if in remove_useless_groupby_columns(), in the foreach loop you > could change the > + if (bms_subset_compare(pkattnos, relattnos) == BMS_SUBSET1) > + { > by something like > + if (bms_num_members(relattnos) <= bms_num_members(pkattnos)) > + continue; > + > + if (bms_is_subset(pkattnos, relattnos)) > + { > which may be cheaper. FWIW, I really doubt that would be cheaper. The various flavors of subset comparison are word-at-a-time bitmasking operations, but bms_num_members has to grovel over individual bits; it's certain to be more expensive than a subset test. regards, tom lane
David Rowley <david.rowley@2ndquadrant.com> writes: > On 23 January 2016 at 12:44, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> * What you did to join.sql/join.out seems a bit bizarre. The existing >> test case that you modified was meant to test something else, and >> conflating this behavior with the pre-existing one (and not documenting >> it) is confusing as can be. A bit more work on regression test cases >> seems indicated. > The history behind that is that at one point during developing the > patch that test had started failing due to the group by item being > removed therefore allowing the join removal conditions to be met. On > testing again with the old test query I see this no longer happens, so > I've removed the change, although the expected output still differs > due to the group by item being removed. Hmm ... but ... it seems to me that the test as it stands *should* fail after this patch, because once the non-pkey grouping column is removed the join removal optimization should apply. I think we should look a bit more closely at what's happening there. (IOW, I wasn't so much unhappy with the change to that test case as that it was being used as the only test case for this new behavior. I see you added some new, separate test cases, so that's good; but there's something fishy if the existing case doesn't change behavior.) regards, tom lane
On 24 January 2016 at 08:20, Tom Lane <tgl@sss.pgh.pa.us> wrote: > David Rowley <david.rowley@2ndquadrant.com> writes: >> On 23 January 2016 at 12:44, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> * What you did to join.sql/join.out seems a bit bizarre. The existing >>> test case that you modified was meant to test something else, and >>> conflating this behavior with the pre-existing one (and not documenting >>> it) is confusing as can be. A bit more work on regression test cases >>> seems indicated. > >> The history behind that is that at one point during developing the >> patch that test had started failing due to the group by item being >> removed therefore allowing the join removal conditions to be met. On >> testing again with the old test query I see this no longer happens, so >> I've removed the change, although the expected output still differs >> due to the group by item being removed. > > Hmm ... but ... it seems to me that the test as it stands *should* fail > after this patch, because once the non-pkey grouping column is removed > the join removal optimization should apply. I think we should look a bit > more closely at what's happening there. > > (IOW, I wasn't so much unhappy with the change to that test case as > that it was being used as the only test case for this new behavior. > I see you added some new, separate test cases, so that's good; but > there's something fishy if the existing case doesn't change behavior.) Thanks for looking at this again. I've looked into why the join is not removed; since the redundant GROUP BY columns are removed during planning, and since the outer query is planned before the sub query, then when the join removal code checks if the subquery can been removed, the subquery is yet to be planned, so still contains the 2 GROUP BY items. Perhaps the useless columns can be removed a bit earlier, perhaps in parse analysis. I will look into that now. -- David Rowley http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
David Rowley <david.rowley@2ndquadrant.com> writes: > I've looked into why the join is not removed; since the redundant > GROUP BY columns are removed during planning, and since the outer > query is planned before the sub query, then when the join removal code > checks if the subquery can been removed, the subquery is yet to be > planned, so still contains the 2 GROUP BY items. Hmm ... but why did it get removed in the earlier patch version, then? > Perhaps the useless columns can be removed a bit earlier, perhaps in > parse analysis. I will look into that now. No; doing this in parse analysis will be sufficient reason to reject the patch. That would mean adding a not-semantically-necessary dependency on the pkey to a query when it is stored as a view. It has to be done at planning time and no sooner. It's possible that you could integrate it into some earlier phase of planning, like prepjointree, but I think that would be messy and likely not worth it. I don't see any existing query-tree traversal this could piggyback on, and I doubt we want to add a new pass just for this. regards, tom lane
On 25 January 2016 at 10:17, Tom Lane <tgl@sss.pgh.pa.us> wrote: > David Rowley <david.rowley@2ndquadrant.com> writes: >> I've looked into why the join is not removed; since the redundant >> GROUP BY columns are removed during planning, and since the outer >> query is planned before the sub query, then when the join removal code >> checks if the subquery can been removed, the subquery is yet to be >> planned, so still contains the 2 GROUP BY items. > > Hmm ... but why did it get removed in the earlier patch version, then? I'm not sure now, it was months ago. Perhaps I misremembered and only altered the test because I mistakenly anticipated it would break. >> Perhaps the useless columns can be removed a bit earlier, perhaps in >> parse analysis. I will look into that now. > > No; doing this in parse analysis will be sufficient reason to reject the > patch. That would mean adding a not-semantically-necessary dependency on > the pkey to a query when it is stored as a view. It has to be done at > planning time and no sooner. > > It's possible that you could integrate it into some earlier phase of > planning, like prepjointree, but I think that would be messy and likely > not worth it. I don't see any existing query-tree traversal this could > piggyback on, and I doubt we want to add a new pass just for this. It seems like a bit of a corner case anyway. Maybe it's fine as is. -- David Rowley http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
This patch got a good share of review, so it's fair to move it to the next commitfest. I marked it as ready-for-committer there which seems to be the right status. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
David Rowley <david.rowley@2ndquadrant.com> writes: > [ prune_group_by_clause_ab4f491_2016-01-23.patch ] > [ check_functional_grouping_refactor.patch ] I've committed this with mostly-cosmetic revisions (principally, rewriting a lot of the comments, which seemed on the sloppy side). >> * Both of the loops iterating over the groupClause neglect to check >> varlevelsup, thus leading to assert failures or worse if an outer-level >> Var is present in the GROUP BY list. (I'm pretty sure outer Vars can >> still be present at this point, though I might be wrong.) > Fixed in the first loop, and the way I've rewritten the code to use > bms_difference, there's no need to check again in the 2nd loop. Um, AFAICS, you *do* need to check again in the second loop, else you'll be accessing a surplusvars[] entry that might not exist at all, and in any case might falsely tell you that you can exclude the outer var from the new GROUP BY list. That was the only actual bug I found, though I changed some other stuff. regards, tom lane
<p dir="ltr">On 12/02/2016 12:01 am, "Tom Lane" <<a href="mailto:tgl@sss.pgh.pa.us">tgl@sss.pgh.pa.us</a>> wrote:<br/> ><br /> > David Rowley <<a href="mailto:david.rowley@2ndquadrant.com">david.rowley@2ndquadrant.com</a>>writes:<br /> > > [ prune_group_by_clause_ab4f491_2016-01-23.patch]<br /> > > [ check_functional_grouping_refactor.patch ]<br /> ><br/> > I've committed this with mostly-cosmetic revisions (principally, rewriting<br /> > a lot of the comments,which seemed on the sloppy side).<p dir="ltr">Many thanks for committing this and improving the comments.<p dir="ltr">>>> * Both of the loops iterating over the groupClause neglect to check<br /> > >> varlevelsup,thus leading to assert failures or worse if an outer-level<br /> > >> Var is present in the GROUP BYlist. (I'm pretty sure outer Vars can<br /> > >> still be present at this point, though I might be wrong.)<br/> ><br /> > > Fixed in the first loop, and the way I've rewritten the code to use<br /> > > bms_difference,there's no need to check again in the 2nd loop.<br /> ><br /> > Um, AFAICS, you *do* need to check againin the second loop, else you'll<br /> > be accessing a surplusvars[] entry that might not exist at all, and in<br/> > any case might falsely tell you that you can exclude the outer var from<br /> > the new GROUP BY list.<br/> ><p dir="ltr">I can't quite understand what you're seeing here. As far as I can tell from reading the codeagain (on my phone ) the varlevelsup check in the 2nd loop is not required. Any var with varlevelsup above zero won'tmake it into the surplus bitmap for the release as the bms_difference call just removes the pk vars from the varlevelsup=0 vars. So the bms_ismember should be fine. I can't see what I'm missing. In either case it test does no harm.<p dir="ltr">> That was the only actual bug I found, though I changed some other stuff.
David Rowley <david.rowley@2ndquadrant.com> writes: > On 12/02/2016 12:01 am, "Tom Lane" <tgl@sss.pgh.pa.us> wrote: >> Um, AFAICS, you *do* need to check again in the second loop, else you'll >> be accessing a surplusvars[] entry that might not exist at all, and in >> any case might falsely tell you that you can exclude the outer var from >> the new GROUP BY list. > I can't quite understand what you're seeing here. The second loop is iterating through the original GROUP BY list, so it will see again any outer Vars that were excluded by the first loop. It needs to exclude them exactly the same, because they are outside the scope of its data structures. Consider something like (admittedly pretty silly, but legal SQL) create table up (u1 int, u2 int, u3 int); create table down (f1 int primary key, f2 int); select * from othertable, up where u1 in (select f2 from down group by f1, f2, up.u3); up.u3 would have varlevelsup = 1, varno = 2, varattno = 3. If you don't skip it then the surplusvars[var->varno] access will be trying to fetch off the end of the surplusvars[] array, because there is only one RTE in the subquery's rangetable though there are two in the outer query's rangetable. When I trace through this example, it manages not to crash because in point of fact the outer Var has already been replaced by a Param. But I don't think this code should be assuming that; it's an artifact of the detailed order of processing in subquery_planner(), and could easily change in the future, for example if we tried to move the whole affair to the jointree preprocessing stage as we discussed earlier. regards, tom lane
<p dir="ltr">On 14/02/2016 5:11 pm, "Tom Lane" <<a href="mailto:tgl@sss.pgh.pa.us">tgl@sss.pgh.pa.us</a>> wrote:<br/> ><br /> > David Rowley <<a href="mailto:david.rowley@2ndquadrant.com">david.rowley@2ndquadrant.com</a>>writes:<br /> > > On 12/02/2016 12:01am, "Tom Lane" <<a href="mailto:tgl@sss.pgh.pa.us">tgl@sss.pgh.pa.us</a>> wrote:<br /> > > I can't quiteunderstand what you're seeing here.<br /> ><br /> > The second loop is iterating through the original GROUP BYlist, so it<br /> > will see again any outer Vars that were excluded by the first loop.<br /> > It needs to excludethem exactly the same, because they are outside<br /> > the scope of its data structures. Consider something like(admittedly<br /> > pretty silly, but legal SQL)<br /> ><br /> > create table up (u1 int, u2 int, u3 int);<br/> > create table down (f1 int primary key, f2 int);<br /> ><br /> > select * from othertable, up<br />> where u1 in (select f2 from down group by f1, f2, up.u3);<br /> ><br /> > up.u3 would have varlevelsup = 1,varno = 2, varattno = 3.<br /> > If you don't skip it then the surplusvars[var->varno] access<br /> > will betrying to fetch off the end of the surplusvars[] array,<br /> > because there is only one RTE in the subquery's rangetable<br/> > though there are two in the outer query's rangetable.<p dir="ltr">Thanks for explaining this. ClearlyI missed the case of the varno pointing off the end of the array. Thanks for noticing and fixing. <br />