Re: [BUG] Remove self joins causes 'variable not found in subplan target lists' error
От | Richard Guo |
---|---|
Тема | Re: [BUG] Remove self joins causes 'variable not found in subplan target lists' error |
Дата | |
Msg-id | CAMbWs4_14JzUQaKkAhxyv40aWOQsGtZ1hWuYeaSg3OjT_w7fQA@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: [BUG] Remove self joins causes 'variable not found in subplan target lists' error (Tom Lane <tgl@sss.pgh.pa.us>) |
Список | pgsql-hackers |
On Wed, Aug 27, 2025 at 2:26 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Richard Guo <guofenglinux@gmail.com> writes: > > Instead of repeatedly calling make_pathkeys_for_sortclauses to detect > > redundant expressions, I'm wondering if we could introduce a function > > that detects whether a given expression is known equal to a constant > > by the EquivalenceClass machinery. This function should not be too > > costly, as we'd only need to examine ECs that contain pseudoconstants. > I did consider that, but rejected it. I kind of like the fact that > the v3 patch gets rid of duplicated columns-to-be-made-unique. > Yes, repeatedly doing make_pathkeys_for_sortclauses is a bit grotty, > because it's O(N^2) in the number of columns-to-be-made-unique ... > but realistically, how often is that ever more than a couple? > In the common case where the semijoin has only one join column, > the patch adds basically zero work, which is nice too. Fair point. I'm now inclined to agree that the v3 patch is the better approach, as it removes duplicate grouping expressions. I looked into the first loose end you mentioned earlier and tried to write a query that would produce duplicate hash keys for HashAggregate but failed. After checking all the callers of create_agg_path(), it seems that we now eliminate duplicate grouping keys in all cases. That commit mentioned the possibility that duplicate columns might have different hash functions assigned, so we may still need the changes in the executor. But we need to update the test case with a new query that produces duplicate hash keys, or remove it if we cannot find one. On the other hand, we may want to add a similar query in join.sql to show that duplicate grouping expressions can be removed during unique-ification. > > We could then use this function to remove expressions that are known > > constant from semi_rhs_exprs. And if we find that all expressions > > in semi_rhs_exprs are known constant (the second loose end you > > mentioned), we can give up building unique paths and fall back to a > > traditional JOIN_SEMI. > Yeah, I was thinking we could just use the paths of the existing > rel, but really this case means that we'd need to de-dup down > to a single row. We could maybe do something involving plastering > LIMIT 1 on top of each input path, but it's such a weird and > probably-useless-in-the-real-world case that I don't think it's > worth expending effort on. Failing outright seems fine. Yeah. When computing semi_rhs_exprs in compute_semijoin_info(), we punt if we find that there are no columns to unique-ify. /* Punt if we didn't find at least one column to unique-ify */ if (semi_rhs_exprs == NIL) return; I think the case we're discussing here -- where all semi_rhs_exprs are known constants -- is essentially the same in that it leaves us no columns to unique-ify. So it should be fine to just punt in this case as well. So I think we may need to add the following changes: + /* If there are no columns left to unique-ify, return NULL. */ + if (sortPathkeys == NIL && groupClause == NIL) + { + MemoryContextSwitchTo(oldcontext); + return NULL; + } + /* build unique paths based on input rel's pathlist */ > > Another question we need to consider is how to fix this error in v18, > > where it seems we'll need to remove redundant expressions during > > createplan.c. > Ugh ... I forgot you just refactored all that code. > > I wonder if the right answer in v18 is to back out a3179ab69. > Not fixing that performance regression for another year would > be mildly annoying; but AFAIR we've still had only the one > complaint, so it seems like it's not hurting many people. > And we are getting mighty late in the release calendar to be > inventing new stuff in v18. Yeah, inventing new stuff in v18 at this point seems risky. I think backing out a3179ab69 should be fine for v18. Thanks Richard
В списке pgsql-hackers по дате отправления: