Обсуждение: pgsql: Clean up all relid fields of RestrictInfos during join removal.
Clean up all relid fields of RestrictInfos during join removal. The original implementation of remove_rel_from_restrictinfo() thought it could skate by with removing no-longer-valid relid bits from only the clause_relids and required_relids fields. This is quite bogus, although somehow we had not run across a counterexample before now. At minimum, the left_relids and right_relids fields need to be fixed because they will be examined later by clause_sides_match_join(). But it seems pretty foolish not to fix all the relid fields, so do that. This needs to be back-patched as far as v16, because the bug report shows a planner failure that does not occur before v16. I'm a little nervous about back-patching, because this could cause unexpected plan changes due to opening up join possibilities that were rejected before. But it's hard to argue that this isn't a regression. Also, the fact that this changes no existing regression test results suggests that the scope of changes may be fairly narrow. I'll refrain from back-patching further though, since no adverse effects have been demonstrated in older branches. Bug: #19460 Reported-by: François Jehl <francois.jehl@pigment.com> Author: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Richard Guo <guofenglinux@gmail.com> Discussion: https://postgr.es/m/19460-5625143cef66012f@postgresql.org Backpatch-through: 16 Branch ------ master Details ------- https://git.postgresql.org/pg/commitdiff/cfcd5711160a42249def8f781bae197829cf44c7 Modified Files -------------- src/backend/optimizer/plan/analyzejoins.c | 18 +++++++++++++- src/test/regress/expected/join.out | 39 +++++++++++++++++++++++++++++++ src/test/regress/sql/join.sql | 23 ++++++++++++++++++ 3 files changed, 79 insertions(+), 1 deletion(-)
Hi Tom, On Mon, Apr 20, 2026 at 06:48:35PM +0000, Tom Lane wrote: > Clean up all relid fields of RestrictInfos during join removal. > > The original implementation of remove_rel_from_restrictinfo() > thought it could skate by with removing no-longer-valid relid > bits from only the clause_relids and required_relids fields. > This is quite bogus, although somehow we had not run across a > counterexample before now. At minimum, the left_relids and > right_relids fields need to be fixed because they will be > examined later by clause_sides_match_join(). But it seems > pretty foolish not to fix all the relid fields, so do that. prion looks unhappy on this one: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prion&dt=2026-04-20%2022%3A50%3A01 This uses -DRELCACHE_FORCE_RELEASE -DCATCACHE_FORCE_RELEASE. -- Michael
Вложения
On Tue, Apr 21, 2026 at 8:03 AM Michael Paquier <michael@paquier.xyz> wrote: > > Hi Tom, > > On Mon, Apr 20, 2026 at 06:48:35PM +0000, Tom Lane wrote: > > Clean up all relid fields of RestrictInfos during join removal. > > > > The original implementation of remove_rel_from_restrictinfo() > > thought it could skate by with removing no-longer-valid relid > > bits from only the clause_relids and required_relids fields. > > This is quite bogus, although somehow we had not run across a > > counterexample before now. At minimum, the left_relids and > > right_relids fields need to be fixed because they will be > > examined later by clause_sides_match_join(). But it seems > > pretty foolish not to fix all the relid fields, so do that. > > prion looks unhappy on this one: > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prion&dt=2026-04-20%2022%3A50%3A01 > > This uses -DRELCACHE_FORCE_RELEASE -DCATCACHE_FORCE_RELEASE. This seems the same problem as in skink. There is a WIP patch for the fix at: https://postgr.es/m/458729.1776724816@sss.pgh.pa.us - Richard
Richard Guo <guofenglinux@gmail.com> writes: > On Tue, Apr 21, 2026 at 8:03 AM Michael Paquier <michael@paquier.xyz> wrote: >> prion looks unhappy on this one: >> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prion&dt=2026-04-20%2022%3A50%3A01 >> This uses -DRELCACHE_FORCE_RELEASE -DCATCACHE_FORCE_RELEASE. > This seems the same problem as in skink. Yeah, likely. As best I can tell, the reason skink is falling over is that USE_VALGRIND enables list.c's DEBUG_LIST_MEMORY_USAGE, making list.c much more memory-hungry and thus able to recycle freed bitmapset storage before it would have been recycled in a regular debug build. Probably prion's options have a similar effect. Wanting to get the buildfarm green again, I didn't stop to dig into this interesting question: how the heck did cfcd57111 manage to pass regression testing in our standard test rig with CLOBBER_FREED_MEMORY enabled? In the problematic cases, remove_rel_from_eclass reduces cur_em->em_relids to empty causing it to be pfree'd, which means that there is now some associated RestrictInfo whose left_relids or right_relids is pointing at freed memory. That should result in the later bms_del_member calls in remove_rel_from_restrictinfo blowing up instantly on Assert(bms_is_valid_set(a)). The only way that doesn't happen, AFAICS, is if we repopulate that freed chunk with another Bitmapset in between. It's certainly possible given that the loop in remove_rel_from_restrictinfo will do some bms_copy's, but you would not think it'd happen that way reliably enough to get through check-world. regards, tom lane