Обсуждение: BUG #17355: Server crashes on ExecReScanForeignScan in postgres_fdw when accessing foreign partition
BUG #17355: Server crashes on ExecReScanForeignScan in postgres_fdw when accessing foreign partition
От
PG Bug reporting form
Дата:
The following bug has been logged on the website: Bug reference: 17355 Logged by: Alexander Lakhin Email address: exclusion@gmail.com PostgreSQL version: 14.1 Operating system: Ubuntu 20.04 Description: When running the following script (with 20 concurrent insert/update/delete queries): cat << 'EOF' | psql postgres CREATE EXTENSION postgres_fdw; CREATE SERVER testserver1 FOREIGN DATA WRAPPER postgres_fdw; DO $d$ BEGIN EXECUTE $$CREATE SERVER loopback FOREIGN DATA WRAPPER postgres_fdw OPTIONS (dbname '$$||current_database()||$$', port '$$||current_setting('port')||$$' )$$; END; $d$; CREATE USER MAPPING FOR CURRENT_USER SERVER loopback; CREATE TABLE async_pt (a int, b int) PARTITION BY RANGE (a); CREATE TABLE base_tbl2 (a int, b int); CREATE FOREIGN TABLE async_p2 PARTITION OF async_pt FOR VALUES FROM (2000) TO (3000) SERVER loopback OPTIONS (table_name 'base_tbl2'); CREATE TABLE async_p3 PARTITION OF async_pt FOR VALUES FROM (3000) TO (4000); EOF for i in `seq 20`; do cat << 'EOF' | psql postgres >psql-$i.log & INSERT INTO async_pt SELECT i FROM generate_series(3000, 3100) i; DELETE FROM async_pt; UPDATE async_pt SET a = a; EOF done wait I get crashes with the following stacktrace: Core was generated by `postgres: law postgres [local] DELETE '. Program terminated with signal SIGSEGV, Segmentation fault. #0 0x00007f0f438e6c42 in postgresReScanForeignScan (node=0x5580b57f23b8) at postgres_fdw.c:1651 1651 if (!fsstate->cursor_exists) (gdb) bt #0 0x00007f0f438e6c42 in postgresReScanForeignScan (node=0x5580b57f23b8) at postgres_fdw.c:1651 #1 0x00005580b4987bf9 in ExecReScanForeignScan (node=0x5580b57f23b8) at nodeForeignscan.c:326 #2 0x00005580b4942c0a in ExecReScan (node=0x5580b57f23b8) at execAmi.c:235 #3 0x00005580b49815bb in ExecProcNode (node=0x5580b57f23b8) at ../../../src/include/executor/executor.h:255 #4 0x00005580b4981e3b in ExecAppend (pstate=0x5580b57f1f48) at nodeAppend.c:360 #5 0x00005580b4967c88 in ExecProcNodeFirst (node=0x5580b57f1f48) at execProcnode.c:463 #6 0x00005580b495b774 in ExecProcNode (node=0x5580b57f1f48) at ../../../src/include/executor/executor.h:257 #7 0x00005580b4960373 in EvalPlanQualNext (epqstate=0x5580b57cf360) at execMain.c:2612 #8 0x00005580b495fc99 in EvalPlanQual (epqstate=0x5580b57cf360, relation=0x7f0f37b3d598, rti=3, inputslot=0x5580b57d0540) at execMain.c:2383 #9 0x00005580b49a70ed in ExecDelete (mtstate=0x5580b57cf278, resultRelInfo=0x5580b57cf5c8, tupleid=0x7ffdf2fa6002, oldtuple=0x0, planSlot=0x5580b57dc9a8, epqstate=0x5580b57cf360, estate=0x5580b57ceff0, processReturning=true, canSetTag=true, changingPart=false, tupleDeleted=0x0, epqreturnslot=0x0) at nodeModifyTable.c:1243 #10 0x00005580b49a95b3 in ExecModifyTable (pstate=0x5580b57cf278) at nodeModifyTable.c:2599 #11 0x00005580b4967c88 in ExecProcNodeFirst (node=0x5580b57cf278) at execProcnode.c:463 #12 0x00005580b495b774 in ExecProcNode (node=0x5580b57cf278) at ../../../src/include/executor/executor.h:257 #13 0x00005580b495e392 in ExecutePlan (estate=0x5580b57ceff0, planstate=0x5580b57cf278, use_parallel_mode=false, operation=CMD_DELETE, sendTuples=false, numberTuples=0, direction=ForwardScanDirection, dest=0x5580b57dad98, execute_once=true) at execMain.c:1551 #14 0x00005580b495beab in standard_ExecutorRun (queryDesc=0x5580b5711f80, direction=ForwardScanDirection, count=0, execute_once=true) at execMain.c:361 #15 0x00005580b495bc96 in ExecutorRun (queryDesc=0x5580b5711f80, direction=ForwardScanDirection, count=0, execute_once=true) at execMain.c:305 #16 0x00005580b4bd7cf8 in ProcessQuery (plan=0x5580b57d9ac0, sourceText=0x5580b56e9440 "DELETE FROM async_pt;", params=0x0, queryEnv=0x0, dest=0x5580b57dad98, qc=0x7ffdf2fa64b0) at pquery.c:160 #17 0x00005580b4bd98b7 in PortalRunMulti (portal=0x5580b574d090, isTopLevel=true, setHoldSnapshot=false, dest=0x5580b57dad98, altdest=0x5580b57dad98, qc=0x7ffdf2fa64b0) at pquery.c:1274 #18 0x00005580b4bd8d9d in PortalRun (portal=0x5580b574d090, count=9223372036854775807, isTopLevel=true, run_once=true, dest=0x5580b57dad98, altdest=0x5580b57dad98, qc=0x7ffdf2fa64b0) at pquery.c:788 #19 0x00005580b4bd1bd9 in exec_simple_query (query_string=0x5580b56e9440 "DELETE FROM async_pt;") at postgres.c:1214 #20 0x00005580b4bd6ac8 in PostgresMain (argc=1, argv=0x7ffdf2fa66d0, dbname=0x5580b5714f78 "postgres", username=0x5580b5714f58 "law") at postgres.c:4486 #21 0x00005580b4afb357 in BackendRun (port=0x5580b570dab0) at postmaster.c:4530 #22 0x00005580b4afabb2 in BackendStartup (port=0x5580b570dab0) at postmaster.c:4252 #23 0x00005580b4af69a7 in ServerLoop () at postmaster.c:1745 #24 0x00005580b4af6104 in PostmasterMain (argc=3, argv=0x5580b56e34b0) at postmaster.c:1417 #25 0x00005580b49e566a in main (argc=3, argv=0x5580b56e34b0) at main.c:209 postmaster.log contains: 2022-01-06 10:46:20.145 MSK [403022] ERROR: cannot re-evaluate a Foreign Update or Delete during EvalPlanQual 2022-01-06 10:46:20.145 MSK [403022] STATEMENT: UPDATE async_pt SET a = a; 2022-01-06 10:46:20.320 MSK [403059] LOG: unexpected EOF on client connection with an open transaction 2022-01-06 10:46:20.320 MSK [402977] LOG: server process (PID 403043) was terminated by signal 11: Segmentation fault 2022-01-06 10:46:20.320 MSK [402977] DETAIL: Failed process was running: DELETE FROM async_pt;
On Thu, Jan 6, 2022 at 5:25 PM PG Bug reporting form <noreply@postgresql.org> wrote: > When running the following script (with 20 concurrent insert/update/delete > queries): > cat << 'EOF' | psql postgres > CREATE EXTENSION postgres_fdw; > > CREATE SERVER testserver1 FOREIGN DATA WRAPPER postgres_fdw; > DO $d$ > BEGIN > EXECUTE $$CREATE SERVER loopback FOREIGN DATA WRAPPER postgres_fdw > OPTIONS (dbname '$$||current_database()||$$', > port '$$||current_setting('port')||$$' > )$$; > END; > $d$; > > CREATE USER MAPPING FOR CURRENT_USER SERVER loopback; > CREATE TABLE async_pt (a int, b int) PARTITION BY RANGE (a); > CREATE TABLE base_tbl2 (a int, b int); > CREATE FOREIGN TABLE async_p2 PARTITION OF async_pt FOR VALUES FROM (2000) > TO (3000) > SERVER loopback OPTIONS (table_name 'base_tbl2'); > CREATE TABLE async_p3 PARTITION OF async_pt FOR VALUES FROM (3000) TO > (4000); > EOF > > for i in `seq 20`; do > cat << 'EOF' | psql postgres >psql-$i.log & > INSERT INTO async_pt SELECT i FROM generate_series(3000, 3100) i; > DELETE FROM async_pt; > UPDATE async_pt SET a = a; > EOF > done > wait > > I get crashes with the following stacktrace: I haven't tried to reproduce this yet, but I have one question: did you set the async_capable option to true? Thanks for the report! Best regards, Etsuro Fujita
Re: BUG #17355: Server crashes on ExecReScanForeignScan in postgres_fdw when accessing foreign partition
От
Alexander Lakhin
Дата:
06.01.2022 12:56, Etsuro Fujita wrote: > I haven't tried to reproduce this yet, but I have one question: did > you set the async_capable option to true? The async_capable setting doesn't affect this, so I've filed it as a separate bug report. All the other parameters have default values in my setup. Best regards, Alexander
On Thu, Jan 6, 2022 at 9:00 PM Alexander Lakhin <exclusion@gmail.com> wrote: > 06.01.2022 12:56, Etsuro Fujita wrote: > > I haven't tried to reproduce this yet, but I have one question: did > > you set the async_capable option to true? > The async_capable setting doesn't affect this, so I've filed it as a > separate bug report. > All the other parameters have default values in my setup. Ok, thanks! Best regards, Etsuro Fujita
On Fri, Jan 7, 2022 at 3:19 PM Etsuro Fujita <etsuro.fujita@gmail.com> wrote: > On Thu, Jan 6, 2022 at 9:00 PM Alexander Lakhin <exclusion@gmail.com> wrote: > > 06.01.2022 12:56, Etsuro Fujita wrote: > > > I haven't tried to reproduce this yet I think a simple reproducer for this issue is: 1) Define the partitioned table async_pt as shown by Alexander. 2) Run concurrent transactions as follows. Session A: insert into async_pt values (3000); Session A: begin; Session A: update async_pt set a = a; Session B: delete from async_pt; Session A: commit; The commit in Session A would cause a server crash in Session B due to the segmentation fault. Also, I think a simple reproducer for the “cannot re-evaluate a Foreign Update or Delete during EvalPlanQual” error is: 1) Define the partitioned table async_pt as shown by Alexander. 2) Run concurrent transactions as follows. Session A: insert into async_pt values (3000); Session A: begin; Session A: update async_pt set a = a; Session B: update async_pt set a = a; Session A: commit; The commit in Session A would cause the transaction in Session B to abort with that error. I think the root cause of these issues is that because of the rework for inherited UPDATE/DELETE in commit 86dc90056, ForeignScan nodes doing direct modifications are re-evaluated as part of the EvalPlanQual subtree when doing an EvalPlanQual check, which breaks the assumption that those ForeignScan nodes should never be re-evaluated by EvalPlanQual, leading to these issues. To fix, I’d like to propose to ignore those ForeignScan nodes in ExecForeignScan/ExecReScanForeignScan when doing that recheck, like the attached. Best regards, Etsuro Fujita
Вложения
Re: BUG #17355: Server crashes on ExecReScanForeignScan in postgres_fdw when accessing foreign partition
От
Alexander Lakhin
Дата:
Hello Etsura-san, 18.01.2022 11:01, Etsuro Fujita wrote: > On Fri, Jan 7, 2022 at 3:19 PM Etsuro Fujita <etsuro.fujita@gmail.com> wrote: >> On Thu, Jan 6, 2022 at 9:00 PM Alexander Lakhin <exclusion@gmail.com> wrote: >>> 06.01.2022 12:56, Etsuro Fujita wrote: >>>> I haven't tried to reproduce this yet > I think a simple reproducer for this issue is: > > 1) Define the partitioned table async_pt as shown by Alexander. > 2) Run concurrent transactions as follows. > > Session A: insert into async_pt values (3000); > Session A: begin; > Session A: update async_pt set a = a; > Session B: delete from async_pt; > Session A: commit; > > The commit in Session A would cause a server crash in Session B due to > the segmentation fault. > > Also, I think a simple reproducer for the “cannot re-evaluate a > Foreign Update or Delete during EvalPlanQual” error is: > > 1) Define the partitioned table async_pt as shown by Alexander. > 2) Run concurrent transactions as follows. > > Session A: insert into async_pt values (3000); > Session A: begin; > Session A: update async_pt set a = a; > Session B: update async_pt set a = a; > Session A: commit; > > The commit in Session A would cause the transaction in Session B to > abort with that error. Thanks for the explanation and the fix! I've made a simple isolation test (see attachment) to confirm this. And it crashes without the fix as you said. With your fix it works as expected (I was wondering how "plan->operation != CMD_SELECT" with RETURNING will work and I haven't seen any issues yet.) (Besides that I've observed an infinite waiting for ShareLock with step "s1i" { INSERT INTO pt VALUES (2000); } This doesn't happen with a regular (not foreign) table.) Best regards, Alexander
Вложения
Hi Alexander, On Tue, Jan 25, 2022 at 3:00 AM Alexander Lakhin <exclusion@gmail.com> wrote: > I've made a simple isolation test (see attachment) to confirm this. And > it crashes without the fix as you said. > With your fix it works as expected Thanks for testing! > (I was wondering how "plan->operation > != CMD_SELECT" with RETURNING will work and I haven't seen any issues yet.) EvalPlanQual rechecking is irrelevant for direct modifications even when the UPDATE/DELETE query has a RETURNING clause; in which case we compute the RETURNING expressions locally, but that is done *after* updating/deleting foreign rows on the remote side. > (Besides that I've observed an infinite waiting for ShareLock with > step "s1i" { INSERT INTO pt VALUES (2000); } > This doesn't happen with a regular (not foreign) table.) You mean the lock wait occurs on the remote side, not on the local side? If so, I think that that is expected behavior because a write conflict occurs on the remote side in that case. Maybe I don’t fully understand your words, so could you elaborate a bit more on your observation? Best regards, Etsuro Fujita
Re: BUG #17355: Server crashes on ExecReScanForeignScan in postgres_fdw when accessing foreign partition
От
Alexander Lakhin
Дата:
Hello Etsuro-san, 30.01.2022 13:59, Etsuro Fujita wrote: >> (Besides that I've observed an infinite waiting for ShareLock with >> step "s1i" { INSERT INTO pt VALUES (2000); } >> This doesn't happen with a regular (not foreign) table.) > You mean the lock wait occurs on the remote side, not on the local > side? If so, I think that that is expected behavior because a write > conflict occurs on the remote side in that case. Maybe I don’t fully > understand your words, so could you elaborate a bit more on your > observation? Yes, you are right, that was expected behavior. I didn't realize that the isolationtester itself resolves blocking when the target table is local. The isolationtester controls the step execution using pg_isolation_test_session_is_blocked(), but when the target table is foreign, it can not determine correctly whether one step blocking other (cause it checks local, not remote session pids) and just hangs. Sorry for the noise. Best regards, Alexander
Hi Alexander, On Tue, Feb 1, 2022 at 1:00 PM Alexander Lakhin <exclusion@gmail.com> wrote: > 30.01.2022 13:59, Etsuro Fujita wrote: > >> (Besides that I've observed an infinite waiting for ShareLock with > >> step "s1i" { INSERT INTO pt VALUES (2000); } > >> This doesn't happen with a regular (not foreign) table.) > > You mean the lock wait occurs on the remote side, not on the local > > side? If so, I think that that is expected behavior because a write > > conflict occurs on the remote side in that case. Maybe I don’t fully > > understand your words, so could you elaborate a bit more on your > > observation? > Yes, you are right, that was expected behavior. I didn't realize that > the isolationtester itself resolves blocking when the target table is local. > The isolationtester controls the step execution using > pg_isolation_test_session_is_blocked(), but when the target table is > foreign, it can not determine correctly whether one step blocking other > (cause it checks local, not remote session pids) and just hangs. Ok, thanks for the explanation! Attached is an updated patch. I tweaked a comment a little bit and added the commit message. I didn’t include your test because in my understanding we don’t add such a test into the postgres_fdw regression test. I’ll commit the patch if there are no objections from you (or anyone else). Best regards, Etsuro Fujita
Вложения
Re: BUG #17355: Server crashes on ExecReScanForeignScan in postgres_fdw when accessing foreign partition
От
Alexander Lakhin
Дата:
01.02.2022 11:59, Etsuro Fujita wrote: > Attached is an updated patch. I tweaked a comment a little bit and > added the commit message. I didn’t include your test because in my > understanding we don’t add such a test into the postgres_fdw > regression test. I’ll commit the patch if there are no objections > from you (or anyone else). Thank you! I have no objections. That test was not intended for committing, I just wanted to show how I've tested the fix. Best regards, Alexander
On Tue, Feb 1, 2022 at 5:58 PM Etsuro Fujita <etsuro.fujita@gmail.com> wrote: > Hi Alexander, > > On Tue, Feb 1, 2022 at 1:00 PM Alexander Lakhin <exclusion@gmail.com> wrote: > > 30.01.2022 13:59, Etsuro Fujita wrote: > > >> (Besides that I've observed an infinite waiting for ShareLock with > > >> step "s1i" { INSERT INTO pt VALUES (2000); } > > >> This doesn't happen with a regular (not foreign) table.) > > > You mean the lock wait occurs on the remote side, not on the local > > > side? If so, I think that that is expected behavior because a write > > > conflict occurs on the remote side in that case. Maybe I don’t fully > > > understand your words, so could you elaborate a bit more on your > > > observation? > > Yes, you are right, that was expected behavior. I didn't realize that > > the isolationtester itself resolves blocking when the target table is local. > > The isolationtester controls the step execution using > > pg_isolation_test_session_is_blocked(), but when the target table is > > foreign, it can not determine correctly whether one step blocking other > > (cause it checks local, not remote session pids) and just hangs. > > Ok, thanks for the explanation! > > Attached is an updated patch. I tweaked a comment a little bit and > added the commit message. I didn’t include your test because in my > understanding we don’t add such a test into the postgres_fdw > regression test. I’ll commit the patch if there are no objections > from you (or anyone else). Thank you for working on this and sorry about failing to reply to this sooner. Your proposed fix makes sense. I was trying to understand in what way commit 86dc90056's is related to this, because you mention it here in the thread and also in the commit message. I see that c3928b467, also mentioned in your commit message, fixed a related bug [1], which in turn refers to 1375422c as the cause [2]. c3928b467 fixed things so that BeginDirectModify() won't get called if EPQ is active, so fdw_state won't get set. Given that, it also fixed ForeignNext() to not enter the FDW to perform a non-SELECT operation in the EPQ-active case, which made sure its caller Exec[Foreign]Scan() won't enter the FDW in that case. Though it didn't also fix ExecReScan[ForeignScan](), so this bug report. Your patch fixes both ExecForeignScan() and ExecReScanForeignScan() to not enter the FDW during EvalPlanQual() in non-SELECT cases. Before commit 86dc90056, EPQState.plan would be set to the 1st child subplan, which if it happened to be a direct-modify ForeignScan node, this bug would manifest. After that commit though, a direct-modify child subplan would *always* be present in EPQState.plan by way of its parent Append node or some parent node thereof being set as EPQState.plan, so the bug would now always manifest. So my understanding is that the commit 86dc90056 only changed *when* the bug manifests, which is now "always" if direct-modify children are present. Is that understanding correct? BTW, isn't the following code in ForeignNext() added by c3928b467 non-reachable after your patch: /* * direct modifications cannot be re-evaluated, so shouldn't get here * during EvalPlanQual processing */ if (estate->es_epq_active != NULL) elog(ERROR, "cannot re-evaluate a Foreign Update or Delete during EvalPlanQual"); Should that be converted to an Assert(estate->es_epq_active == NULL)? -- Amit Langote EDB: http://www.enterprisedb.com [1] EvalPlanQual() attempting to initialize a direct-modify ForeignScan node without the necessary ResultRelInfo having been set up. [2] As of 1375422c, ResultRelInfos are initialized in ExecInitModifyTable() which won't be invoked during EvalPlanQualStart(), so EvalPlanQual() can no longer rely on accessing a direct-modify ForeignScan node's ResultRelInfo.
Hi Alexander, On Wed, Feb 2, 2022 at 3:00 AM Alexander Lakhin <exclusion@gmail.com> wrote: > 01.02.2022 11:59, Etsuro Fujita wrote: > > Attached is an updated patch. I tweaked a comment a little bit and > > added the commit message. I didn’t include your test because in my > > understanding we don’t add such a test into the postgres_fdw > > regression test. I’ll commit the patch if there are no objections > > from you (or anyone else). > Thank you! I have no objections. > That test was not intended for committing, I just wanted to show how > I've tested the fix. OK Thanks! Best regards, Etsuro Fujita
Hi Amit-san, On Wed, Feb 2, 2022 at 12:59 PM Amit Langote <amitlangote09@gmail.com> wrote: > Your proposed fix makes sense. Cool! > I was trying to understand in what way commit 86dc90056's is related > to this, because you mention it here in the thread and also in the > commit message. I see that c3928b467, also mentioned in your commit > message, fixed a related bug [1], which in turn refers to 1375422c as > the cause [2]. I think the diagnosis in commit c3928b467 isn't correct; I think the cause is commit 86dc90056, because before that commit, direct-modify ForeignScan nodes didn't get called during EvalPlanQual processing, even in the case of an inherited UPDATE/DELETE. (In direct modification, all the work for it except the RETURNING computation is done on the remote side, so we don't need to invoke EvalPlanQual!) BUT as you know, commit 86dc90056 changed things so that such ForeignScan nodes can get called during EvalPlanQual processing, as they are contained as part of the EvalPlanQual subtree in the case of an inherited UPDATE/DELETE, which I think led to the issues in commit c3928b467 as well as this bug report. > Before commit 86dc90056, EPQState.plan would be set to the 1st child > subplan, which if it happened to be a direct-modify ForeignScan node, > this bug would manifest. After that commit though, a direct-modify > child subplan would *always* be present in EPQState.plan by way of its > parent Append node or some parent node thereof being set as > EPQState.plan, so the bug would now always manifest. So my > understanding is that the commit 86dc90056 only changed *when* the bug > manifests, which is now "always" if direct-modify children are > present. Is that understanding correct? As I mentioned above, I think we didn't have this bug before commit 86dc90056. As for when this bug happens, I think it would depend on the partitioned-table definition; eg, for a partitioned table like the one discussed in the thread for commit c3928b467, it wouldn't happen. > BTW, isn't the following code in ForeignNext() added by c3928b467 > non-reachable after your patch: > > /* > * direct modifications cannot be re-evaluated, so shouldn't get here > * during EvalPlanQual processing > */ > if (estate->es_epq_active != NULL) > elog(ERROR, "cannot re-evaluate a Foreign Update or Delete > during EvalPlanQual"); > > Should that be converted to an Assert(estate->es_epq_active == NULL)? +1 I updated the patch as such. Attached is a new version. I also tweaked a comment a litttle bit further. Thanks! Best regards, Etsuro Fujita
Вложения
Fujita-san, On Wed, Feb 2, 2022 at 7:55 PM Etsuro Fujita <etsuro.fujita@gmail.com> wrote: > On Wed, Feb 2, 2022 at 12:59 PM Amit Langote <amitlangote09@gmail.com> wrote: > > I was trying to understand in what way commit 86dc90056's is related > > to this, because you mention it here in the thread and also in the > > commit message. I see that c3928b467, also mentioned in your commit > > message, fixed a related bug [1], which in turn refers to 1375422c as > > the cause [2]. > > I think the diagnosis in commit c3928b467 isn't correct; I think the > cause is commit 86dc90056, because before that commit, direct-modify > ForeignScan nodes didn't get called during EvalPlanQual processing, > even in the case of an inherited UPDATE/DELETE. (In direct > modification, all the work for it except the RETURNING computation is > done on the remote side, so we don't need to invoke EvalPlanQual!) > BUT as you know, commit 86dc90056 changed things so that such > ForeignScan nodes can get called during EvalPlanQual processing, as > they are contained as part of the EvalPlanQual subtree in the case of > an inherited UPDATE/DELETE, which I think led to the issues in commit > c3928b467 as well as this bug report. You're right, thanks for the explanation. What I embarrassingly forgot is that, before 86dc90056, each child subplan would be set to be passed to EvalPlanQ ual() when in its turn came in the ExecModifyTable()'s main loo, using the following code that 86dc90056 removed: if (TupIsNull(planSlot)) - { - /* advance to next subplan if any */ - node->mt_whichplan++; - if (node->mt_whichplan < node->mt_nplans) - { - resultRelInfo++; - subplanstate = node->mt_plans[node->mt_whichplan]; - junkfilter = resultRelInfo->ri_junkFilter; - EvalPlanQualSetPlan(&node->mt_epqstate, subplanstate->plan, - node->mt_arowmarks[node->mt_whichplan]); - continue; - } - else - break; - } Direct-modify subplans, even if they would get set to be passed to EvalPlanQual(), would never actually be processed for the reasons you mention. Now that there's only one subplan and all the child subplans just its subnodes, any direct-modify nodes *would* get processed because they're in the plan that gets passed to EvalPlanQual(). > > Before commit 86dc90056, EPQState.plan would be set to the 1st child > > subplan, which if it happened to be a direct-modify ForeignScan node, > > this bug would manifest. After that commit though, a direct-modify > > child subplan would *always* be present in EPQState.plan by way of its > > parent Append node or some parent node thereof being set as > > EPQState.plan, so the bug would now always manifest. So my > > understanding is that the commit 86dc90056 only changed *when* the bug > > manifests, which is now "always" if direct-modify children are > > present. Is that understanding correct? > > As I mentioned above, I think we didn't have this bug before commit > 86dc90056. As for when this bug happens, I think it would depend on > the partitioned-table definition; eg, for a partitioned table like the > one discussed in the thread for commit c3928b467, it wouldn't happen. Yeah, got it. > > BTW, isn't the following code in ForeignNext() added by c3928b467 > > non-reachable after your patch: > > > > /* > > * direct modifications cannot be re-evaluated, so shouldn't get here > > * during EvalPlanQual processing > > */ > > if (estate->es_epq_active != NULL) > > elog(ERROR, "cannot re-evaluate a Foreign Update or Delete > > during EvalPlanQual"); > > > > Should that be converted to an Assert(estate->es_epq_active == NULL)? > > +1 I updated the patch as such. Attached is a new version. I also > tweaked a comment a litttle bit further. > > Thanks! Thanks, looks good to me. -- Amit Langote EDB: http://www.enterprisedb.com
On 2022-Feb-01, Etsuro Fujita wrote: > I didn’t include your test because in my > understanding we don’t add such a test into the postgres_fdw > regression test. I’ll commit the patch if there are no objections > from you (or anyone else). I think it would be good to reconsider this. We currently don't have any isolation test files for postgres_fdw, but that doesn't mean we can't add some (we now have several subdirs where both types of tests coexist). And if we don't have these tests, it will be easier to break this again in the future without noticing if we decide to rework executor code again, which I'm sure we'll do. Maybe you don't need to add the test in back-branches, which is more work; but if you only target branch master, it should only require an ISOLATION line in the Makefile listing the new test. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ "Every machine is a smoke machine if you operate it wrong enough." https://twitter.com/libseybieda/status/1541673325781196801
Hi Amit-san, On Wed, Feb 2, 2022 at 11:06 PM Amit Langote <amitlangote09@gmail.com> wrote: > On Wed, Feb 2, 2022 at 7:55 PM Etsuro Fujita <etsuro.fujita@gmail.com> wrote: > > On Wed, Feb 2, 2022 at 12:59 PM Amit Langote <amitlangote09@gmail.com> wrote: > > > BTW, isn't the following code in ForeignNext() added by c3928b467 > > > non-reachable after your patch: > > > > > > /* > > > * direct modifications cannot be re-evaluated, so shouldn't get here > > > * during EvalPlanQual processing > > > */ > > > if (estate->es_epq_active != NULL) > > > elog(ERROR, "cannot re-evaluate a Foreign Update or Delete > > > during EvalPlanQual"); > > > > > > Should that be converted to an Assert(estate->es_epq_active == NULL)? > > > > +1 I updated the patch as such. Attached is a new version. I also > > tweaked a comment a litttle bit further. > Thanks, looks good to me. Ok, I have committed the patch. Thanks again! Best regards, Etsuro Fujita
Hi Alvaro, On Wed, Feb 2, 2022 at 11:42 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > On 2022-Feb-01, Etsuro Fujita wrote: > > I didn’t include your test because in my > > understanding we don’t add such a test into the postgres_fdw > > regression test. I’ll commit the patch if there are no objections > > from you (or anyone else). > > I think it would be good to reconsider this. We currently don't have > any isolation test files for postgres_fdw, but that doesn't mean we > can't add some (we now have several subdirs where both types of tests > coexist). And if we don't have these tests, it will be easier to break > this again in the future without noticing if we decide to rework > executor code again, which I'm sure we'll do. Agreed. > Maybe you don't need to add the test in back-branches, which is more > work; but if you only target branch master, it should only require an > ISOLATION line in the Makefile listing the new test. I think it would be better to create a separate patch so that we test not only this but other EvalPlanQual stuff such as ForeignRecheck, so I applied the proposed patch to HEAD as well. Thanks! Best regards, Etsuro Fujita
Re: BUG #17355: Server crashes on ExecReScanForeignScan in postgres_fdw when accessing foreign partition
От
Marina Polyakova
Дата:
On 2022-02-03 09:38, Etsuro Fujita wrote: > I think it would be better to create a separate patch so that we test > not only this but other EvalPlanQual stuff such as ForeignRecheck, so > I applied the proposed patch to HEAD as well. > > Thanks! > > Best regards, > Etsuro Fujita Hello everyone in this thread! Looks like gcc 11.2.0 is now reporting a warning in the branches master/REL_14_STABLE, e.g. see CI https://github.com/postgres/postgres/runs/5047720839: [06:30:18.201] nodeForeignscan.c: In function ‘ForeignNext’: [06:30:18.201] nodeForeignscan.c:47:13: error: unused variable ‘estate’ [-Werror=unused-variable] [06:30:18.201] 47 | EState *estate = node->ss.ps.state; [06:30:18.201] | ^~~~~~ Please could you fix it? E.g. like in the attached patch - the following commands work for me with it: $ ./configure && COPT=-Werror make world-bin -- Marina Polyakova Postgres Professional: http://www.postgrespro.com
Вложения
On 2022-Feb-03, Etsuro Fujita wrote: > > Maybe you don't need to add the test in back-branches, which is more > > work; but if you only target branch master, it should only require an > > ISOLATION line in the Makefile listing the new test. > > I think it would be better to create a separate patch so that we test > not only this but other EvalPlanQual stuff such as ForeignRecheck, so > I applied the proposed patch to HEAD as well. Sure, that sounds good. Are you or is anybody else working on that? -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ Maybe there's lots of data loss but the records of data loss are also lost. (Lincoln Yeoh)
On Mon, Feb 7, 2022 at 12:00 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > On 2022-Feb-03, Etsuro Fujita wrote: > > > Maybe you don't need to add the test in back-branches, which is more > > > work; but if you only target branch master, it should only require an > > > ISOLATION line in the Makefile listing the new test. > > > > I think it would be better to create a separate patch so that we test > > not only this but other EvalPlanQual stuff such as ForeignRecheck, so > > I applied the proposed patch to HEAD as well. > > Sure, that sounds good. Are you or is anybody else working on that? I don't have time on this for v15. I'm planning to work on it for v16, but if anybody else feels like doing so for v15 or later, feel free. Best regards, Etsuro Fujita