Обсуждение: pgsql: postgres_fdw: Fix handling of pending asynchronous requests.
postgres_fdw: Fix handling of pending asynchronous requests. A pending asynchronous request is handled by process_pending_request(), which previously not only processed an in-progress remote query but performed ExecForeignScan() to produce a tuple to return to the local server asynchronously from the result of the remote query. But that led to a server crash when executing a query or led to an "InstrStartNode called twice in a row" or "InstrEndLoop called on running node" failure when doing EXPLAIN ANALYZE of it, in cases where the plan tree for it contained multiple async-capable nodes accessing the same initplan/subplan that contained multiple async-capable nodes scanning the same foreign tables as for the parent async-capable nodes, as reported by Andrey Lepikhov. The reason is that the second step in process_pending_request() invoked when executing the initplan/subplan for one of the parent async-capable nodes caused recursive execution of the initplan/subplan for another of the parent async-capable nodes. To fix, split process_pending_request() into the two steps and postpone the second step until ForeignAsyncConfigureWait() is called for each of the pending asynchronous requests. Also, in ExecAppendAsyncEventWait() we assumed that FDWs would register at least one wait event in a WaitEventSet created there when they were called from ForeignAsyncConfigureWait() in that function, but allow FDWs to register zero wait events in the WaitEventSet; modify ExecAppendAsyncEventWait() to just return in that case. Oversight in commit 27e1f1456. Back-patch to v14 where that commit went in. Andrey Lepikhov and Etsuro Fujita Discussion: https://postgr.es/m/fe5eaa19-1704-e4a4-76ee-3b9d37ade399@postgrespro.ru Branch ------ master Details ------- https://git.postgresql.org/pg/commitdiff/1ec7fca8592178281cd5cdada0f27a340fb813fc Modified Files -------------- contrib/postgres_fdw/expected/postgres_fdw.out | 55 +++++++++++++++++++++- contrib/postgres_fdw/postgres_fdw.c | 65 ++++++++++++++++++++------ contrib/postgres_fdw/sql/postgres_fdw.sql | 13 ++++-- src/backend/executor/nodeAppend.c | 11 +++++ 4 files changed, 126 insertions(+), 18 deletions(-)
Hi Fujita-san, On Fri, Jul 30, 2021 at 08:08:07AM +0000, Etsuro Fujita wrote: > postgres_fdw: Fix handling of pending asynchronous requests. You have angered many members of the buildfarm here, like: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mule&dt=2021-07-30%2011%3A30%3A04 This is crashing. -- Michael
Вложения
Hi Michael-san, On Fri, Jul 30, 2021 at 9:45 PM Michael Paquier <michael@paquier.xyz> wrote: > On Fri, Jul 30, 2021 at 08:08:07AM +0000, Etsuro Fujita wrote: > > postgres_fdw: Fix handling of pending asynchronous requests. > > You have angered many members of the buildfarm here, like: > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mule&dt=2021-07-30%2011%3A30%3A04 > > This is crashing. I’ll look into this.Thanks! Best regards, Etsuro Fujita
On Fri, Jul 30, 2021 at 10:00 PM Etsuro Fujita <etsuro.fujita@gmail.com> wrote: > On Fri, Jul 30, 2021 at 9:45 PM Michael Paquier <michael@paquier.xyz> wrote: > > On Fri, Jul 30, 2021 at 08:08:07AM +0000, Etsuro Fujita wrote: > > > postgres_fdw: Fix handling of pending asynchronous requests. > > > > You have angered many members of the buildfarm here, like: > I’ll look into this. Buildfarm members are causing this assertion failure: TRAP: FailedAssertion("fsstate->conn_state->pendingAreq == areq", File: "/home/pgbf/buildroot/HEAD/pgsql.build/../pgsql/contrib/postgres_fdw/postgres_fdw.c", Line: 6900, PID: 744110) I couldn’t reproduce this in my environment, but I noticed this, which didn’t happen in my environment: the case of delivering notifications to both of the parent async-capable nodes in ExecAppendAsyncEventWait when processing the added test case query. In that case, doing postgresForeignAsyncNotify for the first parent async-capable node would invoke process_pending_request on the second one when executing its initplan, which would lead to the assertion failure when doing postgresForeignAsyncNotify for the second one. :-( I think postgresForeignAsyncNotify would need the same treatment as for postgresForeignAsyncConfigureWait, like the attached. Best regards, Etsuro Fujita
Вложения
Etsuro Fujita <etsuro.fujita@gmail.com> writes: > I couldn’t reproduce this in my environment, but I noticed this, which > didn’t happen in my environment: the case of delivering notifications > to both of the parent async-capable nodes in ExecAppendAsyncEventWait > when processing the added test case query. In that case, doing > postgresForeignAsyncNotify for the first parent async-capable node > would invoke process_pending_request on the second one when executing > its initplan, which would lead to the assertion failure when doing > postgresForeignAsyncNotify for the second one. :-( I think > postgresForeignAsyncNotify would need the same treatment as for > postgresForeignAsyncConfigureWait, like the attached. I tried this patch on a spare machine that seems prone to showing the failure: on HEAD, postgres_fdw's installcheck failed 3 times in 4 attempts. With the patch, it's gotten through 23 consecutive tries without a failure. So this is at least way better ... regards, tom lane
On Sun, Aug 1, 2021 at 1:41 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > I tried this patch on a spare machine that seems prone to showing > the failure: on HEAD, postgres_fdw's installcheck failed 3 times > in 4 attempts. With the patch, it's gotten through 23 consecutive > tries without a failure. So this is at least way better ... I retried to reproduce the failure in my environment but I couldn’t. But I plan on applying the patch tomorrow. Thanks for the testing! Best regards, Etsuro Fujita
On Sun, Aug 1, 2021 at 4:59 PM Etsuro Fujita <etsuro.fujita@gmail.com> wrote: > But I plan on applying the patch tomorrow. Done, in hopes of making the buildfarm back to green. Best regards, Etsuro Fujita