Обсуждение: DISCARD ALL failing to acquire locks on pg_listen
Hi everyone, I've been recently testing PostgreSQL 8.3.4 (upgrade to 8.3.6 is scheduled) with a large number of connections from separate boxes each using a locally installed pgpool-II set in connection pooling mode, for up to 2500 concurrent open connections. Given I was using 8.3, it seemed quite right to set the reset statement to "ABORT; DISCARD ALL". Everything works fine, until a load spike happens and pgpool-II reset queries start to lag behind, with DISCARD ALL failing to acquire an exclusive locks on the pg_listen system table, although the application isn't using any LISTEN/NOTIFY. The reason was not obvious to me, but looking at the man page explained a lot: DISCARD ALL also performs an "UNLISTEN *". Since then I've crafted the reset query to only reset what is actually used by the application, and things are going much better. I vaguely remember that a full LISTEN/NOTIFY overhaul is in the to-do list with low priority, but my point is that DISCARD can be a bottleneck when used in the scenario it is designed for, i.e. high concurrency access from connection poolers. I've been looking to the source code and I understand that async operations are performed acquiring an exclusive lock at the end of the transaction, but I have a proof of concept patch that avoids it in case there are no pending listens or notifies and the backend is not already listening. I didn't have time to test it yet, but I can devote a little bit more time to it in case it makes sense to you. Cheers -- Matteo Beccati OpenX - http://www.openx.org
Thanks for valuable info! I'm going to add a caution to pgpool-II's docs. "DISCARD ALL will cause serious performance degration". -- Tatsuo Ishii SRA OSS, Inc. Japan > Hi everyone, > > I've been recently testing PostgreSQL 8.3.4 (upgrade to 8.3.6 is > scheduled) with a large number of connections from separate boxes each > using a locally installed pgpool-II set in connection pooling mode, for > up to 2500 concurrent open connections. > > Given I was using 8.3, it seemed quite right to set the reset statement > to "ABORT; DISCARD ALL". Everything works fine, until a load spike > happens and pgpool-II reset queries start to lag behind, with DISCARD > ALL failing to acquire an exclusive locks on the pg_listen system table, > although the application isn't using any LISTEN/NOTIFY. The reason was > not obvious to me, but looking at the man page explained a lot: DISCARD > ALL also performs an "UNLISTEN *". Since then I've crafted the reset > query to only reset what is actually used by the application, and things > are going much better. > > I vaguely remember that a full LISTEN/NOTIFY overhaul is in the to-do > list with low priority, but my point is that DISCARD can be a bottleneck > when used in the scenario it is designed for, i.e. high concurrency > access from connection poolers. > > I've been looking to the source code and I understand that async > operations are performed acquiring an exclusive lock at the end of the > transaction, but I have a proof of concept patch that avoids it in case > there are no pending listens or notifies and the backend is not already > listening. > > I didn't have time to test it yet, but I can devote a little bit more > time to it in case it makes sense to you. > > > Cheers > > -- > Matteo Beccati > > OpenX - http://www.openx.org > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers
Matteo Beccati <php@beccati.com> writes: > Given I was using 8.3, it seemed quite right to set the reset statement > to "ABORT; DISCARD ALL". Everything works fine, until a load spike > happens and pgpool-II reset queries start to lag behind, with DISCARD > ALL failing to acquire an exclusive locks on the pg_listen system table, > although the application isn't using any LISTEN/NOTIFY. The reason was > not obvious to me, but looking at the man page explained a lot: DISCARD > ALL also performs an "UNLISTEN *". Seems like we could/should fix UNLISTEN * to not do anything if it is known that the current backend never did any LISTENs. regards, tom lane
Hi Tom, >> Given I was using 8.3, it seemed quite right to set the reset statement >> to "ABORT; DISCARD ALL". Everything works fine, until a load spike >> happens and pgpool-II reset queries start to lag behind, with DISCARD >> ALL failing to acquire an exclusive locks on the pg_listen system table, >> although the application isn't using any LISTEN/NOTIFY. The reason was >> not obvious to me, but looking at the man page explained a lot: DISCARD >> ALL also performs an "UNLISTEN *". > > Seems like we could/should fix UNLISTEN * to not do anything if it is > known that the current backend never did any LISTENs. Ok, I'll take sometime tonight to give my patch a try and eventually submit it. Cheers -- Matteo Beccati OpenX - http://www.openx.org
Hi Tom, >> Given I was using 8.3, it seemed quite right to set the reset statement >> to "ABORT; DISCARD ALL". Everything works fine, until a load spike >> happens and pgpool-II reset queries start to lag behind, with DISCARD >> ALL failing to acquire an exclusive locks on the pg_listen system table, >> although the application isn't using any LISTEN/NOTIFY. The reason was >> not obvious to me, but looking at the man page explained a lot: DISCARD >> ALL also performs an "UNLISTEN *". > > Seems like we could/should fix UNLISTEN * to not do anything if it is > known that the current backend never did any LISTENs. Here's my proposed patch, both for HEAD and 8.3: http://www.beccati.com/misc/pgsql/async_unlisten_skip_HEAD.patch http://www.beccati.com/misc/pgsql/async_unlisten_skip_REL8_3_STABLE.patch I tried to write a regression test, but couldn't find a suitable way to get the regression framework cope with trace_notify printing the backend pid. I even tried to add a @backend_pid@ variable to pg_regress, but soon realised that the pid is not available to psql when variable substitution happens. So, here's the output of some tests I made: http://www.beccati.com/misc/pgsql/async_unlisten_skip.out Note: DISCARD doesn't produce any debug output, because the guc variables are being reset before the Async_UnlistenAll is called. Cheers -- Matteo Beccati OpenX - http://www.openx.org
Matteo Beccati <php@beccati.com> writes: >> Seems like we could/should fix UNLISTEN * to not do anything if it is >> known that the current backend never did any LISTENs. > Here's my proposed patch, both for HEAD and 8.3: I'll take a look. regards, tom lane
Matteo Beccati <php@beccati.com> writes: >> Seems like we could/should fix UNLISTEN * to not do anything if it is >> known that the current backend never did any LISTENs. > Here's my proposed patch, both for HEAD and 8.3: This seems a bit overcomplicated. I had in mind something like this... Index: src/backend/commands/async.c =================================================================== RCS file: /cvsroot/pgsql/src/backend/commands/async.c,v retrieving revision 1.145 diff -c -r1.145 async.c *** src/backend/commands/async.c 1 Jan 2009 17:23:37 -0000 1.145 --- src/backend/commands/async.c 12 Feb 2009 18:28:43 -0000 *************** *** 277,282 **** --- 277,286 ---- if (Trace_notify) elog(DEBUG1, "Async_Unlisten(%s,%d)", relname, MyProcPid); + /* If we couldn't possibly be listening, no need to queue anything */ + if (pendingActions == NIL && !unlistenExitRegistered) + return; + queue_listen(LISTEN_UNLISTEN, relname); } *************** *** 291,296 **** --- 295,304 ---- if (Trace_notify) elog(DEBUG1, "Async_UnlistenAll(%d)", MyProcPid); + /* If we couldn't possibly be listening, no need to queue anything */ + if (pendingActions == NIL && !unlistenExitRegistered) + return; + queue_listen(LISTEN_UNLISTEN_ALL, ""); } regards, tom lane
Tom Lane ha scritto: > Matteo Beccati <php@beccati.com> writes: >>> Seems like we could/should fix UNLISTEN * to not do anything if it is >>> known that the current backend never did any LISTENs. > >> Here's my proposed patch, both for HEAD and 8.3: > > This seems a bit overcomplicated. I had in mind something like this... Much easier indeed... I didn't notice the unlistenExitRegistered variable. Cheers -- Matteo Beccati OpenX - http://www.openx.org
Matteo Beccati <php@beccati.com> writes: > Tom Lane ha scritto: >> This seems a bit overcomplicated. I had in mind something like this... > Much easier indeed... I didn't notice the unlistenExitRegistered variable. Just for completeness, I attach another form of the patch that I thought about for a bit. This adds the ability for UNLISTEN ALL to revert the backend to the state where subsequent UNLISTENs don't cost anything. This could be of value in a scenario where you have pooled connections and just a small fraction of the client threads are using LISTEN. That seemed like kind of an unlikely use-case though. The problem is that this patch adds some cycles to transaction commit/abort for everyone, whether they ever use LISTEN/UNLISTEN/DISCARD or not. It's not a lot of cycles, but even so I'm thinking it's not a win overall. Comments? regards, tom lane Index: src/backend/access/transam/xact.c =================================================================== RCS file: /cvsroot/pgsql/src/backend/access/transam/xact.c,v retrieving revision 1.272 diff -c -r1.272 xact.c *** src/backend/access/transam/xact.c 20 Jan 2009 18:59:37 -0000 1.272 --- src/backend/access/transam/xact.c 12 Feb 2009 18:24:12 -0000 *************** *** 1703,1708 **** --- 1703,1709 ---- AtEOXact_SPI(true); AtEOXact_xml(); AtEOXact_on_commit_actions(true); + AtEOXact_Notify(true); AtEOXact_Namespace(true); /* smgrcommit already done */ AtEOXact_Files(); *************** *** 1939,1944 **** --- 1940,1946 ---- AtEOXact_SPI(true); AtEOXact_xml(); AtEOXact_on_commit_actions(true); + AtEOXact_Notify(true); AtEOXact_Namespace(true); /* smgrcommit already done */ AtEOXact_Files(); *************** *** 2084,2089 **** --- 2086,2092 ---- AtEOXact_SPI(false); AtEOXact_xml(); AtEOXact_on_commit_actions(false); + AtEOXact_Notify(false); AtEOXact_Namespace(false); AtEOXact_Files(); AtEOXact_ComboCid(); Index: src/backend/commands/async.c =================================================================== RCS file: /cvsroot/pgsql/src/backend/commands/async.c,v retrieving revision 1.145 diff -c -r1.145 async.c *** src/backend/commands/async.c 1 Jan 2009 17:23:37 -0000 1.145 --- src/backend/commands/async.c 12 Feb 2009 18:24:13 -0000 *************** *** 167,172 **** --- 167,178 ---- /* True if we've registered an on_shmem_exit cleanup */ static bool unlistenExitRegistered = false; + /* True if this backend has (or might have) an active LISTEN entry */ + static bool haveActiveListen = false; + + /* True if current transaction is trying to commit an UNLISTEN ALL */ + static bool committingUnlistenAll = false; + bool Trace_notify = false; *************** *** 277,282 **** --- 283,292 ---- if (Trace_notify) elog(DEBUG1, "Async_Unlisten(%s,%d)", relname, MyProcPid); + /* If we couldn't possibly be listening, no need to queue anything */ + if (pendingActions == NIL && !haveActiveListen) + return; + queue_listen(LISTEN_UNLISTEN, relname); } *************** *** 291,296 **** --- 301,310 ---- if (Trace_notify) elog(DEBUG1, "Async_UnlistenAll(%d)", MyProcPid); + /* If we couldn't possibly be listening, no need to queue anything */ + if (pendingActions == NIL && !haveActiveListen) + return; + queue_listen(LISTEN_UNLISTEN_ALL, ""); } *************** *** 493,499 **** heap_freetuple(tuple); /* ! * now that we are listening, make sure we will unlisten before dying. */ if (!unlistenExitRegistered) { --- 507,526 ---- heap_freetuple(tuple); /* ! * Remember that this backend has at least one active LISTEN. Also, ! * this LISTEN negates the effect of any earlier UNLISTEN ALL in the ! * same transaction. ! * ! * Note: it's still possible for the current transaction to fail before ! * we reach commit. In that case haveActiveListen might be uselessly ! * left true; but that's OK, if not optimal, so we don't expend extra ! * effort to cover that corner case. ! */ ! haveActiveListen = true; ! committingUnlistenAll = false; ! ! /* ! * Now that we are listening, make sure we will unlisten before dying. */ if (!unlistenExitRegistered) { *************** *** 569,574 **** --- 596,608 ---- simple_heap_delete(lRel, &lTuple->t_self); heap_endscan(scan); + + /* + * Remember that we're trying to commit UNLISTEN ALL. Since we might + * still fail before reaching commit, we can't reset haveActiveListen + * immediately. + */ + committingUnlistenAll = true; } /* *************** *** 675,680 **** --- 709,730 ---- } /* + * AtEOXact_Notify + * + * Clean up post-commit or post-abort. This is not called until + * we know that we successfully committed (or didn't). + */ + void + AtEOXact_Notify(bool isCommit) + { + /* If we committed UNLISTEN ALL, we can reset haveActiveListen */ + if (isCommit && committingUnlistenAll) + haveActiveListen = false; + /* In any case, the next xact starts with clean UNLISTEN ALL slate */ + committingUnlistenAll = false; + } + + /* * AtSubStart_Notify() --- Take care of subtransaction start. * * Push empty state for the new subtransaction. Index: src/include/commands/async.h =================================================================== RCS file: /cvsroot/pgsql/src/include/commands/async.h,v retrieving revision 1.37 diff -c -r1.37 async.h *** src/include/commands/async.h 1 Jan 2009 17:23:58 -0000 1.37 --- src/include/commands/async.h 12 Feb 2009 18:24:13 -0000 *************** *** 28,33 **** --- 28,34 ---- extern void AtSubCommit_Notify(void); extern void AtSubAbort_Notify(void); extern void AtPrepare_Notify(void); + extern void AtEOXact_Notify(bool isCommit); /* signal handler for inbound notifies (SIGUSR2) */ extern void NotifyInterruptHandler(SIGNAL_ARGS);
On Thu, Feb 12, 2009 at 2:29 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Just for completeness, I attach another form of the patch that I thought > about for a bit. This adds the ability for UNLISTEN ALL to revert the > backend to the state where subsequent UNLISTENs don't cost anything. > This could be of value in a scenario where you have pooled connections > and just a small fraction of the client threads are using LISTEN. That > seemed like kind of an unlikely use-case though. The problem is that > this patch adds some cycles to transaction commit/abort for everyone, > whether they ever use LISTEN/UNLISTEN/DISCARD or not. It's not a lot of > cycles, but even so I'm thinking it's not a win overall. Comments? This is so lightweight I'd be inclined to go for it, even if the use case is pretty narrow. Do you think you can actually construct a benchmark where the difference is measurable? ...Robert
Robert Haas <robertmhaas@gmail.com> writes: > On Thu, Feb 12, 2009 at 2:29 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Just for completeness, I attach another form of the patch that I thought >> about for a bit. This adds the ability for UNLISTEN ALL to revert the >> backend to the state where subsequent UNLISTENs don't cost anything. >> This could be of value in a scenario where you have pooled connections >> and just a small fraction of the client threads are using LISTEN. That >> seemed like kind of an unlikely use-case though. The problem is that >> this patch adds some cycles to transaction commit/abort for everyone, >> whether they ever use LISTEN/UNLISTEN/DISCARD or not. It's not a lot of >> cycles, but even so I'm thinking it's not a win overall. Comments? > This is so lightweight I'd be inclined to go for it, even if the use > case is pretty narrow. Do you think you can actually construct a > benchmark where the difference is measurable? Almost certainly not, but "a cycle saved is a cycle earned" ... The real problem I'm having with it is that I don't believe the use-case. The normal scenario for a listener is that you LISTEN and then you sit there waiting for events. In the above scenario, a client thread would only be able to receive events when it actively had control of its pool connection; so it seems like it would be at risk of missing things when it didn't. It seems much more likely that you'd design the application so that listening clients aren't pooled but are listening continuously. The guys sending NOTIFY events might well be pooled, but they're not the issue. If someone can show me a plausible use-case that gets a benefit from this form of the patch, I don't have a problem with making other people pay a few cycles for that. I'm just fearing that nobody would get a win at all, and then neither the cycles nor the extra complexity would give us any benefit. (The extra hooks into xact.c are actually bothering me as much as the cycles. Given that we're intending to throw all this code away and reimplement LISTEN/NOTIFY completely pretty soon, I'd just as soon keep down the number of contact points with the rest of the system.) regards, tom lane
Tom Lane wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Thu, Feb 12, 2009 at 2:29 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> Just for completeness, I attach another form of the patch that I thought >>> about for a bit. This adds the ability for UNLISTEN ALL to revert the >>> backend to the state where subsequent UNLISTENs don't cost anything. >>> This could be of value in a scenario where you have pooled connections >>> and just a small fraction of the client threads are using LISTEN. That >>> seemed like kind of an unlikely use-case though. The problem is that >>> this patch adds some cycles to transaction commit/abort for everyone, >>> whether they ever use LISTEN/UNLISTEN/DISCARD or not. It's not a lot of >>> cycles, but even so I'm thinking it's not a win overall. Comments? > >> This is so lightweight I'd be inclined to go for it, even if the use >> case is pretty narrow. Do you think you can actually construct a >> benchmark where the difference is measurable? > > Almost certainly not, but "a cycle saved is a cycle earned" ... > > The real problem I'm having with it is that I don't believe the > use-case. The normal scenario for a listener is that you LISTEN and > then you sit there waiting for events. In the above scenario, a client > thread would only be able to receive events when it actively had control > of its pool connection; so it seems like it would be at risk of missing > things when it didn't. It seems much more likely that you'd design the > application so that listening clients aren't pooled but are listening > continuously. The guys sending NOTIFY events might well be pooled, but > they're not the issue. > > If someone can show me a plausible use-case that gets a benefit from > this form of the patch, I don't have a problem with making other people > pay a few cycles for that. I'm just fearing that nobody would get a win > at all, and then neither the cycles nor the extra complexity would give > us any benefit. (The extra hooks into xact.c are actually bothering me > as much as the cycles. Given that we're intending to throw all this > code away and reimplement LISTEN/NOTIFY completely pretty soon, I'd just > as soon keep down the number of contact points with the rest of the > system.) Imagine a web application interacting with a deamon using LISTEN/NOTIFY. It happened in past to me to build one, so I guess it could be a fairly common scenario, which you already described. Now if both the front end and the deamon use the same pooler to have a common failover process, previously listening connections could be reused by the web app if the daemon is restarted and the pooler is not. Does it look plausible? That said, I don't mind if we go with the previous two-liner fix :) Cheers -- Matteo Beccati OpenX - http://www.openx.org
Tom Lane wrote: >The real problem I'm having with it is that I don't believe the >use-case. The normal scenario for a listener is that you LISTEN and >then you sit there waiting for events. In the above scenario, a client >thread would only be able to receive events when it actively had control >of its pool connection; so it seems like it would be at risk of missing >things when it didn't. It seems much more likely that you'd design the >application so that listening clients aren't pooled but are listening >continuously. I have an application that actually is able to install callbacks on one or more of its pooled connections to wait for listens. However, the application is not using this on the pooled connections because that would require one to keep track of which connection the listen is registered on. It would require that that connection is never closed. Instead of keeping track of this fact, I'd presume that it'd be easier to simply group all listens on a single connection outside the pool. So your patch will not benefit any practical use cases I can think of. -- Sincerely, Stephen R. van den Berg.