Обсуждение: [GENERAL] startup process stuck in recovery
Hi, We're dealing with a 9.5.5 database with the symptom that, after a certain amount of time after restart, the startup processreaches a certain WAL segment, and stops. The startup process runs at 100% CPU, with no output from strace. Thereare no queries running on the secondary, so it's not a delay for queries. The WAL receiver continues to operate normally,and we can see sent_location / write_location / flush_location continue to move ahead in parallel, with replay_locationstuck in that WAL segment. Suggestions on further diagnosis? -- -- Christophe Pettus xof@thebuild.com -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Christophe Pettus <xof@thebuild.com> writes: > We're dealing with a 9.5.5 database with the symptom that, after a certain amount of time after restart, the startup processreaches a certain WAL segment, and stops. The startup process runs at 100% CPU, with no output from strace. Thereare no queries running on the secondary, so it's not a delay for queries. The WAL receiver continues to operate normally,and we can see sent_location / write_location / flush_location continue to move ahead in parallel, with replay_locationstuck in that WAL segment. > Suggestions on further diagnosis? Attach to startup process with gdb, and get a stack trace? regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
> On Oct 9, 2017, at 12:10, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Attach to startup process with gdb, and get a stack trace? #0 0x0000558812f4f1da in ?? () #1 0x0000558812f4f8cb in StandbyReleaseLockTree () #2 0x0000558812d718ee in ?? () #3 0x0000558812d75520 in xact_redo () #4 0x0000558812d7f713 in StartupXLOG () #5 0x0000558812f0e262 in StartupProcessMain () #6 0x0000558812d8d4ea in AuxiliaryProcessMain () #7 0x0000558812f0b2e9 in ?? () #8 0x0000558812f0dae7 in PostmasterMain () #9 0x0000558812d0c402 in main () -- -- Christophe Pettus xof@thebuild.com -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
On Oct 9, 2017, at 12:18, Christophe Pettus <xof@thebuild.com> wrote: > > #0 0x0000558812f4f1da in ?? () > #1 0x0000558812f4f8cb in StandbyReleaseLockTree () > #2 0x0000558812d718ee in ?? () > #3 0x0000558812d75520 in xact_redo () > #4 0x0000558812d7f713 in StartupXLOG () > #5 0x0000558812f0e262 in StartupProcessMain () > #6 0x0000558812d8d4ea in AuxiliaryProcessMain () > #7 0x0000558812f0b2e9 in ?? () > #8 0x0000558812f0dae7 in PostmasterMain () > #9 0x0000558812d0c402 in main () The other observation is that the startup process is holding a *lot* of locks: (pid 5882 is the startup process) # select mode, count(*) from pg_locks where pid=5882 group by mode; mode | count ---------------------+-------ExclusiveLock | 1AccessExclusiveLock | 7133 (2 rows) -- -- Christophe Pettus xof@thebuild.com -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Christophe Pettus <xof@thebuild.com> writes: > The other observation is that the startup process is holding a *lot* of locks: Hmm. Is it possible that the process is replaying the abort of a transaction with a lot of subtransactions? It seems like maybe you could be getting into an O(N^2) situation, though for N in the low thousands it's hard to see that loop taking so long that you'd think things were stuck. > # select mode, count(*) from pg_locks where pid=5882 group by mode; > AccessExclusiveLock | 7133 Is that number changing at all? regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
> On Oct 9, 2017, at 13:01, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Hmm. Is it possible that the process is replaying the abort of a > transaction with a lot of subtransactions? That's possible, although we're now talking about an hours-long delay at this point. > Is that number changing at all? Increasing: # select mode, count(*) from pg_locks where pid=5882 group by mode; mode | count ---------------------+-------ExclusiveLock | 1AccessExclusiveLock | 8810 (2 rows) -- -- Christophe Pettus xof@thebuild.com -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Christophe Pettus <xof@thebuild.com> writes: >> On Oct 9, 2017, at 13:01, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Is that number changing at all? > Increasing: > AccessExclusiveLock | 8810 Oh, that's really interesting. So it's not *just* releasing locks but also acquiring them, which says that it is making progress of some sort. My bet is that the source server did something that's provoking O(N^2) behavior in the standby server's lock management. It's hard to say exactly what, but I'm wondering about something like a plpgsql function taking an AccessExclusiveLock inside a loop that repeatedly traps an exception. Can you correlate where the standby is stuck with what was happening on the source? regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
> On Oct 9, 2017, at 13:26, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Oh, that's really interesting. So it's not *just* releasing locks but > also acquiring them, which says that it is making progress of some sort. It seems to have leveled out now, and is still grinding away. > Can you correlate where the standby is stuck with what > was happening on the source? There are definitely some mysterious functions being called, and I'll dig into those. -- -- Christophe Pettus xof@thebuild.com -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
> On Oct 9, 2017, at 13:26, Tom Lane <tgl@sss.pgh.pa.us> wrote: > My bet is that the source server did something that's provoking O(N^2) > behavior in the standby server's lock management. It's hard to say > exactly what, but I'm wondering about something like a plpgsql function > taking an AccessExclusiveLock inside a loop that repeatedly traps an > exception. Can you correlate where the standby is stuck with what > was happening on the source? Interestingly, the OIDs for the relations on which the locks on the secondary are held aren't present in pg_class, and they'reclustered together. Could a large number of temporary table creations that are being undone by an abort cause this? -- -- Christophe Pettus xof@thebuild.com -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Christophe Pettus <xof@thebuild.com> writes: >> On Oct 9, 2017, at 13:26, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> My bet is that the source server did something that's provoking O(N^2) >> behavior in the standby server's lock management. It's hard to say >> exactly what, but I'm wondering about something like a plpgsql function >> taking an AccessExclusiveLock inside a loop that repeatedly traps an >> exception. Can you correlate where the standby is stuck with what >> was happening on the source? > Interestingly, the OIDs for the relations on which the locks on the secondary are held aren't present in pg_class, andthey're clustered together. Could a large number of temporary table creations that are being undone by an abort causethis? Hmm. Creating or dropping a temp table does take AccessExclusiveLock, just as it does for a non-temp table. In principle we'd not have to transmit those locks to standbys, but I doubt that the WAL code has enough knowledge to filter them out. So a lot of temp tables and a lot of separate subtransactions could be a nasty combination. regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
> On Oct 9, 2017, at 14:29, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Hmm. Creating or dropping a temp table does take AccessExclusiveLock, > just as it does for a non-temp table. In principle we'd not have to > transmit those locks to standbys, but I doubt that the WAL code has > enough knowledge to filter them out. So a lot of temp tables and > a lot of separate subtransactions could be a nasty combination. The problem indeed appear to be a very large number of subtransactions, each one creating a temp table, inside a single transaction. It's made worse by one of those transactions finally getting replayed on the secondary, only to have anotherone come in right behind it... -- -- Christophe Pettus xof@thebuild.com -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
On Mon, Oct 9, 2017 at 6:12 PM, Christophe Pettus <xof@thebuild.com> wrote: > >> On Oct 9, 2017, at 14:29, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Hmm. Creating or dropping a temp table does take AccessExclusiveLock, >> just as it does for a non-temp table. In principle we'd not have to >> transmit those locks to standbys, but I doubt that the WAL code has >> enough knowledge to filter them out. So a lot of temp tables and >> a lot of separate subtransactions could be a nasty combination. > > The problem indeed appear to be a very large number of subtransactions, each one creating a temp table, inside a singletransaction. It's made worse by one of those transactions finally getting replayed on the secondary, only to haveanother one come in right behind it... FYI, this is pathological coding; it can cause other nastiness such as exhausting shared memory. Consider refactoring the code to using a permanent table (if access is concurrent, you can do tricks like using 'pid' in the key to subdivide the data). merlin -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Christophe Pettus <xof@thebuild.com> writes: > The problem indeed appear to be a very large number of subtransactions, each one creating a temp table, inside a singletransaction. It's made worse by one of those transactions finally getting replayed on the secondary, only to haveanother one come in right behind it... Hmm, I tried to reproduce this and could not. I experimented with various permutations of this: create or replace function goslow(n int) returns void language plpgsql as $$ begin for i in 1..n loop begin execute 'create temp table myfoo' || i || '(f1 int)'; perform i/0; exception when division_by_zerothen null; end; end loop; perform n/0; end; $$; and did not see any untoward behavior, at least not till I got to enough temp tables to overrun the master's shared lock table, and even then it cleaned up fine. At no point was the standby process consuming anywhere near as much CPU as the master's backend. What am I missing to reproduce the problem? regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
On Mon, Oct 9, 2017 at 5:30 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > and did not see any untoward behavior, at least not till I got to enough > temp tables to overrun the master's shared lock table, and even then it > cleaned up fine. At no point was the standby process consuming anywhere > near as much CPU as the master's backend. > > What am I missing to reproduce the problem? Just a guess, but do you disable autovacuum on your dev machine? (I know I do.) It's possible that this is relevant: https://postgr.es/m/CAB-EU3RawZx8-OzMfvswFf6z+Y7GOZf03TZ=bez+PbQX+A4M7Q@mail.gmail.com -- Peter Geoghegan -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Peter Geoghegan <pg@bowt.ie> writes: > Just a guess, but do you disable autovacuum on your dev machine? (I know I do.) Nope. regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
On Mon, Oct 9, 2017 at 12:08 PM, Christophe Pettus <xof@thebuild.com> wrote: > Suggestions on further diagnosis? What's the hot_standy_feedback setting? How about max_standby_archive_delay/max_standby_streaming_delay? -- Peter Geoghegan -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
> On Oct 9, 2017, at 18:21, Peter Geoghegan <pg@bowt.ie> wrote: > What's the hot_standy_feedback setting? How about > max_standby_archive_delay/max_standby_streaming_delay? On, 5m, 5m. -- -- Christophe Pettus xof@thebuild.com -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
> On Oct 9, 2017, at 17:30, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > What am I missing to reproduce the problem? Not sure. The actual client behavior here is a bit cryptic (not our code, incompletely logs). They might be creating asavepoint before each temp table creation, without a release, but I would expect that to exhaust something else first. -- -- Christophe Pettus xof@thebuild.com -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Tom Lane wrote: > Christophe Pettus <xof@thebuild.com> writes: > > The problem indeed appear to be a very large number of subtransactions, each one creating a temp table, inside a singletransaction. It's made worse by one of those transactions finally getting replayed on the secondary, only to haveanother one come in right behind it... > > Hmm, I tried to reproduce this and could not. I experimented with > various permutations of this: This problem is probably related to commit 9b013dc238c, which AFAICS is only in pg10, not 9.5. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Alvaro Herrera <alvherre@alvh.no-ip.org> writes: > Tom Lane wrote: >> Hmm, I tried to reproduce this and could not. I experimented with >> various permutations of this: > This problem is probably related to commit 9b013dc238c, which AFAICS is > only in pg10, not 9.5. You're right, I was testing on HEAD, so that patch might've obscured the problem. But the code looks like it could still be O(N^2) in some cases. Will look again later. regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
> On Oct 10, 2017, at 08:05, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > You're right, I was testing on HEAD, so that patch might've obscured > the problem. But the code looks like it could still be O(N^2) in > some cases. Will look again later. I was able to reproduce this on 9.5.9 with the following: DO $$ DECLARE i int := 1; BEGIN FOR i IN 1..12000 LOOP BEGIN PERFORM f(); i := i / 0; EXCEPTION WHEN division_by_zeroTHEN END; END LOOP; END; $$ language plpgsql; where f() is: CREATE OR REPLACE FUNCTION f() RETURNS VOID AS $$ BEGIN CREATE TEMPORARY TABLE test_table ON COMMIT DROP AS SELECT i FROM generate_series(1, 100) i; END: $$ language plpgsql; A couple of observations: -- In this version, I couldn't do a select * from pg_locks() on the secondary without getting an out-of-shared-memory error. -- If I increased max_locks_per_transaction to 15000, the problem didn't occur, even if I bumped up the number of iterationsin the first to 20000. -- -- Christophe Pettus xof@thebuild.com -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Christophe Pettus <xof@thebuild.com> writes: > I was able to reproduce this on 9.5.9 with the following: Hmm ... so I still can't reproduce the specific symptoms Christophe reports. What I see is that, given this particular test case, the backend process on the master never holds more than a few locks at a time. Each time we abort a subtransaction, the AE lock it was holding on the temp table it created gets dropped. However ... on the standby server, pre v10, the replay process attempts to take all 12000 of those AE locks at once. This is not a great plan. On 9.5, for me, as soon as we're out of shared memory ResolveRecoveryConflictWithLock will go into an infinite loop. And AFAICS it *is* infinite; it doesn't look to me like it's making any progress. This is pretty easy to diagnose though because it spews "out of shared memory" WARNING messages to the postmaster log at an astonishing rate. 9.6 hits the OOM condition as well, but manages to get out of it somehow. I'm not very clear how, and the log trace doesn't look like it's real clean: after a bunch of these WARNING: out of shared memory CONTEXT: xlog redo at 0/C1098AC0 for Standby/LOCK: xid 134024 db 423347 rel 524106 WARNING: out of shared memory CONTEXT: xlog redo at 0/C10A97E0 for Standby/LOCK: xid 134024 db 423347 rel 524151 WARNING: out of shared memory CONTEXT: xlog redo at 0/C10B36B0 for Standby/LOCK: xid 134024 db 423347 rel 524181 WARNING: out of shared memory CONTEXT: xlog redo at 0/C10BD780 for Standby/LOCK: xid 134024 db 423347 rel 524211 you get a bunch of these WARNING: you don't own a lock of type AccessExclusiveLock CONTEXT: xlog redo at 0/C13A79B0 for Transaction/COMMIT: 2017-10-10 15:05:56.615721-04 LOG: RecoveryLockList contains entry for lock no longer recorded by lock manager: xid 134024 database 423347 relation 526185 CONTEXT: xlog redo at 0/C13A79B0 for Transaction/COMMIT: 2017-10-10 15:05:56.615721-04 WARNING: you don't own a lock of type AccessExclusiveLock CONTEXT: xlog redo at 0/C13A79B0 for Transaction/COMMIT: 2017-10-10 15:05:56.615721-04 LOG: RecoveryLockList contains entry for lock no longer recorded by lock manager: xid 134024 database 423347 relation 526188 CONTEXT: xlog redo at 0/C13A79B0 for Transaction/COMMIT: 2017-10-10 15:05:56.615721-04 WARNING: you don't own a lock of type AccessExclusiveLock CONTEXT: xlog redo at 0/C13A79B0 for Transaction/COMMIT: 2017-10-10 15:05:56.615721-04 LOG: RecoveryLockList contains entry for lock no longer recorded by lock manager: xid 134024 database 423347 relation 526191 CONTEXT: xlog redo at 0/C13A79B0 for Transaction/COMMIT: 2017-10-10 15:05:56.615721-04 The important point though is that "a bunch" is a finite number, whereas 9.5 seems to be just stuck. I'm not sure how Christophe's server managed to continue to make progress. It looks like the 9.6-era patch 37c54863c must have been responsible for that behavioral change. There's no indication in the commit message or the comments that anyone had specifically considered the OOM scenario, so I think it's just accidental that it's better. v10 and HEAD avoid the problem because the standby server doesn't take locks (any at all, AFAICS). I suppose this must be a consequence of commit 9b013dc238c, though I'm not sure exactly how. Anyway, it's pretty scary that it's so easy to run the replay process out of shared memory pre-v10. I wonder if we should consider backpatching that fix. Any situation where the replay process takes more locks concurrently than were ever held on the master is surely very bad news. A marginally lesser concern is that the replay process does need to have robust behavior in the face of locktable OOM. AFAICS whatever it is doing now is just accidental, and I'm not sure it's correct. "Doesn't get into an infinite loop" is not a sufficiently high bar. And I'm still wondering exactly what Christophe actually saw ... regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
On 10 October 2017 at 21:23, Tom Lane <tgl@sss.pgh.pa.us> wrote: > What I see is that, given this particular test case, the backend > process on the master never holds more than a few locks at a time. > Each time we abort a subtransaction, the AE lock it was holding > on the temp table it created gets dropped. However ... on the > standby server, pre v10, the replay process attempts to take all > 12000 of those AE locks at once. This is not a great plan. Standby doesn't take locks "at once", they are added just as they arrive. The locks are held by topxid, so not released at subxid abort, by design, so they are held concurrently. > v10 and HEAD avoid the problem because the standby server doesn't > take locks (any at all, AFAICS). I suppose this must be a > consequence of commit 9b013dc238c, though I'm not sure exactly how. Locks are still taken, but in 9b013dc238c we just avoid trying to release locks when transactions don't have any. > Anyway, it's pretty scary that it's so easy to run the replay process > out of shared memory pre-v10. I wonder if we should consider > backpatching that fix. Any situation where the replay process takes > more locks concurrently than were ever held on the master is surely > very bad news. v10 improves on this specific point because we perform lock release at subxid abort. Various cases have been reported over time and this has been improving steadily in each release. It isn't "easy" to run the replay process out of memory because clearly that doesn't happen much, but yes there are some pessimal use cases that don't work well. The use case described seems incredibly unreal and certainly amenable to being rewritten. Backpatching some of those fixes is quite risky, IMHO. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
> On Oct 10, 2017, at 23:54, Simon Riggs <simon@2ndQuadrant.com> wrote: > > The use case described seems incredibly > unreal and certainly amenable to being rewritten. While it's certainly true that this was an extreme case, it was a real-life production situation. The concern here is thatin the actual production situation, the only symptom was that the startup process just stopped. There were no log messagesor any other indication of what was going wrong. -- -- Christophe Pettus xof@thebuild.com -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
On 11 October 2017 at 08:09, Christophe Pettus <xof@thebuild.com> wrote: > >> On Oct 10, 2017, at 23:54, Simon Riggs <simon@2ndQuadrant.com> wrote: >> >> The use case described seems incredibly >> unreal and certainly amenable to being rewritten. > > While it's certainly true that this was an extreme case, it was a real-life production situation. The concern here isthat in the actual production situation, the only symptom was that the startup process just stopped. There were no logmessages or any other indication of what was going wrong. Which indicates it was making progress, just slowly. Tom says "This is pretty easy to diagnose though because it spews "out of shared memory" WARNING messages to the postmaster log at an astonishing rate" These don't seem to match. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Simon Riggs <simon@2ndquadrant.com> writes: > On 11 October 2017 at 08:09, Christophe Pettus <xof@thebuild.com> wrote: >> While it's certainly true that this was an extreme case, it was a real-life production situation. The concern here isthat in the actual production situation, the only symptom was that the startup process just stopped. There were no logmessages or any other indication of what was going wrong. > Which indicates it was making progress, just slowly. > Tom says "This is pretty easy to diagnose though > because it spews "out of shared memory" WARNING messages to the > postmaster log at an astonishing rate" > These don't seem to match. Yeah. I'm still suspicious that Christophe saw some other misbehavior than the one I found. We know his server was dealing with < 10K locks, which doesn't seem like enough to cause any obvious problem from a mere O(N^2) behavior. regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general