Обсуждение: Cluster "stuck" in "not accepting commands to avoid wraparound data loss"

Поиск
Список
Период
Сортировка

Cluster "stuck" in "not accepting commands to avoid wraparound data loss"

От
Andres Freund
Дата:
Hi,

I recently started a pgbench benchmark (to evaluate a piece of hardware,
not postgres) with master. Unfortunately, by accident, I started
postgres in a shell, not screen like pgbench.

Just logged back in and saw:
client 71 aborted in state 8: ERROR:  database is not accepting commands to avoid wraparound data loss in database
"postgres"
HINT:  Stop the postmaster and vacuum that database in single-user mode.
You might also need to commit or roll back old prepared transactions.
transaction type: TPC-B (sort of)
scaling factor: 300
query mode: prepared
number of clients: 97
number of threads: 97
duration: 300000 s
number of transactions actually processed: 2566862424
latency average: 3.214 ms
latency stddev: 7.336 ms
tps = 30169.374133 (including connections establishing)
tps = 30169.378406 (excluding connections establishing)

Hm. Bad news. We apparently didn't keep up vacuuming. But worse news is
that even now, days later, autovacuum hasn't progressed:
postgres=# select txid_current();
ERROR:  database is not accepting commands to avoid wraparound data loss in database "postgres"
HINT:  Stop the postmaster and vacuum that database in single-user mode.
You might also need to commit or roll back old prepared transactions.

Looking at datfrozenxid:
postgres=# select datname, datfrozenxid, age(datfrozenxid) FROM pg_database ; datname  | datfrozenxid |    age    
-----------+--------------+-----------template1 |   3357685367 |         0template0 |   3357685367 |         0postgres
|  3159867733 | 197817634
 
(3 rows)
reveals that the launcher doesn't do squat because it doesn't think it
needs to do anything.

(gdb) p *ShmemVariableCache
$3 = {nextOid = 24576, oidCount = 0, nextXid = 3357685367, oldestXid = 1211201715, xidVacLimit = 1411201715,
xidWarnLimit= 3347685362,  xidStopLimit = 3357685362, xidWrapLimit = 3358685362, oldestXidDB = 12380, oldestCommitTs =
0,newestCommitTs = 0,  latestCompletedXid = 3357685366}
 

'oldestXid' shows the problem: We're indeed pretty short before a
wraparound.


The question is, how did we get here? My current working theory, not
having any logs available, is that two autovacuum workers ran at the
same time. Both concurrently entered vac_update_datfrozenxid(). As both
haven't committed at that time, they can't see each other's updates to
datfrozenxid. And thus vac_truncate_clog(), called by both, won't see a
changed horizon.

Does that make sense?

If so, what can we do about it? After chatting a bit with Alvaro  I can
see two avenues:
1) Hold a self-conflicting lock on pg_database in vac_truncate_clog(),  and don't release the lock until the
transactionend. As the  pg_database scan uses a fresh snapshot, that ought to guarantee  progress.
 
2) Do something like vac_truncate_clog() in the autovacuum launcher,  once every idle cycle or so. That'd then unwedge
us.

Neither of these sound particularly pretty.


Additionally something else has to be going on here - why on earth
wasn't a autovacuum started earlier? The above kinda looks like the
vacuums on template* ran at a very similar time, and only pretty
recently.


I left the cluster hanging in it's stuck state for now, so we have a
chance to continue investigating.

Greetings,

Andres Freund



Re: Cluster "stuck" in "not accepting commands to avoid wraparound data loss"

От
Robert Haas
Дата:
On Thu, Dec 10, 2015 at 4:55 PM, Andres Freund <andres@anarazel.de> wrote:
> Hi,
>
> I recently started a pgbench benchmark (to evaluate a piece of hardware,
> not postgres) with master. Unfortunately, by accident, I started
> postgres in a shell, not screen like pgbench.
>
> Just logged back in and saw:
> client 71 aborted in state 8: ERROR:  database is not accepting commands to avoid wraparound data loss in database
"postgres"
> HINT:  Stop the postmaster and vacuum that database in single-user mode.
> You might also need to commit or roll back old prepared transactions.
> transaction type: TPC-B (sort of)
> scaling factor: 300
> query mode: prepared
> number of clients: 97
> number of threads: 97
> duration: 300000 s
> number of transactions actually processed: 2566862424
> latency average: 3.214 ms
> latency stddev: 7.336 ms
> tps = 30169.374133 (including connections establishing)
> tps = 30169.378406 (excluding connections establishing)
>
> Hm. Bad news. We apparently didn't keep up vacuuming. But worse news is
> that even now, days later, autovacuum hasn't progressed:
> postgres=# select txid_current();
> ERROR:  database is not accepting commands to avoid wraparound data loss in database "postgres"
> HINT:  Stop the postmaster and vacuum that database in single-user mode.
> You might also need to commit or roll back old prepared transactions.
>
> Looking at datfrozenxid:
> postgres=# select datname, datfrozenxid, age(datfrozenxid) FROM pg_database ;
>   datname  | datfrozenxid |    age
> -----------+--------------+-----------
>  template1 |   3357685367 |         0
>  template0 |   3357685367 |         0
>  postgres  |   3159867733 | 197817634
> (3 rows)
> reveals that the launcher doesn't do squat because it doesn't think it
> needs to do anything.
>
> (gdb) p *ShmemVariableCache
> $3 = {nextOid = 24576, oidCount = 0, nextXid = 3357685367, oldestXid = 1211201715, xidVacLimit = 1411201715,
xidWarnLimit= 3347685362,
 
>   xidStopLimit = 3357685362, xidWrapLimit = 3358685362, oldestXidDB = 12380, oldestCommitTs = 0, newestCommitTs = 0,
>   latestCompletedXid = 3357685366}
>
> 'oldestXid' shows the problem: We're indeed pretty short before a
> wraparound.

Is the postmaster in a "stopped" state (T)?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Cluster "stuck" in "not accepting commands to avoid wraparound data loss"

От
Andres Freund
Дата:
On 2015-12-10 17:13:12 -0500, Robert Haas wrote:
> Is the postmaster in a "stopped" state (T)?

No, running normally. New connections are possible without problems.



Re: Cluster "stuck" in "not accepting commands to avoid wraparound data loss"

От
Jeff Janes
Дата:
On Thu, Dec 10, 2015 at 1:55 PM, Andres Freund <andres@anarazel.de> wrote:
> Hi,
>
> I recently started a pgbench benchmark (to evaluate a piece of hardware,
> not postgres) with master. Unfortunately, by accident, I started
> postgres in a shell, not screen like pgbench.
>
> Just logged back in and saw:
> client 71 aborted in state 8: ERROR:  database is not accepting commands to avoid wraparound data loss in database
"postgres"
> HINT:  Stop the postmaster and vacuum that database in single-user mode.
> You might also need to commit or roll back old prepared transactions.
> transaction type: TPC-B (sort of)
> scaling factor: 300
> query mode: prepared
> number of clients: 97
> number of threads: 97
> duration: 300000 s
> number of transactions actually processed: 2566862424
> latency average: 3.214 ms
> latency stddev: 7.336 ms
> tps = 30169.374133 (including connections establishing)
> tps = 30169.378406 (excluding connections establishing)
>
> Hm. Bad news. We apparently didn't keep up vacuuming. But worse news is
> that even now, days later, autovacuum hasn't progressed:
> postgres=# select txid_current();
> ERROR:  database is not accepting commands to avoid wraparound data loss in database "postgres"
> HINT:  Stop the postmaster and vacuum that database in single-user mode.
> You might also need to commit or roll back old prepared transactions.

This is still in regular mode, correct?

I don't think this has ever worked.  Vacuum needs to start a
transaction in order to record its update of datfrozenxid and
relfrozenxid to the catalogs (or at least, starts one for something).
Once you are within 1,000,000 of wraparound, you have to do the vacuum
in single-user mode, you can no longer just wait for autovacuum to do
its thing.  Otherwise the vacuum will do all the work of the vacuum,
but then fail to clear the error condition.



>
> Looking at datfrozenxid:
> postgres=# select datname, datfrozenxid, age(datfrozenxid) FROM pg_database ;
>   datname  | datfrozenxid |    age
> -----------+--------------+-----------
>  template1 |   3357685367 |         0
>  template0 |   3357685367 |         0
>  postgres  |   3159867733 | 197817634
> (3 rows)
> reveals that the launcher doesn't do squat because it doesn't think it
> needs to do anything.
>
> (gdb) p *ShmemVariableCache
> $3 = {nextOid = 24576, oidCount = 0, nextXid = 3357685367, oldestXid = 1211201715, xidVacLimit = 1411201715,
xidWarnLimit= 3347685362,
 
>   xidStopLimit = 3357685362, xidWrapLimit = 3358685362, oldestXidDB = 12380, oldestCommitTs = 0, newestCommitTs = 0,
>   latestCompletedXid = 3357685366}
>
> 'oldestXid' shows the problem: We're indeed pretty short before a
> wraparound.
>
>
> The question is, how did we get here?

Could the database have undergone a crash and recovery cycle?

Since changes to datfrozenxid are WAL logged at the time they occur,
but the supposedly-synchronous change to ShmemVariableCache is not WAL
logged until the next checkpoint, a well timed crash can leave you in
the state where the system is in a tizzy about wraparound but each
database says "Nope, not me".

Since with default settings each database/table gets frozen 10 times
per real wrap-around, this is usually not going to be a problem as
having 10 consecutive well timed crashes is very unlikely.

But if you increase autovacuum_freeze_max_age a lot, or if the freeze
scan takes so long that there is only time to complete one and a
fraction of them during a single real wrap-around interval, then just
a single crash can you leave you destined for trouble.

Cheers,

Jeff



Re: Cluster "stuck" in "not accepting commands to avoid wraparound data loss"

От
Robert Haas
Дата:
On Fri, Dec 11, 2015 at 1:08 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
> Since changes to datfrozenxid are WAL logged at the time they occur,
> but the supposedly-synchronous change to ShmemVariableCache is not WAL
> logged until the next checkpoint, a well timed crash can leave you in
> the state where the system is in a tizzy about wraparound but each
> database says "Nope, not me".

ShmemVariableCache is an in-memory data structure, so it's going to
get blown away and rebuilt on a crash.  But I guess it gets rebuild
from the contents of the most recent checkpoint record, so that
doesn't actually help.  However, I wonder if it would be safe to for
the autovacuum launcher to calculate an updated value and call
SetTransactionIdLimit() to update ShmemVariableCache.

But I'm somewhat confused what this has to do with Andres's report.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Cluster "stuck" in "not accepting commands to avoid wraparound data loss"

От
Andres Freund
Дата:
On 2015-12-11 10:08:32 -0800, Jeff Janes wrote:
> This is still in regular mode, correct?

Yes.

> I don't think this has ever worked.  Vacuum needs to start a
> transaction in order to record its update of datfrozenxid and
> relfrozenxid to the catalogs (or at least, starts one for something).
> Once you are within 1,000,000 of wraparound, you have to do the vacuum
> in single-user mode, you can no longer just wait for autovacuum to do
> its thing.  Otherwise the vacuum will do all the work of the vacuum,
> but then fail to clear the error condition.

But since we don't have, afaik, any code to handle that we should still
be starting workers to do the truncation. Which isn't happening here.

But more importantly, it *should* be impossible to have that combination
of oldestXid values with the datfrozenxid values.

> Could the database have undergone a crash and recovery cycle?

Could, but I don't think it has before the problem occurred. The first
pgbench failure was about not being able to assign xids anymore.

Greetings,

Andres Freund



Fwd: Cluster "stuck" in "not accepting commands to avoid wraparound data loss"

От
Jeff Janes
Дата:
Sorry, accidentally failed to include the list originally, here it is
for the list:

On Dec 16, 2015 9:52 AM, "Robert Haas" <robertmhaas@gmail.com> wrote:
>
> On Fri, Dec 11, 2015 at 1:08 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
> > Since changes to datfrozenxid are WAL logged at the time they occur,
> > but the supposedly-synchronous change to ShmemVariableCache is not WAL
> > logged until the next checkpoint, a well timed crash can leave you in
> > the state where the system is in a tizzy about wraparound but each
> > database says "Nope, not me".
>
> ShmemVariableCache is an in-memory data structure, so it's going to
> get blown away and rebuilt on a crash.  But I guess it gets rebuild
> from the contents of the most recent checkpoint record, so that
> doesn't actually help.  However, I wonder if it would be safe to for
> the autovacuum launcher to calculate an updated value and call
> SetTransactionIdLimit() to update ShmemVariableCache.

I was wondering if that should happen either at the end of crash
recovery (but I suppose you can't poll pg_database yet at that
point?), or immediately before throwing the "database is not accepting
commands to avoid wraparound data loss" error.

At which point would it make sense for the launcher do it?  I guess
just after it was started up under PMSIGNAL_START_AUTOVAC_LAUNCHER
conditions?

> But I'm somewhat confused what this has to do with Andres's report.

Doesn't it explain the exact situation he is in, where the oldest
database is 200 million, but the cluster as a whole is 2 billion?

Cheers,

Jeff



Re: Fwd: Cluster "stuck" in "not accepting commands to avoid wraparound data loss"

От
Andres Freund
Дата:
On 2015-12-17 09:04:25 -0800, Jeff Janes wrote:
> > But I'm somewhat confused what this has to do with Andres's report.
> 
> Doesn't it explain the exact situation he is in, where the oldest
> database is 200 million, but the cluster as a whole is 2 billion?

There were no crashes, so no, I don't think so.



Re: Cluster "stuck" in "not accepting commands to avoid wraparound data loss"

От
Jeff Janes
Дата:
On Thu, Dec 10, 2015 at 1:55 PM, Andres Freund <andres@anarazel.de> wrote:

>
> Looking at datfrozenxid:
> postgres=# select datname, datfrozenxid, age(datfrozenxid) FROM pg_database ;
>   datname  | datfrozenxid |    age
> -----------+--------------+-----------
>  template1 |   3357685367 |         0
>  template0 |   3357685367 |         0
>  postgres  |   3159867733 | 197817634
> (3 rows)
> reveals that the launcher doesn't do squat because it doesn't think it
> needs to do anything.
>
> (gdb) p *ShmemVariableCache
> $3 = {nextOid = 24576, oidCount = 0, nextXid = 3357685367, oldestXid = 1211201715, xidVacLimit = 1411201715,
xidWarnLimit= 3347685362,
 
>   xidStopLimit = 3357685362, xidWrapLimit = 3358685362, oldestXidDB = 12380, oldestCommitTs = 0, newestCommitTs = 0,
>   latestCompletedXid = 3357685366}

Do we know how template0 and template1 get frozen with xid which were
5 past the xidStopLimit?  Is that part of the mystery here, or is that
normal?

Cheers,

Jeff



Re: Fwd: Cluster "stuck" in "not accepting commands to avoid wraparound data loss"

От
Robert Haas
Дата:
On Thu, Dec 17, 2015 at 12:14 PM, Andres Freund <andres@anarazel.de> wrote:
> On 2015-12-17 09:04:25 -0800, Jeff Janes wrote:
>> > But I'm somewhat confused what this has to do with Andres's report.
>>
>> Doesn't it explain the exact situation he is in, where the oldest
>> database is 200 million, but the cluster as a whole is 2 billion?
>
> There were no crashes, so no, I don't think so.

Backing up a step, do we think that the fact that this was running in
a shell rather than a screen is relevant somehow?  Or did something
happen to this particular cluster totally unrelated to that?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Fwd: Cluster "stuck" in "not accepting commands to avoid wraparound data loss"

От
Andres Freund
Дата:
On 2015-12-17 13:08:15 -0500, Robert Haas wrote:
> On Thu, Dec 17, 2015 at 12:14 PM, Andres Freund <andres@anarazel.de> wrote:
> > On 2015-12-17 09:04:25 -0800, Jeff Janes wrote:
> >> > But I'm somewhat confused what this has to do with Andres's report.
> >>
> >> Doesn't it explain the exact situation he is in, where the oldest
> >> database is 200 million, but the cluster as a whole is 2 billion?
> >
> > There were no crashes, so no, I don't think so.
> 
> Backing up a step, do we think that the fact that this was running in
> a shell rather than a screen is relevant somehow?  Or did something
> happen to this particular cluster totally unrelated to that?

I reran the whole thing on a separate, but very similar, VM. Just
checked. Same thing happened. This time I have log files and
everything. No time to investigate right now, but it's reproducible if
you accept running tests for a week or so.



Re: Fwd: Cluster "stuck" in "not accepting commands to avoid wraparound data loss"

От
Robert Haas
Дата:
On Thu, Dec 17, 2015 at 1:20 PM, Andres Freund <andres@anarazel.de> wrote:
> On 2015-12-17 13:08:15 -0500, Robert Haas wrote:
>> On Thu, Dec 17, 2015 at 12:14 PM, Andres Freund <andres@anarazel.de> wrote:
>> > On 2015-12-17 09:04:25 -0800, Jeff Janes wrote:
>> >> > But I'm somewhat confused what this has to do with Andres's report.
>> >>
>> >> Doesn't it explain the exact situation he is in, where the oldest
>> >> database is 200 million, but the cluster as a whole is 2 billion?
>> >
>> > There were no crashes, so no, I don't think so.
>>
>> Backing up a step, do we think that the fact that this was running in
>> a shell rather than a screen is relevant somehow?  Or did something
>> happen to this particular cluster totally unrelated to that?
>
> I reran the whole thing on a separate, but very similar, VM. Just
> checked. Same thing happened. This time I have log files and
> everything. No time to investigate right now, but it's reproducible if
> you accept running tests for a week or so.

I don't think I'm going to speculate further until you have time to
investigate more.  It seems clear that autovacuum is going wrong
somehow, but it's extremely unclear why.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company