Обсуждение: BUG #4565: nextval not updated during wal replication, leading to pk violations

Поиск
Список
Период
Сортировка

BUG #4565: nextval not updated during wal replication, leading to pk violations

От
"Marc Schablewski"
Дата:
The following bug has been logged online:

Bug reference:      4565
Logged by:          Marc Schablewski
Email address:      ms@clickware.de
PostgreSQL version: 8.3.5
Operating system:   Debian, 2.6.24-19 kernel
Description:        nextval not updated during wal replication, leading to
pk violations
Details:

Hi!

We are using "Continuous Archiving" of WAL to keep a warm standby database
server. We recently switch over to that backup database for testing purpose.
We first took an online backup of the master database, let the WAL shipping
run for some days and finally started the backup for normal use.

A job tried to insert some records into a few tables and complained about a
PK violation. PK are of type bigint. The column is populated by a sequence
and a "default nextval()" on the PK column. We found that the sequence's
currval was lower than the maximum value in the table and that this value
was already present. Further investigation showed us that the last records
in the table were inserted on the former master server while taking the
initial online backup for the replication. It seems that the records got
replicated but not the currval/nextval of the sequence. When running "select
nextval()" on the backup database it returned the PK value of the first
record inserted during that last run on the former master server.

I couldn't reproduce the problem with two different servers and a simple
database containing one table. Right now, I don't have any idea how to
narrow down the problem or what to check next. So any hint would be
helpful.

Marc

Re: BUG #4565: nextval not updated during wal replication, leading to pk violations

От
Marc Schablewski
Дата:
FYI: We found another anomaly in that database (duplicated record with a
unique index on a column (non PK)) and so we did some more research. As
before, the record got replicated, but this time, the index entry was
not. Later, a job tried to check if that record was already there by
querying the column (thus, using the index) and couldn't find it so it
was inserted again.

We first thought it could be a WAL that wasn't restored. If I understand
pg_backup_start() and _stop() right, the first WAL that should have been
requested after the online backup had finished should have been the one
mentioned in the "START WAL LOCATION" line of the backup history file,
but the first file that actually gets requested is that in the "STOP WAL
LOCATION" line.

000000020000004B00000027.00396450.backup:

START WAL LOCATION: 4B/27396450 (file 000000020000004B00000027)
STOP WAL LOCATION: 4B/28DDF968 (file 000000020000004B00000028)
CHECKPOINT LOCATION: 4B/27396450
START TIME: 2008-11-28 16:29:17 CET
LABEL: backup-20081128-162916
STOP TIME: 2008-11-28 17:49:09 CET

Both WAL files were copied to the backup system. The log switch occurred
before 17:06, because at that time the file was copied to the backup
server by the archive command. But sadly, the records were all part of a
single transaction which ran between 17:32 and 17:33, so obviously this
is not a part of the problem. Though it seems a bit strange to me that
the ...27 WAL didn't get recovered, but maybe I just don't fully
understand that part of the docs (Chapter 24.3.2., fifth paragraph after
the ordered list).

Right now, I run a fresh online backup to check  if we can reproduce the
problem. I made some minor corrections on my scripts which prepare the
backup database for replication, which might have caused the problem. I
now omit the pg_xlog directory when taking the initial backup, after
rereading chapter 24.3.2. saying there is a "risk of mistakes" if one
copies those files, whatever that means. Maybe someone could clarify
what the risks and mistakes could be?

Speaking of chapter 24.3.2.: The last sentence in second paragraph after
the ordered list ("You must ensure that these steps are carried out in
sequence without any possible overlap, or you will invalidate the
backup.") seems a bit misplaced. As far as I understand, it makes a
reference to the ordered list ("steps"), but I had to read it several
times to get it. I think, a better place would be right after the list.
...But then, maybe it's just me, not sleeping well last night. ;)

Marc


Marc Schablewski wrote:
> The following bug has been logged online:
>
> Bug reference:      4565
> Logged by:          Marc Schablewski
> Email address:      ms@clickware.de
> PostgreSQL version: 8.3.5
> Operating system:   Debian, 2.6.24-19 kernel
> Description:        nextval not updated during wal replication, leading to
> pk violations
> Details:
>
> Hi!
>
> We are using "Continuous Archiving" of WAL to keep a warm standby database
> server. We recently switch over to that backup database for testing purpose.
> We first took an online backup of the master database, let the WAL shipping
> run for some days and finally started the backup for normal use.
>
> A job tried to insert some records into a few tables and complained about a
> PK violation. PK are of type bigint. The column is populated by a sequence
> and a "default nextval()" on the PK column. We found that the sequence's
> currval was lower than the maximum value in the table and that this value
> was already present. Further investigation showed us that the last records
> in the table were inserted on the former master server while taking the
> initial online backup for the replication. It seems that the records got
> replicated but not the currval/nextval of the sequence. When running "select
> nextval()" on the backup database it returned the PK value of the first
> record inserted during that last run on the former master server.
>
> I couldn't reproduce the problem with two different servers and a simple
> database containing one table. Right now, I don't have any idea how to
> narrow down the problem or what to check next. So any hint would be
> helpful.
>
> Marc
>
>

Re: BUG #4565: nextval not updated during wal replication, leading to pk violations

От
Tom Lane
Дата:
"Marc Schablewski" <ms@clickware.de> writes:
> A job tried to insert some records into a few tables and complained about a
> PK violation. PK are of type bigint. The column is populated by a sequence
> and a "default nextval()" on the PK column. We found that the sequence's
> currval was lower than the maximum value in the table and that this value
> was already present. Further investigation showed us that the last records
> in the table were inserted on the former master server while taking the
> initial online backup for the replication. It seems that the records got
> replicated but not the currval/nextval of the sequence. When running "select
> nextval()" on the backup database it returned the PK value of the first
> record inserted during that last run on the former master server.

How many sequence values "overlapped" here, exactly?  If more than one,
are you by any chance running that sequence with a CACHE setting larger
than one?  (See the sequence's cache_value column if you're unsure.)

I can see what might be a race condition between nextval()'s WAL logging
and checkpoint start, but it doesn't seem like it could account for more
than the CACHE setting worth of overlap.

            regards, tom lane

Re: BUG #4565: nextval not updated during wal replication, leading to pk violations

От
Marc Schablewski
Дата:
Tom Lane wrote:
> "Marc Schablewski" <ms@clickware.de> writes:
>
>> A job tried to insert some records into a few tables and complained about a
>> PK violation. PK are of type bigint. The column is populated by a sequence
>> and a "default nextval()" on the PK column. We found that the sequence's
>> currval was lower than the maximum value in the table and that this value
>> was already present. Further investigation showed us that the last records
>> in the table were inserted on the former master server while taking the
>> initial online backup for the replication. It seems that the records got
>> replicated but not the currval/nextval of the sequence. When running "select
>> nextval()" on the backup database it returned the PK value of the first
>> record inserted during that last run on the former master server.
>>
>
> How many sequence values "overlapped" here, exactly?  If more than one,
> are you by any chance running that sequence with a CACHE setting larger
> than one?  (See the sequence's cache_value column if you're unsure.)
>
> I can see what might be a race condition between nextval()'s WAL logging
> and checkpoint start, but it doesn't seem like it could account for more
> than the CACHE setting worth of overlap.
>
>             regards, tom lane
>
>
Thanks for your reply.

cache_value is set to one for all sequences. As far as I can tell, they
were all created by a plain CREATE SEQUENCE seq_name and no other
settings changed. And as we found out later this "loss of information"
hit some indexes as well.

We took a fresh backup yesterday and again we saw that only the last WAL
generated during the backup run was restored on our backup system. I'm
sure that this is the real problem.

START WAL LOCATION: 4E/36F2C280 (file 000000020000004E00000036)
STOP WAL LOCATION: 4E/3944C988 (file 000000020000004E00000039)
CHECKPOINT LOCATION: 4E/36F2C280
START TIME: 2008-12-08 17:17:34 CET
LABEL: backup-20081208-171733
STOP TIME: 2008-12-08 18:22:25 CET

File 000000020000004E00000039 and all WAL that were generated afterwards
were restored. Files ...36 to ...38 weren't even requested. After
starting up the database, we had a table with records that were not
present it's PK index. But the were in the index on the former master
system. I ran an explicit pg_switch_xlog() on the master database before
switching to the backup to make sure all changes got transferred.

I'll check our scripts that do the backup/archiving/recovery again,
there still might be some kind of bug. But beside that, I'd be thankful
for any hints what else I could check.

Marc

Re: BUG #4565: nextval not updated during wal replication, leading to pk violations

От
Tom Lane
Дата:
Marc Schablewski <ms@clickware.de> writes:
> cache_value is set to one for all sequences. As far as I can tell, they
> were all created by a plain CREATE SEQUENCE seq_name and no other
> settings changed. And as we found out later this "loss of information"
> hit some indexes as well.

> We took a fresh backup yesterday and again we saw that only the last WAL
> generated during the backup run was restored on our backup system. I'm
> sure that this is the real problem.

This is beginning to sound like an error in your backup/restore
procedures.  Please describe exactly what you're doing.

            regards, tom lane

Re: BUG #4565: nextval not updated during wal replication, leading to pk violations

От
Marc Schablewski
Дата:
Tom Lane wrote:
> Marc Schablewski <ms@clickware.de> writes:
>
>> cache_value is set to one for all sequences. As far as I can tell, they
>> were all created by a plain CREATE SEQUENCE seq_name and no other
>> settings changed. And as we found out later this "loss of information"
>> hit some indexes as well.
>>
>
>
>> We took a fresh backup yesterday and again we saw that only the last WAL
>> generated during the backup run was restored on our backup system. I'm
>> sure that this is the real problem.
>>
>
> This is beginning to sound like an error in your backup/restore
> procedures.  Please describe exactly what you're doing.
>
>             regards, tom lane
>
>
I'm sorry. It was all my fault. If been a bit to picky about the files I
copy, and I didn't copy the files directly under the cluster directory.
I thought, there are just the config files, so what's the point. But
this way of course, I didn't copy the backup_label either. It was never
there when I looked, because it was either before or after running a
backup. Reading and especially understanding the manual sometimes helps.
doh!

Now everything seems to work fine, but we are still testing. At least
what we see in the logs is more reasonable. Now the .backup file is
requested first, then the WALs. There is one strange thing left, though.
The server first requests the second WAL, then the first one, then the
second again and then it processes them in order (second, third, fourth,
...). Is this normal?

I hope I didn't distract you from your regular work too much. Thanks again.

Marc

Re: BUG #4565: nextval not updated during wal replication, leading to pk violations

От
Tom Lane
Дата:
Marc Schablewski <ms@clickware.de> writes:
> Now everything seems to work fine, but we are still testing. At least
> what we see in the logs is more reasonable. Now the .backup file is
> requested first, then the WALs. There is one strange thing left, though.
> The server first requests the second WAL, then the first one, then the
> second again and then it processes them in order (second, third, fourth,
> ...). Is this normal?

It doesn't surprise me if something like that happens during PITR
startup.

            regards, tom lane