Обсуждение: Hot Backup with rsync fails at pg_clog if under load

Поиск
Список
Период
Сортировка

Hot Backup with rsync fails at pg_clog if under load

От
Linas Virbalas
Дата:
Hello,

* Context *

I'm observing problems with provisioning a standby from the master by
following a basic and documented "Making a Base Backup" [1] procedure with
rsync if, in the mean time, heavy load is applied on the master.

After searching the archives, the only more discussed and similar issue I
found hit was by Daniel Farina in a thread "hot backups: am I doing it
wrong, or do we have a problem with pg_clog?" [2], but, it seems, the issue
was discarded because of a non-standard backup procedure Deniel used.
However, I'm observing the same error with a simple procedure, hence this
message.

* Details *

Procedure:

1. Start load generator on the master (WAL archiving enabled).
2. Prepare a Streaming Replication standby (accepting WAL files too):
2.1. pg_switch_xlog() on the master;
2.2. pg_start_backup(Obackup_under_load¹) on the master (this will take a
while as master is loaded up);
2.3. rsync data/global/pg_control to the standby;
2.4. rsync all other data/ (without pg_xlog) to the standby;
2.5. pg_stop_backup() on the master;
2.6. Wait to receive all WAL files, generated during the backup, on the
standby;
2.6. Start the standby PG instance.

The last step will, usually, fail with a similar error:

2011-09-21 13:41:05 CEST LOG:  database system was interrupted; last known
up at 2011-09-21 13:40:50 CEST
Restoring 00000014.history
mv: cannot stat `/opt/PostgreSQL/9.1/archive/00000014.history': No such file
or directory
Restoring 00000013.history
2011-09-21 13:41:05 CEST LOG:  restored log file "00000013.history" from
archive
2011-09-21 13:41:05 CEST LOG:  entering standby mode
Restoring 0000001300000006000000DC
2011-09-21 13:41:05 CEST LOG:  restored log file "0000001300000006000000DC"
from archive
Restoring 0000001300000006000000DB
2011-09-21 13:41:05 CEST LOG:  restored log file "0000001300000006000000DB"
from archive
2011-09-21 13:41:05 CEST FATAL:  could not access status of transaction
1188673
2011-09-21 13:41:05 CEST DETAIL:  Could not read from file "pg_clog/0001" at
offset 32768: Success.
2011-09-21 13:41:05 CEST LOG:  startup process (PID 13819) exited with exit
code 1
2011-09-21 13:41:05 CEST LOG:  aborting startup due to startup process
failure

The procedure works very reliably if there is little or no load on the
master, but fails very often with the pg_clog error when load generator (few
thousands of SELECTs, ~60 INSERTs, ~60 DELETEs and ~60 UPDATES per second)
is started up.

I assumed that a file system backup taken during pg_start_backup and
pg_stop_backup is guaranteed to be consistent and that missing pieces will
be taken from the WAL files, generated & shipped during the backup, but is
it really?

Is this procedure missing some steps? Or maybe this a known issue?

Thank you,
Linas

[1] http://www.postgresql.org/docs/current/static/continuous-archiving.html
[2] http://archives.postgresql.org/pgsql-hackers/2011-04/msg01132.php



Re: Hot Backup with rsync fails at pg_clog if under load

От
Euler Taveira de Oliveira
Дата:
On 21-09-2011 11:44, Linas Virbalas wrote:

[This question doesn't belong to -hackers. Please post it in -general or -admin]

> Procedure:
>
> 1. Start load generator on the master (WAL archiving enabled).
> 2. Prepare a Streaming Replication standby (accepting WAL files too):
> 2.1. pg_switch_xlog() on the master;
You don't need this.

> 2.2. pg_start_backup(Obackup_under_load¹) on the master (this will take a
> while as master is loaded up);
No. if you use pg_start_backup('foo', true) it will be fast. Check the manual.

> 2.3. rsync data/global/pg_control to the standby;
Why are you doing this? If ...

> 2.4. rsync all other data/ (without pg_xlog) to the standby;
you will copy it again or no? Don't understand your point.

> 2.5. pg_stop_backup() on the master;
> 2.6. Wait to receive all WAL files, generated during the backup, on the
> standby;
> 2.6. Start the standby PG instance.
>
> The last step will, usually, fail with a similar error:
>
The problem could be that the minimum recovery point (step 2.3) is different 
from the end of rsync if you are under load.


--    Euler Taveira de Oliveira - Timbira       http://www.timbira.com.br/   PostgreSQL: Consultoria, Desenvolvimento,
Suporte24x7 e Treinamento
 


Re: Hot Backup with rsync fails at pg_clog if under load

От
Florian Pflug
Дата:
On Sep21, 2011, at 16:44 , Linas Virbalas wrote:
> After searching the archives, the only more discussed and similar issue I
> found hit was by Daniel Farina in a thread "hot backups: am I doing it
> wrong, or do we have a problem with pg_clog?" [2], but, it seems, the issue
> was discarded because of a non-standard backup procedure Deniel used.

That's not the way I read that thread. In fact, Robert Haas confirmed that
Daniel's backup procedure was sound in theory. The open question was whether the
error occurred because of a Bug in Daniel's backup code or postgresql's restore
code. The thread then petered out without that question being answered.

> Procedure:
>
> 1. Start load generator on the master (WAL archiving enabled).
> 2. Prepare a Streaming Replication standby (accepting WAL files too):
> 2.1. pg_switch_xlog() on the master;
> 2.2. pg_start_backup(Obackup_under_load¹) on the master (this will take a
> while as master is loaded up);
> 2.3. rsync data/global/pg_control to the standby;
> 2.4. rsync all other data/ (without pg_xlog) to the standby;
> 2.5. pg_stop_backup() on the master;
> 2.6. Wait to receive all WAL files, generated during the backup, on the
> standby;
> 2.6. Start the standby PG instance.

Looks good. (2.1) and (2.3) seem redundant (as Euler already noticed),
but shouldn't cause any errors.

Could you provide us with the exact rsync version and parameters you use?

> The last step will, usually, fail with a similar error:
>
> 2011-09-21 13:41:05 CEST LOG:  database system was interrupted; last known
> up at 2011-09-21 13:40:50 CEST
> Restoring 00000014.history
> mv: cannot stat `/opt/PostgreSQL/9.1/archive/00000014.history': No such file
> or directory
> Restoring 00000013.history
> 2011-09-21 13:41:05 CEST LOG:  restored log file "00000013.history" from
> archive
> 2011-09-21 13:41:05 CEST LOG:  entering standby mode
> Restoring 0000001300000006000000DC
> 2011-09-21 13:41:05 CEST LOG:  restored log file "0000001300000006000000DC"
> from archive
> Restoring 0000001300000006000000DB
> 2011-09-21 13:41:05 CEST LOG:  restored log file "0000001300000006000000DB"
> from archive
> 2011-09-21 13:41:05 CEST FATAL:  could not access status of transaction
> 1188673
> 2011-09-21 13:41:05 CEST DETAIL:  Could not read from file "pg_clog/0001" at
> offset 32768: Success.

Whats the size of the file (pg_clog/0001) on both the master and the slave?

best regards,
Florian Pflug



Re: Hot Backup with rsync fails at pg_clog if under load

От
Robert Haas
Дата:
On Wed, Sep 21, 2011 at 12:22 PM, Euler Taveira de Oliveira
<euler@timbira.com> wrote:
> [This question doesn't belong to -hackers. Please post it in -general or
> -admin]

-hackers or -bugs seems appropriate to me; I think this is a bug.

>> 2.2. pg_start_backup(Obackup_under_loadš) on the master (this will take a
>> while as master is loaded up);
>
> No. if you use pg_start_backup('foo', true) it will be fast. Check the
> manual.

If the server is sufficiently heavily loaded that a checkpoint takes a
nontrivial amount of time, the OP is correct that this will be not
fast, regardless of whether you choose to force an immediate
checkpoint.

>> 2.3. rsync data/global/pg_control to the standby;
>
> Why are you doing this? If ...
>
>> 2.4. rsync all other data/ (without pg_xlog) to the standby;
>
> you will copy it again or no? Don't understand your point.

His point is that exercising the bug depends on doing the copying in a
certain order.  Any order of copying the data theoretically ought to
be OK, as long as it's all between starting the backup and stopping
the backup, but apparently it isn't.

> The problem could be that the minimum recovery point (step 2.3) is different
> from the end of rsync if you are under load.

It seems pretty clear that some relevant chunk of WAL isn't getting
replayed, but it's not at all clear to me why not.  It seems like it
would be useful to compare the LSN returned by pg_start_backup() with
the location at which replay begins when you fire up the clone.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Hot Backup with rsync fails at pg_clog if under load

От
Linas Virbalas
Дата:
>>> 2.2. pg_start_backup(Obackup_under_loadš) on the master (this will take a
>>> while as master is loaded up);
>>
>> No. if you use pg_start_backup('foo', true) it will be fast. Check the
>> manual.
>
> If the server is sufficiently heavily loaded that a checkpoint takes a
> nontrivial amount of time, the OP is correct that this will be not
> fast, regardless of whether you choose to force an immediate
> checkpoint.

In order to check more cases, I have changed the procedure to force an
immediate checkpoint, i.e. pg_start_backup('backup_under_load', true). With
the same load generator running, pg_start_backup returned almost
instantaneously compared to how long it took previously.

Most importantly, after doing this change, I cannot reproduce the pg_clog
error message anymore. In other words, with immediate checkpoint hot backup
succeeds under this load!

>>> 2.3. rsync data/global/pg_control to the standby;
>>
>> Why are you doing this? If ...
>>
>>> 2.4. rsync all other data/ (without pg_xlog) to the standby;
>>
>> you will copy it again or no? Don't understand your point.
>
> His point is that exercising the bug depends on doing the copying in a
> certain order.  Any order of copying the data theoretically ought to
> be OK, as long as it's all between starting the backup and stopping
> the backup, but apparently it isn't.

Please note that in the past I was able to reproduce the same pg_clog error
even with taking a singular rsync of the whole data/ folder (i.e. without
splitting it into two steps).

>> The problem could be that the minimum recovery point (step 2.3) is different
>> from the end of rsync if you are under load.

Do you have ideas why does the Hot Backup operation with
pg_start_backup('backup_under_load', true) succeed while
pg_start_backup('backup_under_load') fails under the same load?

Originally, I was using pg_start_backup('backup_under_load') in order not to
clog the master server during the I/O required for the checkpoint. Of
course, now, it seems, this should be sacrificed for the sake of a
successful backup under load.

> It seems pretty clear that some relevant chunk of WAL isn't getting
> replayed, but it's not at all clear to me why not.  It seems like it
> would be useful to compare the LSN returned by pg_start_backup() with

If needed, I could do that, if I had the exact procedure... Currently,
during the start of the backup I take the following information:

pg_xlogfile_name(pg_start_backup(...))

> the location at which replay begins when you fire up the clone.

As you have seen in my original message, in the pg_log I get only the
restored WAL file names after starting up the standby. Can I tune the
postgresql.conf to include the location at which replay begins in the log?

> Could you provide us with the exact rsync version and parameters you use?

rsync -azv
version 2.6.8  protocol version 29

--
Sincerely,
Linas Virbalas
http://flyingclusters.blogspot.com/



Re: Hot Backup with rsync fails at pg_clog if under load

От
Euler Taveira de Oliveira
Дата:
On 22-09-2011 11:24, Linas Virbalas wrote:
> In order to check more cases, I have changed the procedure to force an
> immediate checkpoint, i.e. pg_start_backup('backup_under_load', true). With
> the same load generator running, pg_start_backup returned almost
> instantaneously compared to how long it took previously.
>
> Most importantly, after doing this change, I cannot reproduce the pg_clog
> error message anymore. In other words, with immediate checkpoint hot backup
> succeeds under this load!
>
Interesting. I remembered someone reporting this same problem but it was not 
reproducible by some of us.

> Do you have ideas why does the Hot Backup operation with
> pg_start_backup('backup_under_load', true) succeed while
> pg_start_backup('backup_under_load') fails under the same load?
>
I don't but if you show us the output of the steps above...

> If needed, I could do that, if I had the exact procedure... Currently,
> during the start of the backup I take the following information:
>
Just show us the output of pg_start_backup and part of the standby log with 
the following message 'redo starts at' and the subsequent messages up to the 
failure.


--    Euler Taveira de Oliveira - Timbira       http://www.timbira.com.br/   PostgreSQL: Consultoria, Desenvolvimento,
Suporte24x7 e Treinamento
 


Re: Hot Backup with rsync fails at pg_clog if under load

От
Robert Haas
Дата:
2011/9/22 Euler Taveira de Oliveira <euler@timbira.com>:
> On 22-09-2011 11:24, Linas Virbalas wrote:
>>
>> In order to check more cases, I have changed the procedure to force an
>> immediate checkpoint, i.e. pg_start_backup('backup_under_load', true).
>> With
>> the same load generator running, pg_start_backup returned almost
>> instantaneously compared to how long it took previously.
>>
>> Most importantly, after doing this change, I cannot reproduce the pg_clog
>> error message anymore. In other words, with immediate checkpoint hot
>> backup
>> succeeds under this load!
>>
> Interesting. I remembered someone reporting this same problem but it was not
> reproducible by some of us.

So maybe there's some action that has to happen between the time the
redo pointer is set and the time the checkpoint is WAL-logged to
tickle the bug.  Like... CLOG extension, maybe?

*grep grep grep*

OK, so ExtendCLOG() just zeroes the page in memory, writes the WAL
record, and calls it good.  All the interesting stuff is done while
holding CLogControlLock.  So, at checkpoint time, we'd better make
sure to flush those pages out to disk before writing the checkpoint
record.  Otherwise, the redo pointer might advance past the
CLOG-extension record before the corresponding page hits the disk.
That's the job of CheckPointCLOG(), which is called from
CheckPointGuts(), which is called just from CreateCheckPoint() just
after setting the redo pointer.  Now, there is some funny business
with the locking here as we're writing the dirty pages
(CheckPointCLOG() calls SimpleLruFlush()).  We release and reacquire
the control lock many times.  But I don't see how that can cause a
problem, because it's all being done after the redo pointer has
already been said.  We could end up having buffers get dirtied again
after they are flushed, but that shouldn't matter either as long as
each buffer is written out at least once.  And if the write fails we
throw an error.  So I don't see any holes there.

Anybody else have an idea?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Hot Backup with rsync fails at pg_clog if under load

От
Linas Virbalas
Дата:
On 9/22/11 6:59 PM, "Euler Taveira de Oliveira" <euler@timbira.com> wrote:

>> If needed, I could do that, if I had the exact procedure... Currently,
>> during the start of the backup I take the following information:
>>
> Just show us the output of pg_start_backup and part of the standby log with
> the following message 'redo starts at' and the subsequent messages up to the
> failure.

Unfortunately, it's impossible, because the error message "Could not read
from file "pg_clog/0001" at offset 32768: Success" is shown (and startup
aborted) before the turn for "redo starts at" message arrives.

For comparison purposes, here is the log from the original message:

2011-09-21 13:41:05 CEST LOG:  database system was interrupted; last known
up at 2011-09-21 13:40:50 CEST
Restoring 00000014.history
mv: cannot stat `/opt/PostgreSQL/9.1/archive/00000014.history': No such file
or directory
Restoring 00000013.history
2011-09-21 13:41:05 CEST LOG:  restored log file "00000013.history" from
archive
! 2011-09-21 13:41:05 CEST LOG:  entering standby mode
Restoring 0000001300000006000000DC
2011-09-21 13:41:05 CEST LOG:  restored log file "0000001300000006000000DC"
from archive
Restoring 0000001300000006000000DB
2011-09-21 13:41:05 CEST LOG:  restored log file "0000001300000006000000DB"
from archive
2011-09-21 13:41:05 CEST FATAL:  could not access status of transaction
1188673
2011-09-21 13:41:05 CEST DETAIL:  Could not read from file "pg_clog/0001" at
offset 32768: Success.
2011-09-21 13:41:05 CEST LOG:  startup process (PID 13819) exited with exit
code 1
2011-09-21 13:41:05 CEST LOG:  aborting startup due to startup process
Failure

As you can see, there is no "redo starts at" message. If we compare this to
a healthy standby startup, we can see that the "pg_clog" error appears after
the "entering standby mode", but before the "redo starts at", hence the
latter is not reached. Healthy log example:

2011-09-23 09:52:31 CEST LOG:  database system was interrupted; last known
up at 2011-09-23 09:52:15 CEST
Restoring 00000007.history
mv: cannot stat `/opt/PostgreSQL/9.1/archive/00000007.history': No such file
or directory
Restoring 00000006.history
2011-09-23 09:52:31 CEST LOG:  restored log file "00000006.history" from
archive
2011-09-23 09:52:31 CEST LOG:  entering standby mode
Restoring 000000060000000000000065
2011-09-23 09:52:31 CEST LOG:  restored log file "000000060000000000000065"
from archive
2011-09-23 09:52:31 CEST LOG:  redo starts at 0/65000020
2011-09-23 09:52:31 CEST LOG:  consistent recovery state reached at
0/66000000
2011-09-23 09:52:31 CEST LOG:  database system is ready to accept read only
connections
Restoring 000000060000000000000066
mv: cannot stat `/opt/PostgreSQL/9.1/archive/000000060000000000000066': No
such file or directory
Restoring 00000007.history
mv: cannot stat `/opt/PostgreSQL/9.1/archive/00000007.history': No such file
or directory
2011-09-23 09:52:31 CEST LOG:  streaming replication successfully connected
to primary

--
Best regards,
Linas Virbalas
http://flyingclusters.blogspot.com/



Re: Hot Backup with rsync fails at pg_clog if under load

От
Heikki Linnakangas
Дата:
On 23.09.2011 11:02, Linas Virbalas wrote:
> On 9/22/11 6:59 PM, "Euler Taveira de Oliveira"<euler@timbira.com>  wrote:
>
>>> If needed, I could do that, if I had the exact procedure... Currently,
>>> during the start of the backup I take the following information:
>>>
>> Just show us the output of pg_start_backup and part of the standby log with
>> the following message 'redo starts at' and the subsequent messages up to the
>> failure.
>
> Unfortunately, it's impossible, because the error message "Could not read
> from file "pg_clog/0001" at offset 32768: Success" is shown (and startup
> aborted) before the turn for "redo starts at" message arrives.

It looks to me that pg_clog/0001 exists, but it shorter than recovery 
expects. Which shouldn't happen, of course, because the start-backup 
checkpoint should flush all the clog that's needed by recovery to disk 
before the backup procedure begins to them.

Can you do "ls -l pg_clog" in the master and the backup, and 
pg_controldata on the backup dir, and post the results, please?

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: Hot Backup with rsync fails at pg_clog if under load

От
Florian Pflug
Дата:
On Sep23, 2011, at 10:41 , Heikki Linnakangas wrote:
> On 23.09.2011 11:02, Linas Virbalas wrote:
>> On 9/22/11 6:59 PM, "Euler Taveira de Oliveira"<euler@timbira.com>  wrote:
>>
>>>> If needed, I could do that, if I had the exact procedure... Currently,
>>>> during the start of the backup I take the following information:
>>>>
>>> Just show us the output of pg_start_backup and part of the standby log with
>>> the following message 'redo starts at' and the subsequent messages up to the
>>> failure.
>>
>> Unfortunately, it's impossible, because the error message "Could not read
>> from file "pg_clog/0001" at offset 32768: Success" is shown (and startup
>> aborted) before the turn for "redo starts at" message arrives.
>
> It looks to me that pg_clog/0001 exists, but it shorter than recovery expects. Which shouldn't happen, of course,
becausethe start-backup checkpoint should flush all the clog that's needed by recovery to disk before the backup
procedurebegins to them. 

Yeah. What confuses me though is that we fail while trying to read from the clog. When do we do that during normal
(non-standby)recovery? 

One other possibility would be that the problem is somehow triggered by vacuum running while the start-backup
checkpointis commencing. That'd explain why the problem goes away with immediate checkpoints, because those make the
windowsmach smaller. But I haven't got a concrete theory of whats happening.. 

best regards,
Florian Pflug



Re: Hot Backup with rsync fails at pg_clog if under load

От
Heikki Linnakangas
Дата:
On 23.09.2011 11:48, Florian Pflug wrote:
> On Sep23, 2011, at 10:41 , Heikki Linnakangas wrote:
>> On 23.09.2011 11:02, Linas Virbalas wrote:
>>> On 9/22/11 6:59 PM, "Euler Taveira de Oliveira"<euler@timbira.com>   wrote:
>>>
>>>>> If needed, I could do that, if I had the exact procedure... Currently,
>>>>> during the start of the backup I take the following information:
>>>>>
>>>> Just show us the output of pg_start_backup and part of the standby log with
>>>> the following message 'redo starts at' and the subsequent messages up to the
>>>> failure.
>>>
>>> Unfortunately, it's impossible, because the error message "Could not read
>>> from file "pg_clog/0001" at offset 32768: Success" is shown (and startup
>>> aborted) before the turn for "redo starts at" message arrives.
>>
>> It looks to me that pg_clog/0001 exists, but it shorter than recovery expects. Which shouldn't happen, of course,
becausethe start-backup checkpoint should flush all the clog that's needed by recovery to disk before the backup
procedurebegins to them.
 
>
> Yeah. What confuses me though is that we fail while trying to read from the clog. When do we do that during normal
(non-standby)recovery?
 

I believe the OP is setting up a standby. He didn't say if it's a hot 
standby, though. That changes the startup sequence a bit.

At the end of recovery, we read the "last" clog page, containing the 
next XID that will be assigned to a transaction, and zero out the future 
part of that clog file (StartupCLOG). In hot standby, we do that 
earlier, after reading the checkpoint record but before we start 
replaying records.

> One other possibility would be that the problem is somehow triggered by vacuum running while the start-backup
checkpointis commencing. That'd explain why the problem goes away with immediate checkpoints, because those make the
windowsmach smaller. But I haven't got a concrete theory of whats happening..
 

Hmm, that's a good theory. The logic in SimpleLruPhysicakReadPage() 
tolerates non-existent files, because the old clog files might've been 
truncated away by a vacuum after the backup started. That's OK, because 
they will be recreated, and later deleted again, during the WAL replay. 
But what if something like this happens:

0. pg_start_backup().
1. rsync starts. It gets a list of all files. It notes that pg_clog/0001 
exists.
2. vacuum runs, and deletes pg_clog/0001
3. rsync starts to copy pg_clog/0001. Oops, it's gone. Instead of 
leaving it out altogether, it includes it as an empty file.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: Hot Backup with rsync fails at pg_clog if under load

От
Linas Virbalas
Дата:
On 9/23/11 12:05 PM, "Heikki Linnakangas"
<heikki.linnakangas@enterprisedb.com> wrote:

>>> It looks to me that pg_clog/0001 exists, but it shorter than recovery
>>> expects. Which shouldn't happen, of course, because the start-backup
>>> checkpoint should flush all the clog that's needed by recovery to disk
>>> before the backup procedure begins to them.
>>
>> Yeah. What confuses me though is that we fail while trying to read from the
>> clog. When do we do that during normal (non-standby) recovery?
>
> I believe the OP is setting up a standby. He didn't say if it's a hot
> standby, though. That changes the startup sequence a bit.

I confirm that it is a Hot Standby with Streaming Replication.

>> One other possibility would be that the problem is somehow triggered by
>> vacuum running while the start-backup checkpoint is commencing. That'd
>> explain why the problem goes away with immediate checkpoints, because those
>> make the windows mach smaller. But I haven't got a concrete theory of whats
>> happening..
>
> Hmm, that's a good theory. The logic in SimpleLruPhysicakReadPage()
> tolerates non-existent files, because the old clog files might've been
> truncated away by a vacuum after the backup started. That's OK, because
> they will be recreated, and later deleted again, during the WAL replay.
> But what if something like this happens:
>
> 0. pg_start_backup().
> 1. rsync starts. It gets a list of all files. It notes that pg_clog/0001
> exists.
> 2. vacuum runs, and deletes pg_clog/0001
> 3. rsync starts to copy pg_clog/0001. Oops, it's gone. Instead of
> leaving it out altogether, it includes it as an empty file.

I cannot confirm that. I have tried this scenario one more time and was
observing how pg_clog/ looked on the master and standby. Here's an overview:

1. pg_start_backup('base_backup') - waiting...

2. Checking the size of pg_clog periodically on the master:

-bash-3.2$ ls -l pg_clog
total 8
-rw------- 1 postgres postgres 8192 Sep 23 14:31 0000
-bash-3.2$ ls -l pg_clog
total 8
-rw------- 1 postgres postgres 8192 Sep 23 14:31 0000
-bash-3.2$ ls -l pg_clog
total 16

3. Somewhere just before (1) step releases and we get into rsync phase, the
pg_clog size changes:

-rw------- 1 postgres postgres 16384 Sep 23 14:33 0000
-bash-3.2$ ls -l pg_clog
total 16
-rw------- 1 postgres postgres 16384 Sep 23 14:33 0000
-bash-3.2$ ls -l pg_clog
total 16
-rw------- 1 postgres postgres 16384 Sep 23 14:33 0000

4. The rsync does transfer the file:

...
INFO   | jvm 1    | 2011/09/23 14:33:43 | 2011-09-23 14:33:43,881 INFO
management.script.ScriptPlugin base/16394/16405_fsm
INFO   | jvm 1    | 2011/09/23 14:33:43 | 2011-09-23 14:33:43,881 INFO
management.script.ScriptPlugin base/16394/pg_internal.init
INFO   | jvm 1    | 2011/09/23 14:33:43 | 2011-09-23 14:33:43,886 INFO
management.script.ScriptPlugin global/
INFO   | jvm 1    | 2011/09/23 14:33:43 | 2011-09-23 14:33:43,886 INFO
management.script.ScriptPlugin global/12691
INFO   | jvm 1    | 2011/09/23 14:33:43 | 2011-09-23 14:33:43,886 INFO
management.script.ScriptPlugin global/12696
INFO   | jvm 1    | 2011/09/23 14:33:43 | 2011-09-23 14:33:43,886 INFO
management.script.ScriptPlugin global/12697
INFO   | jvm 1    | 2011/09/23 14:33:43 | 2011-09-23 14:33:43,886 INFO
management.script.ScriptPlugin global/pg_internal.init
INFO   | jvm 1    | 2011/09/23 14:33:43 | 2011-09-23 14:33:43,887 INFO
management.script.ScriptPlugin pg_clog/0000
INFO   | jvm 1    | 2011/09/23 14:33:43 | 2011-09-23 14:33:43,887 INFO
management.script.ScriptPlugin pg_multixact/offsets/0000
INFO   | jvm 1    | 2011/09/23 14:33:43 | 2011-09-23 14:33:43,887 INFO
management.script.ScriptPlugin pg_notify/
...

But on the standby its size is the old one (thus, it seems, that the size
changed after the rsync transfer and before the pg_stop_backup() was
called):

ls -l pg_clog/
total 8
-rw------- 1 postgres postgres 8192 Sep 23 14:31 0000

5. Hence, the server doesn't start and pg_log complains:

2011-09-23 14:33:45 CEST LOG:  streaming replication successfully connected
to primary
2011-09-23 14:33:46 CEST FATAL:  the database system is starting up
2011-09-23 14:33:46 CEST FATAL:  the database system is starting up
2011-09-23 14:33:46 CEST FATAL:  the database system is starting up
2011-09-23 14:33:46 CEST FATAL:  the database system is starting up
2011-09-23 14:33:46 CEST FATAL:  the database system is starting up
2011-09-23 14:33:46 CEST FATAL:  the database system is starting up
2011-09-23 14:33:46 CEST FATAL:  the database system is starting up
2011-09-23 14:33:46 CEST FATAL:  could not access status of transaction
37206
2011-09-23 14:33:46 CEST DETAIL:  Could not read from file "pg_clog/0000" at
offset 8192: Success.

6. Now, if I do something that, of course, should never be done, and copy
this 0000 file from master to the standby soon enough (without even
starting/stopping backup), the standby starts up successfully.

--
Hope this helps,
Linas Virbalas
http://flyingclusters.blogspot.com/



Re: Hot Backup with rsync fails at pg_clog if under load

От
Robert Haas
Дата:
On Fri, Sep 23, 2011 at 8:47 AM, Linas Virbalas
<linas.virbalas@continuent.com> wrote:
> But on the standby its size is the old one (thus, it seems, that the size
> changed after the rsync transfer and before the pg_stop_backup() was
> called):

Now that seems pretty weird - I don't think that file should ever shrink.

Are you sure that the rsync isn't starting until after
pg_start_backup() completes?  Because if you were starting it just a
tiny bit early, that would explain the observed symptoms perfectly, I
think.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Hot Backup with rsync fails at pg_clog if under load

От
Linas Virbalas
Дата:
>> But on the standby its size is the old one (thus, it seems, that the size
>> changed after the rsync transfer and before the pg_stop_backup() was
>> called):

> Now that seems pretty weird - I don't think that file should ever shrink.

It seems, I was not clear in my last example. The pg_clog file doesn't shrink. On the contrary, when rsync kicks in it
isstill small, but after it completes and somewhere around the pg_stop_backup() call, the pg_clog file grows on the
master.Hence, it is left smaller on the standby. 

> Are you sure that the rsync isn't starting until after
> pg_start_backup() completes?

100% - the procedure is scripted in a single threaded application, so rsync is started only after pg_start_backup(...)
returns.

> Because if you were starting it just a
> tiny bit early, that would explain the observed symptoms perfectly, I
> think.

I agree, but that is not the case.

Re: Hot Backup with rsync fails at pg_clog if under load

От
Aidan Van Dyk
Дата:
On Fri, Sep 23, 2011 at 4:41 AM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:

>> Unfortunately, it's impossible, because the error message "Could not read
>> from file "pg_clog/0001" at offset 32768: Success" is shown (and startup
>> aborted) before the turn for "redo starts at" message arrives.
>
> It looks to me that pg_clog/0001 exists, but it shorter than recovery
> expects. Which shouldn't happen, of course, because the start-backup
> checkpoint should flush all the clog that's needed by recovery to disk
> before the backup procedure begins to them.

I think the point here is that recover *never starts*.  Something in
the standby startup is looking for a value in a clog block that
recovery hadn't had a chance to replay (produce) yet.

So the standby is looking into the data directory *before* recovery
has had a chance to run, and based on that,goes to look for something
in clog page that wasn't guarenteed to exists at the start of the
backup period, and bombing out before recovery has a chance to start
replaying WAL and write the new clog page.


a.

--
Aidan Van Dyk                                             Create like a god,
aidan@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.


Re: Hot Backup with rsync fails at pg_clog if under load

От
Alvaro Herrera
Дата:
Excerpts from Linas Virbalas's message of vie sep 23 09:47:20 -0300 2011:
> On 9/23/11 12:05 PM, "Heikki Linnakangas"
> <heikki.linnakangas@enterprisedb.com> wrote:

> But on the standby its size is the old one (thus, it seems, that the size
> changed after the rsync transfer and before the pg_stop_backup() was
> called):
> 
> ls -l pg_clog/
> total 8
> -rw------- 1 postgres postgres 8192 Sep 23 14:31 0000

Sounds like rsync is caching the file size at the start of the run, and
then copying that many bytes, ignoring the growth that occurred after it
started.

-- 
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: Hot Backup with rsync fails at pg_clog if under load

От
Magnus Hagander
Дата:
<p><br /> On Sep 23, 2011 5:59 PM, "Alvaro Herrera" <<a
href="mailto:alvherre@commandprompt.com">alvherre@commandprompt.com</a>>wrote:<br /> ><br /> ><br /> >
Excerptsfrom Linas Virbalas's message of vie sep 23 09:47:20 -0300 2011:<br /> > > On 9/23/11 12:05 PM, "Heikki
Linnakangas"<br/> > > <<a
href="mailto:heikki.linnakangas@enterprisedb.com">heikki.linnakangas@enterprisedb.com</a>>wrote:<br /> ><br />
>> But on the standby its size is the old one (thus, it seems, that the size<br /> > > changed after the
rsynctransfer and before the pg_stop_backup() was<br /> > > called):<br /> > ><br /> > > ls -l
pg_clog/<br/> > > total 8<br /> > > -rw------- 1 postgres postgres 8192 Sep 23 14:31 0000<br /> ><br />
>Sounds like rsync is caching the file size at the start of the run, and<br /> > then copying that many bytes,
ignoringthe growth that occurred after it<br /> > started.<br /> ><p>That pretty much matches what Daniel does
whenhe got the same failure case. <p>Is this not allowed? Shouldn't wal replay fix this? <p>/Magnus  

Re: Hot Backup with rsync fails at pg_clog if under load

От
Andres Freund
Дата:
Hi,

On Wednesday 21 Sep 2011 16:44:30 Linas Virbalas wrote:
> 2011-09-21 13:41:05 CEST DETAIL:  Could not read from file "pg_clog/0001"
> at offset 32768: Success.
Any chance you can attach gdb to the startup process and provide a backtrace 
from the place where this message is printed?

Greetings,

Andres


Re: Hot Backup with rsync fails at pg_clog if under load

От
Heikki Linnakangas
Дата:
On 23.09.2011 19:03, Magnus Hagander wrote:
> On Sep 23, 2011 5:59 PM, "Alvaro Herrera"<alvherre@commandprompt.com>
> wrote:
>>
>>
>> Excerpts from Linas Virbalas's message of vie sep 23 09:47:20 -0300 2011:
>>> On 9/23/11 12:05 PM, "Heikki Linnakangas"
>>> <heikki.linnakangas@enterprisedb.com>  wrote:
>>
>>> But on the standby its size is the old one (thus, it seems, that the
> size
>>> changed after the rsync transfer and before the pg_stop_backup() was
>>> called):
>>>
>>> ls -l pg_clog/
>>> total 8
>>> -rw------- 1 postgres postgres 8192 Sep 23 14:31 0000
>>
>> Sounds like rsync is caching the file size at the start of the run, and
>> then copying that many bytes, ignoring the growth that occurred after it
>> started.
>
> That pretty much matches what Daniel does when he got the same failure case.
>
> Is this not allowed? Shouldn't wal replay fix this?

That's OK. The effect is the same as if rsync had copied the file at the 
start.

What's not OK is to store the file as empty or truncated file in the 
backup, when the file is deleted from pg_clog while rsync is running. 
The file must have length >= the length the file had when backup 
started, or the file must be omitted from the backup altogether.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: Hot Backup with rsync fails at pg_clog if under load

От
Robert Haas
Дата:
On Fri, Sep 23, 2011 at 11:43 AM, Aidan Van Dyk <aidan@highrise.ca> wrote:
> On Fri, Sep 23, 2011 at 4:41 AM, Heikki Linnakangas
> <heikki.linnakangas@enterprisedb.com> wrote:
>
>>> Unfortunately, it's impossible, because the error message "Could not read
>>> from file "pg_clog/0001" at offset 32768: Success" is shown (and startup
>>> aborted) before the turn for "redo starts at" message arrives.
>>
>> It looks to me that pg_clog/0001 exists, but it shorter than recovery
>> expects. Which shouldn't happen, of course, because the start-backup
>> checkpoint should flush all the clog that's needed by recovery to disk
>> before the backup procedure begins to them.
>
> I think the point here is that recover *never starts*.  Something in
> the standby startup is looking for a value in a clog block that
> recovery hadn't had a chance to replay (produce) yet.

Ah.  I think you are right - Heikki made the same point.  Maybe some
of the stuff that happens just after this comment:
       /*        * Initialize for Hot Standby, if enabled. We won't let backends in        * yet, not until we've
reachedthe min recovery point specified in        * control file and we've established a recovery snapshot from a
* running-xacts WAL record.        */ 


...actually needs to be postponed until after we've reached consistency?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Hot Backup with rsync fails at pg_clog if under load

От
Florian Pflug
Дата:
On Sep23, 2011, at 18:03 , Magnus Hagander wrote:
> On Sep 23, 2011 5:59 PM, "Alvaro Herrera" <alvherre@commandprompt.com> wrote:
> > Sounds like rsync is caching the file size at the start of the run, and
> > then copying that many bytes, ignoring the growth that occurred after it
> > started.
>
> That pretty much matches what Daniel does when he got the same failure case.
>
> Is this not allowed? Shouldn't wal replay fix this?

I don't see how it could be forbidden. ISTM that we absolutely *have* to be
able to deal with each byte of a file's date, including it's meta-data, being in
any state it was between the time pg_start_backup() returned and pg_stop_backup()
was issued. With the individual states being in no way synchronized.
(OK, in reality restricting this to individual 512-byte blocks is probably enough,
but still...).

This, I think, is also the reasons we need to force full_page_writes to on
during a hot backup. If a page was modified at all between pg_start_backup() and
pg_stop_backup(), we essentially have to assume it's totally garbled.

best regards,
Florian Pflug



Re: Hot Backup with rsync fails at pg_clog if under load

От
Florian Pflug
Дата:
On Sep23, 2011, at 18:45 , Robert Haas wrote:
> Ah.  I think you are right - Heikki made the same point.  Maybe some
> of the stuff that happens just after this comment:
>
>        /*
>         * Initialize for Hot Standby, if enabled. We won't let backends in
>         * yet, not until we've reached the min recovery point specified in
>         * control file and we've established a recovery snapshot from a
>         * running-xacts WAL record.
>         */
>
>
> ...actually needs to be postponed until after we've reached consistency?

I came the the same conclusion. It seems the before we've reached consistency,
we shouldn't attempt to read anything because the data can be pretty arbitrarily
garbled.

best regards,
Florian Pflug



Re: Hot Backup with rsync fails at pg_clog if under load

От
Heikki Linnakangas
Дата:
On 23.09.2011 19:49, Florian Pflug wrote:
> On Sep23, 2011, at 18:45 , Robert Haas wrote:
>> Ah.  I think you are right - Heikki made the same point.  Maybe some
>> of the stuff that happens just after this comment:
>>
>>         /*
>>          * Initialize for Hot Standby, if enabled. We won't let backends in
>>          * yet, not until we've reached the min recovery point specified in
>>          * control file and we've established a recovery snapshot from a
>>          * running-xacts WAL record.
>>          */
>>
>>
>> ...actually needs to be postponed until after we've reached consistency?
>
> I came the the same conclusion. It seems the before we've reached consistency,
> we shouldn't attempt to read anything because the data can be pretty arbitrarily
> garbled.

There are pretty clear rules on what state clog can be in. When you 
launch postmaster in a standby:

* Any clog preceding the nextXid from the checkpoint record we start 
recovery from, must either be valid, or the clog file must be missing 
altogether (which can happen when it was vacuumed away while the backup 
in progress - if the clog is still needed at the end of backup it must 
not be missing, of course).
* Any clog following nextXid can be garbled or missing.

Recovery will overwrite any clog after nextXid from the WAL, but not the 
clog before it.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: Hot Backup with rsync fails at pg_clog if under load

От
Robert Haas
Дата:
On Fri, Sep 23, 2011 at 12:58 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
> There are pretty clear rules on what state clog can be in. When you launch
> postmaster in a standby:
>
> * Any clog preceding the nextXid from the checkpoint record we start
> recovery from, must either be valid, or the clog file must be missing
> altogether (which can happen when it was vacuumed away while the backup in
> progress - if the clog is still needed at the end of backup it must not be
> missing, of course).
> * Any clog following nextXid can be garbled or missing.
>
> Recovery will overwrite any clog after nextXid from the WAL, but not the
> clog before it.

So the actual error message in the last test was:

2011-09-21 13:41:05 CEST FATAL:  could not access status of transaction 1188673

...but we can't tell if that was before or after nextXid, which seems
like it would be useful to know.

If Linas can rerun his experiment, but also capture the output of
pg_controldata before firing up the standby for the first time, then
we'd able to see that information.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Hot Backup with rsync fails at pg_clog if under load

От
Daniel Farina
Дата:
On Fri, Sep 23, 2011 at 9:45 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Fri, Sep 23, 2011 at 11:43 AM, Aidan Van Dyk <aidan@highrise.ca> wrote:
>> On Fri, Sep 23, 2011 at 4:41 AM, Heikki Linnakangas
>> <heikki.linnakangas@enterprisedb.com> wrote:
>>
>>>> Unfortunately, it's impossible, because the error message "Could not read
>>>> from file "pg_clog/0001" at offset 32768: Success" is shown (and startup
>>>> aborted) before the turn for "redo starts at" message arrives.
>>>
>>> It looks to me that pg_clog/0001 exists, but it shorter than recovery
>>> expects. Which shouldn't happen, of course, because the start-backup
>>> checkpoint should flush all the clog that's needed by recovery to disk
>>> before the backup procedure begins to them.
>>
>> I think the point here is that recover *never starts*.  Something in
>> the standby startup is looking for a value in a clog block that
>> recovery hadn't had a chance to replay (produce) yet.
>
> Ah.  I think you are right - Heikki made the same point.  Maybe some
> of the stuff that happens just after this comment:
>
>        /*
>         * Initialize for Hot Standby, if enabled. We won't let backends in
>         * yet, not until we've reached the min recovery point specified in
>         * control file and we've established a recovery snapshot from a
>         * running-xacts WAL record.
>         */
>
>
> ...actually needs to be postponed until after we've reached consistency?

We have a number of backups that are like this, and the problem is
entirely reproducible for those.  We always get around it by disabling
hot standby for a while (until consistency is reached) I poked at
xlog.c a bit, and to me seems entirely likely that StartupCLOG is
being called early -- way too early, or at least parts of it.
Presumably(?) it is being called so early in the hot standby path so
that the status of transactions can be known for the purposes of
querying, but it's happening before consistency is reached, ergo not
many invariants (outside of checkpointed things like pg_controldata)
are likely to hold...such as clog being the right size to locate the
transaction status of a page.

Anyway, sorry for dropping the ball on pushing that one; we've been
using this workaround for a while after taking a look at the mechanism
and deciding it was probably not a problem (except for a sound night's
sleep).  We've now seen this dozens of times.

--
fdr


Re: Hot Backup with rsync fails at pg_clog if under load

От
Florian Pflug
Дата:
On Sep23, 2011, at 21:10 , Robert Haas wrote:
> So the actual error message in the last test was:
>
> 2011-09-21 13:41:05 CEST FATAL:  could not access status of transaction 1188673
>
> ...but we can't tell if that was before or after nextXid, which seems
> like it would be useful to know.
>
> If Linas can rerun his experiment, but also capture the output of
> pg_controldata before firing up the standby for the first time, then
> we'd able to see that information.

Hm, wouldn't pg_controldata quite certainly show a nextId beyond the clog
if copied after pg_clog/*?

Linas, could you capture the output of pg_controldata *and* increase the
log level to DEBUG1 on the standby? We should then see nextXid value of
the checkpoint the recovery is starting from.

FWIW, I've had a few more theories about what's going on, but none survived
after looking at the code. My first guess was that there maybe are circumstances
under which the nextId from the control file, instead of the one from the
pre-backup checkpoint, ends up becoming the standby's nextXid. But there doesn't
seem to be a way for that to happen.

My next theory was that something increments nextIdx before StartupCLOG().
The only possible candidate seems to be PrescanPreparedTransactions(), which
does increment nextXid if it's smaller than some sub-xid of a prepared xact.
But we only call that before StartupCLOG() if we're starting from a
shutdown checkpoint, which shouldn't be the case for the OP.

I also checked what rsync does when a file vanishes after rsync computed the
file list, but before it is sent. rsync 3.0.7 on OSX, at least, complains
loudly, and doesn't sync the file. It BTW also exits non-zero, with a special
exit code for precisely that failure case.

best regards,
Florian Pflug



Re: Hot Backup with rsync fails at pg_clog if under load

От
Linas Virbalas
Дата:
> Linas, could you capture the output of pg_controldata *and* increase the
> log level to DEBUG1 on the standby? We should then see nextXid value of
> the checkpoint the recovery is starting from.

I'll try to do that whenever I'm in that territory again... Incidentally,
recently there was a lot of unrelated-to-this-post work to polish things up
for a talk being given at PGWest 2011 Today :)

> I also checked what rsync does when a file vanishes after rsync computed the
> file list, but before it is sent. rsync 3.0.7 on OSX, at least, complains
> loudly, and doesn't sync the file. It BTW also exits non-zero, with a special
> exit code for precisely that failure case.

To be precise, my script has logic to accept the exit code 24, just as
stated in PG manual:

Docs> For example, some versions of rsync return a separate exit code for
Docs> "vanished source files", and you can write a driver script to accept
Docs> this exit code as a non-error case.

--
Sincerely,
Linas Virbalas
http://flyingclusters.blogspot.com/



Re: Hot Backup with rsync fails at pg_clog if under load

От
Florian Pflug
Дата:
On Sep29, 2011, at 17:44 , Linas Virbalas wrote:
>> I also checked what rsync does when a file vanishes after rsync computed the
>> file list, but before it is sent. rsync 3.0.7 on OSX, at least, complains
>> loudly, and doesn't sync the file. It BTW also exits non-zero, with a special
>> exit code for precisely that failure case.
>
> To be precise, my script has logic to accept the exit code 24, just as
> stated in PG manual:
>
> Docs> For example, some versions of rsync return a separate exit code for
> Docs> "vanished source files", and you can write a driver script to accept
> Docs> this exit code as a non-error case.

Oh, cool. I was about to suggest that we add something along these lines
to the docs - guess I should RTFM from time to time ;-)

best regards,
Florian Pflug


>
> --
> Sincerely,
> Linas Virbalas
> http://flyingclusters.blogspot.com/
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers



Re: Hot Backup with rsync fails at pg_clog if under load

От
Chris Redekop
Дата:
> > Linas, could you capture the output of pg_controldata *and* increase the
> > log level to DEBUG1 on the standby? We should then see nextXid value of
> > the checkpoint the recovery is starting from.
> I'll try to do that whenever I'm in that territory again... Incidentally,
> recently there was a lot of unrelated-to-this-post work to polish things up
> for a talk being given at PGWest 2011 Today :)
> > I also checked what rsync does when a file vanishes after rsync computed the
> > file list, but before it is sent. rsync 3.0.7 on OSX, at least, complains
> > loudly, and doesn't sync the file. It BTW also exits non-zero, with a special
> > exit code for precisely that failure case.
> To be precise, my script has logic to accept the exit code 24, just as
> stated in PG manual:
> Docs> For example, some versions of rsync return a separate exit code for
> Docs> "vanished source files", and you can write a driver script to accept
> Docs> this exit code as a non-error case.

I also am running into this issue and can reproduce it very reliably.  For me, however, it happens even when doing the "fast backup" like so: pg_start_backup('whatever', true)...my traffic is more write-heavy than linas's tho, so that might have something to do with it.  Yesterday it reliably errored out on pg_clog every time, but today it is failing sporadically on pg_subtrans (which seems to be past where the pg_clog error was)....the only thing that has changed is that I've changed the log level to debug1....I wouldn't think that could be related though.  I've linked the requested pg_controldata and debug1 logs for both errors.  Both links contain the output from pg_start_backup, rsync, pg_stop_backup, pg_controldata, and then the postgres debug1 log produced from a subsequent startup attempt.


Any workarounds would be very appreciated.....would copying clog+subtrans before or after the rest of the data directory (or something like that) make any difference?

Thanks!

Re: Hot Backup with rsync fails at pg_clog if under load

От
Chris Redekop
Дата:
I can confirm that both the pg_clog and pg_subtrans errors do occur when using pg_basebackup instead of rsync.  The data itself seems to be fine because using the exact same data I can start up a warm standby no problem, it is just the hot standby that will not start up.


On Sat, Oct 15, 2011 at 7:33 PM, Chris Redekop <chris@replicon.com> wrote:
> > Linas, could you capture the output of pg_controldata *and* increase the
> > log level to DEBUG1 on the standby? We should then see nextXid value of
> > the checkpoint the recovery is starting from.
> I'll try to do that whenever I'm in that territory again... Incidentally,
> recently there was a lot of unrelated-to-this-post work to polish things up
> for a talk being given at PGWest 2011 Today :)
> > I also checked what rsync does when a file vanishes after rsync computed the
> > file list, but before it is sent. rsync 3.0.7 on OSX, at least, complains
> > loudly, and doesn't sync the file. It BTW also exits non-zero, with a special
> > exit code for precisely that failure case.
> To be precise, my script has logic to accept the exit code 24, just as
> stated in PG manual:
> Docs> For example, some versions of rsync return a separate exit code for
> Docs> "vanished source files", and you can write a driver script to accept
> Docs> this exit code as a non-error case.

I also am running into this issue and can reproduce it very reliably.  For me, however, it happens even when doing the "fast backup" like so: pg_start_backup('whatever', true)...my traffic is more write-heavy than linas's tho, so that might have something to do with it.  Yesterday it reliably errored out on pg_clog every time, but today it is failing sporadically on pg_subtrans (which seems to be past where the pg_clog error was)....the only thing that has changed is that I've changed the log level to debug1....I wouldn't think that could be related though.  I've linked the requested pg_controldata and debug1 logs for both errors.  Both links contain the output from pg_start_backup, rsync, pg_stop_backup, pg_controldata, and then the postgres debug1 log produced from a subsequent startup attempt.


Any workarounds would be very appreciated.....would copying clog+subtrans before or after the rest of the data directory (or something like that) make any difference?

Thanks!

Re: Hot Backup with rsync fails at pg_clog if under load

От
Chris Redekop
Дата:
Well, on the other hand maybe there is something wrong with the data.  Here's the test/steps I just did - 
1. I do the pg_basebackup when the master is under load, hot slave now will not start up but warm slave will.  
2. I start a warm slave and let it catch up to current
3. On the slave I change 'hot_standby=on' and do a 'service postgresql restart'
4. The postgres fails to restart with the same error.  
5. I turn hot_standby back off and postgres starts back up fine as a warm slave
6. I then turn off the load, the slave is all caught up, master and slave are both sitting idle
7. I, again, change 'hot_standby=on' and do a service restart
8. Again it fails, with the same error, even though there is no longer any load.
9. I repeat this warmstart/hotstart cycle a couple more times until to my surprise, instead of failing, it successfully starts up as a hot standby (this is after maybe 5 minutes or so of sitting idle)

So...given that it continued to fail even after the load had been turned of, that makes me believe that the data which was copied over was invalid in some way.  And when a checkpoint/logrotation/somethingelse occurred when not under load it cleared itself up....I'm shooting in the dark here

Anyone have any suggestions/ideas/things to try?


On Mon, Oct 17, 2011 at 2:13 PM, Chris Redekop <chris@replicon.com> wrote:
I can confirm that both the pg_clog and pg_subtrans errors do occur when using pg_basebackup instead of rsync.  The data itself seems to be fine because using the exact same data I can start up a warm standby no problem, it is just the hot standby that will not start up.


On Sat, Oct 15, 2011 at 7:33 PM, Chris Redekop <chris@replicon.com> wrote:
> > Linas, could you capture the output of pg_controldata *and* increase the
> > log level to DEBUG1 on the standby? We should then see nextXid value of
> > the checkpoint the recovery is starting from.
> I'll try to do that whenever I'm in that territory again... Incidentally,
> recently there was a lot of unrelated-to-this-post work to polish things up
> for a talk being given at PGWest 2011 Today :)
> > I also checked what rsync does when a file vanishes after rsync computed the
> > file list, but before it is sent. rsync 3.0.7 on OSX, at least, complains
> > loudly, and doesn't sync the file. It BTW also exits non-zero, with a special
> > exit code for precisely that failure case.
> To be precise, my script has logic to accept the exit code 24, just as
> stated in PG manual:
> Docs> For example, some versions of rsync return a separate exit code for
> Docs> "vanished source files", and you can write a driver script to accept
> Docs> this exit code as a non-error case.

I also am running into this issue and can reproduce it very reliably.  For me, however, it happens even when doing the "fast backup" like so: pg_start_backup('whatever', true)...my traffic is more write-heavy than linas's tho, so that might have something to do with it.  Yesterday it reliably errored out on pg_clog every time, but today it is failing sporadically on pg_subtrans (which seems to be past where the pg_clog error was)....the only thing that has changed is that I've changed the log level to debug1....I wouldn't think that could be related though.  I've linked the requested pg_controldata and debug1 logs for both errors.  Both links contain the output from pg_start_backup, rsync, pg_stop_backup, pg_controldata, and then the postgres debug1 log produced from a subsequent startup attempt.


Any workarounds would be very appreciated.....would copying clog+subtrans before or after the rest of the data directory (or something like that) make any difference?

Thanks!


Re: Hot Backup with rsync fails at pg_clog if under load

От
Daniel Farina
Дата:
On Mon, Oct 17, 2011 at 11:30 PM, Chris Redekop <chris@replicon.com> wrote:
> Well, on the other hand maybe there is something wrong with the data.
>  Here's the test/steps I just did -
> 1. I do the pg_basebackup when the master is under load, hot slave now will
> not start up but warm slave will.
> 2. I start a warm slave and let it catch up to current
> 3. On the slave I change 'hot_standby=on' and do a 'service postgresql
> restart'
> 4. The postgres fails to restart with the same error.
> 5. I turn hot_standby back off and postgres starts back up fine as a warm
> slave
> 6. I then turn off the load, the slave is all caught up, master and slave
> are both sitting idle
> 7. I, again, change 'hot_standby=on' and do a service restart
> 8. Again it fails, with the same error, even though there is no longer any
> load.
> 9. I repeat this warmstart/hotstart cycle a couple more times until to my
> surprise, instead of failing, it successfully starts up as a hot standby
> (this is after maybe 5 minutes or so of sitting idle)
> So...given that it continued to fail even after the load had been turned of,
> that makes me believe that the data which was copied over was invalid in
> some way.  And when a checkpoint/logrotation/somethingelse occurred when not
> under load it cleared itself up....I'm shooting in the dark here
> Anyone have any suggestions/ideas/things to try?

Having digged at this a little -- but not too much -- the problem
seems to be that postgres is reading the commit logs way, way too
early, that is to say, before it has played enough WAL to be
'consistent' (the WAL between pg_start and pg_stop backup).  I have
not been able to reproduce this problem (I think) after the message
from postgres suggesting it has reached a consistent state; at that
time I am able to go into hot-standby mode.

The message is like: "consistent recovery state reached at %X/%X".
(this is the errmsg)

It doesn't seem meaningful for StartupCLOG (or, indeed, any of the
hot-standby path functionality) to be called before that code is
executed, but it is anyway right now.  I'm not sure if this oversight
is simply an oversight, or indicative of a misplaced assumption
somewhere.  Basically, my thoughts for a fix are to suppress
hot_standby = on (in spirit) before the consistent recovery state is
reached.

--
fdr


Re: Hot Backup with rsync fails at pg_clog if under load

От
Simon Riggs
Дата:
On Sun, Oct 23, 2011 at 9:48 PM, Daniel Farina <daniel@heroku.com> wrote:
> On Mon, Oct 17, 2011 at 11:30 PM, Chris Redekop <chris@replicon.com> wrote:
>> Well, on the other hand maybe there is something wrong with the data.
>>  Here's the test/steps I just did -
>> 1. I do the pg_basebackup when the master is under load, hot slave now will
>> not start up but warm slave will.
>> 2. I start a warm slave and let it catch up to current
>> 3. On the slave I change 'hot_standby=on' and do a 'service postgresql
>> restart'
>> 4. The postgres fails to restart with the same error.
>> 5. I turn hot_standby back off and postgres starts back up fine as a warm
>> slave
>> 6. I then turn off the load, the slave is all caught up, master and slave
>> are both sitting idle
>> 7. I, again, change 'hot_standby=on' and do a service restart
>> 8. Again it fails, with the same error, even though there is no longer any
>> load.
>> 9. I repeat this warmstart/hotstart cycle a couple more times until to my
>> surprise, instead of failing, it successfully starts up as a hot standby
>> (this is after maybe 5 minutes or so of sitting idle)
>> So...given that it continued to fail even after the load had been turned of,
>> that makes me believe that the data which was copied over was invalid in
>> some way.  And when a checkpoint/logrotation/somethingelse occurred when not
>> under load it cleared itself up....I'm shooting in the dark here
>> Anyone have any suggestions/ideas/things to try?
>
> Having digged at this a little -- but not too much -- the problem
> seems to be that postgres is reading the commit logs way, way too
> early, that is to say, before it has played enough WAL to be
> 'consistent' (the WAL between pg_start and pg_stop backup).  I have
> not been able to reproduce this problem (I think) after the message
> from postgres suggesting it has reached a consistent state; at that
> time I am able to go into hot-standby mode.
>
> The message is like: "consistent recovery state reached at %X/%X".
> (this is the errmsg)
>
> It doesn't seem meaningful for StartupCLOG (or, indeed, any of the
> hot-standby path functionality) to be called before that code is
> executed, but it is anyway right now.  I'm not sure if this oversight
> is simply an oversight, or indicative of a misplaced assumption
> somewhere.  Basically, my thoughts for a fix are to suppress
> hot_standby = on (in spirit) before the consistent recovery state is
> reached.

Not sure about that, but I'll look at where this comes from.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Hot Backup with rsync fails at pg_clog if under load

От
Simon Riggs
Дата:
On Sun, Oct 16, 2011 at 2:33 AM, Chris Redekop <chris@replicon.com> wrote:

> pg_subtrans: http://pastebin.com/qAXEHAQt

I confirm this as a HS issue and will investigate from here.

FATAL:  could not access status of transaction 21110784
which, in pg_subtrans, is the first xid on a new subtrans page. So we
have missed zeroing a page.

pg_control shows ... Latest checkpoint's oldestActiveXID:  21116666
which shows quite clearly that the pg_control file is later than it should be.

Chris, can you rearrange the backup so you copy the pg_control file as
the first act after the pg_start_backup?

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Hot Backup with rsync fails at pg_clog if under load

От
Simon Riggs
Дата:
On Sun, Oct 23, 2011 at 9:48 PM, Daniel Farina <daniel@heroku.com> wrote:

> Having digged at this a little -- but not too much -- the problem
> seems to be that postgres is reading the commit logs way, way too
> early, that is to say, before it has played enough WAL to be
> 'consistent' (the WAL between pg_start and pg_stop backup).  I have
> not been able to reproduce this problem (I think) after the message
> from postgres suggesting it has reached a consistent state; at that
> time I am able to go into hot-standby mode.

The WAL appears too early because the other control info is later than
it should be.

So this is approx backwards and nothing related to consistent state,
but thanks for drawing my attention to this.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Hot Backup with rsync fails at pg_clog if under load

От
Florian Pflug
Дата:
On Oct24, 2011, at 01:27 , Simon Riggs wrote:
> FATAL:  could not access status of transaction 21110784
> which, in pg_subtrans, is the first xid on a new subtrans page. So we
> have missed zeroing a page.
>
> pg_control shows ... Latest checkpoint's oldestActiveXID:  21116666
> which shows quite clearly that the pg_control file is later than it should be.

But shouldn't pg_control be largely irrelevant in a hot backup scenario? Most
(all?) of the information contained therein should be overwritten with the
contents of the checkpoint referenced by the backup label, shouldn't it?

best regards,
Florian Pflug



Re: Hot Backup with rsync fails at pg_clog if under load

От
Florian Pflug
Дата:
On Oct23, 2011, at 22:48 , Daniel Farina wrote:
> It doesn't seem meaningful for StartupCLOG (or, indeed, any of the
> hot-standby path functionality) to be called before that code is
> executed, but it is anyway right now.

I think the idea is to check that the CLOG part which recovery *won't*
overwrite is consistent (or rather, given the simplicity of the check,
at least accessible)

Heikki said the following somewhere else in this thread when I suggested
something similar to your proposal:

>> There are pretty clear rules on what state clog can be in. When you launch postmaster in a standby:
>>
>> * Any clog preceding the nextXid from the checkpoint record we start recovery from, must either be valid, or the
clogfile must be missing altogether (which can happen when it was vacuumed away while the backup in progress - if the
clogis still needed at the end of backup it must not be missing, of course). 
>> * Any clog following nextXid can be garbled or missing.
>>
>> Recovery will overwrite any clog after nextXid from the WAL, but not the clog before it.

I think Simon's theory that we're starting recovery from the wrong place,
i.e. should start with an earlier WAL location, is probably correct. The
question is, why?

best regards,
Florian Pflug



Re: Hot Backup with rsync fails at pg_clog if under load

От
Simon Riggs
Дата:
On Mon, Oct 24, 2011 at 7:13 AM, Florian Pflug <fgp@phlo.org> wrote:

> I think Simon's theory that we're starting recovery from the wrong place,
> i.e. should start with an earlier WAL location, is probably correct. The
> question is, why?

Err, that's not what I said and I don't mean that. Having said that,
what I said about pg_control being invalid would imply that, so is
wrong also.

We are starting recovery at the right place but we are initialising
the clog and subtrans incorrectly. Precisely, the oldestActiveXid is
being derived later than it should be, which can cause problems if
this then means that whole pages are unitialised in subtrans. The bug
only shows up if you do enough transactions (2048 is always enough) to
move to the next subtrans page between the redo pointer and the
checkpoint record while at the same time we do not have a long running
transaction that spans those two points. That's just enough to happen
reasonably frequently on busy systems and yet just enough to have
slipped through testing.

We must either

1. During CreateCheckpoint() we should derive oldestActiveXid before
we derive the redo location

2. Change the way subtrans pages are initialized during recovery so we
don't rely on oldestActiveXid

I need to think some more before a decision on this in my own mind,
but I lean towards doing (1) as a longer term fix and doing (2) as a
short term fix for existing releases. I expect to have a fix later
today.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Hot Backup with rsync fails at pg_clog if under load

От
Simon Riggs
Дата:
On Tue, Oct 25, 2011 at 8:03 AM, Simon Riggs <simon@2ndquadrant.com> wrote:

> We are starting recovery at the right place but we are initialising
> the clog and subtrans incorrectly. Precisely, the oldestActiveXid is
> being derived later than it should be, which can cause problems if
> this then means that whole pages are unitialised in subtrans. The bug
> only shows up if you do enough transactions (2048 is always enough) to
> move to the next subtrans page between the redo pointer and the
> checkpoint record while at the same time we do not have a long running
> transaction that spans those two points. That's just enough to happen
> reasonably frequently on busy systems and yet just enough to have
> slipped through testing.
>
> We must either
>
> 1. During CreateCheckpoint() we should derive oldestActiveXid before
> we derive the redo location
>
> 2. Change the way subtrans pages are initialized during recovery so we
> don't rely on oldestActiveXid
>
> I need to think some more before a decision on this in my own mind,
> but I lean towards doing (1) as a longer term fix and doing (2) as a
> short term fix for existing releases. I expect to have a fix later
> today.

(1) looks the best way forwards in all cases.

Patch attached. Will be backpatched to 9.0

I think it is possible to avoid taking XidGenLock during
GetRunningTransactions() now, but I haven't included that change in
this patch.

Any other comments before commit?

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Вложения

Re: Hot Backup with rsync fails at pg_clog if under load

От
Florian Pflug
Дата:
On Oct25, 2011, at 11:13 , Simon Riggs wrote:
> On Tue, Oct 25, 2011 at 8:03 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> We are starting recovery at the right place but we are initialising
>> the clog and subtrans incorrectly. Precisely, the oldestActiveXid is
>> being derived later than it should be, which can cause problems if
>> this then means that whole pages are unitialised in subtrans. The bug
>> only shows up if you do enough transactions (2048 is always enough) to
>> move to the next subtrans page between the redo pointer and the
>> checkpoint record while at the same time we do not have a long running
>> transaction that spans those two points. That's just enough to happen
>> reasonably frequently on busy systems and yet just enough to have
>> slipped through testing.
>>
>> We must either
>>
>> 1. During CreateCheckpoint() we should derive oldestActiveXid before
>> we derive the redo location

> (1) looks the best way forwards in all cases.

Let me see if I understand this

The probem seems to be that we currently derive oldestActiveXid end the end of
the checkpoint, just before writing the checkpoint record. Since we use
oldestActiveXid to initialize SUBTRANS, this is wrong. Records written before
that checkpoint record (but after the REDO location, of course) may very well
contain XIDs earlier than that wrongly derived oldestActiveXID, and if attempt
to touch these XID's SUBTRANS state, we error out.

Your patch seems sensible, because the checkpoint "logically" occurs at the
REDO location not the checkpoint's location, so we ought to log an oldestActiveXID
corresponding to that location.

What I don't understand is how this affects the CLOG. How does oldestActiveXID
factor into CLOG initialization?

best regards,
Florian Pflug



Re: Hot Backup with rsync fails at pg_clog if under load

От
Simon Riggs
Дата:
On Tue, Oct 25, 2011 at 12:39 PM, Florian Pflug <fgp@phlo.org> wrote:

> What I don't understand is how this affects the CLOG. How does oldestActiveXID
> factor into CLOG initialization?

It is an entirely different error.

Chris' clog error was caused by a file read error. The file was
opened, we did a seek within the file and then the call to read()
failed to return a complete page from the file.

The xid shown is 22811359, which is the nextxid in the control file.

So StartupClog() must have failed trying to read the clog page from disk.

That isn't a Hot Standby problem, a recovery problem nor is it certain
its a PostgreSQL problem.

OTOH SlruPhysicalReadPage() does cope gracefully with missing clog
files during recovery, so maybe we can think of a way to make recovery
cope with a SLRU_READ_FAILED error gracefully also. Any ideas?

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Hot Backup with rsync fails at pg_clog if under load

От
Chris Redekop
Дата:
> Chris, can you rearrange the backup so you copy the pg_control file as
> the first act after the pg_start_backup?

I tried this and it doesn't seem to make any difference.  I also tried the patch and I can no longer reproduce the subtrans error, however instead it now it starts up, but never gets to the point where it'll accept connections.  It starts up but if I try to do anything I always get "FATAL:  the database system is starting up"...even if the load is removed from the primary, the standby still never finishes "starting up".  Attached below is a log of one of these startup attempts.  In my testing with the patch applied approx 3 in 10 attempts start up successfully, 7 in 10 attempts go into the "db is starting up" state....the pg_clog error is still there, but seems much harder to reproduce now....I've seen it only once since applying the patch (out of probably 50 or 60 under-load startup attempts).  It does seem to be "moody" like that tho....it will be very difficult to reproduce for a while, and then it will happen damn-near every time for a while...weirdness

On a bit of a side note, I've been thinking of changing my scripts so that they perform an initial rsync prior to doing the startbackup-rsync-stopbackup just so that the second rsync will be faster....so that the backup is in progress for a shorter period of time, as while it is running it will stop other standbys from starting up....this shouldn't cause any issues eh?


2011-10-25 13:43:24.035 MDT [15072]: [1-1] LOG:  database system was interrupted; last known up at 2011-10-25 13:43:11 MDT
2011-10-25 13:43:24.035 MDT [15072]: [2-1] LOG:  creating missing WAL directory "pg_xlog/archive_status"
2011-10-25 13:43:24.037 MDT [15072]: [3-1] LOG:  entering standby mode
DEBUG:  received replication command: IDENTIFY_SYSTEM
DEBUG:  received replication command: START_REPLICATION 2/CF000000
2011-10-25 13:43:24.041 MDT [15073]: [1-1] LOG:  streaming replication successfully connected to primary
2011-10-25 13:43:24.177 MDT [15092]: [1-1] FATAL:  the database system is starting up
2011-10-25 13:43:24.781 MDT [15072]: [4-1] DEBUG:  checkpoint record is at 2/CF81A478
2011-10-25 13:43:24.781 MDT [15072]: [5-1] DEBUG:  redo record is at 2/CF000020; shutdown FALSE
2011-10-25 13:43:24.781 MDT [15072]: [6-1] DEBUG:  next transaction ID: 0/4634700; next OID: 1188228
2011-10-25 13:43:24.781 MDT [15072]: [7-1] DEBUG:  next MultiXactId: 839; next MultiXactOffset: 1686
2011-10-25 13:43:24.781 MDT [15072]: [8-1] DEBUG:  oldest unfrozen transaction ID: 1669, in database 1
2011-10-25 13:43:24.781 MDT [15072]: [9-1] DEBUG:  transaction ID wrap limit is 2147485316, limited by database with OID 1
2011-10-25 13:43:24.783 MDT [15072]: [10-1] DEBUG:  resetting unlogged relations: cleanup 1 init 0
2011-10-25 13:43:24.791 MDT [15072]: [11-1] DEBUG:  initializing for hot standby
2011-10-25 13:43:24.791 MDT [15072]: [12-1] LOG:  consistent recovery state reached at 2/CF81A4D0
2011-10-25 13:43:24.791 MDT [15072]: [13-1] LOG:  redo starts at 2/CF000020
2011-10-25 13:43:25.019 MDT [15072]: [14-1] LOG:  consistent state delayed because recovery snapshot incomplete
2011-10-25 13:43:25.019 MDT [15072]: [15-1] CONTEXT:  xlog redo  running xacts:
nextXid 4634700 latestCompletedXid 4634698 oldestRunningXid 4634336; 130 xacts:
4634336 4634337 4634338 4634339 4634340 4634341 4634342 4634343 4634344 4634345
4634346 4634347 4634348 4634349 4634350 4634351 4634352 4634353 4634354 4634355
4634356 4634357 4634358 4634359 4634360 4634361 4634362 4634363 4634364 4634365
4634366 4634367 4634368 4634369 4634370 4634371 4634515 4634516 4634517 4634518
4634519 4634520 4634521 4634522 4634523 4634524 4634525 4634526 4634527 4634528
4634529 4634530 4634531 4634532 4634533 4634534 4634535 4634536 4634537 4634538
4634539 4634540 4634541 4634542 4634543 4634385 4634386 4634387 4634388 4634389
4634390 4634391 4634392 4634393 4634394 4634395 4634396 4634397 4634398 4634399
4634400 4634401 4634402 4634403 4634404 4634405 4634406 4634407 4634408 4634409
4634410 4634411 4634412 4634413 4634414 4634415 4634416 4634417 4634418 4634419
4634420 4634579 4634580 4634581 4634582 4634583 4634584 4634585 4634586 4634587
4634588 4634589 4634590 4634591 4634592 4634593 4634594 4634595 4634596 4634597
4634598 4634599 4634600 4634601 4634602 4634603 4634604 4634605 4634606 4634607;
 subxid ovf
2011-10-25 13:43:25.240 MDT [15130]: [1-1] FATAL:  the database system is starting up
DEBUG:  standby "sync_rep_test" has now caught up with primary
2011-10-25 13:43:26.304 MDT [15167]: [1-1] FATAL:  the database system is starting up
2011-10-25 13:43:27.366 MDT [15204]: [1-1] FATAL:  the database system is starting up
2011-10-25 13:43:28.426 MDT [15241]: [1-1] FATAL:  the database system is starting up
2011-10-25 13:43:29.461 MDT [15275]: [1-1] FATAL:  the database system is starting up
and so on...


On Tue, Oct 25, 2011 at 6:51 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
On Tue, Oct 25, 2011 at 12:39 PM, Florian Pflug <fgp@phlo.org> wrote:

> What I don't understand is how this affects the CLOG. How does oldestActiveXID
> factor into CLOG initialization?

It is an entirely different error.

Chris' clog error was caused by a file read error. The file was
opened, we did a seek within the file and then the call to read()
failed to return a complete page from the file.

The xid shown is 22811359, which is the nextxid in the control file.

So StartupClog() must have failed trying to read the clog page from disk.

That isn't a Hot Standby problem, a recovery problem nor is it certain
its a PostgreSQL problem.

OTOH SlruPhysicalReadPage() does cope gracefully with missing clog
files during recovery, so maybe we can think of a way to make recovery
cope with a SLRU_READ_FAILED error gracefully also. Any ideas?

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: Hot Backup with rsync fails at pg_clog if under load

От
Chris Redekop
Дата:
That isn't a Hot Standby problem, a recovery problem nor is it certain
its a PostgreSQL problem.
Do you have any theories on this that I could help investigate?  It happens even when using pg_basebackup and it persists until another sync is performed, so the files must be in some state that that it can't recover from....without understanding the internals just viewing from an outside perspective, I don't really see how this could not be a PostgreSQL problem....

Re: Hot Backup with rsync fails at pg_clog if under load

От
Simon Riggs
Дата:
On Tue, Oct 25, 2011 at 10:06 PM, Chris Redekop <chris@replicon.com> wrote:
>> Chris, can you rearrange the backup so you copy the pg_control file as
>> the first act after the pg_start_backup?
> I tried this and it doesn't seem to make any difference.

It won't, that was a poor initial diagnosis on my part.

> I also tried the
> patch and I can no longer reproduce the subtrans error

Good

>, however instead it
> now it starts up, but never gets to the point where it'll accept
> connections.  It starts up but if I try to do anything I always get "FATAL:
>  the database system is starting up"...even if the load is removed from the
> primary, the standby still never finishes "starting up".
...
> 2011-10-25 13:43:25.019 MDT [15072]: [14-1] LOG:  consistent state delayed
> because recovery snapshot incomplete
...

This is a different problem and has already been reported by one of
your colleagues in a separate thread, and answered in detail by me
there. There is no bug related to this error message.

From here, it looks like the published fixes the originally reported problem.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Hot Backup with rsync fails at pg_clog if under load

От
Florian Pflug
Дата:
On Oct25, 2011, at 14:51 , Simon Riggs wrote:
> On Tue, Oct 25, 2011 at 12:39 PM, Florian Pflug <fgp@phlo.org> wrote:
>
>> What I don't understand is how this affects the CLOG. How does oldestActiveXID
>> factor into CLOG initialization?
>
> It is an entirely different error.

Ah, OK. I assumed that you believe the wrong oldestActiveXID computation
solved both the SUBTRANS-related *and* the CLOG-related errors, since you
said "We are starting recovery at the right place but we are initialising
the clog and subtrans incorrectly" at the start of the mail.

> Chris' clog error was caused by a file read error. The file was
> opened, we did a seek within the file and then the call to read()
> failed to return a complete page from the file.
>
> The xid shown is 22811359, which is the nextxid in the control file.
>
> So StartupClog() must have failed trying to read the clog page from disk.

Yep.

> That isn't a Hot Standby problem, a recovery problem nor is it certain
> its a PostgreSQL problem.

It's very likely that it's a PostgreSQL problem, though. It's probably
not a pilot error since it happens even for backups taken with pg_basebackup(),
so the only explanation other than a PostgreSQL bug is broken hardware or
a pretty serious kernel/filesystem bug.

> OTOH SlruPhysicalReadPage() does cope gracefully with missing clog
> files during recovery, so maybe we can think of a way to make recovery
> cope with a SLRU_READ_FAILED error gracefully also. Any ideas?

As long as we don't understand how the CLOG-related errors happen in
the first place, I think it's a bad idea to silence them.

best regards,
Florian Pflug



Re: Hot Backup with rsync fails at pg_clog if under load

От
Florian Pflug
Дата:
On Oct25, 2011, at 13:39 , Florian Pflug wrote:
> On Oct25, 2011, at 11:13 , Simon Riggs wrote:
>> On Tue, Oct 25, 2011 at 8:03 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>>> We are starting recovery at the right place but we are initialising
>>> the clog and subtrans incorrectly. Precisely, the oldestActiveXid is
>>> being derived later than it should be, which can cause problems if
>>> this then means that whole pages are unitialised in subtrans. The bug
>>> only shows up if you do enough transactions (2048 is always enough) to
>>> move to the next subtrans page between the redo pointer and the
>>> checkpoint record while at the same time we do not have a long running
>>> transaction that spans those two points. That's just enough to happen
>>> reasonably frequently on busy systems and yet just enough to have
>>> slipped through testing.
>>>
>>> We must either
>>>
>>> 1. During CreateCheckpoint() we should derive oldestActiveXid before
>>> we derive the redo location
>
>> (1) looks the best way forwards in all cases.
>
> Let me see if I understand this
>
> The probem seems to be that we currently derive oldestActiveXid end the end of
> the checkpoint, just before writing the checkpoint record. Since we use
> oldestActiveXid to initialize SUBTRANS, this is wrong. Records written before
> that checkpoint record (but after the REDO location, of course) may very well
> contain XIDs earlier than that wrongly derived oldestActiveXID, and if attempt
> to touch these XID's SUBTRANS state, we error out.
>
> Your patch seems sensible, because the checkpoint "logically" occurs at the
> REDO location not the checkpoint's location, so we ought to log an oldestActiveXID
> corresponding to that location.

Thinking about this some more (and tracing through the code), I realized that
things are a bit more complicated.

What we actually need to ensure, I think, is that the XID we pass to StartupSUBTRANS()
is earlier than any top-level XID in XLOG_XACT_ASSIGNMENT records. Which, at first
glance, implies that we ought to use the nextId at the *beginning* of the checkpoint
for SUBTRANS initialization. At second glace, however, that'd be wrong, because
backends emit XLOG_XACT_ASSIGNMENT only every PGPROC_MAX_CACHED_SUBXIDS sub-xid
assignment. Thus, an XLOG_XACT_ASSIGNMENT written *after* the checkpoint has started
may contain sub-XIDs which were assigned *before* the checkpoint has started.

Using oldestActiveXID works around that because we guarantee that sub-XIDs are always
larger than their parent XIDs and because only active transactions can produce
XLOG_XACT_ASSIGNMENT records.

So your patch is fine, but I think the reasoning about why oldestActiveXID is
the correct value for StartupSUBTRANS deserves an explanation somewhere.

best regards,
Florian Pflug



Re: Hot Backup with rsync fails at pg_clog if under load

От
Simon Riggs
Дата:
On Wed, Oct 26, 2011 at 12:16 PM, Florian Pflug <fgp@phlo.org> wrote:

>> Chris' clog error was caused by a file read error. The file was
>> opened, we did a seek within the file and then the call to read()
>> failed to return a complete page from the file.
>>
>> The xid shown is 22811359, which is the nextxid in the control file.
>>
>> So StartupClog() must have failed trying to read the clog page from disk.
>
> Yep.
>
>> That isn't a Hot Standby problem, a recovery problem nor is it certain
>> its a PostgreSQL problem.
>
> It's very likely that it's a PostgreSQL problem, though. It's probably
> not a pilot error since it happens even for backups taken with pg_basebackup(),
> so the only explanation other than a PostgreSQL bug is broken hardware or
> a pretty serious kernel/filesystem bug.

The way forwards here is for someone to show the clog file that causes
the error and find out why the call to read() fails.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Hot Backup with rsync fails at pg_clog if under load

От
Aidan Van Dyk
Дата:
On Wed, Oct 26, 2011 at 7:43 AM, Simon Riggs <simon@2ndquadrant.com> wrote:

>> It's very likely that it's a PostgreSQL problem, though. It's probably
>> not a pilot error since it happens even for backups taken with pg_basebackup(),
>> so the only explanation other than a PostgreSQL bug is broken hardware or
>> a pretty serious kernel/filesystem bug.
>
> The way forwards here is for someone to show the clog file that causes
> the error and find out why the call to read() fails.

Sorry, I thought the problem was obvious.  Either that, of I've
completely missed something in these threads...  I'll admit to not
following this one very closely anymore...

When the backup started, the clog was small.  So on the "recovering
instance", the clog is small.  PostgreSQL is supposed to be able to
deal with any file as it was when the backup starts.

When the backup is stopped, clog is big.  But that file was copied
after the backup was started, not after the backup finished.  So its
size is only guarenteed to be as big as it was when the backup
started.  Recovery is responsible for extending it as it was extended
during the backup period on the master.

The read fails because their is no data at the location it's trying to
read from, because clog hasn't been extended yet by recovery.

a.
--
Aidan Van Dyk                                             Create like a god,
aidan@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.


Re: Hot Backup with rsync fails at pg_clog if under load

От
Simon Riggs
Дата:
On Wed, Oct 26, 2011 at 12:54 PM, Aidan Van Dyk <aidan@highrise.ca> wrote:

> The read fails because their is no data at the location it's trying to
> read from, because clog hasn't been extended yet by recovery.

You don't actually know that, though I agree it seems a reasonable
guess and was my first thought also.

The error is very specifically referring to 22811359, which is the
nextxid from pg_control and updated by checkpoint.

22811359 is mid-way through a clog page, so prior xids will already
have been allocated, pages extended and then those pages fsyncd before
the end of pg_start_backup().  So it shouldn't be possible for that
page to be absent from the base backup, unless the base backup was
taken without a preceding checkpoint, which seems is not the case from
the script output.

Note that if you are correct, then the solution is to extend clog,
which Florian disagrees with as a solution.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Hot Backup with rsync fails at pg_clog if under load

От
Florian Pflug
Дата:
On Oct26, 2011, at 15:12 , Simon Riggs wrote:
> On Wed, Oct 26, 2011 at 12:54 PM, Aidan Van Dyk <aidan@highrise.ca> wrote:
>
>> The read fails because their is no data at the location it's trying to
>> read from, because clog hasn't been extended yet by recovery.
>
> You don't actually know that, though I agree it seems a reasonable
> guess and was my first thought also.

The actual error message also supports that theory. Here's the relevant
snippet from the OP's log (Found in CA9FD2FE.1D8D2%linas.virbalas@continuent.com)

2011-09-21 13:41:05 CEST FATAL:  could not access status of transaction 1188673
2011-09-21 13:41:05 CEST DETAIL:  Could not read from file "pg_clog/0001" at offset 32768: Success.

Note that it says "Success" at the end of the second log entry. That
can only happen, I think, if we're trying to read the page adjacent to
the last page in the file. The seek would be successfull, and the subsequent
read() would indicate EOF by returning zero bytes. None of the calls would
set errno. If there was a real IO error, read() would set errno, and if the
page wasn't adjacent to the last page in the file, seek() would set errno.
In both cases we'd see the corresponding error messag, not "Success".

> The error is very specifically referring to 22811359, which is the
> nextxid from pg_control and updated by checkpoint.

Where does that XID come from? The reference to that XID in the archives
that I can find is in your message
CA+U5nMKUUoA8kRG=itfsO5nZuE3x_KDJz78EaUN3_FKmq-uMJA@mail.gmail.com

> 22811359 is mid-way through a clog page, so prior xids will already
> have been allocated, pages extended and then those pages fsyncd before
> the end of pg_start_backup().  So it shouldn't be possible for that
> page to be absent from the base backup, unless the base backup was
> taken without a preceding checkpoint, which seems is not the case from
> the script output.

Or unless the nextId we store in the checkpoint is for some reason higher
than it should be. Or unless nextId somehow gets mangled during recovery.
Or unless there's some interaction between VACUUM and CHECKPOINTS that
we're overlooking...

> Note that if you are correct, then the solution is to extend clog,
> which Florian disagrees with as a solution.

That's not what I said. As you said, the CLOG page corresponding to nextId
*should* always be accessible at the start of recovery (Unless whole file
has been removed by VACUUM, that is). So we shouldn't need to extends CLOG.
Yet the error suggest that the CLOG is, in fact, too short. What I said
is that we shouldn't apply any fix (for the CLOG problem) before we understand
the reason for that apparent contradiction.

Doing it nevertheless to get rid of this seems dangerous. What happens, for
example, to the CLOG state of transactions earlier than the checkpoint's
nextId? There COMMIT record may very well lie before the checkpoint's REDO
pointer, so the CLOG we copied better contained their correct state. Yet if
it does, then why isn't the nextId's CLOG page accessible?

best regards,
Florian Pflug



Re: Hot Backup with rsync fails at pg_clog if under load

От
Chris Redekop
Дата:
> And I think they also reported that if they didn't run hot standby,
but just normal recovery into a new master, it didn't have the problem
either, i.e. without hotstandby, recovery ran, properly extended the
clog, and then ran as a new master fine.

Yes this is correct...attempting to start as hotstandby will produce the pg_clog error repeatedly and then without changing anything else, just turning hot standby off it will start up successfully.  

> This fits the OP's observation ob the
> problem vanishing when pg_start_backup() does an immediate checkpoint.

Note that this is *not* the behaviour I'm seeing....it's possible it happens more frequently without the immediate checkpoint, but I am seeing it happen even with the immediate checkpoint.

> This is a different problem and has already been reported by one of
your colleagues in a separate thread, and answered in detail by me
there. There is no bug related to this error message.

Excellent...I will continue this discussion in that thread.

Re: Hot Backup with rsync fails at pg_clog if under load

От
Florian Pflug
Дата:
On Oct26, 2011, at 15:57 , Florian Pflug wrote:
> As you said, the CLOG page corresponding to nextId
> *should* always be accessible at the start of recovery (Unless whole file
> has been removed by VACUUM, that is). So we shouldn't need to extends CLOG.
> Yet the error suggest that the CLOG is, in fact, too short. What I said
> is that we shouldn't apply any fix (for the CLOG problem) before we understand
> the reason for that apparent contradiction.

Ha! I think I've got a working theory.

In CreateCheckPoint(), we determine the nextId that'll go into the checkpoint
record, and then call CheckPointGuts() which does the actual writing and fsyncing.
So far, that fine. If a transaction ID is assigned before we compute the
checkpoint's nextXid, we'll extend the CLOG accordingly, and CheckPointGuts() will
make sure the new CLOG page goes to disk.

But, if wal_level = hot_standby, we also call LogStandbySnapshot() in
CreateCheckPoint(), and we do that *after* CheckPointGuts(). Which would be
fine too, except that LogStandbySnapshot() re-assigned the *current* value of
ShmemVariableCache->nextXid to the checkpoint's nextXid field.

Thus, if the CLOG is extended after (or in the middle of) CheckPointGuts(), but
before LogStandbySnapshot(), then we end up with a nextXid in the checkpoint
whose CLOG page hasn't necessarily made it to the disk yet. The longer CheckPointGuts()
takes to finish it's work the more likely it becomes (assuming that CLOG writing
and syncing doesn't happen at the very end). This fits the OP's observation ob the
problem vanishing when pg_start_backup() does an immediate checkpoint.

I dunno how to this fix, though, since I don't really understand why
LogStandbySnapshot() needs to modify the checkpoint's nextXid.Simon, is there some
documentation on what assumptions the hot standby code makes about the various XID
fields included in a checkpoint?

best regards,
Florian Pflug



Re: Hot Backup with rsync fails at pg_clog if under load

От
Aidan Van Dyk
Дата:
On Wed, Oct 26, 2011 at 9:57 AM, Florian Pflug <fgp@phlo.org> wrote:
> On Oct26, 2011, at 15:12 , Simon Riggs wrote:
>> On Wed, Oct 26, 2011 at 12:54 PM, Aidan Van Dyk <aidan@highrise.ca> wrote:
>>
>>> The read fails because their is no data at the location it's trying to
>>> read from, because clog hasn't been extended yet by recovery.
>>
>> You don't actually know that, though I agree it seems a reasonable
>> guess and was my first thought also.
>
> The actual error message also supports that theory. Here's the relevant
> snippet from the OP's log (Found in CA9FD2FE.1D8D2%linas.virbalas@continuent.com)
>
> 2011-09-21 13:41:05 CEST FATAL:  could not access status of transaction 1188673
> 2011-09-21 13:41:05 CEST DETAIL:  Could not read from file "pg_clog/0001" at offset 32768: Success.
>
> Note that it says "Success" at the end of the second log entry. That
> can only happen, I think, if we're trying to read the page adjacent to
> the last page in the file. The seek would be successfull, and the subsequent
> read() would indicate EOF by returning zero bytes. None of the calls would
> set errno. If there was a real IO error, read() would set errno, and if the
> page wasn't adjacent to the last page in the file, seek() would set errno.
> In both cases we'd see the corresponding error messag, not "Success".

And even more pointedly, in the original go around on this:
http://article.gmane.org/gmane.comp.db.postgresql.devel.general/174056

He reported that clog/0000 after pg_start_backup call:   -rw------- 1 postgres postgres 8192 Sep 23 14:31 0000

Changed during the rsync phase to this:   -rw------- 1 postgres postgres 16384 Sep 23 14:33 0000

But on the slave, of course, it was copied before it was extend so it
was the original size (that's ok, that's the point of recovery after
the backup):   -rw------- 1 postgres postgres 8192 Sep 23 14:31 0000

With the error: 2011-09-23 14:33:46 CEST FATAL:  could not access status of transaction 37206 2011-09-23 14:33:46 CEST
DETAIL: Could not read from file 
"pg_clog/0000" at offset 8192: Success.

And that error happens *before* recovery even can get attempted.

And that if he copied the "recent" clog/0000 from the master, it did start up.

And I think they also reported that if they didn't run hot standby,
but just normal recovery into a new master, it didn't have the problem
either, i.e. without hotstandby, recovery ran, properly extended the
clog, and then ran as a new master fine.

a.

--
Aidan Van Dyk                                             Create like a god,
aidan@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.


Re: Hot Backup with rsync fails at pg_clog if under load

От
Chris Redekop
Дата:
The way forwards here is for someone to show the clog file that causes
the error and find out why the call to read() fails.

Well let me hook ya up :)  Here's the log, controldata, and erroring clog attached


2011-10-26 08:44:15.419 MDT [1544]: [1-1] LOG:  database system was interrupted; last known up at 2011-10-26 08:43:54 MDT
2011-10-26 08:44:15.419 MDT [1544]: [2-1] LOG:  creating missing WAL directory "pg_xlog/archive_status"
2011-10-26 08:44:15.421 MDT [1544]: [3-1] LOG:  entering standby mode
DEBUG:  received replication command: IDENTIFY_SYSTEM
DEBUG:  received replication command: START_REPLICATION 6/CE000000
2011-10-26 08:44:15.425 MDT [1545]: [1-1] LOG:  streaming replication successfully connected to primary
2011-10-26 08:44:16.246 MDT [1547]: [1-1] FATAL:  the database system is starting up
2011-10-26 08:44:17.501 MDT [1544]: [4-1] DEBUG:  checkpoint record is at 6/D0BC36A0
2011-10-26 08:44:17.501 MDT [1544]: [5-1] DEBUG:  redo record is at 6/CE0010E0; shutdown FALSE
2011-10-26 08:44:17.501 MDT [1544]: [6-1] DEBUG:  next transaction ID: 0/10846655; next OID: 2826628
2011-10-26 08:44:17.501 MDT [1544]: [7-1] DEBUG:  next MultiXactId: 1741; next MultiXactOffset: 3498
2011-10-26 08:44:17.501 MDT [1544]: [8-1] DEBUG:  oldest unfrozen transaction ID: 1669, in database 1
2011-10-26 08:44:17.501 MDT [1544]: [9-1] DEBUG:  transaction ID wrap limit is 2147485316, limited by database with OID 1
2011-10-26 08:44:17.742 MDT [1544]: [10-1] DEBUG:  resetting unlogged relations: cleanup 1 init 0
2011-10-26 08:44:17.751 MDT [1544]: [11-1] DEBUG:  initializing for hot standby
2011-10-26 08:44:17.751 MDT [1544]: [12-1] FATAL:  could not access status of transaction 10846655
2011-10-26 08:44:17.751 MDT [1544]: [13-1] DETAIL:  Could not read from file "pg_clog/000A" at offset 90112: Success.
2011-10-26 08:44:17.751 MDT [1537]: [2-1] LOG:  startup process (PID 1544) exited with exit code 1
2011-10-26 08:44:17.751 MDT [1537]: [3-1] LOG:  aborting startup due to startup process failure
2011-10-26 08:44:17.843 MDT [1543]: [1-1] DEBUG:  logger shutting down



pg_control version number:            903
Catalog version number:               201105231
Database system identifier:           5667259861501982685
Database cluster state:               in production
pg_control last modified:             Wed 26 Oct 2011 08:43:54 AM MDT
Latest checkpoint location:           6/D0BC36A0
Prior checkpoint location:            6/877B9950
Latest checkpoint's REDO location:    6/CE0010E0
Latest checkpoint's TimeLineID:       1
Latest checkpoint's NextXID:          0/10846655
Latest checkpoint's NextOID:          2826628
Latest checkpoint's NextMultiXactId:  1741
Latest checkpoint's NextMultiOffset:  3498
Latest checkpoint's oldestXID:        1669
Latest checkpoint's oldestXID's DB:   1
Latest checkpoint's oldestActiveXID:  10845042
Time of latest checkpoint:            Wed 26 Oct 2011 08:43:44 AM MDT
Minimum recovery ending location:     0/0
Backup start location:                0/0
Current wal_level setting:            hot_standby
Current max_connections setting:      600
Current max_prepared_xacts setting:   0
Current max_locks_per_xact setting:   256
Maximum data alignment:               8
Database block size:                  8192
Blocks per segment of large relation: 131072
WAL block size:                       8192
Bytes per WAL segment:                16777216
Maximum length of identifiers:        64
Maximum columns in an index:          32
Maximum size of a TOAST chunk:        1996
Date/time type storage:               64-bit integers
Float4 argument passing:              by value
Float8 argument passing:              by value
Вложения

Re: Hot Backup with rsync fails at pg_clog if under load

От
Florian Pflug
Дата:
On Oct26, 2011, at 17:36 , Chris Redekop wrote:
> > And I think they also reported that if they didn't run hot standby,
> > but just normal recovery into a new master, it didn't have the problem
> > either, i.e. without hotstandby, recovery ran, properly extended the
> > clog, and then ran as a new master fine.
>
> Yes this is correct...attempting to start as hotstandby will produce the
> pg_clog error repeatedly and then without changing anything else, just
> turning hot standby off it will start up successfully.

Yup, because with hot standby disabled (on the client side), StartupCLOG()
happens after recovery has completed. That, at the very least, makes the
problem very unlikely to occur in the non-hot-standby case. I'm not sure
it's completely impossible, though.

Per my theory about the cause of the problem in my other mail, I think you
might see StartupCLOG failures even during crash recovery, provided that
wal_level was set to hot_standby when the primary crashed. Here's how

1) We start a checkpoint, and get as far as LogStandbySnapshot()
2) A backend does AssignTransactionId, and gets as far as GetTransactionoId().  The assigned XID requires CLOG
extension.
3) The checkpoint continues, and LogStandbySnapshot () advances the  checkpoint's nextXid to the XID assigned in (2).
4) We crash after writing the checkpoint record, but before the CLOG  extension makes it to the disk, and before any
traceof the XID assigned  in (2) makes it to the xlog. 

Then StartupCLOG() would fail at the end of recovery, because we'd end up
with a nextXid whose corresponding CLOG page doesn't exist.

> > This fits the OP's observation ob the
> > problem vanishing when pg_start_backup() does an immediate checkpoint.
>
> Note that this is *not* the behaviour I'm seeing....it's possible it happens
> more frequently without the immediate checkpoint, but I am seeing it happen
> even with the immediate checkpoint.

Yeah, I should have said "of the problem's likelihood decreasing" instead
of "vanishing". The point is, the longer the checkpoint takes, the higher
the chance the nextId is advanced far enough to require a CLOG extension.

That alone isn't enough to trigger the error - the CLOG extension must also
*not* make it to the disk before the checkpoint completes - but it's
a required precondition for the error to occur.

best regards,
Florian Pflug



Re: Hot Backup with rsync fails at pg_clog if under load

От
Simon Riggs
Дата:
On Wed, Oct 26, 2011 at 3:47 PM, Florian Pflug <fgp@phlo.org> wrote:
> On Oct26, 2011, at 15:57 , Florian Pflug wrote:
>> As you said, the CLOG page corresponding to nextId
>> *should* always be accessible at the start of recovery (Unless whole file
>> has been removed by VACUUM, that is). So we shouldn't need to extends CLOG.
>> Yet the error suggest that the CLOG is, in fact, too short. What I said
>> is that we shouldn't apply any fix (for the CLOG problem) before we understand
>> the reason for that apparent contradiction.
>
> Ha! I think I've got a working theory.
>
> In CreateCheckPoint(), we determine the nextId that'll go into the checkpoint
> record, and then call CheckPointGuts() which does the actual writing and fsyncing.
> So far, that fine. If a transaction ID is assigned before we compute the
> checkpoint's nextXid, we'll extend the CLOG accordingly, and CheckPointGuts() will
> make sure the new CLOG page goes to disk.
>
> But, if wal_level = hot_standby, we also call LogStandbySnapshot() in
> CreateCheckPoint(), and we do that *after* CheckPointGuts(). Which would be
> fine too, except that LogStandbySnapshot() re-assigned the *current* value of
> ShmemVariableCache->nextXid to the checkpoint's nextXid field.
>
> Thus, if the CLOG is extended after (or in the middle of) CheckPointGuts(), but
> before LogStandbySnapshot(), then we end up with a nextXid in the checkpoint
> whose CLOG page hasn't necessarily made it to the disk yet. The longer CheckPointGuts()
> takes to finish it's work the more likely it becomes (assuming that CLOG writing
> and syncing doesn't happen at the very end). This fits the OP's observation ob the
> problem vanishing when pg_start_backup() does an immediate checkpoint.

This is the correct explanation. I've just come back into Wifi range,
so I was just writing to you with this explanation but your original
point that nextxid must be wrong deserves credit. OTOH I was just
waiting to find out what the reason for the physical read was, rather
than guessing.

Notice that the nextxid value isn't wrong, its just not the correct
value to use for starting clog.

As it turns out the correct fix is actually just to skip StartupClog()
until the end of recovery because it does nothing useful when executed
at that time. When I wrote the original code I remember thinking that
StartupClog() is superfluous at that point.

Brewing a patch now.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Hot Backup with rsync fails at pg_clog if under load

От
Simon Riggs
Дата:
On Wed, Oct 26, 2011 at 5:08 PM, Simon Riggs <simon@2ndquadrant.com> wrote:

> Brewing a patch now.

Latest thinking... confirmations or other error reports please.

This fixes both the subtrans and clog bugs in one patch.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Вложения

Re: Hot Backup with rsync fails at pg_clog if under load

От
Simon Riggs
Дата:
On Wed, Oct 26, 2011 at 5:16 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On Wed, Oct 26, 2011 at 5:08 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
>
>> Brewing a patch now.
>
> Latest thinking... confirmations or other error reports please.
>
> This fixes both the subtrans and clog bugs in one patch.

I'll be looking to commit that tomorrow afternoon as two separate
patches with appropriate credits.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Hot Backup with rsync fails at pg_clog if under load

От
Chris Redekop
Дата:
FYI I have given this patch a good test and can now no longer reproduce either the subtrans nor the clog error.  Thanks guys!


On Wed, Oct 26, 2011 at 11:09 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
On Wed, Oct 26, 2011 at 5:16 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On Wed, Oct 26, 2011 at 5:08 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
>
>> Brewing a patch now.
>
> Latest thinking... confirmations or other error reports please.
>
> This fixes both the subtrans and clog bugs in one patch.

I'll be looking to commit that tomorrow afternoon as two separate
patches with appropriate credits.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: Hot Backup with rsync fails at pg_clog if under load

От
Florian Pflug
Дата:
On Oct26, 2011, at 18:08 , Simon Riggs wrote:
> On Wed, Oct 26, 2011 at 3:47 PM, Florian Pflug <fgp@phlo.org> wrote:
>> On Oct26, 2011, at 15:57 , Florian Pflug wrote:
>> Thus, if the CLOG is extended after (or in the middle of) CheckPointGuts(), but
>> before LogStandbySnapshot(), then we end up with a nextXid in the checkpoint
>> whose CLOG page hasn't necessarily made it to the disk yet. The longer CheckPointGuts()
>> takes to finish it's work the more likely it becomes (assuming that CLOG writing
>> and syncing doesn't happen at the very end). This fits the OP's observation ob the
>> problem vanishing when pg_start_backup() does an immediate checkpoint.
>
> As it turns out the correct fix is actually just to skip StartupClog()
> until the end of recovery because it does nothing useful when executed
> at that time. When I wrote the original code I remember thinking that
> StartupClog() is superfluous at that point.

Hm, that fixes the problem in the hot standby case, but as I said in my
reply to Chris Redekop, normal crash recovery is also at risk (although
the probability of hitting the bug is much smaller there). Here's my
reasoning from that other mail:

Per my theory about the cause of the problem in my other mail, I think you
might see StartupCLOG failures even during crash recovery, provided that
wal_level was set to hot_standby when the primary crashed. Here's how

1) We start a checkpoint, and get as far as LogStandbySnapshot()
2) A backend does AssignTransactionId, and gets as far as GetTransactionoId(). The assigned XID requires CLOG
extension.
3) The checkpoint continues, and LogStandbySnapshot () advances the checkpoint's nextXid to the XID assigned in (2).
4) We crash after writing the checkpoint record, but before the CLOG extension makes it to the disk, and before any
traceof the XID assigned in (2) makes it to the xlog. 

Then StartupCLOG() would fail at the end of recovery, because we'd end up
with a nextXid whose corresponding CLOG page doesn't exist.

Quite aside from that concern, I think it's probably not a good idea
for the nextXid value of a checkpoint to depend on whether wal_level
was set to hot_standby or not. Our recovery code is already quite complex
and hard to test, and this just adds one more combination that has to
be thought about while coding and that needs to be tested.

My suggestion is to fix the CLOG problem in that same way that you fixed
the SUBTRANS problem, i.e. by moving LogStandbySnapshot() to before
CheckPointGuts().

Here's what I image CreateCheckPoint() should look like:

1) LogStandbySnapshot() and fill out oldestActiveXid
2) Fill out REDO
3) Wait for concurrent commits
4) Fill out nextXid and the other fields
5) CheckPointGuts()
6) Rest

It's then no longer necessary for LogStandbySnapshot() do modify
the nextXid, since we fill out nextXid after LogStandbySnapshot() and
will thus derive a higher value than LogStandbySnapshot() would have.

We could then also fold GetOldestActiveTransactionId() back into
your proposed LogStandbySnapshot() and thus don't need two ProcArray
traversals.

best regards,
Florian Pflug



Re: Hot Backup with rsync fails at pg_clog if under load

От
Robert Haas
Дата:
On Wed, Oct 26, 2011 at 12:16 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On Wed, Oct 26, 2011 at 5:08 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
>
>> Brewing a patch now.
>
> Latest thinking... confirmations or other error reports please.
>
> This fixes both the subtrans and clog bugs in one patch.

I don't see the point of changing StartupCLOG() to be an empty
function and adding a new function TrimCLOG() that does everything
StartupCLOG() used to do.  Seems simpler to just move the calls to
StartupCLOG() wherever they need to be - i.e. remove the one that
happens before WAL replay, and extricate the one at end-of-recovery
from the if block which currently contains it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Hot Backup with rsync fails at pg_clog if under load

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
> On Wed, Oct 26, 2011 at 12:16 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> This fixes both the subtrans and clog bugs in one patch.

> I don't see the point of changing StartupCLOG() to be an empty
> function and adding a new function TrimCLOG() that does everything
> StartupCLOG() used to do.

+1 ... I found that overly cute also.
        regards, tom lane


Re: Hot Backup with rsync fails at pg_clog if under load

От
Heikki Linnakangas
Дата:
On 27.10.2011 02:29, Florian Pflug wrote:
> Per my theory about the cause of the problem in my other mail, I think you
> might see StartupCLOG failures even during crash recovery, provided that
> wal_level was set to hot_standby when the primary crashed. Here's how
>
> 1) We start a checkpoint, and get as far as LogStandbySnapshot()
> 2) A backend does AssignTransactionId, and gets as far as GetTransactionoId().
>    The assigned XID requires CLOG extension.
> 3) The checkpoint continues, and LogStandbySnapshot () advances the
>    checkpoint's nextXid to the XID assigned in (2).
> 4) We crash after writing the checkpoint record, but before the CLOG
>    extension makes it to the disk, and before any trace of the XID assigned
>    in (2) makes it to the xlog.
>
> Then StartupCLOG() would fail at the end of recovery, because we'd end up
> with a nextXid whose corresponding CLOG page doesn't exist.

No, clog extension is WAL-logged while holding the XidGenLock. At step 
3, LogStandbySnapshot() would block until the clog-extension record is 
written to WAL, so crash recovery would see and replay that record 
before calling StartupCLOG().

That can happen during hot standby, though, because StartupCLOG() is 
called earlier.

> My suggestion is to fix the CLOG problem in that same way that you fixed
> the SUBTRANS problem, i.e. by moving LogStandbySnapshot() to before
> CheckPointGuts().
>
> Here's what I image CreateCheckPoint() should look like:
>
> 1) LogStandbySnapshot() and fill out oldestActiveXid
> 2) Fill out REDO
> 3) Wait for concurrent commits
> 4) Fill out nextXid and the other fields
> 5) CheckPointGuts()
> 6) Rest
>
> It's then no longer necessary for LogStandbySnapshot() do modify
> the nextXid, since we fill out nextXid after LogStandbySnapshot() and
> will thus derive a higher value than LogStandbySnapshot() would have.

Hmm, I don't think that fully fixes the problem. Even if you're certain 
that CheckPointGuts() has fsync'd the clog page to disk, VACUUM might 
decide to truncate it away again while the checkpoint is running.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: Hot Backup with rsync fails at pg_clog if under load

От
Heikki Linnakangas
Дата:
On 27.10.2011 09:57, Heikki Linnakangas wrote:
>> My suggestion is to fix the CLOG problem in that same way that you fixed
>> the SUBTRANS problem, i.e. by moving LogStandbySnapshot() to before
>> CheckPointGuts().
>>
>> Here's what I image CreateCheckPoint() should look like:
>>
>> 1) LogStandbySnapshot() and fill out oldestActiveXid
>> 2) Fill out REDO
>> 3) Wait for concurrent commits
>> 4) Fill out nextXid and the other fields
>> 5) CheckPointGuts()
>> 6) Rest
>>
>> It's then no longer necessary for LogStandbySnapshot() do modify
>> the nextXid, since we fill out nextXid after LogStandbySnapshot() and
>> will thus derive a higher value than LogStandbySnapshot() would have.
>
> Hmm, I don't think that fully fixes the problem. Even if you're certain
> that CheckPointGuts() has fsync'd the clog page to disk, VACUUM might
> decide to truncate it away again while the checkpoint is running.

Oh, scratch that. During recovery, we merrily treat missing slru files 
as they were filled with zeros.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: Hot Backup with rsync fails at pg_clog if under load

От
Simon Riggs
Дата:
On Thu, Oct 27, 2011 at 4:36 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Wed, Oct 26, 2011 at 12:16 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
>>> This fixes both the subtrans and clog bugs in one patch.
>
>> I don't see the point of changing StartupCLOG() to be an empty
>> function and adding a new function TrimCLOG() that does everything
>> StartupCLOG() used to do.
>
> +1 ... I found that overly cute also.

It would have been even easier to move StartupCLOG() later, but then
we'd need a big comment explaining why CLOG starts up at one point and
subtrans starts up at another point, since that is very confusing way
of doing things. I wrote it that way first and it definitely looks
strange.

It's much easier to understand that StartupCLOG() is actually a no-op
and that we need to trim the clog at the end of recovery in all cases.

The patch isn't meant to be cute, just a better of way of expressing
what needs to be done, so I think the patch should stay that way.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Hot Backup with rsync fails at pg_clog if under load

От
Robert Haas
Дата:
On Thu, Oct 27, 2011 at 5:37 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On Thu, Oct 27, 2011 at 4:36 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Robert Haas <robertmhaas@gmail.com> writes:
>>> On Wed, Oct 26, 2011 at 12:16 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
>>>> This fixes both the subtrans and clog bugs in one patch.
>>
>>> I don't see the point of changing StartupCLOG() to be an empty
>>> function and adding a new function TrimCLOG() that does everything
>>> StartupCLOG() used to do.
>>
>> +1 ... I found that overly cute also.
>
> It would have been even easier to move StartupCLOG() later, but then
> we'd need a big comment explaining why CLOG starts up at one point and
> subtrans starts up at another point, since that is very confusing way
> of doing things. I wrote it that way first and it definitely looks
> strange.
>
> It's much easier to understand that StartupCLOG() is actually a no-op
> and that we need to trim the clog at the end of recovery in all cases.

If it's a no-op, why have it at all?  I know we have a bunch of places
in the code where we have empty stubs where there used to be
initialization or cleanup code, but I've never found that particularly
good style.  If something no longer requires initialization in a
certain place, I think we should nuke the whole function.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Hot Backup with rsync fails at pg_clog if under load

От
Florian Pflug
Дата:
On Oct27, 2011, at 08:57 , Heikki Linnakangas wrote:
> On 27.10.2011 02:29, Florian Pflug wrote:
>> Per my theory about the cause of the problem in my other mail, I think you
>> might see StartupCLOG failures even during crash recovery, provided that
>> wal_level was set to hot_standby when the primary crashed. Here's how
>>
>> 1) We start a checkpoint, and get as far as LogStandbySnapshot()
>> 2) A backend does AssignTransactionId, and gets as far as GetTransactionoId().
>>   The assigned XID requires CLOG extension.
>> 3) The checkpoint continues, and LogStandbySnapshot () advances the
>>   checkpoint's nextXid to the XID assigned in (2).
>> 4) We crash after writing the checkpoint record, but before the CLOG
>>   extension makes it to the disk, and before any trace of the XID assigned
>>   in (2) makes it to the xlog.
>>
>> Then StartupCLOG() would fail at the end of recovery, because we'd end up
>> with a nextXid whose corresponding CLOG page doesn't exist.
>
> No, clog extension is WAL-logged while holding the XidGenLock. At step 3,
> LogStandbySnapshot() would block until the clog-extension record is written
> to WAL, so crash recovery would see and replay that record before calling
> StartupCLOG().

Hm, true. But it still seems wrong for LogStandbySnapshot() to modify the
checkpoint's nextXid, and even more wrong to do that only if wal_mode =
hot_standby. Plus, I think it's a smart idea to verify that the required
parts of the CLOG are available at the start of recovery. Because if they're
missing, the data on the standby *will* be corrupted. Is there any argument
against doiing LogStandbySnapshot() earlier (i.e., at the time oldestActiveXid
is computed)?

best regards,
Florian Pflug




Re: Hot Backup with rsync fails at pg_clog if under load

От
Simon Riggs
Дата:
On Thu, Oct 27, 2011 at 12:36 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Thu, Oct 27, 2011 at 5:37 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> On Thu, Oct 27, 2011 at 4:36 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> Robert Haas <robertmhaas@gmail.com> writes:
>>>> On Wed, Oct 26, 2011 at 12:16 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
>>>>> This fixes both the subtrans and clog bugs in one patch.
>>>
>>>> I don't see the point of changing StartupCLOG() to be an empty
>>>> function and adding a new function TrimCLOG() that does everything
>>>> StartupCLOG() used to do.
>>>
>>> +1 ... I found that overly cute also.
>>
>> It would have been even easier to move StartupCLOG() later, but then
>> we'd need a big comment explaining why CLOG starts up at one point and
>> subtrans starts up at another point, since that is very confusing way
>> of doing things. I wrote it that way first and it definitely looks
>> strange.
>>
>> It's much easier to understand that StartupCLOG() is actually a no-op
>> and that we need to trim the clog at the end of recovery in all cases.
>
> If it's a no-op, why have it at all?  I know we have a bunch of places
> in the code where we have empty stubs where there used to be
> initialization or cleanup code, but I've never found that particularly
> good style.  If something no longer requires initialization in a
> certain place, I think we should nuke the whole function.

It is a no-op for exactly the same reason other similar functions are
no-ops - it used to do something but now does not.

Anyone seeing StartupSubtrans and StartupMultiXact but no StartupClog
will immediately ask "why?".
IMHO it's easier to have an obviously named function than a comment -
its less invasive for a backpatch as well.

I'm following current code style. If you wish to change that, feel
free to change this and all other locations that do this. Until then,
doing this makes most sense and follows current coding style.

If I had done it the way you suggest, I don't doubt someone would say
in about 6 months "Which idiot removed StartupClog()?".

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Hot Backup with rsync fails at pg_clog if under load

От
Simon Riggs
Дата:
On Thu, Oct 27, 2011 at 12:29 AM, Florian Pflug <fgp@phlo.org> wrote:

> Per my theory about the cause of the problem in my other mail, I think you
> might see StartupCLOG failures even during crash recovery, provided that
> wal_level was set to hot_standby when the primary crashed. Here's how
>
> 1) We start a checkpoint, and get as far as LogStandbySnapshot()
> 2) A backend does AssignTransactionId, and gets as far as GetTransactionoId().
>  The assigned XID requires CLOG extension.
> 3) The checkpoint continues, and LogStandbySnapshot () advances the
>  checkpoint's nextXid to the XID assigned in (2).
> 4) We crash after writing the checkpoint record, but before the CLOG
>  extension makes it to the disk, and before any trace of the XID assigned
>  in (2) makes it to the xlog.
>
> Then StartupCLOG() would fail at the end of recovery, because we'd end up
> with a nextXid whose corresponding CLOG page doesn't exist.

Clog extension holds XidGenLock, as does LogStandbySnapshot, which
specifically excludes the above scenario.


> Quite aside from that concern, I think it's probably not a good idea
> for the nextXid value of a checkpoint to depend on whether wal_level
> was set to hot_standby or not. Our recovery code is already quite complex
> and hard to test, and this just adds one more combination that has to
> be thought about while coding and that needs to be tested.
>
> My suggestion is to fix the CLOG problem in that same way that you fixed
> the SUBTRANS problem, i.e. by moving LogStandbySnapshot() to before
> CheckPointGuts().
>
> Here's what I image CreateCheckPoint() should look like:
>
> 1) LogStandbySnapshot() and fill out oldestActiveXid
> 2) Fill out REDO
> 3) Wait for concurrent commits
> 4) Fill out nextXid and the other fields
> 5) CheckPointGuts()
> 6) Rest
>
> It's then no longer necessary for LogStandbySnapshot() do modify
> the nextXid, since we fill out nextXid after LogStandbySnapshot() and
> will thus derive a higher value than LogStandbySnapshot() would have.
>
> We could then also fold GetOldestActiveTransactionId() back into
> your proposed LogStandbySnapshot() and thus don't need two ProcArray
> traversals.

I think you make a good case for doing this.

However, I'm concerned that moving LogStandbySnapshot() in a backpatch
seems more risky than it's worth. We could easily introduce a new bug
into what we would all agree is a complex piece of code. Minimal
change seems best in this case.

And also, 2 ProcArray traversals is not a problem there.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Hot Backup with rsync fails at pg_clog if under load

От
Florian Pflug
Дата:
On Oct27, 2011, at 15:51 , Simon Riggs wrote:
> On Thu, Oct 27, 2011 at 12:29 AM, Florian Pflug <fgp@phlo.org> wrote:
>> Here's what I image CreateCheckPoint() should look like:
>> 
>> 1) LogStandbySnapshot() and fill out oldestActiveXid
>> 2) Fill out REDO
>> 3) Wait for concurrent commits
>> 4) Fill out nextXid and the other fields
>> 5) CheckPointGuts()
>> 6) Rest
>> 
>> It's then no longer necessary for LogStandbySnapshot() do modify
>> the nextXid, since we fill out nextXid after LogStandbySnapshot() and
>> will thus derive a higher value than LogStandbySnapshot() would have.
>> 
>> We could then also fold GetOldestActiveTransactionId() back into
>> your proposed LogStandbySnapshot() and thus don't need two ProcArray
>> traversals.
> 
> I think you make a good case for doing this.
> 
> However, I'm concerned that moving LogStandbySnapshot() in a backpatch
> seems more risky than it's worth. We could easily introduce a new bug
> into what we would all agree is a complex piece of code. Minimal
> change seems best in this case.

OTOH, we currently compute oldestActiveXid within LogStandbySnapshot().
Your proposed patch changes that, which also carries a risk since something
could depend on these values being in sync. Especially since both the logged
snapshot and oldestActiveXid influence the snapshot tracking on the slave.

But since you wrote most of that code, your judgement about the relative
risks of these two approaches obviously out-weights mine.

best regards,
Florian Pflug



Re: Hot Backup with rsync fails at pg_clog if under load

От
Tom Lane
Дата:
Simon Riggs <simon@2ndQuadrant.com> writes:
> On Thu, Oct 27, 2011 at 12:36 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Thu, Oct 27, 2011 at 5:37 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>>> It's much easier to understand that StartupCLOG() is actually a no-op
>>> and that we need to trim the clog at the end of recovery in all cases.

>> If it's a no-op, why have it at all?

> It is a no-op for exactly the same reason other similar functions are
> no-ops - it used to do something but now does not.

> Anyone seeing StartupSubtrans and StartupMultiXact but no StartupClog
> will immediately ask "why?".

I think it's a good point that StartupCLog doesn't exist in a vacuum
but should be parallel to the init functions for the other SLRU modules.
So at this point I think I agree with Simon's approach.  However, the
obvious next question is whether those other modules don't need to be
changed also, and if not why not.

Another issue is that if StartupCLog is left as a no-op, what will
happen if someone mistakenly tries to access clog before the trim
function is called?  It would be a good idea to make sure that such
a thing results in an easily-identifiable failure.
        regards, tom lane


Re: Hot Backup with rsync fails at pg_clog if under load

От
Simon Riggs
Дата:
On Thu, Oct 27, 2011 at 3:03 PM, Florian Pflug <fgp@phlo.org> wrote:

>> I think you make a good case for doing this.
>>
>> However, I'm concerned that moving LogStandbySnapshot() in a backpatch
>> seems more risky than it's worth. We could easily introduce a new bug
>> into what we would all agree is a complex piece of code. Minimal
>> change seems best in this case.
>
> OTOH, we currently compute oldestActiveXid within LogStandbySnapshot().
> Your proposed patch changes that, which also carries a risk since something
> could depend on these values being in sync. Especially since both the logged
> snapshot and oldestActiveXid influence the snapshot tracking on the slave.
>
> But since you wrote most of that code, your judgement about the relative
> risks of these two approaches obviously out-weights mine.

We must move oldestActiveXid since that is the source of a bug. There
is no need to move LogStandbySnapshot(), so I am suggesting we don't
do that for the backpatch. I was going to implement it the way you
suggest in HEAD, since I agree that is a cleaner way.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Hot Backup with rsync fails at pg_clog if under load

От
Simon Riggs
Дата:
On Thu, Oct 27, 2011 at 3:13 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> However, the
> obvious next question is whether those other modules don't need to be
> changed also, and if not why not.

Good point.

StartupSubtrans() is also changed by this patch, since it will be
supplied with an earlier initialisation value.

StartupMultiXact() didn't need changing, I thought, but I will review further.

> Another issue is that if StartupCLog is left as a no-op, what will
> happen if someone mistakenly tries to access clog before the trim
> function is called?  It would be a good idea to make sure that such
> a thing results in an easily-identifiable failure.

The old StartupCLOG() didn't do anything that was essential to using
the clog, which is why its a no-op.

You can still use the clog, just with zero startup.

Maybe setting the current page should go in at startup, will think.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Hot Backup with rsync fails at pg_clog if under load

От
Florian Pflug
Дата:
On Oct27, 2011, at 16:30 , Simon Riggs wrote:
> On Thu, Oct 27, 2011 at 3:03 PM, Florian Pflug <fgp@phlo.org> wrote:
>
>>> I think you make a good case for doing this.
>>>
>>> However, I'm concerned that moving LogStandbySnapshot() in a backpatch
>>> seems more risky than it's worth. We could easily introduce a new bug
>>> into what we would all agree is a complex piece of code. Minimal
>>> change seems best in this case.
>>
>> OTOH, we currently compute oldestActiveXid within LogStandbySnapshot().
>> Your proposed patch changes that, which also carries a risk since something
>> could depend on these values being in sync. Especially since both the logged
>> snapshot and oldestActiveXid influence the snapshot tracking on the slave.
>>
>> But since you wrote most of that code, your judgement about the relative
>> risks of these two approaches obviously out-weights mine.
>
> We must move oldestActiveXid since that is the source of a bug. There
> is no need to move LogStandbySnapshot(), so I am suggesting we don't
> do that for the backpatch. I was going to implement it the way you
> suggest in HEAD, since I agree that is a cleaner way.

Sound good.

best regards,
Florian Pflug



Re: Hot Backup with rsync fails at pg_clog if under load

От
Simon Riggs
Дата:
On Thu, Oct 27, 2011 at 4:25 PM, Simon Riggs <simon@2ndquadrant.com> wrote:

> StartupMultiXact() didn't need changing, I thought, but I will review further.

Good suggestion.

On review, StartupMultiXact() could also suffer similar error to the
clog failure. This was caused *because* MultiXact is not maintained by
recovery, which I had thought meant it was protected from such
failure.

Revised patch attached.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Вложения

Re: Hot Backup with rsync fails at pg_clog if under load

От
Chris Redekop
Дата:
looks like the v3 patch re-introduces the pg_subtrans issue...


On Tue, Nov 1, 2011 at 9:33 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
On Thu, Oct 27, 2011 at 4:25 PM, Simon Riggs <simon@2ndquadrant.com> wrote:

> StartupMultiXact() didn't need changing, I thought, but I will review further.

Good suggestion.

On review, StartupMultiXact() could also suffer similar error to the
clog failure. This was caused *because* MultiXact is not maintained by
recovery, which I had thought meant it was protected from such
failure.

Revised patch attached.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: Hot Backup with rsync fails at pg_clog if under load

От
Simon Riggs
Дата:
On Wed, Nov 2, 2011 at 2:40 AM, Chris Redekop <chris@replicon.com> wrote:

> looks like the v3 patch re-introduces the pg_subtrans issue...

No, I just separated the patches to be clearer about the individual changes.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Hot Backup with rsync fails at pg_clog if under load

От
Simon Riggs
Дата:
On Wed, Nov 2, 2011 at 7:34 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On Wed, Nov 2, 2011 at 2:40 AM, Chris Redekop <chris@replicon.com> wrote:
>
>> looks like the v3 patch re-introduces the pg_subtrans issue...
>
> No, I just separated the patches to be clearer about the individual changes.

3 bug fixes committed and back patched.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Hot Backup with rsync fails at pg_clog if under load

От
Chris Redekop
Дата:
okay, sorry I'm a little confused then.  Should I be able to apply both the v2 patch as well as the v3 patch?  or is it expected that I'd have to manually do the merge?


On Wed, Nov 2, 2011 at 1:34 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
On Wed, Nov 2, 2011 at 2:40 AM, Chris Redekop <chris@replicon.com> wrote:

> looks like the v3 patch re-introduces the pg_subtrans issue...

No, I just separated the patches to be clearer about the individual changes.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: Hot Backup with rsync fails at pg_clog if under load

От
Greg Smith
Дата:
I was curious if anyone running into these problems has gotten a chance 
to test the 3 fixes committed here.  It sounded like Linas even had a 
repeatable test case?

For easier reference the commits are:

http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=2f55c535e1f026929cf20855b3790d3632062d42

http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=ff8451aa14c8513e429cbef09ddc72e79da366a5

http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=656bba95af3950b26e9e97d86d29787d89e2b423

-- 
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us