Обсуждение: Warm standby problems

Поиск
Список
Период
Сортировка

Warm standby problems

От
"David F. Skoll"
Дата:
Hi,

I have one PostgreSQL 8.3 machine feeding WAL files to another PostgreSQL 8.3
machine that's running in recovery mode.  However, fairly often (every
few days), the standby machine breaks out of recovery mode with log messages
like:

2009-10-23 21:47:40 EDT LOG:  incorrect resource manager data checksum in record at 11/4E5AA18C
2009-10-23 21:47:40 EDT LOG:  redo done at 11/4E5A9A40
2009-10-23 21:47:40 EDT LOG:  last completed transaction was at log time 2009-10-23 21:28:36.531005-04
2009-10-23 21:47:40 EDT LOG:  restored log file "00000001000000110000004E" from archive
cp: cannot stat `/var/backups/postgres/wal/00000002.history': No such file or directory
2009-10-23 21:47:46 EDT LOG:  selected new timeline ID: 2
cp: cannot stat `/var/backups/postgres/wal/00000001.history': No such file or directory
2009-10-23 21:47:52 EDT LOG:  archive recovery complete
2009-10-23 21:47:55 EDT LOG:  database system is ready to accept connections

I'm pretty sure the WAL is shipped correctly (it's shipped via SCP).
The only thing I can think of is that the shipping version is 8.3.7 and
the consuming version is 8.3.8.  Would this be enough to cause the
"incorrect checksum" message?

Regards,

David.

Re: Warm standby problems

От
Tom Lane
Дата:
"David F. Skoll" <dfs@roaringpenguin.com> writes:
> I have one PostgreSQL 8.3 machine feeding WAL files to another PostgreSQL 8.3
> machine that's running in recovery mode.  However, fairly often (every
> few days), the standby machine breaks out of recovery mode with log messages
> like:

> 2009-10-23 21:47:40 EDT LOG:  incorrect resource manager data checksum in record at 11/4E5AA18C

> I'm pretty sure the WAL is shipped correctly (it's shipped via SCP).
> The only thing I can think of is that the shipping version is 8.3.7 and
> the consuming version is 8.3.8.  Would this be enough to cause the
> "incorrect checksum" message?

No; there's no WAL change between 8.3.7 and 8.3.8.  What seems more
likely is that you're somehow shipping the WAL files before they're
quite finished.

            regards, tom lane

Re: Warm standby problems

От
"David F. Skoll"
Дата:
Tom Lane wrote:

> No; there's no WAL change between 8.3.7 and 8.3.8.  What seems more
> likely is that you're somehow shipping the WAL files before they're
> quite finished.

I doubt it.  Our archive_command works like this:

1) scp the file over to the backup server as root.  It's stored in a
file named <name_of_wal>.NEW

2) When the scp finishes, ssh to the standby server and chown it to postgres.

3) Then, ssh to the standby server and rename the <name_of_wal>.NEW
file to the real WAL filename.

Only after step (3) does our standby script even see the file.

So unless there's a possibility that the master server invokes our archive
script before the WAL has finished being written, I can't see how this
could happen.

Regards,

David.

Re: Warm standby problems

От
Tom Lane
Дата:
"David F. Skoll" <dfs@roaringpenguin.com> writes:
> Tom Lane wrote:
>> No; there's no WAL change between 8.3.7 and 8.3.8.  What seems more
>> likely is that you're somehow shipping the WAL files before they're
>> quite finished.

> So unless there's a possibility that the master server invokes our archive
> script before the WAL has finished being written, I can't see how this
> could happen.

What about the other direction: the script invoked by the archive
returns "done" before the bits have all been shipped?  PG is entirely
capable of overwriting the contents of a WAL file the moment the script
returns "done".

            regards, tom lane

Re: Warm standby problems

От
"David F. Skoll"
Дата:
Tom Lane wrote:

> What about the other direction: the script invoked by the archive
> returns "done" before the bits have all been shipped?

Do you mean the wal_archive_command?  It waits for scp to finish.
It's written in Perl; here is the relevant part.

Regards,

David.

[Stuff deleted...]

my $remote_user      = $failover->get_setting('backup_ssh_user');
my $remote_host      = $failover->get_setting('backup_db_host');
my $remote_directory = $failover->get_setting('backup_wal_dir');
my $postgres_user    = $failover->get_setting('pg_userid') || 'postgres';

my $pid = open(CHILD, '-|');
my $wal_filename_only;
(undef, undef, $wal_filename_only) = File::Spec->splitpath( $wal_file );

if( $pid ) { # Parent

    while(<CHILD>) {
        chomp;
        canit_log( "Output from scp: $_" );
    }

    # This should wait for scp to finish
    my $rc = close CHILD;
    my $errno = $!;
    my $child_error = $?;
    alarm(0);

    if( ! $rc ) {
        if( $errno ) {
            canit_warning("Error closing pipe from scp: $errno");
            exit EXIT_FAILURE;
        } elsif ($child_error & 127) {
            canit_warning("scp died with signal " . ($child_error & 127));
            exit EXIT_FAILURE;
        } else {
            canit_warning("scp exited with non-zero exit code " . ($child_error >> 8));
            exit EXIT_FAILURE;
        }
    }

    my $exit_status = EXIT_OK;
    if (system('ssh', "$remote_user\@$remote_host", "chown $postgres_user $remote_directory/$wal_filename_only.NEW") !=
0){ 
        canit_warning("ssh command to chown remote file failed: $?");
        $exit_status = EXIT_FAILURE;
    }

    if (system('ssh', "$remote_user\@$remote_host", "mv -f $remote_directory/$wal_filename_only.NEW
$remote_directory/$wal_filename_only")!= 0) { 
        canit_warning("ssh command to rename remote file failed: $?");
        $exit_status = EXIT_FAILURE;
    }

    if ($exit_status == EXIT_OK) {
        canit_log("Successully archived WAL file $wal_filename_only on $remote_host");
    }

    exit $exit_status;
}

# In the child:
# Make sure we capture STDERR
open(STDERR, '>&STDOUT');

# In new scope to suppress warning about code after an exec - see
# 'perldoc exec' for details
{
    no warnings 'exec';
    exec 'scp', '-q', $wal_file, "$remote_user\@$remote_host:$remote_directory/$wal_filename_only.NEW";
}

Re: Warm standby problems: Followup

От
"David F. Skoll"
Дата:
Hi,

In an effort to track down the problem, I switched to using rsync rather
than scp to copy the files.  I also take the SHA1 hash on each end, and
have my archiving script exit with a non-zero status if there's a mismatch.
Sure enough:

Oct 27 14:26:35 colo2vs1 canit-failover-wal-archive[29118]:
Warning: rsync succeeded, but local_sha1 1fe9fc62b2a05d21530decac1c5442969adc5819
!= remote_sha1 4f9f8bcd151129db64acd05470f0f05954b56232 !!

This is a "can't happen" situation, so I have to investigate bugs in rsync,
ssh, the kernel, the network, the disk.... bleah.

But I'm pretty sure it's not a PostgreSQL problem.

(My script exits with non-zero status if the SHA1s mismatch, and PostgreSQL
re-archives the WAL a short time later, and that succeeds, so I'm happy
for now.)

Regards,

David.

Re: Warm standby problems: Followup

От
"Kevin Grittner"
Дата:
"David F. Skoll" <dfs@roaringpenguin.com> wrote:

> (My script exits with non-zero status if the SHA1s mismatch, and
> PostgreSQL re-archives the WAL a short time later, and that
> succeeds, so I'm happy for now.)

Just out of curiosity, could you show us the non-comment portions of
your postgresql.conf file?

-Kevin

Re: Warm standby problems: Followup

От
"David F. Skoll"
Дата:
Kevin Grittner wrote:

> Just out of curiosity, could you show us the non-comment portions of
> your postgresql.conf file?

Sure!  Here they are.

Regards,

David.

=============================================================================
data_directory = '/var/lib/postgresql/8.3/main'
hba_file = '/etc/postgresql/8.3/main/pg_hba.conf'
ident_file = '/etc/postgresql/8.3/main/pg_ident.conf'
external_pid_file = '/var/run/postgresql/8.3-main.pid'
port = 5432
max_connections = 100
unix_socket_directory = '/var/run/postgresql'
ssl = true
shared_buffers = 24MB
max_fsm_pages = 500000
archive_mode = on
archive_command = '/usr/bin/wal_archive_command.pl %p'
archive_timeout = 240min
log_line_prefix = '%t '
autovacuum = off
datestyle = 'iso, mdy'
lc_messages = 'C'
lc_monetary = 'C'
lc_numeric = 'C'
lc_time = 'C'
default_text_search_config = 'pg_catalog.english'

Re: Warm standby problems: Followup

От
Tom Lane
Дата:
"David F. Skoll" <dfs@roaringpenguin.com> writes:
> Sure enough:

> Oct 27 14:26:35 colo2vs1 canit-failover-wal-archive[29118]:
> Warning: rsync succeeded, but local_sha1 1fe9fc62b2a05d21530decac1c5442969adc5819
> != remote_sha1 4f9f8bcd151129db64acd05470f0f05954b56232 !!

> This is a "can't happen" situation, so I have to investigate bugs in rsync,
> ssh, the kernel, the network, the disk.... bleah.

> But I'm pretty sure it's not a PostgreSQL problem.

> (My script exits with non-zero status if the SHA1s mismatch, and PostgreSQL
> re-archives the WAL a short time later, and that succeeds, so I'm happy
> for now.)

So, when it archives successfully the second time, which if either of
the two mismatched sha1's proves to have been correct?

(I'm still wondering about the possibility that the WAL file is changing
underneath you ...)

            regards, tom lane

Re: Warm standby problems: Followup

От
"David F. Skoll"
Дата:
Tom Lane wrote:

> So, when it archives successfully the second time, which if either of
> the two mismatched sha1's proves to have been correct?

The one on the master server (lines wrapped for readability).
"local" refers to the master server, and "remote" to the standby
server.

Oct 27 14:26:35 colo2vs1 canit-failover-wal-archive[29118]: Warning:
rsync succeeded, but local_sha1
1fe9fc62b2a05d21530decac1c5442969adc5819 !=
remote_sha1 4f9f8bcd151129db64acd05470f0f05954b56232 !!

Oct 27 14:26:52 colo2vs1 canit-failover-wal-archive[29161]: local_sha1 ==
remote_sha1 (1fe9fc62b2a05d21530decac1c5442969adc5819)

Oct 27 14:26:53 colo2vs1 canit-failover-wal-archive[29161]:
Successfully archived WAL file 000000010000001200000058 on
colo3vs1.roaringpenguin.com

However, the sha1 is taken after rsync exits, so it's unknown what
the local sha1 actually was while the rsync was running.  Maybe the
sha1 really was 4f9... on the master server until something changed
the file.

> (I'm still wondering about the possibility that the WAL file is changing
> underneath you ...)

Well, can PostgreSQL change the WAL file while the archive_command is
running?  I can't see anything on the backup server end changing that
copy.

Regards,

David.

Re: Warm standby problems: Followup

От
"Kevin Grittner"
Дата:
"David F. Skoll" <dfs@roaringpenguin.com> wrote:

> shared_buffers = 24MB

You should probably set that higher.  Different workloads favor
different settings, but if you've got any reasonable amount of RAM for
a modern machine, somewhere between 256MB and one-third of total RAM
is usually best.

> archive_command = '/usr/bin/wal_archive_command.pl %p'

It would probably be safer to pass in %f, too, and use it for the file
name rather than plucking off the last portion of the %p -- at a
minimum it might reduce the chances that you'd use that technique in
restore_command, where it flat out won't work.

> autovacuum = off

Be very careful if you choose to do this -- you've just made yourself
responsible for preventing bloat which can slowly strangle
performance.

There are a few setting which should almost always be overridden for
production use.  You might want to read this for a start:

http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server

None of those, however, seem likely to cause the problem you describe.
Back to blaming the OS.

On the other hand, it's possible that tuning these up might change the
timings such that you don't hit the OS bug so often.

-Kevin

Re: Warm standby problems: Followup

От
"David F. Skoll"
Дата:
Kevin Grittner wrote:
>> shared_buffers = 24MB

> You should probably set that higher.

Nah.  This machine is totally bored; tweaking PostgreSQL would be pointless
since it's so under-utilized.

>> archive_command = '/usr/bin/wal_archive_command.pl %p'

> It would probably be safer to pass in %f, too, and use it for the file
> name rather than plucking off the last portion of the %p -- at a
> minimum it might reduce the chances that you'd use that technique in
> restore_command, where it flat out won't work.

Nah.  Our Perl code is pretty robust; we use the File::Spec module to split
apart path names and that's safe.

The restore_command is robust (wrapped):

restore_command = '/usr/share/canit/failover/libexec/canit-failover-pg-standby.pl \
-s 2 -t /var/backups/postgres/initiate-failover /var/backups/postgres/wal/ %f %p %r'

and well-tested.

>> autovacuum = off

> Be very careful if you choose to do this -- you've just made yourself
> responsible for preventing bloat which can slowly strangle
> performance.

As I said, the server is completely under-utilized.  Nightly vacuum
takes about 7 minutes:

Oct 26 2009 21:56:53 DETAIL:  A total of 29184 page slots are in use (including overhead).
Oct 26 2009 21:56:53 29184 page slots are required to track all free space.

> There are a few setting which should almost always be overridden for
> production use.  You might want to read this for a start:

I doubt any of those settings is causing the problem.  The one setting
that I *do* suspect is the archive_timeout.  We don't have that set on
other systems, and they do not exhibit the problem.  I tried looking
at the PostgreSQL source to see if a log switch requested because of
archive_timeout could somehow result in the WAL file being changed
during execution of archive_command, but I'm completely unfamiliar
with the Pg source code and quickly got lost. :-(

I'm going to leave the system for a few days and see how many SHA1
mismatches I get.  Then I'm going to disable archive_timeout and see
if that changes anything.

Regards,

David.

Re: Warm standby problems: Followup

От
Tom Lane
Дата:
"David F. Skoll" <dfs@roaringpenguin.com> writes:
> Tom Lane wrote:
>> So, when it archives successfully the second time, which if either of
>> the two mismatched sha1's proves to have been correct?

> The one on the master server (lines wrapped for readability).
> However, the sha1 is taken after rsync exits, so it's unknown what
> the local sha1 actually was while the rsync was running.

Be worth taking the sha1 first to see if you get a different answer
about that, or maybe even doing it twice.

>> (I'm still wondering about the possibility that the WAL file is changing
>> underneath you ...)

> Well, can PostgreSQL change the WAL file while the archive_command is
> running?

It's not supposed to...

            regards, tom lane