Re: Duplicate history file?

Поиск
Список
Период
Сортировка
От Kyotaro Horiguchi
Тема Re: Duplicate history file?
Дата
Msg-id 20210603.215208.1092395816133977395.horikyota.ntt@gmail.com
обсуждение исходный текст
Ответ на Re: Duplicate history file?  (Tatsuro Yamada <tatsuro.yamada.tf@nttcom.co.jp>)
Ответы Re: Duplicate history file?  (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
Re: Duplicate history file?  (Tatsuro Yamada <tatsuro.yamada.tf@nttcom.co.jp>)
Список pgsql-hackers
At Tue, 01 Jun 2021 13:03:22 +0900, Tatsuro Yamada <tatsuro.yamada.tf@nttcom.co.jp> wrote in 
> Hi Horiguchi-san,
> 
> On 2021/05/31 16:58, Kyotaro Horiguchi wrote:
> > So, I started a thread for this topic diverged from the following
> > thread.
> > https://www.postgresql.org/message-id/4698027d-5c0d-098f-9a8e-8cf09e36a555@nttcom.co.jp_1
> > 
> >> So, what should we do for the user? I think we should put some notes
> >> in postgresql.conf or in the documentation. For example, something
> >> like this:
> > I'm not sure about the exact configuration you have in mind, but that
> > would happen on the cascaded standby in the case where the upstream
> > promotes. In this case, the history file for the new timeline is
> > archived twice.  walreceiver triggers archiving of the new history
> > file at the time of the promotion, then startup does the same when it
> > restores the file from archive.  Is it what you complained about?
> 
> 
> Thank you for creating a new thread and explaining this.
> We are not using cascade replication in our environment, but I think
> the situation is similar. As an overview, when I do a promote,
> the archive_command fails due to the history file.

Ah, I remembered that PG-REX starts a primary as a standby then
promotes it.

> I've created a reproduction script that includes building replication,
> and I'll share it with you. (I used Robert's test.sh as a reference
> for creating the reproduction script. Thanks)
> 
> The scenario (sr_test_historyfile.sh) is as follows.
> 
> #1 Start pgprimary as a main
> #2 Create standby
> #3 Start pgstandby as a standby
> #4 Execute archive command
> #5 Shutdown pgprimary
> #6 Start pgprimary as a standby
> #7 Promote pgprimary
> #8 Execute archive_command again, but failed since duplicate history
>    file exists (see pgstandby.log)

Ok, I clearly understood what you meant. (However, it is not the legit
state where a standby is running without the primary is running..)
Anyway the "test ! -f" can be problematic in the case.

> Note that this may not be appropriate if you consider it as a recovery
> procedure for replication configuration. However, I'm sharing it as it
> is
> because this seems to be the procedure used in the customer's
> environment (PG-REX).

Understood.

> Regarding "test ! -f",
> I am wondering how many people are using the test command for
> archive_command. If I remember correctly, the guide provided by
> NTT OSS Center that we are using does not recommend using the test
> command.

I think, as the PG-REX documentation says, the simple cp works well as
far as the assumption of PG-REX - no double failure happenes, and
following the instruction - holds.


On the other hand, I found that the behavior happens more generally.

If a standby with archive_mode=always craches, it starts recovery from
the last checkpoint. If the checkpoint were in a archived segment, the
restarted standby will fetch the already-archived segment from archive
then fails to archive it. (The attached).

So, your fear stated upthread is applicable for wider situations. The
following suggestion is rather harmful for the archive_mode=always
setting.

https://www.postgresql.org/docs/14/continuous-archiving.html
> The archive command should generally be designed to refuse to
> overwrite any pre-existing archive file. This is an important safety
> feature to preserve the integrity of your archive in case of
> administrator error (such as sending the output of two different
> servers to the same archive directory).

I'm not sure how we should treat this..  Since archive must store
files actually applied to the server data, just being already archived
cannot be the reason for omitting archiving.  We need to make sure the
new file is byte-identical to the already-archived version. We could
compare just *restored* file to the same file in pg_wal but it might
be too much of penalty for for the benefit. (Attached second file.)

Otherwise the documentation would need someting like the following if
we assume the current behavior.

> The archive command should generally be designed to refuse to
> overwrite any pre-existing archive file. This is an important safety
> feature to preserve the integrity of your archive in case of
> administrator error (such as sending the output of two different
> servers to the same archive directory).
+ For standby with the setting archive_mode=always, there's a case where
+ the same file is archived more than once.  For safety, it is
+ recommended that when the destination file exists, the archive_command
+ returns zero if it is byte-identical to the source file.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
# Copyright (c) 2021, PostgreSQL Global Development Group

#
# Tests related to WAL archiving and recovery.
#
use strict;
use warnings;
use PostgresNode;
use TestLib;
use Test::More tests => 1;
use Config;

my $backup_name='mybackup';

my $primary = get_new_node('primary');
$primary->init(
    has_archiving    => 1,
    allows_streaming => 1);
$primary->append_conf('postgresql.conf', qq[
wal_keep_size=128MB
archive_mode=always
log_checkpoints=yes

]);
my $primary_archive = $primary->archive_dir;
$primary->start;

$primary->backup($backup_name);
my $standby = get_new_node('standby');
my $standby_archive = $standby->archive_dir;
$standby->init_from_backup($primary, $backup_name, has_streaming=>1);
$standby->append_conf('postgresql.conf', qq[
restore_command='cp $primary_archive/%f  %p'
archive_command='test ! -f $standby_archive/%f && cp %p $standby_archive/%f'
]);
$standby->start;

$primary->psql('postgres', 'CHECKPOINT;SELECT pg_switch_wal();CREATE TABLE t(); pg_switch_wal();');
$standby->psql('postgres', 'CHECKPOINT');
$standby->stop('immediate');
$standby->start;

$primary->psql('postgres', 'CHECKPOINT;SELECT pg_switch_wal();CHECKPOINT');
$standby->psql('postgres', 'CHECKPOINT');

my $result;
while (1) {

    $result = 
      $standby->safe_psql('postgres',
                          "SELECT last_archived_wal, last_failed_wal FROM pg_stat_archiver");
    sleep(0.1);
    last  if ($result ne "|");
}

ok($result =~ /^[^|]+\|$/, 'archive check 1');
diff --git a/src/backend/access/transam/xlogarchive.c b/src/backend/access/transam/xlogarchive.c
index 26b023e754..037da0aa3d 100644
--- a/src/backend/access/transam/xlogarchive.c
+++ b/src/backend/access/transam/xlogarchive.c
@@ -382,6 +382,7 @@ KeepFileRestoredFromArchive(const char *path, const char *xlogfname)
 {
     char        xlogfpath[MAXPGPATH];
     bool        reload = false;
+    bool        skip_archive = false;
     struct stat statbuf;
 
     snprintf(xlogfpath, MAXPGPATH, XLOGDIR "/%s", xlogfname);
@@ -416,6 +417,56 @@ KeepFileRestoredFromArchive(const char *path, const char *xlogfname)
         /* same-size buffers, so this never truncates */
         strlcpy(oldpath, xlogfpath, MAXPGPATH);
 #endif
+        /*
+         * On a standby with archive_mode=always, there's the case where the
+         * same file is archived more than once. If the archive_command rejects
+         * overwriting, WAL-archiving won't go further than the file forever.
+         * Avoid duplicate archiving attempts when the file is known to have
+         * been archived and the content doesn't change.
+         */
+        if (XLogArchiveMode == ARCHIVE_MODE_ALWAYS &&
+            XLogArchiveCheckDone(xlogfname))
+        {
+            unsigned char srcbuf[XLOG_BLCKSZ];
+            unsigned char dstbuf[XLOG_BLCKSZ];
+            int fd1 = BasicOpenFile(path, O_RDONLY | PG_BINARY);
+            int fd2 = BasicOpenFile(oldpath, O_RDONLY | PG_BINARY);
+            uint32 i;
+            uint32 off = 0;
+
+            /*
+             * Compare the two files' contents.  We don't bother completing if
+             * something's wrong meanwhile.
+             */
+            for (i = 0 ; i < wal_segment_size / XLOG_BLCKSZ ; i++)
+            {
+                if (pg_pread(fd1, srcbuf, XLOG_BLCKSZ, (off_t) off)
+                    != XLOG_BLCKSZ)
+                    break;
+                
+                if (pg_pread(fd2, dstbuf, XLOG_BLCKSZ, (off_t) off)
+                    != XLOG_BLCKSZ)
+                    break;
+
+                if (memcmp(srcbuf, dstbuf, XLOG_BLCKSZ) != 0)
+                    break;
+
+                off += XLOG_BLCKSZ;
+            }
+
+            close(fd1);
+            close(fd2);
+            
+            if (i == wal_segment_size / XLOG_BLCKSZ)
+            {
+                skip_archive = true;
+
+                ereport(LOG,
+                        (errmsg ("log file \"%s\" have been already archived, skip archiving",
+                                 xlogfname)));
+            }
+        }
+
         if (unlink(oldpath) != 0)
             ereport(FATAL,
                     (errcode_for_file_access(),
@@ -430,7 +481,7 @@ KeepFileRestoredFromArchive(const char *path, const char *xlogfname)
      * Create .done file forcibly to prevent the restored segment from being
      * archived again later.
      */
-    if (XLogArchiveMode != ARCHIVE_MODE_ALWAYS)
+    if (XLogArchiveMode != ARCHIVE_MODE_ALWAYS || skip_archive)
         XLogArchiveForceDone(xlogfname);
     else
         XLogArchiveNotify(xlogfname);

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Peter Eisentraut
Дата:
Сообщение: Re: Support for CREATE MODULE?
Следующее
От: Bharath Rupireddy
Дата:
Сообщение: Re: Are we missing (void) when return value of fsm_set_and_search is ignored?