Re: [GENERAL] WAL Archive command.

Поиск
Список
Период
Сортировка
От Scott Mead
Тема Re: [GENERAL] WAL Archive command.
Дата
Msg-id CAKq0gvKbrqfTLgF9mcELybhj_Jtf15Qb3PqomyBBS=pY4wn3Og@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [GENERAL] WAL Archive command.  (Jerry Sievers <gsievers19@comcast.net>)
Ответы Re: [GENERAL] WAL Archive command.  (John Britto <john@sodexis.com>)
Список pgsql-general


On Wed, Sep 27, 2017 at 2:55 PM, Jerry Sievers <gsievers19@comcast.net> wrote:
John Britto <john@sodexis.com> writes:

> Hello,
>
> I have a streaming replication setup along with WAL archive.
>
> archive_command = ‘test ! -f /var/pg_archive/%f && cp %p <archive
> location>%f && scp %p postgres@192.168.0.123:<archive location>/%f'
>
> When the SCP command fails, the master repeatedly tries to send the
> archived WAL to standby. But during this time, the pg_xlog directly
> grew with newer WAL files.
>
> The streaming replication hasn't had the problem because on my check,
> the WAL write location on the primary was same with the last WAL
> location received/replayed in standby.
>
> Since the pg_xlog in the master had few newer WAL files, the master
> archive is lagging to pick the current pg_xlog WAL file.  When a new
> WAL occur in the pg_xlog, Master picks the old WAL file to send to
> the standby.

Yup Pg is going to handle the unshipped WALs one at a time and it will
do them in order, oldest (lowest file name) first.

> How should I force the PostgreSQL to batch copy the lagging WAL files
> to pg_archive and then send to standby.  Can I do this manually using
> rsync?  I wonder how PostgreSQL knows the changes because it
> maintains info in archive_status with extension as .ready and .done.

I suggest you fix your basic archiving routine to complete and exit with
success to postgres.

+1

scp %p host:/archive/%f
if [ "$?" -ne 0 ]
then
    echo "`date`:FAILED:%p" >> /var/log/failed_xlog.log
    cp %p /localarchive/%f
    exit 0
fi

  Essentially, always make sure that you are returning a 0 to postgres.  If there is a failure, either log it or handle it separately.  This code snippet is NOT COMPLETE, there's a lot more to do in order to make it production ready and recoverable.  The biggest issue I've had with scp is that you have to set and enforce a timeout and trap the timeout.  Note, the above only works until your local machine (or the /localarchive partition) runs out of space.  It's really important that you have ultra solid logging and alerting around this.




And as for archive command scripts in general, simpler is better.

If you want to manually ship them in bulk, you may do so but then will
need to remove the corresponding archive_status/$foo.ready file so that
postgres won't keep trying to ship the same one.

I'm a huge fan of this strategy, especially if you're sending to a remote datacenter.  

archive_command.sh:

cp %p /localarchive/%f
if [ "$?" -ne 0 ]
then
    echo "`date`:FAILED:%p" >> /var/log/failed_xlog.log
    exit 0
fi

send_archive_to_remote.sh
rsync -avzP /localarchive/* host:/archive/


Of course, now you have to deal with files that are removed from the slave and making sure they get removed from the master appropriately, but, this is fairly straightforward.

--Scott




HTH

> Please assist.
>
> Thanks,
>
> John Britto
>
>
>

--
Jerry Sievers
Postgres DBA/Development Consulting
e: postgres.consulting@comcast.net
p: 312.241.7800


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general



--
--
Scott Mead
Sr. Architect
OpenSCG

В списке pgsql-general по дате отправления:

Предыдущее
От: Jerry Sievers
Дата:
Сообщение: Re: [GENERAL] WAL Archive command.
Следующее
От: Scott Mead
Дата:
Сообщение: Re: [GENERAL] Rsync to a recovering streaming replica?