Re: Reliable WAL file shipping over unreliable network

Поиск
Список
Период
Сортировка
От Laurenz Albe
Тема Re: Reliable WAL file shipping over unreliable network
Дата
Msg-id 1519852616.13006.7.camel@cybertec.at
обсуждение исходный текст
Ответ на Re: Reliable WAL file shipping over unreliable network  (Nagy László Zsolt <gandalf@shopzeus.com>)
Список pgsql-admin
Nagy László Zsolt wrote:
> > >  Do I have to copy
> > > segments to temp files, and rename them when they are fully flushed to
> > > disk? Or is it okay to have half complete files in the archive dir for a
> > > while?
> > 
> > I suppose you are talking about "archive_command" here.
> > 
> > If the file restored with "restore_command" is too small,
> > the operation fails, and you get a DEBUG1 message:
> > 
> >   archive file "..." has wrong size: ... instead of ...
> > 
> > So nothing can go wrong there.
> 
> Nothing can go wrong? Does it mean that PostgreSQL will re-execute the
> restore_command if the file was too small? If it won't retry the
> restore_command, then everything goes wrong. It might be documented
> somewhere, but apparently I'm out of luck with documentations.

To verify that restore_command won't accept a file that's too short,
read the code in in backend/access/transam/xlogarchive.c

The behavior for streaming replication with a WAL archive is documented here:
https://www.postgresql.org/docs/current/static/warm-standby.html#STANDBY-SERVER-OPERATION

 "At startup, the standby begins by restoring all WAL available in the archive
  location, calling restore_command. Once it reaches the end of WAL available
  there and restore_command fails, it tries to restore any WAL available in the
  pg_wal directory. If that fails, and streaming replication has been configured,
  the standby tries to connect to the primary server and start streaming WAL
  from the last valid record found in archive or pg_wal. If that fails or
  streaming replication is not configured, or if the connection is later
  disconnected, the standby goes back to step 1 and tries to restore the file
  from the archive again. This loop of retries from the archive, pg_wal, and
  via streaming replication goes on until the server is stopped or failover
  is triggered by a trigger file."

> > > And finally: if I also enable streaming replication, then it seems that
> > > log file shipping is not needed at all. If I omit archive_command and
> > > restore_command from the configs, and setup the replication slots and
> > > primary_conninfo only, then it seems to be working just fine. But when
> > > the network goes down for a while, then the slave goes out of sync and
> > > it cannot recover. It was not clear for me from the documentation, but
> > > am I right in that I can combine log file shipping with streaming
> > > replication, and achieve small replication delays plus the ability to
> > > recover after a longer period if network outage?
> > 
> > If you use a replication slot, the standby will never get out of sync
> > because the primary will retain all WAL that the standby has not
> > received yet.
> > 
> > Streaming replication together with archive recovery is only useful
> > if you are *not* using replication slots.
> 
> So you are saying that if I use replication slots, then I can completely
> forget about manual WAL file shipping.

Precisely.

>  There is one thing in the docs
> that contradicts the above statement.
> 
> This is from
> https://www.postgresql.org/docs/10/static/warm-standby.html#STREAMING-REPLICATION
> 
> > If you use streaming replication without file-based continuous
> > archiving, the server might recycle old WAL segments before the
> > standby has received them. If this occurs, the standby will need to be
> > reinitialized from a new base backup.

This only talks about the case where you do not use replication slots.

Read https://www.postgresql.org/docs/current/static/warm-standby.html#STREAMING-REPLICATION-SLOTS:

 "Replication slots provide an automated way to ensure that the master does not
  remove WAL segments until they have been received by all standbys, and that
  the master does not remove rows which could cause a recovery conflict even
  when the standby is disconnected."

The key word is "ensure".

Yours,
Laurenz Albe



В списке pgsql-admin по дате отправления:

Предыдущее
От: Andres Freund
Дата:
Сообщение: Re: postgresql 9.6 - cannot freeze committed xmax
Следующее
От: Alexandre Garcia
Дата:
Сообщение: Re: postgresql 9.6 - cannot freeze committed xmax