Re: Standby trying "restore_command" before local WAL

Поиск
Список
Период
Сортировка
От Alexander Kukushkin
Тема Re: Standby trying "restore_command" before local WAL
Дата
Msg-id CAFh8B==eaXBUe6F6FC1kjZ6cgQLCPzhS8okLQg3mFsDBkkikLA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Standby trying "restore_command" before local WAL  (Stephen Frost <sfrost@snowman.net>)
Ответы Re: Standby trying "restore_command" before local WAL
Список pgsql-hackers
Hi,

2018-07-31 20:25 GMT+02:00 Stephen Frost <sfrost@snowman.net>:
>
>
> There's still a question here, at least from my perspective, as to which
> is actually going to be faster to perform recovery based off of.  A good
> restore command, which pre-fetches the WAL in parallel and gets it local
> and on the same filesystem, meaning that the restore_command only has to
> execute essentially a 'mv' and return back to PG for the next WAL file,
> is really rather fast, compared to streaming that same data over the
> network with a single TCP connection to the primary.  Of course, there's
> a lot of variables there and it depends on the network speed between the
> various pieces, but I've certainly had cases where a replica catches up
> much faster using restore command than streaming from the primary.


Sure, mv is incredibly fast, but not calling external script/binary at
all is still faster than calling it.

What about the following cases?
1. replica host crashed, and in pg_wal we have a few thousands WAL files.
2. we are creating a new replica with pg_basebackup -X stream, it
takes a long time and again leaves a few thousands WAL files.

In both cases, if there is no restore_command in the recovery.conf,
postgres will happily read WAL files from pg_wal and only when there
is nothing left it will try to start streaming.

But, if restore_command is defined, it will always call the
restore_command, for every single WAL file it wants to restore.
If the restore_command exits with non zero exit code, postgres is
happily restoring the file from pg_wal!
And, only if the file is not there or not valid, postgres is trying to
start streaming.

From my point of view, there is no difference between having no
restore_command and relying only on streaming replication and having
the restore_comman which always fails.
Therefore I don't really understand why we stick to the
"restore_command => pg_wal => streaming" and why it is not possible to
change it to "pg_wal => restore_command => streaming" or maybe even
(pg_wal => streaming => restore_command).
I am not sure about the last option, but in any case. before going to
some remote place, postgres should try to find (and try to replay) the
WAL file in the pg_wal.

Regards,
--
Alexander Kukushkin


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tomas Vondra
Дата:
Сообщение: Re: [PATCH] Improve geometric types
Следующее
От: Robert Haas
Дата:
Сообщение: Re: partition tree inspection functions