Re: BUG #15638: pg_basebackup with --wal-method=stream incorrectlygenerates WAL segment created during backup

Поиск
Список
Период
Сортировка
От Andre Piwoni
Тема Re: BUG #15638: pg_basebackup with --wal-method=stream incorrectlygenerates WAL segment created during backup
Дата
Msg-id CAEC-+VH5PrQyrVHZU-t55xfzUwhZ=XSFGtt6M13Aai8rqNMkcQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: BUG #15638: pg_basebackup with --wal-method=stream incorrectlygenerates WAL segment created during backup  (Michael Paquier <michael@paquier.xyz>)
Ответы Re: BUG #15638: pg_basebackup with --wal-method=stream incorrectlygenerates WAL segment created during backup  (Michael Paquier <michael@paquier.xyz>)
Список pgsql-bugs
The reason you may not see a lot of people complaining about fetch vs. stream mode while using pg_basebackup seems to be largely related to the way many people implement failover. Mainly, when you look at majority of all the failover scripts online when failover happens these scripts take pg_basebackup (directly or via repmgr) on slave(s) as opposed to using WAL archiving+streaming for recovery which is extremely inefficient for large databases. See my point here: https://www.pgpool.net/pipermail/pgpool-general/2019-February/006464.html This is ridiculous given WAL-based system, but I suspect a lot of people are bumping into the same issue given that you mentioned this behavior has not changed.

Also, what do you mean that my archive command is not safe? It is straight from PostrgreSQL docs and other resources:

PostgreSQL docs do not mandate separate location for WAL archive for each server. There's a mention of making sure that when cleaning up WAL archive location that all slaves processed segments which makes sense. I'm not going to use replication slots either because this requires very sophisticated monitoring and maintenance to ensure database space does not blow up.

So far I have no problems doing multiple failovers when creating slaves with base backup in fetch mode but your response is not re-assuring. The only downside that I see from docs is that primary may have removed WAL segment during heavy load (default max_wal_size is 1GB) but then pg_basebackup would fail.

It seems you agree that segment created during pg_basebackup with stream mode, which is not marked with .done extension, would be attempted to be archived again. I also understand stream method during backup is default so changing this behavior may have an impact. If there's any free solution you could recommend for backup strategy please let me know.

Thanks,
Andre Piwoni

On Tue, Feb 19, 2019 at 7:50 PM Michael Paquier <michael@paquier.xyz> wrote:
On Tue, Feb 19, 2019 at 09:25:57AM -0800, Andre Piwoni wrote:
> I call pg_ctl -D /var/lib/pgsql/10/data promote to upgrade slave to master
> when failover happens. archive_mode is set to "on" and not "always".
> I repoint slave to the master by stopping it, updating recovery.conf and
> restarting it. Let me know if I'm doing it wrong.

As long as you stop the primary cleanly (stop or smart mode) so as the
primary has the possibility to send its shutdown checkpoint record to
the standby and makes sure that the standby has flushed the record,
that's safe.

> I think this problem is created before promotion when new slave is created
> using pg_basebackup with --wal-method=stream and manifests when actual
> promotion happens.
> What I'm trying to say it does not seem that .partial extension is the
> issue here but lack of .done extension.

Well, sure.  If you begin by reusing an old backup, you have a risk to
potentially archive the same segment multiple times if you use the
same archive location for all your servers.  Since 9.5 this can get
even more complication as archive_mode has gained an "always" mode
which makes also standbys archive segments while in recovery to give
the users a switch for more archiving redundancy, which is useful when
working with async standbys across multiple sites.  My point is that
this stuff has always worked this way.  And people around do not
actually complain about the difference made for archive_status/ when
using the stream of fetch methods with pg_basebackup.  From what I can
see as well, your archive_command is actually unsafe on many points,
so my take is that you should more carefully design it, or rely on an
existing backup solution developed by experts in PostgreSQL backups.
And no, it is not safe to change a behavior that other people may rely
heavily on for their solutions since pg_basebackup got smarter with
its stream mode.
--
Michael


--

Andre Piwoni

Sr. Software Developer, BI/Database

WebMD Health Services

Mobile: 801.541.4722

www.webmdhealthservices.com

В списке pgsql-bugs по дате отправления:

Предыдущее
От: "David G. Johnston"
Дата:
Сообщение: Re: BUG #15646: Inconsistent behavior for current_setting/set_config
Следующее
От: Jerry Sievert
Дата:
Сообщение: Re: BUG #15646: Inconsistent behavior for current_setting/set_config