Re: [HACKERS] gitlab post-mortem: pg_basebackup waiting for checkpoint

Поиск
Список
Период
Сортировка
От Magnus Hagander
Тема Re: [HACKERS] gitlab post-mortem: pg_basebackup waiting for checkpoint
Дата
Msg-id CABUevExpVYuLUgoNgYNNHxFmZqo3PuuaKgcVwYE5B5wCGScZkQ@mail.gmail.com
обсуждение исходный текст
Ответ на [HACKERS] gitlab post-mortem: pg_basebackup waiting for checkpoint  (Michael Banck <michael.banck@credativ.de>)
Ответы Re: [HACKERS] gitlab post-mortem: pg_basebackup waiting forcheckpoint  (Michael Banck <michael.banck@credativ.de>)
Список pgsql-hackers


On Sat, Feb 11, 2017 at 10:38 AM, Michael Banck <michael.banck@credativ.de> wrote:
Hi,

one take-away from the Gitlab Post-Mortem[1] appears to be that after
their secondary lost replication, they were confused about what
pg_basebackup was doing when they tried to rebuild it. It just sat there
and did nothing (even with --verbose), so they assumed something was
wrong with either the primary or the connection, and restarted it
several times.

AFAICT, it turns out the checkpoint was written on the master (they
probably did not use -c fast), but this wasn't obvious to them:


Yeah, I've seen this happen to a number of people. I think that sounds like what's happened here as well. I've considered things in the line of the patch you posted, but never got around to actually doing anything about it.



ISTM that even with WAL streaming, nothing would be written on the
client server until the checkpoint is complete, as do_pg_start_backup()
runs the checkpoint and only returns the starting WAL location
afterwards.

The attached (untested) patch is to kick of a discussion on how to
improve the situation, it is supposed to mention the checkpoint when
--verbose is used and adds a paragraph about the checkpoint being run to
the Notes section of the documentation.


Docs look good to me, other than claiming that pg_basebackup runs on a server (it can run anywhere). I would just say "during which pg_basebackup will appear idle". How does that sound to you?

As for the code, while I haven't tested it, isn't the "checkpoint completed" message in the wrong place? Doesn't PQsendQuery() complete immediately, and the check needs to be put *after* the PQgetResult() call?

--

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Michael Banck
Дата:
Сообщение: [HACKERS] gitlab post-mortem: pg_basebackup waiting for checkpoint
Следующее
От: Erik Rijkers
Дата:
Сообщение: Re: [HACKERS] Logical replication existing data copy