Re: pg_basebackup connection closed unexpectedly...

Поиск
Список
Период
Сортировка
От Mladen Marinović
Тема Re: pg_basebackup connection closed unexpectedly...
Дата
Msg-id CAHjkqPRwHXw56GYAPO65ZTQsdFXG7YQ2nJX2LFOhyq-A3eF9wg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: pg_basebackup connection closed unexpectedly...  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-general


On Wed, Feb 12, 2020 at 4:09 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Mladen Marinović <mladen.marinovic@kset.org> writes:
> Recently I am having some strange problems with pg_basebackup. About once a
> week the backup process ends with an error message like this:
> 2020-02-11 23:25:40 UTC [25790]: [1-1] user=replicator,db=[unknown] LOG:
>  could not send data to client: Connection reset by peer

Hmmm ....

> The problem started occurring after a hardware (RAM + SSD) upgrade and an
> OS Upgrade to Ubuntu 18.04. Both the server and backup process run in
> separate docker containers on the same machine. This happens randomly on
> multiple servers with the same configuration and it is probably not
> hardware related. Also, this happens evenly on 9.4 and 9.6, and using the
> same docker images that worked flawlessly on the previous installation.
> I have been investigating the issue for at least a month and found no
> problems in any log or metric before or after the event. I suspect that
> this is related to some OS/docker parameter that is not well configured.

How long does the backup run before failing?  If the connection were going
between different machines my suspicions would lean toward a network
timeout.  That seems somewhat unlikely in this configuration, but you
never know.

The backup started at 23:00, and it copied 363GB by the time the connection was closed. It usually takes about 2 hours for the entire database (cca. 1.1TB). I was also thinking that the problem could be network related, but the network is a virtual docker bridge network on a single machine, and the backup is usually ok. If it failed during other operations (as this is a production database) or during every backup it would be easier to see what the problem could be, but this is really annoyingly random.
 

> Would increasing the database log level give me any more info about what
> caused the connection to close?

Nope, not directly.  It might be useful to figure out whether data
transfer continues full throttle right up until the connection drop,
or whether it stops sooner (and then there's some sort of timeout
before the error occurs).

I can see that pg_basebackup has a verbose switch, but I am not sure it will report the stuff you mention. On the database, the log levels currently are:
client_min_messages = notice
log_min_messages = warning
log_min_error_statement = error
 
I assume that I should change the first two to at least debug1 to see something.


                        regards, tom lane

Regards,
Mladen Marinović

В списке pgsql-general по дате отправления:

Предыдущее
От: Michael Paquier
Дата:
Сообщение: Re: JIT on Windows with Postgres 12.1
Следующее
От: Jason Ralph
Дата:
Сообщение: pg_upgrade —link does it remove table bloat