Обсуждение: pg_basebackup fails: could not receive data from WAL stream: server closedthe connection unexpectedly

Поиск
Список
Период
Сортировка

pg_basebackup fails: could not receive data from WAL stream: server closedthe connection unexpectedly

От
AYahorau@ibagroup.eu
Дата:
Hello PostgreSQL Community!


Not so long ago I faced the problem of database synchronization using pg_basebackup utility on linux SLES 12 machine using PostgreSQL 10.4:

pg_basebackup -h host01 -U dbuser  -D /var/PostgresDb   -w
LOG:  standby "pg_basebackup" is now a synchronous standby with priority 1
pg_basebackup: could not receive data from WAL stream: server closed the connection unexpectedly
                This probably means the server terminated abnormally
                before or
while processing the request.
pg_basebackup: child process exited with error 1
pg_basebackup: removing contents of data directory
"/var/RtpPostgresDb"


PostgreSQL log of master server contain the following error entry:
terminating walsender process due to replication timeout

Currently my configuration is the following:
wal_sender_timeout = 1s
wal_receiver_timeout = 1s
wal_receiver_status_interval = 10s

I see that some people faced this situation:
https://www.postgresql.org/message-id/109968546.289.1460409481048.JavaMail.pbrunnen%40Station8.local
The suggestion was increasing wal_sender_timeout  parameter.


On the one hand I understand that  if my database gets bigger it will require bigger value of wal_sender_timeout. On the other hand I use wal_sender_timeout = 1s on purpose in  order to detect if network bad.

Could you please suggest an appropriate configuration, sensible relations between configuration parameters or any other approach of database synchronization which overcomes this issue?

Thank you in advance,
Andrei Yahorau

Re: pg_basebackup fails: could not receive data from WAL stream:server closed the connection unexpectedly

От
Shreeyansh Dba
Дата:
Hi Andrei Yahorau,

As it is difficult to give an exact recommendation, however, We have enabled and tested your exact error with the same parameters in our testing environment, we received the same error and able to resolve this problem by increasing 'wal_sender_timeout' parameter value.

I think you need to test and verify before implementing it on your server according to your business, configuration requirements.




On Mon, Dec 3, 2018 at 2:17 PM <AYahorau@ibagroup.eu> wrote:
Hello PostgreSQL Community!


Not so long ago I faced the problem of database synchronization using pg_basebackup utility on linux SLES 12 machine using PostgreSQL 10.4:

pg_basebackup -h host01 -U dbuser  -D /var/PostgresDb   -w
LOG:  standby "pg_basebackup" is now a synchronous standby with priority 1
pg_basebackup: could not receive data from WAL stream: server closed the connection unexpectedly
                This probably means the server terminated abnormally
                before or
while processing the request.
pg_basebackup: child process exited with error 1
pg_basebackup: removing contents of data directory
"/var/RtpPostgresDb"


PostgreSQL log of master server contain the following error entry:
terminating walsender process due to replication timeout

Currently my configuration is the following:
wal_sender_timeout = 1s
wal_receiver_timeout = 1s
wal_receiver_status_interval = 10s

I see that some people faced this situation:
https://www.postgresql.org/message-id/109968546.289.1460409481048.JavaMail.pbrunnen%40Station8.local
The suggestion was increasing wal_sender_timeout  parameter.


On the one hand I understand that  if my database gets bigger it will require bigger value of wal_sender_timeout. On the other hand I use wal_sender_timeout = 1s on purpose in  order to detect if network bad.

Could you please suggest an appropriate configuration, sensible relations between configuration parameters or any other approach of database synchronization which overcomes this issue?

Thank you in advance,
Andrei Yahorau

Re: pg_basebackup fails: could not receive data from WAL stream: serverclosed the connection unexpectedly

От
AYahorau@ibagroup.eu
Дата:
Hi,
Thank you very much for the response.

I reckon we can return to more conventional approach of postgres db synchronization:
1) SELECT pg_start_backup('label', true);
2) rsync/cp  $PGDATA directory;
3) SELECT pg_stop_backup();


In my opinion this approach looks better because it does not depend on wal_sender_timeout value.

I have a question. What is your opinion about pg_basebackup utility and its behaviour for this condition?  Is it a bug? Should it be fixed?

Best regards,
Andrei



From:        Shreeyansh Dba <shreeyansh2014@gmail.com>
To:        AYahorau@ibagroup.eu,
Cc:        pgsql-admin <pgsql-admin@postgresql.org>, MikalaiKeida@ibagroup.eu
Date:        03/12/2018 15:15
Subject:        Re: pg_basebackup fails: could not receive data from WAL stream: server closed the connection unexpectedly




Hi Andrei Yahorau,

As it is difficult to give an exact recommendation, however, We have enabled and tested your exact error with the same parameters in our testing environment, we received the same error and able to resolve this problem by increasing 'wal_sender_timeout' parameter value.

I think you need to test and verify before implementing it on your server according to your business, configuration requirements.




On Mon, Dec 3, 2018 at 2:17 PM <AYahorau@ibagroup.eu> wrote:
Hello PostgreSQL Community!


Not so long ago I faced the problem of database synchronization using pg_basebackup utility on linux SLES 12 machine using PostgreSQL 10.4:


pg_basebackup -h host01 -U dbuser  -D /var/PostgresDb   -w

LOG:  standby
"pg_basebackup" is now a synchronous standby with priority 1
pg_basebackup: could not receive data from WAL stream: server closed the connection unexpectedly
                This probably means the server terminated abnormally
                before or
while processing the request.
pg_basebackup: child process exited with error 1
pg_basebackup: removing contents of data directory
"/var/RtpPostgresDb"


PostgreSQL log of master server contain the following error entry:

terminating walsender process due to replication timeout


Currently my configuration is the following:

wal_sender_timeout = 1s

wal_receiver_timeout = 1s

wal_receiver_status_interval = 10s


I see that some people faced this situation:

https://www.postgresql.org/message-id/109968546.289.1460409481048.JavaMail.pbrunnen%40Station8.local
The suggestion was increasing wal_sender_timeout  parameter.


On the one hand I understand that  if my database gets bigger it will require bigger value of wal_sender_timeout. On the other hand I use wal_sender_timeout = 1s on purpose in  order to detect if network bad.


Could you please suggest an appropriate configuration, sensible relations between configuration parameters or any other approach of database synchronization which overcomes this issue?


Thank you in advance,
Andrei Yahorau

Re: pg_basebackup fails: could not receive data from WAL stream:server closed the connection unexpectedly

От
Achilleas Mantzios
Дата:
On 4/12/18 2:48 μ.μ., AYahorau@ibagroup.eu wrote:
Hi,
Thank you very much for the response.

I reckon we can return to more conventional approach of postgres db synchronization:
1) SELECT pg_start_backup('label', true);
2) rsync/cp  $PGDATA directory;
3) SELECT pg_stop_backup();


In my opinion this approach looks better because it does not depend on wal_sender_timeout value.


But it depends on wal archiving to a safe location. Do you have this enabled, working and tested?

I have a question. What is your opinion about pg_basebackup utility and its behaviour for this condition?  Is it a bug? Should it be fixed?

Best regards,
Andrei



From:        Shreeyansh Dba <shreeyansh2014@gmail.com>
To:        AYahorau@ibagroup.eu,
Cc:        pgsql-admin <pgsql-admin@postgresql.org>, MikalaiKeida@ibagroup.eu
Date:        03/12/2018 15:15
Subject:        Re: pg_basebackup fails: could not receive data from WAL stream: server closed the connection unexpectedly




Hi Andrei Yahorau,

As it is difficult to give an exact recommendation, however, We have enabled and tested your exact error with the same parameters in our testing environment, we received the same error and able to resolve this problem by increasing 'wal_sender_timeout' parameter value.

I think you need to test and verify before implementing it on your server according to your business, configuration requirements.




On Mon, Dec 3, 2018 at 2:17 PM <AYahorau@ibagroup.eu> wrote:
Hello PostgreSQL Community!


Not so long ago I faced the problem of database synchronization using pg_basebackup utility on linux SLES 12 machine using PostgreSQL 10.4:


pg_basebackup -h host01 -U dbuser  -D /var/PostgresDb   -w

LOG:  standby
"pg_basebackup" is now a synchronous standby with priority 1
pg_basebackup: could not receive data from WAL stream: server closed the connection unexpectedly
                This probably means the server terminated abnormally
                before or
while processing the request.
pg_basebackup: child process exited with error 1
pg_basebackup: removing contents of data directory
"/var/RtpPostgresDb"


PostgreSQL log of master server contain the following error entry:

terminating walsender process due to replication timeout


Currently my configuration is the following:

wal_sender_timeout = 1s

wal_receiver_timeout = 1s

wal_receiver_status_interval = 10s


I see that some people faced this situation:

https://www.postgresql.org/message-id/109968546.289.1460409481048.JavaMail.pbrunnen%40Station8.local
The suggestion was increasing wal_sender_timeout  parameter.


On the one hand I understand that  if my database gets bigger it will require bigger value of wal_sender_timeout. On the other hand I use wal_sender_timeout = 1s on purpose in  order to detect if network bad.


Could you please suggest an appropriate configuration, sensible relations between configuration parameters or any other approach of database synchronization which overcomes this issue?


Thank you in advance,
Andrei Yahorau



-- 
Achilleas Mantzios
IT DEV Lead
IT DEPT
Dynacom Tankers Mgmt

Re: pg_basebackup fails: could not receive data from WAL stream:server closed the connection unexpectedly

От
Stephen Frost
Дата:
Greetings,

* AYahorau@ibagroup.eu (AYahorau@ibagroup.eu) wrote:
> Not so long ago I faced the problem of database synchronization using
> pg_basebackup utility on linux SLES 12 machine using PostgreSQL 10.4:
>
> pg_basebackup -h host01 -U dbuser  -D /var/PostgresDb   -w
> LOG:  standby "pg_basebackup" is now a synchronous standby with priority 1
> pg_basebackup: could not receive data from WAL stream: server closed the
> connection unexpectedly
>                  This probably means the server terminated abnormally
>                  before or while processing the request.
> pg_basebackup: child process exited with error 1
> pg_basebackup: removing contents of data directory "/var/RtpPostgresDb"
>
>
> PostgreSQL log of master server contain the following error entry:
> terminating walsender process due to replication timeout
>
> Currently my configuration is the following:
> wal_sender_timeout = 1s
> wal_receiver_timeout = 1s
> wal_receiver_status_interval = 10s

pg_basebackup has a status-interval option (called statusint in older
versions).  Have you tried setting that option...?

Also, seems like you'd really need to also be running pg_receivewal...
At some point the pg_basebackup will end and disconnect.  Or you could
run a replica instead of using pg_basebackup/pg_receivewal and then
take regular backups and archive your WAL with something like
pgbackrest.

Thanks!

Stephen

Вложения

Re: pg_basebackup fails: could not receive data from WAL stream:server closed the connection unexpectedly

От
Stephen Frost
Дата:
Greetings,

* AYahorau@ibagroup.eu (AYahorau@ibagroup.eu) wrote:
> I reckon we can return to more conventional approach of postgres db
> synchronization:
> 1) SELECT pg_start_backup('label', true);
> 2) rsync/cp  $PGDATA directory;
> 3) SELECT pg_stop_backup();

It doesn't seem clear what the goal here is- if you are looking to have
two DB servers that are synchronized, then using pg_basebackup to get
the initial copy and then running PostgreSQL as a replica would be the
right approach.

I certainly wouldn't recommend trying to hack together something with
rsync or cp or using the exclusive backup mode at all- if the system
crashes when that exclusive backup is happening, the database won't come
back up.

> I have a question. What is your opinion about pg_basebackup utility and
> its behaviour for this condition?  Is it a bug? Should it be fixed?

No, I don't see any bug here, but if you adjust the timeout values on
the server then you need to tell pg_basebackup to send messages to the
server more frequently or it's going to get timed out.  That's what the
--status-interval option in pg_basebackup is for.

Thanks!

Stephen

Вложения

Re: pg_basebackup fails: could not receive data from WAL stream: serverclosed the connection unexpectedly

От
AYahorau@ibagroup.eu
Дата:
Thanks Stephen,

Is there any relation/dependency between status-interval  and wal_sender_timeot.?


I  am asking this because even if I set  status-interval for pg_basebackup to 1 second( the most frequent feedback) I get the same error:
pg_basebackup: could not receive data from WAL stream: server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
pg_basebackup: child process exited with error 1

because the server terminates wal_sender due to replication timeout.

Best regards,
Andrei



From:        Stephen Frost <sfrost@snowman.net>
To:        AYahorau@ibagroup.eu,
Cc:        Shreeyansh Dba <shreeyansh2014@gmail.com>, MikalaiKeida@ibagroup.eu, pgsql-admin <pgsql-admin@postgresql.org>
Date:        04/12/2018 17:05
Subject:        Re: pg_basebackup fails: could not receive data from WAL stream: server closed the connection unexpectedly




Greetings,

* AYahorau@ibagroup.eu (AYahorau@ibagroup.eu) wrote:
> I reckon we can return to more conventional approach of postgres db
> synchronization:
> 1) SELECT pg_start_backup('label', true);
> 2) rsync/cp  $PGDATA directory;
> 3) SELECT pg_stop_backup();

It doesn't seem clear what the goal here is- if you are looking to have
two DB servers that are synchronized, then using pg_basebackup to get
the initial copy and then running PostgreSQL as a replica would be the
right approach.

I certainly wouldn't recommend trying to hack together something with
rsync or cp or using the exclusive backup mode at all- if the system
crashes when that exclusive backup is happening, the database won't come
back up.

> I have a question. What is your opinion about pg_basebackup utility and
> its behaviour for this condition?  Is it a bug? Should it be fixed?

No, I don't see any bug here, but if you adjust the timeout values on
the server then you need to tell pg_basebackup to send messages to the
server more frequently or it's going to get timed out.  That's what the
--status-interval option in pg_basebackup is for.

Thanks!

Stephen
[attachment "signature.asc" deleted by Andrei Yahorau/IBA]

Re: pg_basebackup fails: could not receive data from WAL stream:server closed the connection unexpectedly

От
Stephen Frost
Дата:
Greetings,

* AYahorau@ibagroup.eu (AYahorau@ibagroup.eu) wrote:
> Is there any relation/dependency between status-interval  and
> wal_sender_timeot.?

Yes, if pg_basebackup doesn't ping the server with a status interval
within wal_sender_timeout amount of time then the server is going to
think it's disappeared.

> I  am asking this because even if I set  status-interval for pg_basebackup
> to 1 second( the most frequent feedback) I get the same error:

Sure- if they're both set to 1s, you're likely to still see the issue.
Networks take time and systems can end up being busy, so having such a
very tight timeout is, frankly, unlikely to work out all that well for
you regardless.

You could try setting wal_sender_timeout to 2s or maybe 5s and see if
that works better.

Thanks!

Stephen

Вложения

Re: pg_basebackup fails: could not receive data from WAL stream: serverclosed the connection unexpectedly

От
AYahorau@ibagroup.eu
Дата:
Thanks for the suggestion.

I understand that wal_sender_timeout should be increased. Experience  has shown that for status-interval=1s it is necessary to configure wal_sender_timeout to 15 seconds.


Nevertheless I found out that if I call
pg_basebackup -h host01 -U dbuser  -D /var/PostgresDb   --wal-method=fetch -w
(using fetch method instead of default stream method),

pg_basebackup completes successfully.

Best regards,
Andrei



From:        Stephen Frost <sfrost@snowman.net>
To:        AYahorau@ibagroup.eu,
Cc:        MikalaiKeida@ibagroup.eu, pgsql-admin <pgsql-admin@postgresql.org>, Shreeyansh Dba <shreeyansh2014@gmail.com>
Date:        04/12/2018 18:38
Subject:        Re: pg_basebackup fails: could not receive data from WAL stream: server closed the connection unexpectedly




Greetings,

* AYahorau@ibagroup.eu (AYahorau@ibagroup.eu) wrote:
> Is there any relation/dependency between status-interval  and
> wal_sender_timeot.?

Yes, if pg_basebackup doesn't ping the server with a status interval
within wal_sender_timeout amount of time then the server is going to
think it's disappeared.

> I  am asking this because even if I set  status-interval for pg_basebackup
> to 1 second( the most frequent feedback) I get the same error:

Sure- if they're both set to 1s, you're likely to still see the issue.
Networks take time and systems can end up being busy, so having such a
very tight timeout is, frankly, unlikely to work out all that well for
you regardless.

You could try setting wal_sender_timeout to 2s or maybe 5s and see if
that works better.

Thanks!

Stephen
[attachment "signature.asc" deleted by Andrei Yahorau/IBA]