Обсуждение: replication terminated by primary server

Поиск
Список
Период
Сортировка

replication terminated by primary server

От
Bruyninckx Kristof
Дата:

 

In our environment, we have a master slave replication setup that has been working stable for the last year.

Host systems are debian Jessie and we are using postgres 9.4.

Now recently we have experienced a crash/hung master, and after restarting the postgress services on here the replication stopped working. The master however is running seemingly normal, except for the errors reported when it got restarted. After this nothing error related is reported.

 

[10192-1] [unknown]@[unknown] LOG: incomplete startup packet

[10222-1] [unknown]@[unknown] LOG: incomplete startup packet

[10033-2] LOG: replication terminated by primary server

[10033-3] DETAIL: End of WAL reached on timeline 2 at 999/A5687790.

[1082-12] LOG: invalid record length at 999/A5687790

[10239-1] LOG: started streaming WAL from primary at 999/A5000000 on timeline 2

[1064-7] LOG: startup process (PID 1082) exited with exit code 1

[1064-8] LOG: terminating any other active server processes

[18749-1] readonly@pal WARNING: terminating connection because of crash of another server process

[25793-1] _readonly@pal WARNING: terminating connection because of crash of another server process

 

 

After a recent crash of the postgres master I'm not able to get the slave to start replicating.

 

I always get the following error message

 

13247-2] HINT:  Future log output will go to log destination "syslog".

[13247-3] LOCATION:  PostmasterMain, postmaster.c:1228

[13248-1] LOG:  00000: database system was interrupted while in recovery at log time 2017-12-04 15:10:29 CET

[13248-2] HINT:  If this has occurred more than once some data might be corrupted and you might need to choose an earlier recovery target.

[13248-3] LOCATION:  StartupXLOG, xlog.c:6134

[13248-4] LOG:  00000: entering standby mode

[13248-5] LOCATION:  StartupXLOG, xlog.c:6203

[13247-4] LOG:  00000: startup process (PID 13248) exited with exit code 1

[13247-5] LOCATION:  LogChildExit, postmaster.c:3452

[13247-6] LOG:  00000: aborting startup due to startup process failure

 

I’ve already tried to perform a complete backup and resync procedure on the slave

pg_basebackup -D /var/lib/postgresql/backups/fullbackup -R -h <IP> --checkpoint=fast --username=<username> --xlog-method=stream

 

Which completes without any error message. The odd thing is that the backup folder does already contains a recovery.done file. When I do the same command on a test platform this recovery.done is not created.

But the test is using 9.5. Not sure it is related.

 

Also the recovery.conf contains all the information is should but still the error message stays the same.

cat recovery.conf

recovery_target_timeline='latest'

standby_mode = 'on'

primary_conninfo = 'user=<user>  password=<passwd> host=IP port=5432 sslmode=prefer sslcompression=1 krbsrvname=postgres'

 

Does this mean that the corruption is on the master system and it needs to be restored to a point before it crashed ? Not sure what I can do to get the replication working again ?

Any ideas ?

 

Kind Regards,

 

Kristof

 

 

 

Met vriendelijke groeten / Meilleures salutations / Best regards

Kristof Bruyninckx
System Engineer

 

E  Kristof.Bruyninckx@cegeka.com 
M +32 473 33 50 67

 

CEGEKA Universiteitslaan 9
B-3500 Hasselt, Belgium
T +32 11 24 02 34
WWW.CEGEKA.COM 

LinkedIn

Twitter

Facebook

Youtube

 

Re: replication terminated by primary server

От
Payal Singh
Дата:


On Wed, Dec 6, 2017 at 8:07 AM, Bruyninckx Kristof <Kristof.Bruyninckx@cegeka.com> wrote:

 

In our environment, we have a master slave replication setup that has been working stable for the last year.

Host systems are debian Jessie and we are using postgres 9.4.

Now recently we have experienced a crash/hung master, and after restarting the postgress services on here the replication stopped working. The master however is running seemingly normal, except for the errors reported when it got restarted. After this nothing error related is reported.


Might want to check the master postgres logs during and after crash as well. Also, check for wal file progress on master (select * from pg_stat_archiver). 

--
Payal Singh
Graduate Student
Department of Computer Science and Electrical Engineering
University of Maryland, Baltimore County

RE: replication terminated by primary server

От
Bruyninckx Kristof
Дата:

I’ve been going over the log of the system at the time of the crash, but I’m not seeing something that stands out as telling me anymore about the reason of either the crash or the failure to start the replication again.  I’m attaching a part of the log file to this mail.

 

Correct me if I’m wrong but the “select * from pg_stat_archiver” is linked with WAL archiving, correct ?  Currently this system has archiving switch off. For backup purposes we are running scheduled pg_dumps of each database.

So it didn’t give me any output. Which I think is normal since we switched it off.

 

To setup the replication we used the pg_basebackup with the --xlog-method=stream option this way pg_basebackup will not only copy the data as it is, but also stream the XLOG being created during the base backup to our destination server.

Not sure what the problem is since it appears to being able to perform these actions without any reported error.

 

Cheers,

 

Kristof

 

From: Payal Singh [mailto:payals1@umbc.edu]
Sent: donderdag 7 december 2017 19:29
To: Bruyninckx Kristof <Kristof.Bruyninckx@cegeka.com>
Cc: pgsql-general@lists.postgresql.org
Subject: Re: replication terminated by primary server

 

 

 

On Wed, Dec 6, 2017 at 8:07 AM, Bruyninckx Kristof <Kristof.Bruyninckx@cegeka.com> wrote:

 

In our environment, we have a master slave replication setup that has been working stable for the last year.

Host systems are debian Jessie and we are using postgres 9.4.

Now recently we have experienced a crash/hung master, and after restarting the postgress services on here the replication stopped working. The master however is running seemingly normal, except for the errors reported when it got restarted. After this nothing error related is reported.

 

Might want to check the master postgres logs during and after crash as well. Also, check for wal file progress on master (select * from pg_stat_archiver). 

 

--

Payal Singh
Graduate Student
Department of Computer Science and Electrical Engineering
University of Maryland, Baltimore County

Вложения