replication terminated by primary server

Поиск
Список
Период
Сортировка
От Bruyninckx Kristof
Тема replication terminated by primary server
Дата
Msg-id 2e68e11d019147508ba6c155efdfdd35@SVINTMAIL03.cegekanv.corp.local
обсуждение исходный текст
Ответы Re: replication terminated by primary server  (Payal Singh <payals1@umbc.edu>)
Список pgsql-general

 

In our environment, we have a master slave replication setup that has been working stable for the last year.

Host systems are debian Jessie and we are using postgres 9.4.

Now recently we have experienced a crash/hung master, and after restarting the postgress services on here the replication stopped working. The master however is running seemingly normal, except for the errors reported when it got restarted. After this nothing error related is reported.

 

[10192-1] [unknown]@[unknown] LOG: incomplete startup packet

[10222-1] [unknown]@[unknown] LOG: incomplete startup packet

[10033-2] LOG: replication terminated by primary server

[10033-3] DETAIL: End of WAL reached on timeline 2 at 999/A5687790.

[1082-12] LOG: invalid record length at 999/A5687790

[10239-1] LOG: started streaming WAL from primary at 999/A5000000 on timeline 2

[1064-7] LOG: startup process (PID 1082) exited with exit code 1

[1064-8] LOG: terminating any other active server processes

[18749-1] readonly@pal WARNING: terminating connection because of crash of another server process

[25793-1] _readonly@pal WARNING: terminating connection because of crash of another server process

 

 

After a recent crash of the postgres master I'm not able to get the slave to start replicating.

 

I always get the following error message

 

13247-2] HINT:  Future log output will go to log destination "syslog".

[13247-3] LOCATION:  PostmasterMain, postmaster.c:1228

[13248-1] LOG:  00000: database system was interrupted while in recovery at log time 2017-12-04 15:10:29 CET

[13248-2] HINT:  If this has occurred more than once some data might be corrupted and you might need to choose an earlier recovery target.

[13248-3] LOCATION:  StartupXLOG, xlog.c:6134

[13248-4] LOG:  00000: entering standby mode

[13248-5] LOCATION:  StartupXLOG, xlog.c:6203

[13247-4] LOG:  00000: startup process (PID 13248) exited with exit code 1

[13247-5] LOCATION:  LogChildExit, postmaster.c:3452

[13247-6] LOG:  00000: aborting startup due to startup process failure

 

I’ve already tried to perform a complete backup and resync procedure on the slave

pg_basebackup -D /var/lib/postgresql/backups/fullbackup -R -h <IP> --checkpoint=fast --username=<username> --xlog-method=stream

 

Which completes without any error message. The odd thing is that the backup folder does already contains a recovery.done file. When I do the same command on a test platform this recovery.done is not created.

But the test is using 9.5. Not sure it is related.

 

Also the recovery.conf contains all the information is should but still the error message stays the same.

cat recovery.conf

recovery_target_timeline='latest'

standby_mode = 'on'

primary_conninfo = 'user=<user>  password=<passwd> host=IP port=5432 sslmode=prefer sslcompression=1 krbsrvname=postgres'

 

Does this mean that the corruption is on the master system and it needs to be restored to a point before it crashed ? Not sure what I can do to get the replication working again ?

Any ideas ?

 

Kind Regards,

 

Kristof

 

 

 

Met vriendelijke groeten / Meilleures salutations / Best regards

Kristof Bruyninckx
System Engineer

 

E  Kristof.Bruyninckx@cegeka.com 
M +32 473 33 50 67

 

CEGEKA Universiteitslaan 9
B-3500 Hasselt, Belgium
T +32 11 24 02 34
WWW.CEGEKA.COM 

LinkedIn

Twitter

Facebook

Youtube

 

В списке pgsql-general по дате отправления:

Предыдущее
От: Martin Mueller
Дата:
Сообщение: Re: a back up question
Следующее
От: Maltsev Eduard
Дата:
Сообщение: Does Postgresql 10 query partitions in parallel?