Re: Streaming replication - unable to stop the standby

Поиск
Список
Период
Сортировка
От Stefan Kaltenbrunner
Тема Re: Streaming replication - unable to stop the standby
Дата
Msg-id 4BDF1458.1040807@kaltenbrunner.cc
обсуждение исходный текст
Ответ на Re: Streaming replication - unable to stop the standby  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: Streaming replication - unable to stop the standby  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
Tom Lane wrote:
> Stefan Kaltenbrunner <stefan@kaltenbrunner.cc> writes:
>> I'm currently testing SR/HS in 9.0beta1 and I noticed that it seems 
>> quite easy to end up in a situation where you have a standby that seems 
>> to be stuck in:
> 
>> $ psql -p 5433
>> psql: FATAL:  the database system is shutting down
> 
>> but not not actually shuting down ever. I ran into that a few times now 
>> (mostly because I'm trying to chase a recovery issue I hit during 
>> earlier testing) by simply having the master iterate between a pgbench 
>> run and "idle" while simple doing pg_ctl restart in a loop on the standby.
>> I do vaguely recall some discussions of that but I thought the issue git 
>> settled somehow?
> 
> Hm, I haven't pushed this hard but "pg_ctl stop" seems to stop the
> standby for me.  Which subprocesses of the slave postmaster are still
> around?  Could you attach to them with gdb and get stack traces?

it is not always failing to shut down - it only fails sometimes - I have 
not exactly pinpointed yet what it is causing this but the standby is in 
a weird state now:

* the master is currently idle
* the standby has no connections at all

logs from the standby:

FATAL:  the database system is shutting down
FATAL:  the database system is shutting down
FATAL:  replication terminated by primary server
LOG:  restored log file "000000010000001900000054" from archive
cp: cannot stat `/mnt/space/wal-archive/000000010000001900000055': No 
such file or directory
LOG:  record with zero length at 19/55000078
cp: cannot stat `/mnt/space/wal-archive/000000010000001900000055': No 
such file or directory
FATAL:  could not connect to the primary server: could not connect to 
server: Connection refused    Is the server running on host "localhost" and accepting    TCP/IP connections on port
5432?couldnot connect to server: Connection refused    Is the server running on host "localhost" and accepting
TCP/IPconnections on port 5432?
 
cp: cannot stat `/mnt/space/wal-archive/000000010000001900000055': No 
such file or directory
cp: cannot stat `/mnt/space/wal-archive/000000010000001900000055': No 
such file or directory
LOG:  streaming replication successfully connected to primary
FATAL:  the database system is shutting down


the first two "FATAL: the database system is shutting down" are from me 
trying to connect using psql after i noticed that pg_ctl failed to 
shutdown the slave.
The next thing I tried was restarting the master - which lead to the 
following logs and the standby noticing that and reconnecting but you 
cannot actually connect...

process tree for the standby is:

29523 pts/2    S      0:00 /home/postgres9/pginst/bin/postgres -D 
/mnt/space/pgdata_standby
29524 ?        Ss     0:06  \_ postgres: startup process   waiting for 
000000010000001900000055
29529 ?        Ss     0:00  \_ postgres: writer process 

29835 ?        Ss     0:00  \_ postgres: wal receiver process 
streaming 19/55000078



Stefan


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Robert Haas
Дата:
Сообщение: Re: Streaming replication - unable to stop the standby
Следующее
От: Simon Riggs
Дата:
Сообщение: Re: max_standby_delay considered harmful