On 25.03.2014 15:36, Alvaro Herrera wrote:
> Tom Lane wrote:
>> postgresql@thequod.de writes:
>>> PostgreSQL just failed to startup after a reboot (which was forced vi=
a
>>> remote Ctrl-Alt-Delete on the PostgreSQL's containers host):
>>
>>> 2014-03-24 13:32:47 CET LOG: could not receive data from client: Con=
nection
>>> reset by peer
>>> 2014-03-25 12:32:17 CET FATAL: no free slots in PMChildFlags array
>>> 2014-03-25 12:32:17 CET LOG: process 9975 releasing ProcSignal slot =
108,
>>> but it contains 0
>>> 2014-03-25 12:32:17 CET LOG: process 9974 releasing ProcSignal slot =
109,
>>> but it contains 0
>>> 2014-03-25 12:32:17 CET LOG: process 9976 releasing ProcSignal slot =
110,
>>> but it contains 0
>>
>> That's odd (and as you say, unexpected) but this log extract doesn't g=
ive
>> much clue as to how we got into this state. What was going on before
>> this? In particular, it's hard to call this "failure to start up" whe=
n
>> you evidently had a hundred or so postmaster child processes already.
>> Could there have been some unexpected surge in the number of connectio=
n
>> attempts just after the database came up? Also, this extract doesn't =
look
>> like anything that would've caused the postmaster to decide to shut do=
wn
>> again, so what happened after that? Or in short, I want to see the re=
st
>> of the log not just this part.
That was the whole log.
The rotated one before has only:
2014-03-22 03:51:37 CET LOG: could not receive data from client: Connect=
ion reset by peer
2014-03-22 03:52:25 CET LOG: could not receive data from client: Connect=
ion reset by peer
2014-03-22 03:59:31 CET LOG: could not receive data from client: Connect=
ion reset by peer
2014-03-22 04:00:18 CET LOG: could not receive data from client: Connect=
ion reset by peer
2014-03-22 06:03:06 CET LOG: could not receive data from client: Connect=
ion reset by peer
Should I increase the logging verbosity, in case this happens again?
If so, to what? (I have not configured logging yet, so it has the default=
s from your Debian package).
> Here's my guess --- this is a virtualized system that somehow dumped
> some state to disk to hibernate while the host was being rebooted; and
> then, when the host was up again, it tried to resurrect the virtual
> machine and found things to be all inconsistent.
Yes, the container was frozen during reboot:
=46rom the host:
Mar 25 11:54:48 HN kernel: [ 76.237452] CT: 144: started
Mar 25 11:55:03 HN kernel: [ 91.201145] CT: 144: restored
OpenVZ uses "suspend" by default to stop containers on host reboots.
I will change this to "stop" for the PostgreSQL container, but still this=
seems like something PostgreSQL should handle better.
FWIW, I have just suspended and started the container manually, and Postg=
reSQL kept running (upgraded to 9.3.4 in the meantime).
Maybe it's a bug with OpenVZ and how it restores some resources after reb=
ooting the host?
Please also note that the PostgreSQL error happened half an hour after th=
e reboot/resuming of the container.
Thanks,
Daniel.
--=20
http://daniel.hahler.de/