Обсуждение: str replication failed, restart fixed it


str replication failed, restart fixed it

Willy-Bas Loos

I had a problem today and i fixed it by restarting postgres.
That doesn't seem to make sense to me, what could have been going on?

This is the log:
2014-02-26 04:30:45 CET db: ip: us: FATAL:  could not send data to WAL stream: SSL error: sslv3 alert unexpected message
cp: cannot stat `/data/postgresql/9.1/main/wal_archive/000000010000006400000062': No such file or directory
2014-02-26 04:30:45 CET db: ip: us: LOG:  unexpected pageaddr 64/3FBC6000 in log file 100, segment 98, offset 12345344
cp: cannot stat `/data/postgresql/9.1/main/wal_archive/000000010000006400000062': No such file or directory
2014-02-26 04:30:45 CET db: ip: us: LOG:  streaming replication successfully connected to primary
2014-02-26 04:32:09 CET db: ip: us: LOG:  startup process (PID 5385) was terminated by signal 7: Bus error
2014-02-26 04:32:09 CET db: ip: us: LOG:  terminating any other active server processes

The cluster was "online" according to pg_lsclusters, but it was not possible to connect to it:
psql: could not connect to server: No such file or directory
    Is the server running locally and accepting
    connections on Unix domain socket "/var/run/postgresql/.s.PGSQL.5432"?

uptime tells me this:
postgres@myserver:~$ uptime
 10:47:27 up 89 days, 42 min,  1 user,  load average: 0.00, 0.00, 0.00

This is postgresql 9.1 on Ubuntu 12.04 on OpenVZ

The weirdest thing is that restarting the postgres cluster fixed it.
Does this make any sense to you?


"Quality comes from focus and clarity of purpose" -- Mark Shuttleworth

Re: str replication failed, restart fixed it

Willy-Bas Loos
This is very probably an OpenVZ issue, it can be solved by bringing down the shared_buffers a lot.
The restart works because the server is in fact down. I think pg_lsclusters showed online because of a stale runfile.

I was hoping that the memory allocation improvements in postgres 9.3 would solve these issues, but this post makes me think that they won't:

Does anyone know solutions?



On Wed, Feb 26, 2014 at 10:53 AM, Willy-Bas Loos <willybas@gmail.com> wrote:

I had a problem today and i fixed it by restarting postgres.
That doesn't seem to make sense to me, what could have been going on?

This is the log:
2014-02-26 04:30:45 CET db: ip: us: FATAL:  could not send data to WAL stream: SSL error: sslv3 alert unexpected message
cp: cannot stat `/data/postgresql/9.1/main/wal_archive/000000010000006400000062': No such file or directory
2014-02-26 04:30:45 CET db: ip: us: LOG:  unexpected pageaddr 64/3FBC6000 in log file 100, segment 98, offset 12345344
cp: cannot stat `/data/postgresql/9.1/main/wal_archive/000000010000006400000062': No such file or directory
2014-02-26 04:30:45 CET db: ip: us: LOG:  streaming replication successfully connected to primary
2014-02-26 04:32:09 CET db: ip: us: LOG:  startup process (PID 5385) was terminated by signal 7: Bus error
2014-02-26 04:32:09 CET db: ip: us: LOG:  terminating any other active server processes

The cluster was "online" according to pg_lsclusters, but it was not possible to connect to it:
psql: could not connect to server: No such file or directory
    Is the server running locally and accepting
    connections on Unix domain socket "/var/run/postgresql/.s.PGSQL.5432"?

uptime tells me this:
postgres@myserver:~$ uptime
 10:47:27 up 89 days, 42 min,  1 user,  load average: 0.00, 0.00, 0.00

This is postgresql 9.1 on Ubuntu 12.04 on OpenVZ

The weirdest thing is that restarting the postgres cluster fixed it.
Does this make any sense to you?


"Quality comes from focus and clarity of purpose" -- Mark Shuttleworth

"Quality comes from focus and clarity of purpose" -- Mark Shuttleworth