Re: [GENERAL] openvz and shared memory trouble

Поиск
Список
Период
Сортировка
От lst_hoe02@kwsoft.de
Тема Re: [GENERAL] openvz and shared memory trouble
Дата
Msg-id 20140331150232.Horde.bhrrXj4fvWVUjEbn9LGGgg7@webmail.kwsoft.de
обсуждение исходный текст
Ответ на Re: [GENERAL] openvz and shared memory trouble  (Willy-Bas Loos <willybas@gmail.com>)
Ответы Re: [GENERAL] openvz and shared memory trouble  (Willy-Bas Loos <willybas@gmail.com>)
Список pgsql-admin
Zitat von Willy-Bas Loos <willybas@gmail.com>:

> On Sat, Mar 29, 2014 at 6:17 PM, Adrian Klaver
> <adrian.klaver@aklaver.com>wrote:
>
>> On 03/29/2014 08:19 AM, Willy-Bas Loos wrote:
>>
>>> The error that shows up is a Bus error.
>>> That's on the replication slave.
>>> Here's the log about it:
>>> 2014-03-29 12:41:33 CET db: ip: us: FATAL:  could not receive data from
>>> WAL stream: server closed the connection unexpectedly
>>>          This probably means the server terminated abnormally
>>>          before or while processing the request.
>>>
>>> cp: cannot stat
>>> `/data/postgresql/9.1/main/wal_archive/00000001000000720000000A': No
>>> such file or directory
>>> 2014-03-29 12:41:33 CET db: ip: us: LOG:  unexpected pageaddr
>>> 71/E9DA0000 in log file 114, segment 10, offset 14286848
>>> cp: cannot stat
>>> `/data/postgresql/9.1/main/wal_archive/00000001000000720000000A': No
>>> such file or directory
>>> 2014-03-29 12:41:33 CET db: ip: us: LOG:  streaming replication
>>> successfully connected to primary
>>> 2014-03-29 12:41:48 CET db: ip: us: LOG:  startup process (PID 17452)
>>> was terminated by signal 7: Bus error
>>> 2014-03-29 12:41:48 CET db: ip: us: LOG:  terminating any other active
>>> server processes
>>> 2014-03-29 12:41:48 CET db:wbloos ip:[local] us:wbloos WARNING:
>>> terminating connection because of crash of another server process
>>> 2014-03-29 12:41:48 CET db:wbloos ip:[local] us:wbloos DETAIL:  The
>>> postmaster has commanded this server process to roll back the current
>>> transaction and exit, because another server process exited abnormally
>>> and possibly corrupted shared memory.
>>> 2014-03-29 12:41:48 CET db:wbloos ip:[local] us:wbloos HINT:  In a
>>> moment you should be able to reconnect to the database and repeat your
>>> command.
>>>
>>>
>> Well what I am seeing are WAL log errors. One saying no file is
>> present, the other pointing at a possible file corruption.
>
> Those are normal notices, nothing to worry about.
>
>
>> Shared memory problems are offered as a possible cause only. Right now I
>> would say we are seeing only half the picture. The Postgres logs from the
>> same time period for the primary server, as well as the system logs for the
>> openvz container would help fill in the other half of the picture.
>>
>
> Here's the log from the primary postgres server:
> 2014-03-29 12:41:29 CET db:wbloos ip:[local] us:wbloos NOTICE:  ALTER TABLE
> will create implicit sequence "test_x_seq" for serial column "test.x"
> 2014-03-29 12:41:33 CET db:[unknown] ip:xxx.xxx.xxx.xxx us:replication
> LOG:  SSL renegotiation failure
> 2014-03-29 12:41:33 CET db:[unknown] ip:xxx.xxx.xxx.xxx us:replication
> LOG:  SSL error: unexpected record
> 2014-03-29 12:41:33 CET db:[unknown] ip:xxx.xxx.xxx.xxx us:replication
> LOG:  could not send data to client: Connection reset by peer
> 2014-03-29 12:41:48 CET db:[unknown] ip:xxx.xxx.xxx.xxx us:replication
> LOG:  could not receive data from client: Connection reset by peer
> 2014-03-29 12:41:48 CET db:[unknown] ip:xxx.xxx.xxx.xxx us:replication
> LOG:  unexpected EOF on standby connection
>
> (the SSL renegotiation failure happens all the time, without the crash)
>
> And here's the syslog form the container:
> Mar 29 12:41:01 mycontainer snmpd[8819]: Connection from UDP:
> [xxx.xxx.xxx.xxx]:59090->[xxx.xxx.xxx.xxx]
> Mar 29 12:42:30 mycontainer snmpd[8819]: Connection from UDP:
> [xxx.xxx.xxx.xxx]:35949->[xxx.xxx.xxx.xxx]
>
> The log on the host doesn't say anything interesting either.
>
> A cursory look at memory management in openvz shows it is different from
>> other virtualization software and physical machines. Whether that is a
>> problem would seem to be dependent on where you are on the learning curve:)
>>
> That sounds like "there is a solution to the problem, all you have to do is
> find out what it is". There doesn't seem to be a variable in the
> beancounters or anywhere else that can prevent the bus error from happening.
> There's seems to be no separate way of guaranteeing shared memory. There's
> no OOM killer active either, nor is host or server running short of memory.
>
> I'm still worried that it's like Tom Lane said in another discussion:"So
> basically, you've got a broken kernel here: it claimed to give PG circa
> (135MB) of memory, but what's actually there is only about (128MB). I don't
> see any connection between those numbers and the shmmax/shmall settings,
> either --- so I think this must be some busted implementation of a VM-level
> limitation."
> (here:
> http://www.postgresql.org/message-id/CAK3UJREBcyVBtr8D7vMfU=uDdkjXkrPnGcuy8erYB0tMfKe1LA@mail.gmail.com
> )
>
> And it makes me wonder what else may be issues that arise from that. But
> especially, what i can do about it.
>

So what does your "shmpages" and your sysctl inside the container
claim to be the max shared memory available. If you set the shmmax,
but the shmpages in your container config is to short Postgres maybe
has no way to find out beside crashing
(http://openvz.org/Postgresql_and_shared_memory).

Regards

Andreas


Вложения

В списке pgsql-admin по дате отправления:

Предыдущее
От: Willy-Bas Loos
Дата:
Сообщение: Re: [GENERAL] openvz and shared memory trouble
Следующее
От: Sergey Arlashin
Дата:
Сообщение: PostgreSQL 9.3 logging: separate log messages