Обсуждение: slave restarts with kill -9 coming from somewhere, or nowhere

Поиск
Список
Период
Сортировка

slave restarts with kill -9 coming from somewhere, or nowhere

От
Bert
Дата:
Hello,

I'm running the latest postgres version (9.2.3), and today for the first time I encountered this:

12774 2013-04-02 18:13:10 CEST LOG:  server process (PID 28463) was terminated by signal 9: Killed
12774 2013-04-02 18:13:10 CEST DETAIL:  Failed process was running: BEGIN;declare "SQL_CUR0xff25e80" cursor for select distinct .... as "Reservation_date___time" , "C_4F_TRANSACTION"."FTRA_PRICE_VAL
12774 2013-04-02 18:13:10 CEST LOG:  terminating any other active server processes
12774 2013-04-02 18:13:12 CEST LOG:  all server processes terminated; reinitializing
29113 2013-04-02 18:13:15 CEST LOG:  database system was interrupted while in recovery at log time 2013-04-02 18:02:21 CEST
29113 2013-04-02 18:13:15 CEST HINT:  If this has occurred more than once some data might be corrupted and you might need to choose an earlier recovery target.
29113 2013-04-02 18:13:15 CEST LOG:  entering standby mode
29113 2013-04-02 18:13:15 CEST LOG:  redo starts at 6B0/DD0928A0
29113 2013-04-02 18:13:22 CEST LOG:  consistent recovery state reached at 6B0/DE3831E8
12774 2013-04-02 18:13:22 CEST LOG:  database system is ready to accept read only connections
29113 2013-04-02 18:13:22 CEST LOG:  invalid record length at 6B0/DE3859B8
29117 2013-04-02 18:13:22 CEST LOG:  streaming replication successfully connected to primary

for as far as I know it happened twice today. I have no idea where these kills are coming from. I only know thse are not nice :)

Does anyone has an idea what happened exactly?

wkr,
Bert

--
Bert Desmet
0477/305361

Re: slave restarts with kill -9 coming from somewhere, or nowhere

От
Tom Lane
Дата:
Bert <biertie@gmail.com> writes:
> I'm running the latest postgres version (9.2.3), and today for the first
> time I encountered this:

> 12774 2013-04-02 18:13:10 CEST LOG:  server process (PID 28463) was
> terminated by signal 9: Killed

AFAIK there are only two possible sources of signal 9: a manual kill,
or the Linux kernel's OOM killer.  If it's the latter there should be
a concurrent entry in the kernel logfiles about this.  If you find one,
suggest reading up on how to disable OOM kills, or at least reconfigure
your system to make them less probable.

            regards, tom lane


Re: slave restarts with kill -9 coming from somewhere, or nowhere

От
Bert
Дата:
Hi Tom,

thanks for the tip! it was indeed the oom killer.

Is it wise to disable the oom killer? Or will the server really go down withough postgres doing something about it?

currently I already lowered the shared_memory value a bit..

cheers,
Bert


On Tue, Apr 2, 2013 at 8:06 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Bert <biertie@gmail.com> writes:
> I'm running the latest postgres version (9.2.3), and today for the first
> time I encountered this:

> 12774 2013-04-02 18:13:10 CEST LOG:  server process (PID 28463) was
> terminated by signal 9: Killed

AFAIK there are only two possible sources of signal 9: a manual kill,
or the Linux kernel's OOM killer.  If it's the latter there should be
a concurrent entry in the kernel logfiles about this.  If you find one,
suggest reading up on how to disable OOM kills, or at least reconfigure
your system to make them less probable.

                        regards, tom lane



--
Bert Desmet
0477/305361

Re: slave restarts with kill -9 coming from somewhere, or nowhere

От
Bert
Дата:
Hi all,

I have turned vm.overcommit_memory on 1.

It's a pretty much dedicated machine anyway, except for some postgres maintainance scripts I run in python / bash from the server.

We'll see what it gives.

cheers,
Bert


On Wed, Apr 3, 2013 at 8:45 AM, Bert <biertie@gmail.com> wrote:
Hi Tom,

thanks for the tip! it was indeed the oom killer.

Is it wise to disable the oom killer? Or will the server really go down withough postgres doing something about it?

currently I already lowered the shared_memory value a bit..

cheers,
Bert


On Tue, Apr 2, 2013 at 8:06 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Bert <biertie@gmail.com> writes:
> I'm running the latest postgres version (9.2.3), and today for the first
> time I encountered this:

> 12774 2013-04-02 18:13:10 CEST LOG:  server process (PID 28463) was
> terminated by signal 9: Killed

AFAIK there are only two possible sources of signal 9: a manual kill,
or the Linux kernel's OOM killer.  If it's the latter there should be
a concurrent entry in the kernel logfiles about this.  If you find one,
suggest reading up on how to disable OOM kills, or at least reconfigure
your system to make them less probable.

                        regards, tom lane



--
Bert Desmet
0477/305361



--
Bert Desmet
0477/305361

Re: slave restarts with kill -9 coming from somewhere, or nowhere

От
Bert
Дата:
hi,

this is strange: one connection almost killed the server. So not a combination of a lot of connections. I saw one connection grewing till over 100GB. Then I cancelled the connection before the oom killer became active again.

These are my memory settings:
shared_buffers = 20GB 
temp_buffers = 1GB
max_prepared_transactions = 10
work_mem = 4GB
maintenance_work_mem = 1GB
max_stack_depth = 8MB
wal_buffers = 32MB
effective_cache_size = 88GB

The server has 128GB ram

How is it possible that one connection (query) uses all the ram? And how can I avoid it?

ps: the database is a DWH. I don't need a lot of connections. But I want to process a lot of data fast.

cheers,
Bert




On Wed, Apr 3, 2013 at 10:10 AM, Bert <biertie@gmail.com> wrote:
Hi all,

I have turned vm.overcommit_memory on 1.

It's a pretty much dedicated machine anyway, except for some postgres maintainance scripts I run in python / bash from the server.

We'll see what it gives.

cheers,
Bert


On Wed, Apr 3, 2013 at 8:45 AM, Bert <biertie@gmail.com> wrote:
Hi Tom,

thanks for the tip! it was indeed the oom killer.

Is it wise to disable the oom killer? Or will the server really go down withough postgres doing something about it?

currently I already lowered the shared_memory value a bit..

cheers,
Bert


On Tue, Apr 2, 2013 at 8:06 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Bert <biertie@gmail.com> writes:
> I'm running the latest postgres version (9.2.3), and today for the first
> time I encountered this:

> 12774 2013-04-02 18:13:10 CEST LOG:  server process (PID 28463) was
> terminated by signal 9: Killed

AFAIK there are only two possible sources of signal 9: a manual kill,
or the Linux kernel's OOM killer.  If it's the latter there should be
a concurrent entry in the kernel logfiles about this.  If you find one,
suggest reading up on how to disable OOM kills, or at least reconfigure
your system to make them less probable.

                        regards, tom lane



--
Bert Desmet
0477/305361



--
Bert Desmet
0477/305361



--
Bert Desmet
0477/305361

Re: slave restarts with kill -9 coming from somewhere, or nowhere

От
Tom Lane
Дата:
Bert <biertie@gmail.com> writes:
> These are my memory settings:
> work_mem = 4GB

> How is it possible that one connection (query) uses all the ram? And how
> can I avoid it?

Uh ... don't do the above.  work_mem is the allowed memory consumption
per query step, ie per hash or sort operation.  A complex query can
easily use multiples of work_mem.

            regards, tom lane


Re: slave restarts with kill -9 coming from somewhere, or nowhere

От
Bert
Дата:
aha, ok. This was a setting pg_tune sugested. But I can understand how that is a bad idea.

wkr,
Bert


On Thu, Apr 4, 2013 at 8:17 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Bert <biertie@gmail.com> writes:
> These are my memory settings:
> work_mem = 4GB

> How is it possible that one connection (query) uses all the ram? And how
> can I avoid it?

Uh ... don't do the above.  work_mem is the allowed memory consumption
per query step, ie per hash or sort operation.  A complex query can
easily use multiples of work_mem.

                        regards, tom lane



--
Bert Desmet
0477/305361