Обсуждение: Preventing OOM kills

Поиск
Список
Период
Сортировка

Preventing OOM kills

От
Yang Zhang
Дата:
PG tends to be picked on by the Linux OOM killer, so lately we've been
forcing the OOM killer to kill other processes first with this script:

while true; do
  for i in `pgrep postgres`; do
    echo -17 > /proc/$i/oom_adj
  done
  sleep 60
done

Is there a Better Way?  Thanks in advance.

Re: Preventing OOM kills

От
Andrej
Дата:
On 25 May 2011 12:32, Yang Zhang <yanghatespam@gmail.com> wrote:
> PG tends to be picked on by the Linux OOM killer, so lately we've been
> forcing the OOM killer to kill other processes first with this script:
>
> while true; do
>  for i in `pgrep postgres`; do
>    echo -17 > /proc/$i/oom_adj
>  done
>  sleep 60
> done
>
> Is there a Better Way?  Thanks in advance.

Add more RAM?  Look at tunables for other processes on
the machine?  At the end of the day making the kernel shoot
anything out of despair shouldn't be the done thing.


Cheers,
Andrej

Re: Preventing OOM kills

От
Scott Marlowe
Дата:
On Tue, May 24, 2011 at 6:50 PM, Andrej <andrej.groups@gmail.com> wrote:
> On 25 May 2011 12:32, Yang Zhang <yanghatespam@gmail.com> wrote:
>> PG tends to be picked on by the Linux OOM killer, so lately we've been
>> forcing the OOM killer to kill other processes first with this script:
>>
>> while true; do
>>  for i in `pgrep postgres`; do
>>    echo -17 > /proc/$i/oom_adj
>>  done
>>  sleep 60
>> done
>>
>> Is there a Better Way?  Thanks in advance.
>
> Add more RAM?  Look at tunables for other processes on
> the machine?  At the end of the day making the kernel shoot
> anything out of despair shouldn't be the done thing.

I thought that setting vm.overcommit_memory=2 stopped the OOM killer.

Re: Preventing OOM kills

От
Devrim GÜNDÜZ
Дата:
On Tue, 2011-05-24 at 17:32 -0700, Yang Zhang wrote:
> PG tends to be picked on by the Linux OOM killer, so lately we've been
> forcing the OOM killer to kill other processes first with this script:
>
> while true; do
>   for i in `pgrep postgres`; do
>     echo -17 > /proc/$i/oom_adj
>   done
>   sleep 60
> done
>
> Is there a Better Way?  Thanks in advance.

Why don't you start postmaster with this value? Here is what we do in
RPM init scripts.

PG_OOM_ADJ=-17
    test x"$PG_OOM_ADJ" != x && echo "$PG_OOM_ADJ" > /proc/self/oom_adj
        $SU -l postgres -c "$PGENGINE/postmaster -p '$PGPORT' -D '$PGDATA' ${PGOPTS} &" >> "$PGLOG" 2>&1 < /dev/null

Regards,
--
Devrim GÜNDÜZ
Principal Systems Engineer @ EnterpriseDB: http://www.enterprisedb.com
PostgreSQL Danışmanı/Consultant, Red Hat Certified Engineer
Community: devrim~PostgreSQL.org, devrim.gunduz~linux.org.tr
http://www.gunduz.org  Twitter: http://twitter.com/devrimgunduz

Вложения

Re: Preventing OOM kills

От
John R Pierce
Дата:
On 05/24/11 5:50 PM, Andrej wrote:
> Add more RAM?  Look at tunables for other processes on
> the machine?  At the end of the day making the kernel shoot
> anything out of despair shouldn't be the done thing.

somehow, 'real' unix has neither a OOMkiller nor does it flat out die
under heavy loads, it just degrades gracefully.  I've seen Solaris and
AIX and BSD servers happily chugging along with load factors in the
100s, significant portions of memory paging, etc, without completely
crumbling to a halt.    Soimetimes I wonder why Linux even pretends to
support virtual memory, as you sure don't want it to be paging.


--
john r pierce                            N 37, W 123
santa cruz ca                         mid-left coast


Re: Preventing OOM kills

От
Scott Marlowe
Дата:
On Tue, May 24, 2011 at 7:01 PM, John R Pierce <pierce@hogranch.com> wrote:
> On 05/24/11 5:50 PM, Andrej wrote:
>>
>> Add more RAM?  Look at tunables for other processes on
>> the machine?  At the end of the day making the kernel shoot
>> anything out of despair shouldn't be the done thing.
>
> somehow, 'real' unix has neither a OOMkiller nor does it flat out die under
> heavy loads, it just degrades gracefully.  I've seen Solaris and AIX and BSD
> servers happily chugging along with load factors in the 100s, significant
> portions of memory paging, etc, without completely crumbling to a halt.
>  Soimetimes I wonder why Linux even pretends to support virtual memory, as
> you sure don't want it to be paging.

I've found that on servers with multiple drives and the page file
spread across them linux does pretty well when swapping out.  Even
going pretty far back, when I had 6 9G SCSI drives on an old Sparc 20
running RHEL with 256M ram the swapping was quite speedy with a 100M
or so on each drive.

Re: Preventing OOM kills

От
Marco Colombo
Дата:
On 05/25/2011 03:01 AM, John R Pierce wrote:
> On 05/24/11 5:50 PM, Andrej wrote:
>> Add more RAM? Look at tunables for other processes on
>> the machine? At the end of the day making the kernel shoot
>> anything out of despair shouldn't be the done thing.
>
> somehow, 'real' unix has neither a OOMkiller nor does it flat out die
> under heavy loads, it just degrades gracefully. I've seen Solaris and
> AIX and BSD servers happily chugging along with load factors in the
> 100s, significant portions of memory paging, etc, without completely
> crumbling to a halt. Soimetimes I wonder why Linux even pretends to
> support virtual memory, as you sure don't want it to be paging.
>
>

http://developers.sun.com/solaris/articles/subprocess/subprocess.html

"Some operating systems (such as Linux, IBM AIX, and HP-UX) have a
feature called memory overcommit (also known as lazy swap allocation).
In a memory overcommit mode, malloc() does not reserve swap space and
always returns a non-NULL pointer, regardless of whether there is enough
VM on the system to support it or not.

The memory overcommit feature has advantages and disadvantages."

(the page goes on with some interesting info) [*]

It appears by your definition that neither Linux, AIX nor HP-UX are
'real' Unix. Oh, wait, FreeBSD overcommits, too, so can't be 'real' either.

/me wonders now what a 'real' Unix is. :) Must be something related with
'true' SysV derivatives. If memory serves me well, that's where the word
'thrashing' originated, right? Actually in my experience nothing
'thrashes' better than a SysV, Solaris included.

The solution for the OP problem is to keep the system from reaching OOM
state in the first place. That is necessary even with overcommitting
turned off. PG not performing its job because malloc() keeps failing
isn't really a "solution".

.TM.

[*] One missing piece is that overcommitting actually prevents or delays
OOM state. The article does mention "system memory can be used more
flexibly and efficiently" w/o really elaborating further. It means that,
given the same amount of memory (RAM+swap), a non overcommitting system
reaches OOM way before than a overcommitting one. Also it is rarely a
good idea, when running low on memory, to switch to an allocation policy
that is _less_ efficient, memory wise.