Обсуждение: Setting oom_adj on linux?

Поиск
Список
Период
Сортировка

Setting oom_adj on linux?

От
Magnus Hagander
Дата:
I realize this is a very platform-specific thing, but should we
consider setting the value of /proc/<pid>/oom_adj when running on
linux? See:


http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=blob;f=Documentation/filesystems/proc.txt;h=220cc6376ef80e0c9bcfec162d45552e729cdf5a;hb=45d28b097280a78893ce25a5d0db41e6a2717853

section 3.1.

To get the best benefit, I think it needs to be done in cooperation
between the startup scripts and PostgreSQL. We'd want -17 ("never oom
kill") on the postmaster, but some different value on regular backends
(since if it has to kill someone, it's better to pick a regular
backend so we can do a controlled restart). Only root can drop the
value, so the startup script needs to be part of it. But if we then
want to increase it for sub-processes, that'd require something in the
backend itself...

-- Magnus HaganderMe: http://www.hagander.net/Work: http://www.redpill-linpro.com/


Re: Setting oom_adj on linux?

От
Alvaro Herrera
Дата:
Magnus Hagander wrote:
> I realize this is a very platform-specific thing, but should we
> consider setting the value of /proc/<pid>/oom_adj when running on
> linux? See:

http://archives.postgresql.org/message-id/20080201223336.GC24780%40alvh.no-ip.org

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


Re: Setting oom_adj on linux?

От
Magnus Hagander
Дата:
On Mon, Jan 4, 2010 at 16:45, Alvaro Herrera <alvherre@commandprompt.com> wrote:
> Magnus Hagander wrote:
>> I realize this is a very platform-specific thing, but should we
>> consider setting the value of /proc/<pid>/oom_adj when running on
>> linux? See:
>
> http://archives.postgresql.org/message-id/20080201223336.GC24780%40alvh.no-ip.org

Grr. I had zero recollectoin of that :S

Can't find a useful consensus though?


-- Magnus HaganderMe: http://www.hagander.net/Work: http://www.redpill-linpro.com/


Re: Setting oom_adj on linux?

От
"Joshua D. Drake"
Дата:
On Mon, 4 Jan 2010 16:57:28 +0100, Magnus Hagander <magnus@hagander.net>
wrote:
> On Mon, Jan 4, 2010 at 16:45, Alvaro Herrera
<alvherre@commandprompt.com>
> wrote:
>> Magnus Hagander wrote:
>>> I realize this is a very platform-specific thing, but should we
>>> consider setting the value of /proc/<pid>/oom_adj when running on
>>> linux? See:
>>
>>
http://archives.postgresql.org/message-id/20080201223336.GC24780%40alvh.no-ip.org
>
> Grr. I had zero recollectoin of that :S
>
> Can't find a useful consensus though?

I don't think we should set a setting like that automatically. Perhaps a
warning on startup?

Joshua D. Drake

--
PostgreSQL - XMPP: jdrake(at)jabber(dot)postgresql(dot)org
   Consulting, Development, Support, Training
   503-667-4564 - http://www.commandprompt.com/
   The PostgreSQL Company, serving since 1997


Re: Setting oom_adj on linux?

От
Andrew Dunstan
Дата:

Magnus Hagander wrote:
> On Mon, Jan 4, 2010 at 16:45, Alvaro Herrera <alvherre@commandprompt.com> wrote:
>   
>> Magnus Hagander wrote:
>>     
>>> I realize this is a very platform-specific thing, but should we
>>> consider setting the value of /proc/<pid>/oom_adj when running on
>>> linux? See:
>>>       
>> http://archives.postgresql.org/message-id/20080201223336.GC24780%40alvh.no-ip.org
>>     
>
> Grr. I had zero recollectoin of that :S
>
> Can't find a useful consensus though?
>
>   

It is probably worth trying to protect the postmaster in the init 
script. Beyond that things probably start to get fairly difficult.

cheers

andrew


Re: Setting oom_adj on linux?

От
Magnus Hagander
Дата:
On Mon, Jan 4, 2010 at 17:07, Andrew Dunstan <andrew@dunslane.net> wrote:
>
>
> Magnus Hagander wrote:
>>
>> On Mon, Jan 4, 2010 at 16:45, Alvaro Herrera <alvherre@commandprompt.com>
>> wrote:
>>
>>>
>>> Magnus Hagander wrote:
>>>
>>>>
>>>> I realize this is a very platform-specific thing, but should we
>>>> consider setting the value of /proc/<pid>/oom_adj when running on
>>>> linux? See:
>>>>
>>>
>>>
>>> http://archives.postgresql.org/message-id/20080201223336.GC24780%40alvh.no-ip.org
>>>
>>
>> Grr. I had zero recollectoin of that :S
>>
>> Can't find a useful consensus though?
>>
>>
>
> It is probably worth trying to protect the postmaster in the init script.
> Beyond that things probably start to get fairly difficult.

Right. But AFAICS (though I haven't tested with -17), it will become
inherited to children, which is something we'd want to *undo*, no?


-- Magnus HaganderMe: http://www.hagander.net/Work: http://www.redpill-linpro.com/


Re: Setting oom_adj on linux?

От
Greg Smith
Дата:
Magnus Hagander wrote: <blockquote cite="mid:9837222c1001040757r3a3edc85hc192f52a1f8e0fba@mail.gmail.com"
type="cite"><prewrap="">On Mon, Jan 4, 2010 at 16:45, Alvaro Herrera <a class="moz-txt-link-rfc2396E"
href="mailto:alvherre@commandprompt.com"><alvherre@commandprompt.com></a>wrote: </pre><blockquote
type="cite"><prewrap="">Magnus Hagander wrote:   </pre><blockquote type="cite"><pre wrap="">I realize this is a very
platform-specificthing, but should we
 
consider setting the value of /proc/<pid>/oom_adj when running on
linux? See:     </pre></blockquote><pre wrap=""><a class="moz-txt-link-freetext"
href="http://archives.postgresql.org/message-id/20080201223336.GC24780%40alvh.no-ip.org">http://archives.postgresql.org/message-id/20080201223336.GC24780%40alvh.no-ip.org</a>
 </pre></blockquote><pre wrap="">
 
Can't find a useful consensus though? </pre></blockquote><br /> In <a class="moz-txt-link-freetext"
href="http://archives.postgresql.org/pgsql-hackers/2008-02/msg00049.php">http://archives.postgresql.org/pgsql-hackers/2008-02/msg00049.php</a>
Tompoints out that while you could make this adjustment in the init scripts for PostgreSQL, actually doing so is quite
questionableas a packaging decision.  That's where that thread ended as far as I was concerned.  The best I think
anyonecould do here is to add such a capability into some of the init scripts, but it would probably need to be
disabledby default.  Since that sort of defeats the purpose of the change, I'm not sure what the benefit there is--if
youhave to turn it on, you might as well do something at a higher level instead.<br /><br /><pre class="moz-signature"
cols="72">--
 
Greg Smith    2ndQuadrant   Baltimore, MD
PostgreSQL Training, Services and Support
<a class="moz-txt-link-abbreviated" href="mailto:greg@2ndQuadrant.com">greg@2ndQuadrant.com</a>  <a
class="moz-txt-link-abbreviated"href="http://www.2ndQuadrant.com">www.2ndQuadrant.com</a>
 
</pre>

Re: Setting oom_adj on linux?

От
Andrew Dunstan
Дата:

Magnus Hagander wrote:
> On Mon, Jan 4, 2010 at 17:07, Andrew Dunstan <andrew@dunslane.net> wrote:
>   
>> Magnus Hagander wrote:
>>     
>>> On Mon, Jan 4, 2010 at 16:45, Alvaro Herrera <alvherre@commandprompt.com>
>>> wrote:
>>>
>>>       
>>>> Magnus Hagander wrote:
>>>>
>>>>         
>>>>> I realize this is a very platform-specific thing, but should we
>>>>> consider setting the value of /proc/<pid>/oom_adj when running on
>>>>> linux? See:
>>>>>
>>>>>           
>>>> http://archives.postgresql.org/message-id/20080201223336.GC24780%40alvh.no-ip.org
>>>>
>>>>         
>>> Grr. I had zero recollectoin of that :S
>>>
>>> Can't find a useful consensus though?
>>>
>>>
>>>       
>> It is probably worth trying to protect the postmaster in the init script.
>> Beyond that things probably start to get fairly difficult.
>>     
>
> Right. But AFAICS (though I haven't tested with -17), it will become
> inherited to children, which is something we'd want to *undo*, no?
>   


[experiments]

Yes, darnit, you're right. But it looks like the oom_adj file can be set 
to the default by the process owner:
   [andrew@sophia ~]$ ls -l /proc/6520/oom_adj   -rw-r--r-- 1 andrew andrew 0 2010-01-04 12:37 /proc/6520/oom_adj
[andrew@sophia~]$ cat /proc/6520/oom_adj   0   [andrew@sophia ~]$ id   uid=500(andrew) gid=500(andrew)
groups=10(wheel),500(andrew)  [andrew@sophia ~]$ echo -17 > /proc/6520/oom_adj   -bash: echo: write error: Permission
denied  [andrew@sophia ~]$ echo 0 > /proc/6520/oom_adj   [andrew@sophia ~]$ echo -17 > /proc/6520/oom_adj   -bash:
echo:write error: Permission denied   [andrew@sophia ~]$
 

But that would be a pain to have to do.

OTOH, disabling the OOM killer is not always an option. I recently tried 
it on one system and had to revert it rapidly because the system stopped 
working in minutes. Some software just doesn't live well in such 
environments, sadly.

cheers

andrew




Re: Setting oom_adj on linux?

От
Magnus Hagander
Дата:
On Mon, Jan 4, 2010 at 17:40, Andrew Dunstan <andrew@dunslane.net> wrote:
>
>
> Magnus Hagander wrote:
>>
>> On Mon, Jan 4, 2010 at 17:07, Andrew Dunstan <andrew@dunslane.net> wrote:
>>
>>>
>>> Magnus Hagander wrote:
>>>
>>>>
>>>> On Mon, Jan 4, 2010 at 16:45, Alvaro Herrera
>>>> <alvherre@commandprompt.com>
>>>> wrote:
>>>>
>>>>
>>>>>
>>>>> Magnus Hagander wrote:
>>>>>
>>>>>
>>>>>>
>>>>>> I realize this is a very platform-specific thing, but should we
>>>>>> consider setting the value of /proc/<pid>/oom_adj when running on
>>>>>> linux? See:
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> http://archives.postgresql.org/message-id/20080201223336.GC24780%40alvh.no-ip.org
>>>>>
>>>>>
>>>>
>>>> Grr. I had zero recollectoin of that :S
>>>>
>>>> Can't find a useful consensus though?
>>>>
>>>>
>>>>
>>>
>>> It is probably worth trying to protect the postmaster in the init script.
>>> Beyond that things probably start to get fairly difficult.
>>>
>>
>> Right. But AFAICS (though I haven't tested with -17), it will become
>> inherited to children, which is something we'd want to *undo*, no?
>>
>
>
> [experiments]
>
> Yes, darnit, you're right. But it looks like the oom_adj file can be set to
> the default by the process owner:
>
>   [andrew@sophia ~]$ ls -l /proc/6520/oom_adj
>   -rw-r--r-- 1 andrew andrew 0 2010-01-04 12:37 /proc/6520/oom_adj
>   [andrew@sophia ~]$ cat /proc/6520/oom_adj
>   0
>   [andrew@sophia ~]$ id
>   uid=500(andrew) gid=500(andrew) groups=10(wheel),500(andrew)
>   [andrew@sophia ~]$ echo -17 > /proc/6520/oom_adj
>   -bash: echo: write error: Permission denied
>   [andrew@sophia ~]$ echo 0 > /proc/6520/oom_adj
>   [andrew@sophia ~]$ echo -17 > /proc/6520/oom_adj
>   -bash: echo: write error: Permission denied
>   [andrew@sophia ~]$
>
> But that would be a pain to have to do.
>
> OTOH, disabling the OOM killer is not always an option. I recently tried it
> on one system and had to revert it rapidly because the system stopped
> working in minutes. Some software just doesn't live well in such
> environments, sadly.

Right. Which is why I like the idea of disabling the OOM killer for
the *postmaster*, but not the regular backends. Gives it a chance to
recover. It's not nice, but it's better than nothing.

-- Magnus HaganderMe: http://www.hagander.net/Work: http://www.redpill-linpro.com/


Re: Setting oom_adj on linux?

От
Tom Lane
Дата:
Greg Smith <greg@2ndquadrant.com> writes:
> In http://archives.postgresql.org/pgsql-hackers/2008-02/msg00049.php Tom 
> points out that while you could make this adjustment in the init scripts 
> for PostgreSQL, actually doing so is quite questionable as a packaging 
> decision.

I just wondered if it would be questioned, I didn't say there was a
problem.

However, the long and the short of this is that we can't do anything
without the close cooperation of an init script.  I think that moves
it out of the realm of what Postgres as a project should be doing.
It seems more like a patch that the Linux-based packagers should be
carrying.

Memo to self: get off duff and prepare such a patch for the Red Hat/Fedora
packages.
        regards, tom lane


Re: Setting oom_adj on linux?

От
Alvaro Herrera
Дата:
Magnus Hagander wrote:

> Right. Which is why I like the idea of disabling the OOM killer for
> the *postmaster*, but not the regular backends. Gives it a chance to
> recover. It's not nice, but it's better than nothing.

It doesn't sound like the init script can reenable the killer for the
child processes though.  So, if there's anything that the core code
ought to do, is re-enable OOM-killer for postmaster children, after
being disabled by the initscript.

BTW, is it possible for pg_ctl to disable OOM-killer?  I guess not,
since it's not run by root ...


Is there a way to disable memory overcommit for particular processes?
That would be very useful -- just disable overcommit for all Postgres
processes, and there shouldn't be much need to enable the killer for
backends.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


Re: Setting oom_adj on linux?

От
Magnus Hagander
Дата:
On Mon, Jan 4, 2010 at 17:55, Alvaro Herrera <alvherre@commandprompt.com> wrote:
> Magnus Hagander wrote:
>
>> Right. Which is why I like the idea of disabling the OOM killer for
>> the *postmaster*, but not the regular backends. Gives it a chance to
>> recover. It's not nice, but it's better than nothing.
>
> It doesn't sound like the init script can reenable the killer for the
> child processes though.  So, if there's anything that the core code
> ought to do, is re-enable OOM-killer for postmaster children, after
> being disabled by the initscript.

Yeah, that's why the backend code would need to be involved.


> BTW, is it possible for pg_ctl to disable OOM-killer?  I guess not,
> since it's not run by root ...

No, it has to run as root.


> Is there a way to disable memory overcommit for particular processes?
> That would be very useful -- just disable overcommit for all Postgres
> processes, and there shouldn't be much need to enable the killer for
> backends.

Not that I've been able to find.

-- Magnus HaganderMe: http://www.hagander.net/Work: http://www.redpill-linpro.com/


Re: Setting oom_adj on linux?

От
Tom Lane
Дата:
Alvaro Herrera <alvherre@commandprompt.com> writes:
> Is there a way to disable memory overcommit for particular processes?

I would think not --- the very essence of overcommit is that you're
promising more total memory than the system has got, and that's
inherently a global proposition.
        regards, tom lane


Re: Setting oom_adj on linux?

От
Alvaro Herrera
Дата:
Magnus Hagander wrote:
> On Mon, Jan 4, 2010 at 17:55, Alvaro Herrera <alvherre@commandprompt.com> wrote:

> > BTW, is it possible for pg_ctl to disable OOM-killer?  I guess not,
> > since it's not run by root ...
> 
> No, it has to run as root.

We could at least make it work on Windows, since it is often run as
Administrator and drops privileges afterwards ...

... oh, wait ...

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


Re: Setting oom_adj on linux?

От
Tom Lane
Дата:
Magnus Hagander <magnus@hagander.net> writes:
> I realize this is a very platform-specific thing, but should we
> consider setting the value of /proc/<pid>/oom_adj when running on
> linux? See:
>
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=blob;f=Documentation/filesystems/proc.txt;h=220cc6376ef80e0c9bcfec162d45552e729cdf5a;hb=45d28b097280a78893ce25a5d0db41e6a2717853

One interesting thing I read there is:
   Swapped out tasks are killed first. Half of each child's memory size is   added to the parent's score if they do not
sharethe same memory.                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 

This suggests that PG's shared memory ought not be counted in the
postmaster's OOM score, which would mean that the problem shouldn't be
quite as bad as we've believed.  I wonder if that is a recent change?
Or maybe it's supposed to be that way and is not implemented correctly?

BTW, the given link shows only chapter 1, see 


http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=blob_plain;f=Documentation/filesystems/proc.txt;hb=45d28b097280a78893ce25a5d0db41e6a2717853

for the whole file.
        regards, tom lane


Re: Setting oom_adj on linux?

От
"Kevin Grittner"
Дата:
Tom Lane <tgl@sss.pgh.pa.us> wrote:
> This suggests that PG's shared memory ought not be counted in the
> postmaster's OOM score, which would mean that the problem
> shouldn't be quite as bad as we've believed.  I wonder if that is
> a recent change?  Or maybe it's supposed to be that way and is not
> implemented correctly?
I've wondered about that based on my experience.  When I found that
memory leak back in 8.2devel, running on a SLES 9 SP 3 system, the
OOM killer killed the offending backend rather than the postmaster,
although it took out a couple Java middle tier processes before
starting in on PostgreSQL.
-Kevin


Re: Setting oom_adj on linux?

От
Ron Mayer
Дата:
Tom Lane wrote:
> Magnus Hagander <magnus@hagander.net> writes:
>> ...oom_adj...
> 
> One interesting thing I read there is:
>     Swapped out tasks are killed first. Half of each child's memory size is
>     added to the parent's score if they do not share the same memory.
>                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> This suggests that PG's shared memory ought not be counted in the
> postmaster's OOM score, which would mean that the problem shouldn't be
> quite as bad as we've believed.  I wonder if that is a recent change?
> Or maybe it's supposed to be that way and is not implemented correctly?

The code for oom_kill.c looks fairly readable (link below [1]):

96         points = mm->total_vm;
....
117         list_for_each_entry(child, &p->children, sibling) {
118                 task_lock(child);
119                 if (child->mm != mm && child->mm)
120                         points += child->mm->total_vm/2 + 1;
121                 task_unlock(child);
122         }

Which seems to add points for each child who doesn't share the
same mm structure as the parent.  Which I think is a quite a bit
stricter interpretation of "if they do not share the same memory".



[1]
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=blob;f=mm/oom_kill.c;h=f52481b1c1e5442c9a5b16b06b22221b75b9bb7c;hb=HEAD



Re: Setting oom_adj on linux?

От
Alex Hunsaker
Дата:
On Mon, Jan 4, 2010 at 09:55, Alvaro Herrera <alvherre@commandprompt.com> wrote:
> Magnus Hagander wrote:
>
>> Right. Which is why I like the idea of disabling the OOM killer for
>> the *postmaster*, but not the regular backends. Gives it a chance to
>> recover. It's not nice, but it's better than nothing.
>
> It doesn't sound like the init script can reenable the killer for the
> child processes though.  So, if there's anything that the core code
> ought to do, is re-enable OOM-killer for postmaster children, after
> being disabled by the initscript.

Exactly.

FWIW here is the patch I run.  Stupid as the patch may be, count it as
a +1 for people in the field doing this.  Hence a reason to think
about doing something in core.  maybe.

This has some oddities like it does not reset oom to 0 for the (wal)
writer process.  Plus assuming you do oom, the stats collector has a
good chance of being hit.  Although normal backends will probably have
a higher score.

[ oom_adj gets set to -17 in the startup script.  I run this on top of
disabling overcommit, color me paranoid ]

*** a/src/backend/postmaster/autovacuum.c
--- b/src/backend/postmaster/autovacuum.c
***************
*** 362,367 **** StartAutoVacLauncher(void)
--- 362,370 ----
  #ifndef EXEC_BACKEND
          case 0:
              /* in postmaster child ... */
+
+             oom_adjust();
+
              /* Close the postmaster's sockets */
              ClosePostmasterPorts(false);

*** a/src/backend/postmaster/fork_process.c
--- b/src/backend/postmaster/fork_process.c
***************
*** 65,68 **** fork_process(void)
--- 65,84 ----
      return result;
  }

+ void
+ oom_adjust(void)
+ {
+     /* adjust oom */
+     FILE *oom = fopen("/proc/self/oom_adj", "w");
+
+     /*
+      * ignore errors we dont really care
+      */
+     if (oom)
+     {
+         fprintf(oom, "0\n");
+         fclose(oom);
+     }
+ }
+
  #endif   /* ! WIN32 */
*** a/src/backend/postmaster/pgarch.c
--- b/src/backend/postmaster/pgarch.c
***************
*** 161,166 **** pgarch_start(void)
--- 161,169 ----
  #ifndef EXEC_BACKEND
          case 0:
              /* in postmaster child ... */
+
+             oom_adjust();
+
              /* Close the postmaster's sockets */
              ClosePostmasterPorts(false);

*** a/src/backend/postmaster/pgstat.c
--- b/src/backend/postmaster/pgstat.c
***************
*** 622,627 **** pgstat_start(void)
--- 622,630 ----
  #ifndef EXEC_BACKEND
          case 0:
              /* in postmaster child ... */
+
+             oom_adjust();
+
              /* Close the postmaster's sockets */
              ClosePostmasterPorts(false);

*** a/src/backend/postmaster/postmaster.c
--- b/src/backend/postmaster/postmaster.c
***************
*** 3056,3061 **** BackendStartup(Port *port)
--- 3056,3063 ----
      {
          free(bn);

+         oom_adjust();
+
          /*
           * Let's clean up ourselves as the postmaster child, and close the
           * postmaster's listen sockets.  (In EXEC_BACKEND case this is all
*** a/src/backend/postmaster/syslogger.c
--- b/src/backend/postmaster/syslogger.c
***************
*** 530,535 **** SysLogger_Start(void)
--- 530,538 ----
  #ifndef EXEC_BACKEND
          case 0:
              /* in postmaster child ... */
+
+             oom_adjust();
+
              /* Close the postmaster's sockets */
              ClosePostmasterPorts(true);

*** a/src/include/postmaster/fork_process.h
--- b/src/include/postmaster/fork_process.h
***************
*** 13,17 ****
--- 13,18 ----
  #define FORK_PROCESS_H

  extern pid_t fork_process(void);
+ extern void oom_adjust(void);

  #endif   /* FORK_PROCESS_H */

Вложения

Re: Setting oom_adj on linux?

От
Tom Lane
Дата:
Alex Hunsaker <badalex@gmail.com> writes:
> FWIW here is the patch I run.  Stupid as the patch may be, count it as
> a +1 for people in the field doing this.  Hence a reason to think
> about doing something in core.  maybe.

Thanks for the patch --- it's certainly a fine starting point.

We can either drop this in core (with a lot of #ifdef LINUX added)
or expect Linux packagers to carry it as a patch.  Given that the
packagers would also have to modify their init scripts to go with,
the patch route is not unreasonable.  Comments?

> This has some oddities like it does not reset oom to 0 for the (wal)
> writer process.

FWIW, I think that's probably a feature --- I'd vote for only resetting
in regular backends and possibly autovac workers.
        regards, tom lane


Re: Setting oom_adj on linux?

От
Alex Hunsaker
Дата:
On Thu, Jan 7, 2010 at 20:26, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Alex Hunsaker <badalex@gmail.com> writes:

> We can either drop this in core (with a lot of #ifdef LINUX added)

Any thoughts on doing something like (in fork_process.c)

#ifdef LINUX
void oom_adjust()
{
...
}
#else
void oom_adjust() {}
#endif

So there is only one #ifdef?  It still leaves the ugly calls to the function...

> or expect Linux packagers to carry it as a patch.  Given that the
> packagers would also have to modify their init scripts to go with,
> the patch route is not unreasonable.  Comments?

Id plus +1 for core.  The problem certainly does not look to be going
away soon (if ever).

>> This has some oddities like it does not reset oom to 0 for the (wal)
>> writer process.
>
> FWIW, I think that's probably a feature --- I'd vote for only resetting
> in regular backends and possibly autovac workers.

I think that makes sense +1.  In-fact thats why the patch has it as a
separate function instead of hacked into fork_process().  However its
mainly odd because IIRC I greped for all instances of fork_process()
and added the oom_adjusting to the callers.  Given that it seems the
wall writer procs should also be set to 0.  My guess is its a race
with my startup script launching postgres and then setting oom_adj. Or
maybe I missed a caller? Maybe they don't use fork_process()? Ill
check it out.


Re: Setting oom_adj on linux?

От
Magnus Hagander
Дата:
On Fri, Jan 8, 2010 at 04:46, Alex Hunsaker <badalex@gmail.com> wrote:
> On Thu, Jan 7, 2010 at 20:26, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Alex Hunsaker <badalex@gmail.com> writes:
>
>> We can either drop this in core (with a lot of #ifdef LINUX added)
>
> Any thoughts on doing something like (in fork_process.c)
>
> #ifdef LINUX
> void oom_adjust()
> {
> ...
> }
> #else
> void oom_adjust() {}
> #endif
>
> So there is only one #ifdef?  It still leaves the ugly calls to the function...

Seems like a much better way, yes. Especially if we in the future want
to do this for more than one platform (if it becomes necessary).


>> or expect Linux packagers to carry it as a patch.  Given that the
>> packagers would also have to modify their init scripts to go with,
>> the patch route is not unreasonable.  Comments?
>
> Id plus +1 for core.  The problem certainly does not look to be going
> away soon (if ever).

Yeah, I think core is better. It's not like it's enough code to cause
a huge maintenance problem, I think.

Do we need to make the value configurable? I'd certainly find it
interesting to set backends to say 5 or something like that, that
makes them less likely to be killed than any old "oops opened too big
file in an editor"-process, but still possible to kill if the system
is *really* running out of memory.



-- Magnus HaganderMe: http://www.hagander.net/Work: http://www.redpill-linpro.com/


Re: Setting oom_adj on linux?

От
Tom Lane
Дата:
Magnus Hagander <magnus@hagander.net> writes:
> Do we need to make the value configurable? I'd certainly find it
> interesting to set backends to say 5 or something like that, that
> makes them less likely to be killed than any old "oops opened too big
> file in an editor"-process, but still possible to kill if the system
> is *really* running out of memory.

I don't want to go to the trouble of creating (and documenting) a
configure option for this.  Much less a GUC ;-)

What I suggest is that we do something like
#ifdef LINUX_OOM_ADJ...fprintf(oom, "%d\n", LINUX_OOM_ADJ);...#endif

Then, somebody who wants the feature would build with, say,-DLINUX_OOM_ADJ=0
or another value if they want that.
        regards, tom lane


Re: Setting oom_adj on linux?

От
Bruce Momjian
Дата:
Alex Hunsaker wrote:
> On Thu, Jan 7, 2010 at 20:26, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > Alex Hunsaker <badalex@gmail.com> writes:
> 
> > We can either drop this in core (with a lot of #ifdef LINUX added)
> 
> Any thoughts on doing something like (in fork_process.c)
> 
> #ifdef LINUX
> void oom_adjust()
> {
> ...
> }
> #else
> void oom_adjust() {}
> #endif
> 
> So there is only one #ifdef?  It still leaves the ugly calls to the function...

The usual solution for this kind of thing is:
#ifdef LINUX#define OOM_ADJUST oom_adjust()#else#define OOM_ADJUST do {} while (0)#endif

so there is no call or dummy function and you reference it in the code
as:
OOM_ADJUST;

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: Setting oom_adj on linux?

От
Alex Hunsaker
Дата:
On Fri, Jan 8, 2010 at 07:53, Bruce Momjian <bruce@momjian.us> wrote:
> Alex Hunsaker wrote:
>> On Thu, Jan 7, 2010 at 20:26, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> > Alex Hunsaker <badalex@gmail.com> writes:

> The usual solution for this kind of thing is:
>
>        #ifdef LINUX
>        #define OOM_ADJUST oom_adjust()
>        #else
>        #define OOM_ADJUST do {} while (0)
>        #endif
>
> so there is no call or dummy function and you reference it in the code
> as:

Surely any compiler worth its salt would turn a call to an empty void
function into a noop?  Then again maybe I just hate macros :)


Re: Setting oom_adj on linux?

От
Alex Hunsaker
Дата:
On Fri, Jan 8, 2010 at 07:27, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Then, somebody who wants the feature would build with, say,
>        -DLINUX_OOM_ADJ=0
> or another value if they want that.

Here is a stab at that.

It sets oom_adj for:
 autovacuum workers
 archivers (pgarch.c)
 regular backends

Also it updates the contrib linux starup script to start under an oom_adj of -17

Comments?

*** a/contrib/start-scripts/linux
--- b/contrib/start-scripts/linux
***************
*** 53,58 **** DAEMON="$prefix/bin/postmaster"
--- 53,63 ----
  # What to use to shut down the postmaster
  PGCTL="$prefix/bin/pg_ctl"

+ # Adjust oom_adj on linux to avoid the postmaster from be killed
+ # note you probably want to compile postgres with -DLINUX_OOM_ADJ=0
+ # so that regular backends will be killed on oom
+ OOM_ADJ=-17
+
  set -e

  # Only start if we can find the postmaster.
***************
*** 62,67 **** test -x $DAEMON || exit 0
--- 67,73 ----
  case $1 in
    start)
    echo -n "Starting PostgreSQL: "
+   echo $OOM_ADJ > /proc/self/oom_adj
    su - $PGUSER -c "$DAEMON -D '$PGDATA' &" >>$PGLOG 2>&1
    echo "ok"
    ;;
*** a/src/backend/postmaster/autovacuum.c
--- b/src/backend/postmaster/autovacuum.c
***************
*** 1403,1408 **** StartAutoVacWorker(void)
--- 1403,1411 ----
            /* Lose the postmaster's on-exit routines */
            on_exit_reset();

+           /* allow us to be killed on oom */
+           oom_adjust();
+
            AutoVacWorkerMain(0, NULL);
            break;
  #endif
*** a/src/backend/postmaster/fork_process.c
--- b/src/backend/postmaster/fork_process.c
***************
*** 66,68 **** fork_process(void)
--- 66,97 ----
  }

  #endif   /* ! WIN32 */
+
+ #if defined(__linux__) && defined(LINUX_OOM_ADJ)
+ /*
+  * By default linux really likes to kill the postmaster on oom.
+  * Because linux does not take SYSV shared mem into account it
+  * (almost) always will SIGKILL the postmaster on an oom event.
+  *
+  * In the event you started the postmaster under a low (negative)
+  * oom_adj.  This will adjust regular backends, autovac and archivers
+  * to LINUX_OOM_ADJ on fork().  Allowing them to be killed in an oom
+  * event.
+  *
+  * Later we might add other OS oom type stuff here.
+  */
+ void
+ oom_adjust(void)
+ {
+   FILE *oom = fopen("/proc/self/oom_adj", "w");
+
+   /* ignore errors we dont really care */
+   if (oom)
+   {
+       fprintf(oom, "%d\n", LINUX_OOM_ADJ);
+       fclose(oom);
+   }
+ }
+ #else
+ void oom_adjust(void) { }
+ #endif
*** a/src/backend/postmaster/pgarch.c
--- b/src/backend/postmaster/pgarch.c
***************
*** 170,175 **** pgarch_start(void)
--- 170,178 ----
            /* Drop our connection to postmaster's shared memory, as well */
            PGSharedMemoryDetach();

+           /* allow us to be killed on oom */
+           oom_adjust();
+
            PgArchiverMain(0, NULL);
            break;
  #endif
*** a/src/backend/postmaster/postmaster.c
--- b/src/backend/postmaster/postmaster.c
***************
*** 3076,3081 **** BackendStartup(Port *port)
--- 3076,3084 ----
        /* Perform additional initialization and collect startup packet */
        BackendInitialize(port);

+       /* allow us to be killed on oom */
+       oom_adjust();
+
        /* And run the backend */
        proc_exit(BackendRun(port));
    }
*** a/src/include/postmaster/fork_process.h
--- b/src/include/postmaster/fork_process.h
***************
*** 13,17 ****
--- 13,18 ----
  #define FORK_PROCESS_H

  extern pid_t fork_process(void);
+ extern void oom_adjust(void);

  #endif   /* FORK_PROCESS_H */

Вложения

Re: Setting oom_adj on linux?

От
Tom Lane
Дата:
Alex Hunsaker <badalex@gmail.com> writes:
> On Fri, Jan 8, 2010 at 07:27, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Then, somebody who wants the feature would build with, say,
>>        -DLINUX_OOM_ADJ=0
>> or another value if they want that.

> Here is a stab at that.

Anybody have an objection to this basic approach?  I'm in a bit of a
hurry to get something like this into the Fedora RPMs, so barring
objections I'm going to review this, commit it into HEAD, and then
make a back-ported patch I can use with 8.4 in Fedora.
        regards, tom lane


Re: Setting oom_adj on linux?

От
Bruce Momjian
Дата:
Tom Lane wrote:
> Alex Hunsaker <badalex@gmail.com> writes:
> > On Fri, Jan 8, 2010 at 07:27, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> Then, somebody who wants the feature would build with, say,
> >>        -DLINUX_OOM_ADJ=0
> >> or another value if they want that.
> 
> > Here is a stab at that.
> 
> Anybody have an objection to this basic approach?  I'm in a bit of a
> hurry to get something like this into the Fedora RPMs, so barring
> objections I'm going to review this, commit it into HEAD, and then
> make a back-ported patch I can use with 8.4 in Fedora.

Go for it, but FYI, we need to udpate the our OOM documentation mention
to reflect this change.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: Setting oom_adj on linux?

От
Stephen Frost
Дата:
* Magnus Hagander (magnus@hagander.net) wrote:
> Do we need to make the value configurable? I'd certainly find it
> interesting to set backends to say 5 or something like that, that
> makes them less likely to be killed than any old "oops opened too big
> file in an editor"-process, but still possible to kill if the system
> is *really* running out of memory.

We do need to make it configurable, in at least the sense of being able
to control if it's done or not.  There are some environments where you
won't be able to set it.  Perhaps just handling failure gracefully would
work, but I'd be happier if you could just disable it in the config
file.
Thanks,
    Stephen

Re: Setting oom_adj on linux?

От
Stephen Frost
Дата:
* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> I don't want to go to the trouble of creating (and documenting) a
> configure option for this.  Much less a GUC ;-)

Requiring a custom build to disable it would be horrible, in my view.
Or, at best, just means that the packagers won't enable it, which
obviously would be less than ideal.

Sorry if it's a pain, but I think it needs to either be configurable or
not done.  As I said before, it definitely needs to handle failure
gracefully, but I worry that even that won't be sufficient in some
cases.  Just thinking about how we run PG under VServers and Linux
Containers and whatnot, we ran into some issues with OpenSSH trying to
monkey with proc values and I'd really hate to run into the same issues
with PG.
Thanks,
    Stephen

Re: Setting oom_adj on linux?

От
Alex Hunsaker
Дата:
On Fri, Jan 8, 2010 at 10:07, Stephen Frost <sfrost@snowman.net> wrote:
> * Tom Lane (tgl@sss.pgh.pa.us) wrote:
>> I don't want to go to the trouble of creating (and documenting) a
>> configure option for this.  Much less a GUC ;-)
>
> Requiring a custom build to disable it would be horrible, in my view.
> Or, at best, just means that the packagers won't enable it, which
> obviously would be less than ideal.

FWIW I agree.

> Sorry if it's a pain, but I think it needs to either be configurable or
> not done.  As I said before, it definitely needs to handle failure
> gracefully, but I worry that even that won't be sufficient in some
> cases.  Just thinking about how we run PG under VServers and Linux
> Containers and whatnot, we ran into some issues with OpenSSH trying to
> monkey with proc values and I'd really hate to run into the same issues
> with PG.

As long as the VM/container you are running under wont kill postmaster
for trying to access proc-- the patch I posted should work fine.  It
just ignores any error (I assumed you might be running in a chroot
without proc or some such).


Re: Setting oom_adj on linux?

От
Stephen Frost
Дата:
* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> Alex Hunsaker <badalex@gmail.com> writes:
> > On Fri, Jan 8, 2010 at 07:27, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> Then, somebody who wants the feature would build with, say,
> >>        -DLINUX_OOM_ADJ=0
> >> or another value if they want that.
>
> > Here is a stab at that.
>
> Anybody have an objection to this basic approach?  I'm in a bit of a
> hurry to get something like this into the Fedora RPMs, so barring
> objections I'm going to review this, commit it into HEAD, and then
> make a back-ported patch I can use with 8.4 in Fedora.

Whoah, I would caution against doing that without being very confident
it won't break when installed under things like VServer, Linux
Containers, SELinux configurations, etc, when back-porting it.  I don't
expect people would be too pleased to discover their "nice, simple,
no-expected-issues" upgrade of a minor point release to all of a sudden
mean their database doesn't start anymore..

Sorry I havn't got time right now to run down the issue I had with
OpenSSH doing something similar, and it might have just been poor coding
in OpenSSH, but I wanted to voice my concern.
Thanks,
    Stephen

Re: Setting oom_adj on linux?

От
Tom Lane
Дата:
Stephen Frost <sfrost@snowman.net> writes:
> * Tom Lane (tgl@sss.pgh.pa.us) wrote:
>> I don't want to go to the trouble of creating (and documenting) a
>> configure option for this.  Much less a GUC ;-)

> Requiring a custom build to disable it would be horrible, in my view.
> Or, at best, just means that the packagers won't enable it, which
> obviously would be less than ideal.

I'm a packager, and I think that this approach is perfectly fine.
The place where the rubber meets the road is in the init script,
which is the packager's responsibility.  If the packager is going
to provide an init script that sets oom_adj in the first place,
he can turn on the compensation code inside the binary.  If not,
the compensation code has no purpose anyhow.  There are no moving
parts in this as far as the end user is concerned.

> Sorry if it's a pain, but I think it needs to either be configurable or
> not done.  As I said before, it definitely needs to handle failure
> gracefully,

We just ignore any error from the attempt to write to /proc.

> but I worry that even that won't be sufficient in some
> cases.  Just thinking about how we run PG under VServers and Linux
> Containers and whatnot,

I think you are missing the point that the code won't even be compiled
except on platforms where the packager has determined that it's sensible
to have it.
        regards, tom lane


Re: Setting oom_adj on linux?

От
Stephen Frost
Дата:
Alex,

* Alex Hunsaker (badalex@gmail.com) wrote:
> As long as the VM/container you are running under wont kill postmaster
> for trying to access proc-- the patch I posted should work fine.  It
> just ignores any error (I assumed you might be running in a chroot
> without proc or some such).

As I recall, oom_adj wasn't visible in the container because you
explicitly set what proc entries processes can have access to when using
VServers, and OpenSSH didn't handle that cleanly.  Guess what I'm just
saying is "don't just assume everything is as it would be on a 'normal'
system when dealing with /proc and friends", and, of course, test, test,
test when you're talking about back-porting things.
Thanks,
    Stephen

Re: Setting oom_adj on linux?

От
Andrew Dunstan
Дата:

Tom Lane wrote:
> I don't want to go to the trouble of creating (and documenting) a
> configure option for this.  Much less a GUC ;-)
>
> What I suggest is that we do something like
>
>     #ifdef LINUX_OOM_ADJ
>     ...
>     fprintf(oom, "%d\n", LINUX_OOM_ADJ);
>     ...
>     #endif
>
> Then, somebody who wants the feature would build with, say,
>     -DLINUX_OOM_ADJ=0
> or another value if they want that.
>             
>   

+1 for this. Looks like a sound approach.

cheers

andrew


Re: Setting oom_adj on linux?

От
Robert Haas
Дата:
On Fri, Jan 8, 2010 at 12:24 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Stephen Frost <sfrost@snowman.net> writes:
>> * Tom Lane (tgl@sss.pgh.pa.us) wrote:
>>> I don't want to go to the trouble of creating (and documenting) a
>>> configure option for this.  Much less a GUC ;-)
>
>> Requiring a custom build to disable it would be horrible, in my view.
>> Or, at best, just means that the packagers won't enable it, which
>> obviously would be less than ideal.
>
> I'm a packager, and I think that this approach is perfectly fine.
> The place where the rubber meets the road is in the init script,
> which is the packager's responsibility.  If the packager is going
> to provide an init script that sets oom_adj in the first place,
> he can turn on the compensation code inside the binary.  If not,
> the compensation code has no purpose anyhow.  There are no moving
> parts in this as far as the end user is concerned.

There could well be moving parts if the user wants to adjust the value
being written to oom_adj, and can't because it's compiled in.  I don't
see why we can't just add a GUC for this and be done with it.

...Robert


Re: Setting oom_adj on linux?

От
Tom Lane
Дата:
Stephen Frost <sfrost@snowman.net> writes:
> * Tom Lane (tgl@sss.pgh.pa.us) wrote:
>> I don't want to go to the trouble of creating (and documenting) a
>> configure option for this.  Much less a GUC ;-)

> Requiring a custom build to disable it would be horrible, in my view.

BTW, maybe you're confused about the intention here?  You'd need a
custom build to *enable* it, not to disable it.
        regards, tom lane


Re: Setting oom_adj on linux?

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
> There could well be moving parts if the user wants to adjust the value
> being written to oom_adj, and can't because it's compiled in.  I don't
> see why we can't just add a GUC for this and be done with it.

The number of users who will want to do that might be different from
epsilon, but not by enough to justify a GUC.  Furthermore, we don't have
any reasonable infrastructure for supporting platform-specific GUCs,
which means that the amount of effort you're proposing is extremely
large.
        regards, tom lane


Re: Setting oom_adj on linux?

От
Alex Hunsaker
Дата:
On Fri, Jan 8, 2010 at 10:24, Stephen Frost <sfrost@snowman.net> wrote:
> As I recall, oom_adj wasn't visible in the container because you
> explicitly set what proc entries processes can have access to when using
> VServers, and OpenSSH didn't handle that cleanly.  Guess what I'm just
> saying is "don't just assume everything is as it would be on a 'normal'
> system when dealing with /proc and friends", and, of course, test, test,
> test when you're talking about back-porting things.

Sure this was openssh? I just looked through the entire cvs history
for opensshp and found 0 references to 'oom' let alone 'oom_adj'.
Maybe something distro specific?


[ for the curious here is what I tried ]
$ git clone git://git.infradead.org/openssh.git
$ cd openssh
$ git grep oom_adj
$ git grep 'oom'
$ git grep oom | grep -vi loomis | grep -v room
$ git log -p | grep oom | grep -vi loomis | grep -v room | grep -v tsoome


Re: Setting oom_adj on linux?

От
Tom Lane
Дата:
Alex Hunsaker <badalex@gmail.com> writes:
> Sure this was openssh? I just looked through the entire cvs history
> for opensshp and found 0 references to 'oom' let alone 'oom_adj'.
> Maybe something distro specific?

FWIW, I see no evidence that sshd on Fedora does anything to change its
oom score --- the oom_adj file reads as zero for both the parent daemon
and its children.  Kinda scary to realize the OOM killer could easily
lock me out of boxes I run headless.
        regards, tom lane


Re: Setting oom_adj on linux?

От
Alex Hunsaker
Дата:
On Fri, Jan 8, 2010 at 12:45, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Alex Hunsaker <badalex@gmail.com> writes:
>> Sure this was openssh? I just looked through the entire cvs history
>> for opensshp and found 0 references to 'oom' let alone 'oom_adj'.
>> Maybe something distro specific?
>
> FWIW, I see no evidence that sshd on Fedora does anything to change its
> oom score --- the oom_adj file reads as zero for both the parent daemon
> and its children.  Kinda scary to realize the OOM killer could easily
> lock me out of boxes I run headless.

[ OT, CC trimmed]

Yeah, for me sshd has a score of 24 and has the 13th lowest oom_score
on my box.  While postgres with 0 connections and shared_buffers =
28MB has a score of 26558 and has the 5th highest oom_score.  Only
chromium and xmonad are above it.  With 5 connections it just about
doubles its score (to 47238). Had I been headless postgres would
certainly die on oom.  But even something like alsamixer or bash is
higher than sshd *shrug*.

For the curious here below is the raw data and how i generated it [
yes its far from perfect... ]:
(for file in /proc/*/; do echo `cat $file/oom_score` `cat
$file/cmdline` $file;  done) | sort -n

6 /sbin/agetty-838400tty1linux /proc/2204/
6 /sbin/agetty-838400tty2linux /proc/1977/
6 /sbin/agetty-838400tty3linux /proc/1978/
6 /usr/sbin/crond /proc/1954/
6 /usr/sbin/uptimed /proc/1971/
10 /usr/sbin/irqbalance /proc/1951/
12 /usr/sbin/ntpd-s /proc/1967/
12 /usr/sbin/smartd /proc/2128/
13 hald-addon-input: Listening on /dev/input/event2 /dev/input/event1
/proc/2044/
13 hald-addon-storage: polling /dev/sr0 (every 2 sec) /proc/2088/
13 /usr/lib/hal/hald-addon-cpufreq /proc/2092/
19 /usr/sbin/syslog-ng /proc/1639/
24 /usr/sbin/sshd /proc/1943/
26 /usr/lib/sa/sadc-F-L-SDISK6006- /proc/2716/
27 supervising syslog-ng /proc/1638/
38 hald-runner /proc/1992/
53 cat/proc/self//cmdline /proc/self/
64 /usr/lib/postfix/master /proc/2089/
90 /usr/bin/X-nolistentcp /proc/2173/
102 -bash /proc/10997/
108 /home/alex/.cabal/bin/xmobar-x1 /proc/2190/
140 /usr/bin/dbus-daemon--fork--print-pid5--print-address7--session /proc/2256/
140 /usr/bin/dbus-daemon--system /proc/1983/
191 /usr/lib/hal/hald-addon-acpi /proc/2093/
194 dbus-launch--autolaunch004d7f457c373938f22d796a4ae05b60--binary-syntax--close-stderr
/proc/2255/
199 /usr/sbin/ntpd-s /proc/1960/
206 ssh-agentxmonad /proc/2188/
287 tail-f/var/log/httpd/error.log /proc/8688/
310 firefox /proc/18571/
339 xbindkeys /proc/2192/
341 -bash /proc/10589/
354 /usr/local/bin/cmus /proc/2205/
400 /bin/sh/usr/bin/startx /proc/2155/
456 sort-n /proc/10998/
487 /usr/sbin/hald /proc/1991/
525 qmgr-l-tfifo-u /proc/2097/
639 /bin/sh/home/alex/.xinitrc /proc/2180/
736 /usr/lib/GConf/gconfd-2 /proc/18578/
861 tail-f/var/log/httpd/error.log /proc/1621/
1122 urxvtd-q-f-o /proc/2189/
1151 /usr/lib/chromium/chromium /proc/2220/
1172 -bash /proc/22276/
1351 mutt-y /proc/27637/
1497 alsamixer /proc/3528/
1528 ssh192.168.0.15 /proc/1523/
1534 -bash /proc/2863/
1534 -bash /proc/2881/
1534 -bash /proc/2891/
1793 /usr/lib/chromium/chromium /proc/2219/
1828 ssh70.98.186.4 /proc/2860/
2066 pickup-l-tfifo-u /proc/26254/
2195 postgres: stats collector process /proc/10602/
2281 -bash /proc/3521/
2299 -bash /proc/1516/
2447 -bash /proc/23990/
2573 -bash /proc/1538/
3082 /usr/lib/chromium/chromium --channel=2219.aa9eb00.196295212
--type=renderer --lang=en-US

--force-fieldtest=AsyncSlowStart/_AsyncSlowStart_off/CacheSize/CacheSizeGroup_6/DnsImpact/_max_500ms_queue_prefetch/GlobalSdch/_global_disable_sdch/SocketLateBinding/_enable_late_binding/
/proc/2392/
3305 -bash /proc/20490/
3796 /usr/sbin/httpd-kstart /proc/1553/
4697 /usr/bin/knotify4 /proc/31084/
6117 /usr/lib/chromium/chromium --channel=2219.afcb450.251437212
--type=renderer --lang=en-US

--force-fieldtest=AsyncSlowStart/_AsyncSlowStart_off/CacheSize/CacheSizeGroup_6/DnsImpact/_max_500ms_queue_prefetch/GlobalSdch/_global_disable_sdch/SocketLateBinding/_enable_late_binding/
/proc/18311/
6564
/proc/self/exe--channel=2219.a684ce8.115295409--type=extension--lang=en-US--force-fieldtest=AsyncSlowStart/_AsyncSlowStart_off/DnsImpact/_max_500ms_queue_prefetch/GlobalSdch/_global_disable_sdch/SocketLateBinding/_enable_late_binding/
/proc/2242/
6921
/proc/self/exe--channel=2219.a6862e0.261342415--type=extension--lang=en-US--force-fieldtest=AsyncSlowStart/_AsyncSlowStart_off/DnsImpact/_max_500ms_queue_prefetch/GlobalSdch/_global_disable_sdch/SocketLateBinding/_enable_late_binding/
/proc/2245/
7453
/proc/self/exe--type=plugin--plugin-path=/usr/lib/mozilla/plugins/libflashplayer.so--lang=en-US--plugin-data-dir=/home/alex/.config/chromium/Default--channel=2219.afd16a58.340779570
/proc/18316/
10132 -bash /proc/2195/
10154 postgres: wal writer process /proc/10600/
10154 postgres: writer process /proc/10599/
10299 postgres: autovacuum launcher process /proc/10601/
11019 init [3] /proc/1/
17332 /usr/sbin/httpd-kstart /proc/1616/
17365 /usr/sbin/httpd-kstart /proc/1611/
17366 /usr/sbin/httpd-kstart /proc/1612/
17398 /usr/sbin/httpd-kstart /proc/1613/
17567 /usr/sbin/httpd-kstart /proc/1614/
19954 xinit/home/alex/.xinitrc--/etc/X11/xinit/xserverrc:0-auth/tmp/serverauth.NHLLZS74xg
/proc/2172/
26558 bin/postgres-Dblah /proc/10597/
26882
/proc/self/exe--channel=2219.b1692d18.1483224477--type=extension--lang=en-US--force-fieldtest=AsyncSlowStart/_AsyncSlowStart_off/CacheSize/CacheSizeGroup_6/DnsImpact/_max_500ms_queue_prefetch/GlobalSdch/_global_disable_sdch/SocketLateBinding/_enable_late_binding/
/proc/1597/
52201 /usr/lib/chromium/chromium--type=zygote /proc/2221/
53839 /home/alex/.xmonad/xmonad-i386-linux /proc/2187/
61530 /usr/lib/chromium/chromium --channel=2219.aa993d0.1197730980
--type=renderer --lang=en-US

--force-fieldtest=AsyncSlowStart/_AsyncSlowStart_off/CacheSize/CacheSizeGroup_6/DnsImpact/_max_500ms_queue_prefetch/GlobalSdch/_global_disable_sdch/SocketLateBinding/_enable_late_binding/
/proc/1527/


Re: Setting oom_adj on linux?

От
Stephen Frost
Дата:
* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> Alex Hunsaker <badalex@gmail.com> writes:
> > Sure this was openssh? I just looked through the entire cvs history
> > for opensshp and found 0 references to 'oom' let alone 'oom_adj'.
> > Maybe something distro specific?
>
> FWIW, I see no evidence that sshd on Fedora does anything to change its
> oom score --- the oom_adj file reads as zero for both the parent daemon
> and its children.  Kinda scary to realize the OOM killer could easily
> lock me out of boxes I run headless.

There were a few issues, as it turns out, the particularly annoying one
was in the init script which caused upgrades to fail due to sshd not
being restarted, bug report here:

http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=473573

The other issue was with a Debian-specific patch which was applied to
OpenSSH which basically just created noise in the log file, bug report
here:

http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=487325

In the end, the problem was with errors being returned from attempts to
modify oom_adj.  As long as we can just ignore those hopefully there
won't be any issues.
Thanks,
    Stephen

Re: Setting oom_adj on linux?

От
Tom Lane
Дата:
Stephen Frost <sfrost@snowman.net> writes:
> The other issue was with a Debian-specific patch which was applied to
> OpenSSH which basically just created noise in the log file, bug report
> here:

> http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=487325

Hmm, that's pretty interesting, specifically this:

: After some discussion with the Linux-Vserver folks, they found some
: interesting information I thought it worth adding. First EPERM is not
: the error that they expected, and that inside a vserver guest its really
: strict about what options you open it with, both O_CREAT and O_TRUNC are
: forbidden, and O_WRONLY lets you write 0\n to it.

That suggests it might be worth our trouble to use open/write rather
than fopen, so that we can ensure the flags are correct to avoid this
type of restriction.

But in any case the main takeaway seems to be to not insist on the
operation succeeding ;-)
        regards, tom lane


Re: Setting oom_adj on linux?

От
Alex Hunsaker
Дата:
On Fri, Jan 8, 2010 at 15:24, Stephen Frost <sfrost@snowman.net> wrote:
> There were a few issues, as it turns out, the particularly annoying one
> was in the init script which caused upgrades to fail due to sshd not
> being restarted, bug report here:

Thanks for the pointers!

> http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=473573

The changes I proposed to the example linux startup script wont suffer
from that.

> http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=487325
>
> In the end, the problem was with errors being returned from attempts to
> modify oom_adj.  As long as we can just ignore those hopefully there
> won't be any issues.

Yep sounds good.

Thanks again!

Tom, sounds like you got busy with other stuff :) Should I submit a
new patch that uses open and O_WRONLY?


Re: Setting oom_adj on linux?

От
Tom Lane
Дата:
Alex Hunsaker <badalex@gmail.com> writes:
> Tom, sounds like you got busy with other stuff :) Should I submit a
> new patch that uses open and O_WRONLY?

No, I was just waiting to see if there were more comments.  I can
take it from here.
        regards, tom lane


Re: Setting oom_adj on linux?

От
Peter Eisentraut
Дата:
On fre, 2010-01-08 at 11:37 -0500, Tom Lane wrote:
> Alex Hunsaker <badalex@gmail.com> writes:
> > On Fri, Jan 8, 2010 at 07:27, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> Then, somebody who wants the feature would build with, say,
> >>        -DLINUX_OOM_ADJ=0
> >> or another value if they want that.
> 
> > Here is a stab at that.
> 
> Anybody have an objection to this basic approach?  I'm in a bit of a
> hurry to get something like this into the Fedora RPMs, so barring
> objections I'm going to review this, commit it into HEAD, and then
> make a back-ported patch I can use with 8.4 in Fedora.

I find this whole approach a bit evil.  If word of this gets out, every
server process on Linux will excuse itself from the OOM killer.  And
then the kernel guys will add another setting to override the process
preference.  It's an arms race, but maybe that's what's needed.



Re: Setting oom_adj on linux?

От
Alex Hunsaker
Дата:
On Sat, Jan 9, 2010 at 14:06, Peter Eisentraut <peter_e@gmx.net> wrote:
> I find this whole approach a bit evil.

I would tend to agree but this type of thing has been known since about 2004...

See http://thoughts.j-davis.com/2009/11/29/linux-oom-killer/,
particularly the comment from Greg Smith.

> If word of this gets out, every
> server process on Linux will excuse itself from the OOM killer.  And
> then the kernel guys will add another setting to override the process
> preference.

Yes, and note debian is already doing that with things like ssh.  Who
knows what else. (Id be curious to know)

Plus maybe it will convince them its time to fix the damn thing.
Although postgres really is kind of special in this regard.  All the
other daemons on my system include X had way lower oom scores.
Alsamixer was 3 times more likely to get killed than the first daemon
with the highest score (hald) while postgres was 55 times more likely.Yes its the kernel being stupid, but its been
knownfor more than 6 
years...

(oom scores: alsamxier: 1497, hald: 487, postgres: 26558)

> It's an arms race, but maybe that's what's needed.

Well *shrug* regardless of what core does... Ill certainly be doing it
on my postgres linux builds :)  Maybe it would convince them more if
we could get distros to accept patches that fix the kernel to do
correct/better shared mem accounting?  May I add good luck? :)


Re: Setting oom_adj on linux?

От
Andrew Dunstan
Дата:

Peter Eisentraut wrote:
> On fre, 2010-01-08 at 11:37 -0500, Tom Lane wrote:
>   
>> Alex Hunsaker <badalex@gmail.com> writes:
>>     
>>> On Fri, Jan 8, 2010 at 07:27, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>>       
>>>> Then, somebody who wants the feature would build with, say,
>>>>        -DLINUX_OOM_ADJ=0
>>>> or another value if they want that.
>>>>         
>>> Here is a stab at that.
>>>       
>> Anybody have an objection to this basic approach?  I'm in a bit of a
>> hurry to get something like this into the Fedora RPMs, so barring
>> objections I'm going to review this, commit it into HEAD, and then
>> make a back-ported patch I can use with 8.4 in Fedora.
>>     
>
> I find this whole approach a bit evil.  If word of this gets out, every
> server process on Linux will excuse itself from the OOM killer.  And
> then the kernel guys will add another setting to override the process
> preference.  It's an arms race, but maybe that's what's needed.
>   

The trouble is that the OOM heuristics are pretty bad, and many Linux 
hackers aren't interested in improving them. One of the most prominent 
told me some years ago "Just turn it off."

And the point of this patch is to allow the postmaster to *remove* OOM 
protection from normal postgres backends. We at least would be playing 
nice, and not engaging in an arms race.

cheers

andrew


Re: Setting oom_adj on linux?

От
Tom Lane
Дата:
Alex Hunsaker <badalex@gmail.com> writes:
> On Sat, Jan 9, 2010 at 14:06, Peter Eisentraut <peter_e@gmx.net> wrote:
>> If word of this gets out, every
>> server process on Linux will excuse itself from the OOM killer.  And
>> then the kernel guys will add another setting to override the process
>> preference.

> ... maybe it will convince them its time to fix the damn thing.
> Although postgres really is kind of special in this regard.

Yeah.  If they had saner handling of shared-memory accounting, maybe
there wouldn't be a need for us to kluge around the OOM logic.
        regards, tom lane


Re: Setting oom_adj on linux?

От
Tom Lane
Дата:
Alex Hunsaker <badalex@gmail.com> writes:
> On Fri, Jan 8, 2010 at 07:27, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Then, somebody who wants the feature would build with, say,
>>        -DLINUX_OOM_ADJ=0
>> or another value if they want that.

> Here is a stab at that.

Applied with some editorialization.  I concluded that it'd be better to
put the oom_adj reset right into fork_process, rather than scattering
the support across several different files.  The latter seems vulnerable
to errors of omission in future versions, and there's no really strong
reason to not have all the child processes behave the same.  Also, a
single-file patch is a lot easier for packagers to borrow and apply to
existing releases, should they choose to.  (Already done and tested in
Fedora packages ...)
        regards, tom lane