Обсуждение: /proc/self/oom_adj is deprecated in newer Linux kernels

Поиск
Список
Период
Сортировка

/proc/self/oom_adj is deprecated in newer Linux kernels

От
Tom Lane
Дата:
While testing 9.1 RPMs on Fedora 15 (2.6.40 kernel), I notice
messages like these in the kernel log:

Sep 11 13:38:56 rhl kernel: [  415.308092] postgres (18040): /proc/18040/oom_adj is deprecated, please use
/proc/18040/oom_score_adjinstead.
 

These don't show up on every single PG process launch, but that probably
just indicates there's a rate-limiter in the kernel reporting mechanism.

So it looks like it behooves us to cater for oom_score_adj in the
future.  The simplest, least risky change that I can think of is to
copy-and-paste the relevant #ifdef code block in fork_process.c.
If we do that, then it would be up to the packager whether to #define
LINUX_OOM_ADJ or LINUX_OOM_SCORE_ADJ or both depending on the behavior
he wants.

That would be good enough for my own purposes in building Fedora/RHEL
packages, since I can predict with confidence which kernel versions a
given build is likely to be used with.  I think probably the same
would be true for most other distro-specific builds.  Does anyone want
to argue for doing something more complicated, and if so what exactly?
        regards, tom lane


Re: /proc/self/oom_adj is deprecated in newer Linux kernels

От
Greg Stark
Дата:
On Fri, Sep 16, 2011 at 3:57 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Does anyone want
> to argue for doing something more complicated, and if so what exactly?
>

Well there's no harm trying to write to oom_score_adj and if that
fails with EEXISTS trying to write to oom_adj.

-- 
greg


Re: /proc/self/oom_adj is deprecated in newer Linux kernels

От
Tom Lane
Дата:
Greg Stark <stark@mit.edu> writes:
> On Fri, Sep 16, 2011 at 3:57 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Does anyone want
>> to argue for doing something more complicated, and if so what exactly?

> Well there's no harm trying to write to oom_score_adj and if that
> fails with EEXISTS trying to write to oom_adj.

Well, (1) what value are you going to write (they need to be different
for the two files), and (2) the main point of the exercise, at present,
is to avoid kernel log messages.  I'm not sure that trying to create
random files under /proc isn't going to draw bleats in the kernel log.
        regards, tom lane


Re: /proc/self/oom_adj is deprecated in newer Linux kernels

От
Alvaro Herrera
Дата:
Excerpts from Tom Lane's message of vie sep 16 13:37:46 -0300 2011:
> Greg Stark <stark@mit.edu> writes:
> > On Fri, Sep 16, 2011 at 3:57 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> Does anyone want
> >> to argue for doing something more complicated, and if so what exactly?
> 
> > Well there's no harm trying to write to oom_score_adj and if that
> > fails with EEXISTS trying to write to oom_adj.
> 
> Well, (1) what value are you going to write (they need to be different
> for the two files), and (2) the main point of the exercise, at present,
> is to avoid kernel log messages.  I'm not sure that trying to create
> random files under /proc isn't going to draw bleats in the kernel log.

I guess the question is what semantics the new code has.  In the old
badness() world, child processes inherited the oom_adj value of its
parent.  What the code in fork_process was used for was resetting the
value back to 0 (meaning "kernel is allowed to kill this process on
OOM"), so that you could set the oom_adj in the start script for
postmaster (to a value meaning "never kill this one"), and the backends
would see their values reset to zero.

The new oom_score_adj has a scale of -1000 to +1000, with zero being
neutral and -1000 being "never kill".  So what we want to do here in
most cases, it seems, is set the value to zero whether it's oom_adj or
oom_score_adj -- assuming the new code is still causing children
processes to inherit the "adj" value from the parent.

Now the problem is that we have defined the LINUX_OOM_ADJ symbol as
meaning the value we're going to write.  Maybe this wasn't the best
choice.  I mean, it's very flexible, but it doesn't seem to offer any
benefit over a plain boolean choice.

Is your proposal to create a new LINUX_OOM_SCORE_ADJ cpp symbol with the
same semantics?

The most thorough documentation on this option seems to be this commit:
https://github.com/mirrors/linux-2.6/commit/a63d83f427fbce97a6cea0db2e64b0eb8435cd10#include/linux/oom.h

-- 
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: /proc/self/oom_adj is deprecated in newer Linux kernels

От
Tom Lane
Дата:
Alvaro Herrera <alvherre@commandprompt.com> writes:
> Now the problem is that we have defined the LINUX_OOM_ADJ symbol as
> meaning the value we're going to write.  Maybe this wasn't the best
> choice.  I mean, it's very flexible, but it doesn't seem to offer any
> benefit over a plain boolean choice.

> Is your proposal to create a new LINUX_OOM_SCORE_ADJ cpp symbol with the
> same semantics?

Yes, that's what I was thinking.  We could avoid that if we were going
to hard-wire a decision that zero is the thing to write, but I see no
reason to place such a restriction on users.  Who's to say they might
not want backends to adopt some other value?
        regards, tom lane


Re: /proc/self/oom_adj is deprecated in newer Linux kernels

От
Alex Hunsaker
Дата:
On Fri, Sep 16, 2011 at 10:37, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Greg Stark <stark@mit.edu> writes:
>> On Fri, Sep 16, 2011 at 3:57 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> Does anyone want
>>> to argue for doing something more complicated, and if so what exactly?
>
>> Well there's no harm trying to write to oom_score_adj and if that
>> fails with EEXISTS trying to write to oom_adj.

Yeah, I don't really like the idea of a compile time option that is
kernel version dependent... But i don't feel too strongly about it
either (all my kernels are new enough that they support
oom_score_adj).

I'll also note that on my system we are in the good company of ssd and chromium:
sshd (978): /proc/978/oom_adj is deprecated, please use
/proc/978/oom_score_adj instead.
chromium-sandbo (1377): /proc/1375/oom_adj is deprecated, please use
/proc/1375/oom_score_adj instead.

[ It quite annoying that soon after we decided to stick
-DLINUX_OOM_ADJ in they changed it.  Whatever happened to a stable
userspace API :-( ]


Re: /proc/self/oom_adj is deprecated in newer Linux kernels

От
Peter Eisentraut
Дата:
On fre, 2011-09-16 at 10:57 -0400, Tom Lane wrote:
> So it looks like it behooves us to cater for oom_score_adj in the
> future.  The simplest, least risky change that I can think of is to
> copy-and-paste the relevant #ifdef code block in fork_process.c.
> If we do that, then it would be up to the packager whether to #define
> LINUX_OOM_ADJ or LINUX_OOM_SCORE_ADJ or both depending on the behavior
> he wants.

There are lots of reasons why that won't work: backports, forward ports,
derivatives, custom kernels, distribution upgrades, virtual hosting.

ISTM that we could at least query the currently used kernel version.



Re: /proc/self/oom_adj is deprecated in newer Linux kernels

От
Tom Lane
Дата:
Peter Eisentraut <peter_e@gmx.net> writes:
> On fre, 2011-09-16 at 10:57 -0400, Tom Lane wrote:
>> So it looks like it behooves us to cater for oom_score_adj in the
>> future.  The simplest, least risky change that I can think of is to
>> copy-and-paste the relevant #ifdef code block in fork_process.c.
>> If we do that, then it would be up to the packager whether to #define
>> LINUX_OOM_ADJ or LINUX_OOM_SCORE_ADJ or both depending on the behavior
>> he wants.

> There are lots of reasons why that won't work: backports, forward ports,
> derivatives, custom kernels, distribution upgrades, virtual hosting.

[ shrug... ]  These are all hypothetical reasons.  A packager who
foresaw needing that could just turn on both write attempts, or for that
matter patch the code to do whatever else he saw fit.  In practice,
we've not had requests for anything significantly smarter than what is
there.

But having said that, it wouldn't be very hard to arrange things so that
if you did have both symbols defined, the code would only attempt to
write oom_adj if it had failed to write oom_score_adj; which is about as
close as you're likely to get to a kernel version test for this.
        regards, tom lane


Re: /proc/self/oom_adj is deprecated in newer Linux kernels

От
Peter Eisentraut
Дата:
On sön, 2011-09-18 at 12:21 -0400, Tom Lane wrote:
> Peter Eisentraut <peter_e@gmx.net> writes:
> > On fre, 2011-09-16 at 10:57 -0400, Tom Lane wrote:
> >> So it looks like it behooves us to cater for oom_score_adj in the
> >> future.  The simplest, least risky change that I can think of is to
> >> copy-and-paste the relevant #ifdef code block in fork_process.c.
> >> If we do that, then it would be up to the packager whether to #define
> >> LINUX_OOM_ADJ or LINUX_OOM_SCORE_ADJ or both depending on the behavior
> >> he wants.
> 
> > There are lots of reasons why that won't work: backports, forward ports,
> > derivatives, custom kernels, distribution upgrades, virtual hosting.
> 
> [ shrug... ]  These are all hypothetical reasons.  A packager who
> foresaw needing that could just turn on both write attempts, or for that
> matter patch the code to do whatever else he saw fit.  In practice,
> we've not had requests for anything significantly smarter than what is
> there.
> 
> But having said that, it wouldn't be very hard to arrange things so that
> if you did have both symbols defined, the code would only attempt to
> write oom_adj if it had failed to write oom_score_adj; which is about as
> close as you're likely to get to a kernel version test for this.

Why is this feature not a run-time configuration variable or at least a
configure option?  It's awfully well hidden now.  I doubt a lot of
people are using this even though they might wish to.



Re: /proc/self/oom_adj is deprecated in newer Linux kernels

От
Tom Lane
Дата:
Peter Eisentraut <peter_e@gmx.net> writes:
> On sön, 2011-09-18 at 12:21 -0400, Tom Lane wrote:
>> But having said that, it wouldn't be very hard to arrange things so that
>> if you did have both symbols defined, the code would only attempt to
>> write oom_adj if it had failed to write oom_score_adj; which is about as
>> close as you're likely to get to a kernel version test for this.

> Why is this feature not a run-time configuration variable or at least a
> configure option?  It's awfully well hidden now.  I doubt a lot of
> people are using this even though they might wish to.

See the thread in which the feature was designed originally:
http://archives.postgresql.org/pgsql-hackers/2010-01/msg00170.php

The key point is that to get useful behavior, you need cooperation
between both a root-privileged startup script and the PG executable.
That tends to throw the problem into the domain of packagers, more
than end users, and definitely puts a big crimp in the idea that
run-time configuration of just half of the behavior could be helpful.
So far, no Linux packagers have complained that the design is inadequate
(a position that I also hold when wearing my red fedora) so I do not
feel a need to complicate it further.
        regards, tom lane


[PATCH] Use new oom_score_adj without a new compile-time constant

От
Dan McGee
Дата:
This is one way to prevent the kernel warning message without having to
introduce a new constant. Scale the old oom_adj-style value the same way
the kernel internally does it and use that instead if oom_score_adj is
available for writing.

Signed-off-by: Dan McGee <dan@archlinux.org>
---

This addresses some of the concerns raised on the ML and will at least keep
those of us running modern kernels happy.

Alternatively one could switch the symbol used to be the new style and have the
old one computed; however this is a pain for legacy vs. current versions.
src/backend/postmaster/fork_process.c |   22 +++++++++++++++++++++-1 files changed, 21 insertions(+), 1 deletions(-)

diff --git a/src/backend/postmaster/fork_process.c b/src/backend/postmaster/fork_process.c
index b2fe9a1..3cded54 100644
--- a/src/backend/postmaster/fork_process.c
+++ b/src/backend/postmaster/fork_process.c
@@ -81,16 +81,36 @@ fork_process(void)             * Use open() not stdio, to ensure we control the open flags. Some
        * Linux security environments reject anything but O_WRONLY.             */
 
-            int            fd = open("/proc/self/oom_adj", O_WRONLY, 0);
+            int            fd = open("/proc/self/oom_score_adj", O_WRONLY, 0);            /* We ignore all errors */
        if (fd >= 0)            {                char        buf[16];
 
+                int        oom_score_adj;
+                /*
+                 * The compile-time value is the old style oom_adj;
+                 * scale it the same way the kernel does to
+                 * convert to the new style oom_score_adj. This
+                 * should become a constant at compile time.
+                 * Valid values range from -17 (never kill) to
+                 * 15; no attempt of validation is done.
+                 */
+                oom_score_adj = LINUX_OOM_ADJ * 1000 / 17;                snprintf(buf, sizeof(buf), "%d\n",
LINUX_OOM_ADJ);               (void) write(fd, buf, strlen(buf));                close(fd);
 
+            } else if (errno == EEXIST) {
+                int        fd = open("/proc/self/oom_adj", O_WRONLY, 0);
+                if (fd >= 0)
+                {
+                    char    buf[16];
+
+                    snprintf(buf, sizeof(buf), "%d\n", LINUX_OOM_ADJ);
+                    (void) write(fd, buf, strlen(buf));
+                    close(fd);
+                }            }        }#endif   /* LINUX_OOM_ADJ */
-- 
1.7.6.1



Re: [PATCH] Use new oom_score_adj without a new compile-time constant

От
Dan McGee
Дата:
On Mon, Sep 19, 2011 at 3:11 PM, Dan McGee <dan@archlinux.org> wrote:
> This is one way to prevent the kernel warning message without having to
> introduce a new constant. Scale the old oom_adj-style value the same way
> the kernel internally does it and use that instead if oom_score_adj is
> available for writing.
>
> Signed-off-by: Dan McGee <dan@archlinux.org>
> ---
>
> This addresses some of the concerns raised on the ML and will at least keep
> those of us running modern kernels happy.
>
> Alternatively one could switch the symbol used to be the new style and have the
> old one computed; however this is a pain for legacy vs. current versions.
>
>  src/backend/postmaster/fork_process.c |   22 +++++++++++++++++++++-
>  1 files changed, 21 insertions(+), 1 deletions(-)
>
> diff --git a/src/backend/postmaster/fork_process.c b/src/backend/postmaster/fork_process.c
> index b2fe9a1..3cded54 100644
> --- a/src/backend/postmaster/fork_process.c
> +++ b/src/backend/postmaster/fork_process.c
> @@ -81,16 +81,36 @@ fork_process(void)
>                         * Use open() not stdio, to ensure we control the open flags. Some
>                         * Linux security environments reject anything but O_WRONLY.
>                         */
> -                       int                     fd = open("/proc/self/oom_adj", O_WRONLY, 0);
> +                       int                     fd = open("/proc/self/oom_score_adj", O_WRONLY, 0);
>
>                        /* We ignore all errors */
>                        if (fd >= 0)
>                        {
>                                char            buf[16];
> +                               int             oom_score_adj;
>
> +                               /*
> +                                * The compile-time value is the old style oom_adj;
> +                                * scale it the same way the kernel does to
> +                                * convert to the new style oom_score_adj. This
> +                                * should become a constant at compile time.
> +                                * Valid values range from -17 (never kill) to
> +                                * 15; no attempt of validation is done.
> +                                */
> +                               oom_score_adj = LINUX_OOM_ADJ * 1000 / 17;
>                                snprintf(buf, sizeof(buf), "%d\n", LINUX_OOM_ADJ);
Of course it would help to actually use the computed value here; this should be:
snprintf(buf,sizeof(buf), "%d\n", 
oom_score_adj);

>                                (void) write(fd, buf, strlen(buf));
>                                close(fd);
> +                       } else if (errno == EEXIST) {
> +                               int             fd = open("/proc/self/oom_adj", O_WRONLY, 0);
> +                               if (fd >= 0)
> +                               {
> +                                       char    buf[16];
> +
> +                                       snprintf(buf, sizeof(buf), "%d\n", LINUX_OOM_ADJ);
> +                                       (void) write(fd, buf, strlen(buf));
> +                                       close(fd);
> +                               }
>                        }
>                }
>  #endif   /* LINUX_OOM_ADJ */
> --
> 1.7.6.1
>
>

Re: [PATCH] Use new oom_score_adj without a new compile-time constant

От
Robert Haas
Дата:
On Mon, Sep 19, 2011 at 4:36 PM, Dan McGee <dan@archlinux.org> wrote:
> [ patch ]

I suppose it's Tom who really needs to comment on this, but I'm not
too enthusiastic about this approach.  Duplicating the Linux kernel
calculation into our code means that we could drift if the formula
changes again.

I like Tom's previous suggestion (I think) of allowing both constants
to be defined - if they are, then we try oom_score_adj first and fall
back to oom_adj if that fails.  For bonus points, we could have
postmaster stat() its own oom_score_adj file before forking and set a
global variable to indicate the results.  That way we'd only ever need
to test once per postmaster startup (at least until someone figures
out a way to swap out the running kernel without stopping the
server...!).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: [PATCH] Use new oom_score_adj without a new compile-time constant

От
Dan McGee
Дата:
On Fri, Sep 23, 2011 at 2:49 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Mon, Sep 19, 2011 at 4:36 PM, Dan McGee <dan@archlinux.org> wrote:
>> [ patch ]
>
> I suppose it's Tom who really needs to comment on this, but I'm not
> too enthusiastic about this approach.  Duplicating the Linux kernel
> calculation into our code means that we could drift if the formula
> changes again.
Why would the formula ever change? This seems like a different excuse
way of really saying you don't appreciate the hacky approach, which I
can understand completely. However, it simply doesn't make sense for
them to change this formula, as it scales the -17 to 16 old range to
the new -1000 to 1000 range. Those endpoints won't be changing unless
a third method is introduced, in which case this whole thing is mute
and we'd need to fix it yet again.

> I like Tom's previous suggestion (I think) of allowing both constants
> to be defined - if they are, then we try oom_score_adj first and fall
> back to oom_adj if that fails.  For bonus points, we could have
> postmaster stat() its own oom_score_adj file before forking and set a
> global variable to indicate the results.  That way we'd only ever need
> to test once per postmaster startup (at least until someone figures
> out a way to swap out the running kernel without stopping the
> server...!).
This would be fine, it just seems unreasonably complicated, not to
mention unnecessary. I might as well point this out [1]. I don't think
a soul out there has built without defining this to 0 (if they define
it at all), and not even that many people are using it. Is it all that
bad of an idea to just force it to 0 for both knobs and be done with
it?

-Dan McGee

[1] http://www.google.com/codesearch#search/&q=LINUX_OOM_ADJ=&type=cs
- Slackware and Fedora are the only hits not in the PG codebase, and
both define it to 0.


Re: [PATCH] Use new oom_score_adj without a new compile-time constant

От
Robert Haas
Дата:
On Fri, Sep 23, 2011 at 4:05 PM, Dan McGee <dan@archlinux.org> wrote:
> On Fri, Sep 23, 2011 at 2:49 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Mon, Sep 19, 2011 at 4:36 PM, Dan McGee <dan@archlinux.org> wrote:
>>> [ patch ]
>>
>> I suppose it's Tom who really needs to comment on this, but I'm not
>> too enthusiastic about this approach.  Duplicating the Linux kernel
>> calculation into our code means that we could drift if the formula
>> changes again.
> Why would the formula ever change? This seems like a different excuse
> way of really saying you don't appreciate the hacky approach, which I
> can understand completely. However, it simply doesn't make sense for
> them to change this formula, as it scales the -17 to 16 old range to
> the new -1000 to 1000 range. Those endpoints won't be changing unless
> a third method is introduced, in which case this whole thing is mute
> and we'd need to fix it yet again.
>
>> I like Tom's previous suggestion (I think) of allowing both constants
>> to be defined - if they are, then we try oom_score_adj first and fall
>> back to oom_adj if that fails.  For bonus points, we could have
>> postmaster stat() its own oom_score_adj file before forking and set a
>> global variable to indicate the results.  That way we'd only ever need
>> to test once per postmaster startup (at least until someone figures
>> out a way to swap out the running kernel without stopping the
>> server...!).
> This would be fine, it just seems unreasonably complicated, not to
> mention unnecessary. I might as well point this out [1]. I don't think
> a soul out there has built without defining this to 0 (if they define
> it at all), and not even that many people are using it. Is it all that
> bad of an idea to just force it to 0 for both knobs and be done with
> it?

Did we do anything about this?  Anyone else have an opinion on what
ought to be done?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: [PATCH] Use new oom_score_adj without a new compile-time constant

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
>> [ oom_score_adj business ]

> Did we do anything about this?  Anyone else have an opinion on what
> ought to be done?

I held off doing anything because it didn't seem like we had consensus.
OTOH, it may well be that it's not important enough to demand real
consensus, and he who does the work gets to make the choices.
        regards, tom lane


Re: [PATCH] Use new oom_score_adj without a new compile-time constant

От
Robert Haas
Дата:
On Mon, Oct 24, 2011 at 1:17 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>>> [ oom_score_adj business ]
>
>> Did we do anything about this?  Anyone else have an opinion on what
>> ought to be done?
>
> I held off doing anything because it didn't seem like we had consensus.
> OTOH, it may well be that it's not important enough to demand real
> consensus, and he who does the work gets to make the choices.

Half a loaf is better than none.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: /proc/self/oom_adj is deprecated in newer Linux kernels

От
Tom Lane
Дата:
I was reminded today that we still haven't done anything about this:

Tom Lane <tgl@sss.pgh.pa.us> writes:
> While testing 9.1 RPMs on Fedora 15 (2.6.40 kernel), I notice
> messages like these in the kernel log:
> Sep 11 13:38:56 rhl kernel: [  415.308092] postgres (18040): /proc/18040/oom_adj is deprecated, please use
/proc/18040/oom_score_adjinstead.
 

At this point there are no shipping Fedora versions that don't emit this
gripe, and F15 is even about to go EOL.

The previous discussion thread at
http://archives.postgresql.org/pgsql-hackers/2011-09/msg00794.php
went off into the weeds of what was in my opinion over-design.
I still think it's sufficient to do what I suggested initially:

> ... The simplest, least risky change that I can think of is to
> copy-and-paste the relevant #ifdef code block in fork_process.c.
> If we do that, then it would be up to the packager whether to #define
> LINUX_OOM_ADJ or LINUX_OOM_SCORE_ADJ or both depending on the behavior
> he wants.

and would like to squeeze that into 9.2 so that we're only a year late
and not two years late in responding to this issue :-(.

Objections?
        regards, tom lane


Re: /proc/self/oom_adj is deprecated in newer Linux kernels

От
Robert Haas
Дата:
On Tue, Jun 12, 2012 at 12:37 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I was reminded today that we still haven't done anything about this:
>
> Tom Lane <tgl@sss.pgh.pa.us> writes:
>> While testing 9.1 RPMs on Fedora 15 (2.6.40 kernel), I notice
>> messages like these in the kernel log:
>> Sep 11 13:38:56 rhl kernel: [  415.308092] postgres (18040): /proc/18040/oom_adj is deprecated, please use
/proc/18040/oom_score_adjinstead. 
>
> At this point there are no shipping Fedora versions that don't emit this
> gripe, and F15 is even about to go EOL.
>
> The previous discussion thread at
> http://archives.postgresql.org/pgsql-hackers/2011-09/msg00794.php
> went off into the weeds of what was in my opinion over-design.
> I still think it's sufficient to do what I suggested initially:
>
>> ... The simplest, least risky change that I can think of is to
>> copy-and-paste the relevant #ifdef code block in fork_process.c.
>> If we do that, then it would be up to the packager whether to #define
>> LINUX_OOM_ADJ or LINUX_OOM_SCORE_ADJ or both depending on the behavior
>> he wants.
>
> and would like to squeeze that into 9.2 so that we're only a year late
> and not two years late in responding to this issue :-(.
>
> Objections?

I think my position, and the position of the people who responded to
the original thread, was that it seems like you're architecting a
kludge when it wouldn't be that hard to architect a nicer solution.
That having been said, I don't think it's such a large kludge that we
should all have massive dry heaves at the thought of it living in our
repository.  And then too, this isn't the time to be architecting new
9.2 feature anyway.  So I say go for it.  If it makes your life
easier, back-patch it.  Code that requires a #define to enable it
won't break anything for anyone else.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: /proc/self/oom_adj is deprecated in newer Linux kernels

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
> On Tue, Jun 12, 2012 at 12:37 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I still think it's sufficient to do what I suggested initially:
>>> ... The simplest, least risky change that I can think of is to
>>> copy-and-paste the relevant #ifdef code block in fork_process.c.

> I think my position, and the position of the people who responded to
> the original thread, was that it seems like you're architecting a
> kludge when it wouldn't be that hard to architect a nicer solution.

I'd be the first to agree it's a kluge.  But the end result of the
previous discussion was that it wasn't so obvious how to architect
a nicer solution.  Nor is there all that much room for people to use a
nicer solution, given that we need help from not just fork_process.c
but the root-privileged startup script (or lately in Fedora it's a
systemd unit script doing the privilege-increasing end of this).

In the short run, if we don't have consensus on this, I'll probably
just carry a Fedora patch like so:

-            int        fd = open("/proc/self/oom_adj", O_WRONLY, 0);
+            int        fd = open("/proc/self/oom_score_adj", O_WRONLY, 0);

But it seems a tad silly to be carrying such a patch for a block of
code that is only of interest to Linux packagers anyway, and nearly
all such packagers are facing this same issue either now or in the
very near future.
        regards, tom lane


Re: /proc/self/oom_adj is deprecated in newer Linux kernels

От
Robert Haas
Дата:
On Tue, Jun 12, 2012 at 1:05 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Tue, Jun 12, 2012 at 12:37 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> I still think it's sufficient to do what I suggested initially:
>>>> ... The simplest, least risky change that I can think of is to
>>>> copy-and-paste the relevant #ifdef code block in fork_process.c.
>
>> I think my position, and the position of the people who responded to
>> the original thread, was that it seems like you're architecting a
>> kludge when it wouldn't be that hard to architect a nicer solution.
>
> I'd be the first to agree it's a kluge.  But the end result of the
> previous discussion was that it wasn't so obvious how to architect
> a nicer solution.  Nor is there all that much room for people to use a
> nicer solution, given that we need help from not just fork_process.c
> but the root-privileged startup script (or lately in Fedora it's a
> systemd unit script doing the privilege-increasing end of this).

Well, I don't think a GUC or two would be such a bad way of doing it, but...

> In the short run, if we don't have consensus on this, I'll probably
> just carry a Fedora patch like so:
>
> -            int        fd = open("/proc/self/oom_adj", O_WRONLY, 0);
> +            int        fd = open("/proc/self/oom_score_adj", O_WRONLY, 0);
>
> But it seems a tad silly to be carrying such a patch for a block of
> code that is only of interest to Linux packagers anyway, and nearly
> all such packagers are facing this same issue either now or in the
> very near future.

...I also agree with this.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: /proc/self/oom_adj is deprecated in newer Linux kernels

От
Gurjeet Singh
Дата:
On Mon, Sep 19, 2011 at 10:18 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Peter Eisentraut <peter_e@gmx.net> writes:
>> On sön, 2011-09-18 at 12:21 -0400, Tom Lane wrote:
>>> But having said that, it wouldn't be very hard to arrange things so that
>>> if you did have both symbols defined, the code would only attempt to
>>> write oom_adj if it had failed to write oom_score_adj; which is about as
>>> close as you're likely to get to a kernel version test for this.
>
>> Why is this feature not a run-time configuration variable or at least a
>> configure option?  It's awfully well hidden now.  I doubt a lot of
>> people are using this even though they might wish to.
>
> See the thread in which the feature was designed originally:
> http://archives.postgresql.org/pgsql-hackers/2010-01/msg00170.php
>
> The key point is that to get useful behavior, you need cooperation
> between both a root-privileged startup script and the PG executable.
> That tends to throw the problem into the domain of packagers, more
> than end users, and definitely puts a big crimp in the idea that
> run-time configuration of just half of the behavior could be helpful.
> So far, no Linux packagers have complained that the design is inadequate
> (a position that I also hold when wearing my red fedora) so I do not
> feel a need to complicate it further.

Startup scripts are not solely in the domain of packagers. End users
can also be expected to develop/edit their own startup scripts.

Providing it as GUC would have given end users both the peices, but
with a compile-time option they have only one half of the solution;
except if they go compile their own binaries, which forces them into
being packagers.

I am not alone in feeling that if Postgres wishes to provide a control
over child backend's oom_score_adj, it should be a GUC parameter
rather than a compile-time option. Yesterday a customer wanted to
leverage this and couldn't because they refuse to maintain their own
fork of Postgres code.

Please find attached a patch to turn it into a GUC, #ifdef'd by __linux__ macro.

Best regards,
--
Gurjeet Singh http://gurjeet.singh.im/

EDB www.EnterpriseDB.com

Вложения

Re: /proc/self/oom_adj is deprecated in newer Linux kernels

От
Tom Lane
Дата:
Gurjeet Singh <gurjeet@singh.im> writes:
> Startup scripts are not solely in the domain of packagers. End users
> can also be expected to develop/edit their own startup scripts.

> Providing it as GUC would have given end users both the peices, but
> with a compile-time option they have only one half of the solution;
> except if they go compile their own binaries, which forces them into
> being packagers.

I don't find that this argument holds any water at all.  Anyone who's
developing their own start script can be expected to manage recompiling
Postgres.

Extra GUCs do not have zero cost, especially not ones that are as
complicated-to-explain as this would be.

I would also argue that there's a security issue with making it a GUC.
A non-root DBA should not have the authority to decide whether or not
postmaster child processes run with nonzero OOM adjustment; that decision
properly belongs to whoever has authorized the root-owned startup script
to change the adjustment in the first place.  So seeing this as two
independent pieces is not only wrong but dangerous.
        regards, tom lane



Re: /proc/self/oom_adj is deprecated in newer Linux kernels

От
Robert Haas
Дата:
On Tue, Jun 10, 2014 at 10:02 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Gurjeet Singh <gurjeet@singh.im> writes:
>> Startup scripts are not solely in the domain of packagers. End users
>> can also be expected to develop/edit their own startup scripts.
>
>> Providing it as GUC would have given end users both the peices, but
>> with a compile-time option they have only one half of the solution;
>> except if they go compile their own binaries, which forces them into
>> being packagers.
>
> I don't find that this argument holds any water at all.  Anyone who's
> developing their own start script can be expected to manage recompiling
> Postgres.

Huh?  Lots of people install PostgreSQL via, say, RPMs, but may still
want to change their startup script locally.  I don't think those
users should have to give up the benefits of RPM packaging because
they want a different oom_adj value.  Then they have to be responsible
for updating the packages every time there's a new minor release,
instead of just typing 'yum update'.  That's a LOT of extra work.

> Extra GUCs do not have zero cost, especially not ones that are as
> complicated-to-explain as this would be.

NOT having them isn't free either.

> I would also argue that there's a security issue with making it a GUC.
> A non-root DBA should not have the authority to decide whether or not
> postmaster child processes run with nonzero OOM adjustment; that decision
> properly belongs to whoever has authorized the root-owned startup script
> to change the adjustment in the first place.  So seeing this as two
> independent pieces is not only wrong but dangerous.

I think the only possible issue is if the DBA doesn't even have shell
access.  If he doesn't have root but does have shell access, he could
have recompiled anyway - it's just more work.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: /proc/self/oom_adj is deprecated in newer Linux kernels

От
Gurjeet Singh
Дата:
On Tue, Jun 10, 2014 at 10:02 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Gurjeet Singh <gurjeet@singh.im> writes:
>> Startup scripts are not solely in the domain of packagers. End users
>> can also be expected to develop/edit their own startup scripts.
>
>> Providing it as GUC would have given end users both the peices, but
>> with a compile-time option they have only one half of the solution;
>> except if they go compile their own binaries, which forces them into
>> being packagers.
>
> I don't find that this argument holds any water at all.  Anyone who's
> developing their own start script can be expected to manage recompiling
> Postgres.

I respectfully disagree. Writing and managing init/start scripts
requires much different set of expertise than compiling and managing
builds of a critical software like a database product.

I would consider adding `echo -1000 > /proc/self/oom_score_adj` to a
start script as a deployment specific tweak, and not 'developing own
start script'.

> Extra GUCs do not have zero cost, especially not ones that are as
> complicated-to-explain as this would be.
>
> I would also argue that there's a security issue with making it a GUC.
> A non-root DBA should not have the authority to decide whether or not
> postmaster child processes run with nonzero OOM adjustment; that decision
> properly belongs to whoever has authorized the root-owned startup script
> to change the adjustment in the first place.  So seeing this as two
> independent pieces is not only wrong but dangerous.

From experiments last night, I see that child process' oom_score_adj
is capped by the parent process' setting that is forking. So I don't
think it's a security risk from that perspective.

I'll retest and provide the exact findings.

Best regards,
-- 
Gurjeet Singh http://gurjeet.singh.im/

EDB www.EnterpriseDB.com



Re: /proc/self/oom_adj is deprecated in newer Linux kernels

От
Andres Freund
Дата:
On 2014-06-10 07:56:01 -0400, Gurjeet Singh wrote:
> Providing it as GUC would have given end users both the peices, but
> with a compile-time option they have only one half of the solution;
> except if they go compile their own binaries, which forces them into
> being packagers.
> 
> I am not alone in feeling that if Postgres wishes to provide a control
> over child backend's oom_score_adj, it should be a GUC parameter
> rather than a compile-time option. Yesterday a customer wanted to
> leverage this and couldn't because they refuse to maintain their own
> fork of Postgres code.

Independent of the rest of the discussion, I think there's one more
point: Trying to keep your system stable by *increasing* the priority of
normal backends is a bad idea. If you system gets into OOM land you need
to fix that, not whack who gets killed around.
The reason it makes sense to increase the priority of the postmaster is
that that *does* increase the stability by cleaning up resources and
restarting everything.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: /proc/self/oom_adj is deprecated in newer Linux kernels

От
Robert Haas
Дата:
On Tue, Jun 10, 2014 at 10:32 AM, Andres Freund <andres@2ndquadrant.com> wrote:
> On 2014-06-10 07:56:01 -0400, Gurjeet Singh wrote:
>> Providing it as GUC would have given end users both the peices, but
>> with a compile-time option they have only one half of the solution;
>> except if they go compile their own binaries, which forces them into
>> being packagers.
>>
>> I am not alone in feeling that if Postgres wishes to provide a control
>> over child backend's oom_score_adj, it should be a GUC parameter
>> rather than a compile-time option. Yesterday a customer wanted to
>> leverage this and couldn't because they refuse to maintain their own
>> fork of Postgres code.
>
> Independent of the rest of the discussion, I think there's one more
> point: Trying to keep your system stable by *increasing* the priority of
> normal backends is a bad idea. If you system gets into OOM land you need
> to fix that, not whack who gets killed around.
> The reason it makes sense to increase the priority of the postmaster is
> that that *does* increase the stability by cleaning up resources and
> restarting everything.

But the priority is inherited by child processes, so to decrease the
priority of JUST the postmaster you need to decrease its priority and
then set the priority of each child back to the original value.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: /proc/self/oom_adj is deprecated in newer Linux kernels

От
Gurjeet Singh
Дата:
On Tue, Jun 10, 2014 at 10:32 AM, Andres Freund <andres@2ndquadrant.com> wrote:
> On 2014-06-10 07:56:01 -0400, Gurjeet Singh wrote:
>> Providing it as GUC would have given end users both the peices, but
>> with a compile-time option they have only one half of the solution;
>> except if they go compile their own binaries, which forces them into
>> being packagers.
>>
>> I am not alone in feeling that if Postgres wishes to provide a control
>> over child backend's oom_score_adj, it should be a GUC parameter
>> rather than a compile-time option. Yesterday a customer wanted to
>> leverage this and couldn't because they refuse to maintain their own
>> fork of Postgres code.
>
> Independent of the rest of the discussion, I think there's one more
> point: Trying to keep your system stable by *increasing* the priority of
> normal backends is a bad idea. If you system gets into OOM land you need
> to fix that, not whack who gets killed around.
> The reason it makes sense to increase the priority of the postmaster is
> that that *does* increase the stability by cleaning up resources and
> restarting everything.

As it stands right now, a user can decrease the likelyhood of
Postmaster being killed by adjusting the start script, but that
decreases the likelyhood of al the child processes, too, making the
Postmaster just as likely as a kill-candidate, maybe even higher
because it's the parent, as any backend.

This patch gives the user a control to let the backend's likelyhood of
being killed be different/higher than that of the postmaster.

The users were already capable of doing that, but were required to
custom-compile Postgres to get the benefits. This patch just removes
that requirement.

Best regards,
-- 
Gurjeet Singh http://gurjeet.singh.im/

EDB www.EnterpriseDB.com



Re: /proc/self/oom_adj is deprecated in newer Linux kernels

От
Andres Freund
Дата:
On 2014-06-10 10:42:41 -0400, Robert Haas wrote:
> On Tue, Jun 10, 2014 at 10:32 AM, Andres Freund <andres@2ndquadrant.com> wrote:
> > On 2014-06-10 07:56:01 -0400, Gurjeet Singh wrote:
> >> Providing it as GUC would have given end users both the peices, but
> >> with a compile-time option they have only one half of the solution;
> >> except if they go compile their own binaries, which forces them into
> >> being packagers.
> >>
> >> I am not alone in feeling that if Postgres wishes to provide a control
> >> over child backend's oom_score_adj, it should be a GUC parameter
> >> rather than a compile-time option. Yesterday a customer wanted to
> >> leverage this and couldn't because they refuse to maintain their own
> >> fork of Postgres code.
> >
> > Independent of the rest of the discussion, I think there's one more
> > point: Trying to keep your system stable by *increasing* the priority of
> > normal backends is a bad idea. If you system gets into OOM land you need
> > to fix that, not whack who gets killed around.
> > The reason it makes sense to increase the priority of the postmaster is
> > that that *does* increase the stability by cleaning up resources and
> > restarting everything.
> 
> But the priority is inherited by child processes, so to decrease the
> priority of JUST the postmaster you need to decrease its priority and
> then set the priority of each child back to the original value.

Gurjeet argues for making the absolute value of child processes priority
configurable though? My point is that attacking the problem from that
angle is wrong.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: /proc/self/oom_adj is deprecated in newer Linux kernels

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
> On Tue, Jun 10, 2014 at 10:02 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I don't find that this argument holds any water at all.  Anyone who's
>> developing their own start script can be expected to manage recompiling
>> Postgres.

> Huh?  Lots of people install PostgreSQL via, say, RPMs, but may still
> want to change their startup script locally.

So?  The RPM packager could probably be expected to have compiled with the
oom-adjust-reset option enabled.  If not, why aren't these people lobbying
the packager to meet their expectation?

I remain of the opinion that allowing nonprivileged people to decide
whether that code is active or not is unsafe from a system level.
        regards, tom lane



Re: /proc/self/oom_adj is deprecated in newer Linux kernels

От
Robert Haas
Дата:
On Tue, Jun 10, 2014 at 10:51 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Tue, Jun 10, 2014 at 10:02 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> I don't find that this argument holds any water at all.  Anyone who's
>>> developing their own start script can be expected to manage recompiling
>>> Postgres.
>
>> Huh?  Lots of people install PostgreSQL via, say, RPMs, but may still
>> want to change their startup script locally.
>
> So?  The RPM packager could probably be expected to have compiled with the
> oom-adjust-reset option enabled.  If not, why aren't these people lobbying
> the packager to meet their expectation?

Because that might take years to happen, or the packager might never
do it at all on the theory that what is good for customers in general
is different than what's good for one particular customer, or on the
theory that it's just not a high enough priority for that packager.

Are you seriously saying that you've never needed to customize a
startup script on a RHEL box somewhere?

> I remain of the opinion that allowing nonprivileged people to decide
> whether that code is active or not is unsafe from a system level.

On what factual basis?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: /proc/self/oom_adj is deprecated in newer Linux kernels

От
Tom Lane
Дата:
Andres Freund <andres@2ndquadrant.com> writes:
> Independent of the rest of the discussion, I think there's one more
> point: Trying to keep your system stable by *increasing* the priority of
> normal backends is a bad idea. If you system gets into OOM land you need
> to fix that, not whack who gets killed around.
> The reason it makes sense to increase the priority of the postmaster is
> that that *does* increase the stability by cleaning up resources and
> restarting everything.

That's half of the reason.  The other half is that, at least back when
we added this code, the Linux kernel's victim-selection code
disproportionately chose to kill the postmaster rather than any child
backend, an outcome definitely not to be preferred.  IIRC this was because
it blamed the postmaster for the sum of childrens' memory sizes *including
shared memory*, counting the shared memory over again for each child.

I don't know whether our switch away from SysV shmem has helped that, or
if recent kernel versions have less brain-dead OOM victim selection.
I'm not terribly hopeful though.

But anyway, yeah, the point of this feature is that the OOM priority
of the postmaster, and *only* the postmaster, should be raised.  Allowing
unprivileged people to break that is not attractive on any level.
        regards, tom lane



Re: /proc/self/oom_adj is deprecated in newer Linux kernels

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
> On Tue, Jun 10, 2014 at 10:51 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> So?  The RPM packager could probably be expected to have compiled with the
>> oom-adjust-reset option enabled.  If not, why aren't these people lobbying
>> the packager to meet their expectation?

> Because that might take years to happen,

... unlike adding a GUC?

> or the packager might never
> do it at all on the theory that what is good for customers in general
> is different than what's good for one particular customer, or on the
> theory that it's just not a high enough priority for that packager.

> Are you seriously saying that you've never needed to customize a
> startup script on a RHEL box somewhere?

Sure, but what's that have to do with this?  Any Red Hat or PGDG RPM will
come with this code already enabled in the build, so there is no need for
anyone to have a GUC to play around with the behavior.

>> I remain of the opinion that allowing nonprivileged people to decide
>> whether that code is active or not is unsafe from a system level.

> On what factual basis?

Because it would convert the intended behavior (postmaster and only
postmaster is exempt from OOM kill) into a situation where possibly
all of the database processes are exempt from OOM kill, at the whim
of somebody who should not have the privilege to decide that.

In my view, the root-owned startup script grants OOM exemption to
the postmaster because it *knows* that the postmaster's children
will drop the exemption.  If that trust can be violated because some
clueless DBA decided to frob a GUC setting, I'd be a lot more hesitant
about giving the postmaster the exemption in the first place.
        regards, tom lane



Re: /proc/self/oom_adj is deprecated in newer Linux kernels

От
Andres Freund
Дата:
> Andres Freund <andres@2ndquadrant.com> writes:
> > Independent of the rest of the discussion, I think there's one more
> > point: Trying to keep your system stable by *increasing* the priority of
> > normal backends is a bad idea. If you system gets into OOM land you need
> > to fix that, not whack who gets killed around.
> > The reason it makes sense to increase the priority of the postmaster is
> > that that *does* increase the stability by cleaning up resources and
> > restarting everything.
> 
> That's half of the reason.  The other half is that, at least back when
> we added this code, the Linux kernel's victim-selection code
> disproportionately chose to kill the postmaster rather than any child
> backend, an outcome definitely not to be preferred.  IIRC this was because
> it blamed the postmaster for the sum of childrens' memory sizes *including
> shared memory*, counting the shared memory over again for each child.

That's essentially the same thing. We want the postmaster survive, not
the children...

> I don't know whether our switch away from SysV shmem has helped that, or
> if recent kernel versions have less brain-dead OOM victim selection.
> I'm not terribly hopeful though.

It has gotten much better in the latest few versions. But it'll be a
fair bit till those will be wildly used. And even then, from the
kernel's view, there's just no reason strongly prefer not to kill the
postmaster over other processes without the user providing the information.

On 2014-06-10 11:04:52 -0400, Tom Lane wrote:
> But anyway, yeah, the point of this feature is that the OOM priority
> of the postmaster, and *only* the postmaster, should be raised.  Allowing
> unprivileged people to break that is not attractive on any level.

A superuser could already write a function that echoes into /proc? I
fail to see how a GUC changes the landscape significantly here.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: /proc/self/oom_adj is deprecated in newer Linux kernels

От
Tom Lane
Дата:
Andres Freund <andres@2ndquadrant.com> writes:
> On 2014-06-10 11:04:52 -0400, Tom Lane wrote:
>> But anyway, yeah, the point of this feature is that the OOM priority
>> of the postmaster, and *only* the postmaster, should be raised.  Allowing
>> unprivileged people to break that is not attractive on any level.

> A superuser could already write a function that echoes into /proc?

Maybe I'm mistaken, but I thought once the fork_process code has reset our
process's setting to zero it's not possible to lower it again (without
privileges we'd not have).
        regards, tom lane



Re: /proc/self/oom_adj is deprecated in newer Linux kernels

От
Andres Freund
Дата:
On 2014-06-10 11:14:43 -0400, Tom Lane wrote:
> Sure, but what's that have to do with this?  Any Red Hat or PGDG RPM will
> come with this code already enabled in the build, so there is no need for
> anyone to have a GUC to play around with the behavior.

That's imo a fair point. Unless I misunderstand things Gurjeet picked
the topic up again because he wants to increase the priority of the
children. Is that correct Gurjeet?

I do think a GUC might be a nicer interface for this than a
#define. Possibly even with a absolute reset value for the children. But
only because there's no way, afaics, to explicitly reset to the default
value.

> >> I remain of the opinion that allowing nonprivileged people to decide
> >> whether that code is active or not is unsafe from a system level.
> 
> > On what factual basis?
> 
> Because it would convert the intended behavior (postmaster and only
> postmaster is exempt from OOM kill) into a situation where possibly
> all of the database processes are exempt from OOM kill, at the whim
> of somebody who should not have the privilege to decide that.

Meh^3. By that argument we need to forbid superusers to create any form
of untrusted functions. Forbid anything that does malloc(), system(),
fork(), whatever from a user's influence.
Superusers can execute code in the postmaster using
shared_preload_libraries. Feigning that that's not possible isn't buying
us anything.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: /proc/self/oom_adj is deprecated in newer Linux kernels

От
Tom Lane
Дата:
Gurjeet Singh <gurjeet@singh.im> writes:
> As it stands right now, a user can decrease the likelyhood of
> Postmaster being killed by adjusting the start script, but that
> decreases the likelyhood of al the child processes, too, making the
> Postmaster just as likely as a kill-candidate, maybe even higher
> because it's the parent, as any backend.

Exactly.

> This patch gives the user a control to let the backend's likelyhood of
> being killed be different/higher than that of the postmaster.

If you think your users might want to give the postmaster OOM-exemption,
why don't you just activate the existing code when you build?  Resetting
the OOM setting to zero is safe whether or not the startup script did
anything to the postmaster's setting.

In short, I don't see what a GUC adds here, except uncertainty, which
is not a good thing.
        regards, tom lane



Re: /proc/self/oom_adj is deprecated in newer Linux kernels

От
andres@anarazel.de (Andres Freund)
Дата:
On 2014-06-10 11:20:28 -0400, Tom Lane wrote:
> Andres Freund <andres@2ndquadrant.com> writes:
> > On 2014-06-10 11:04:52 -0400, Tom Lane wrote:
> >> But anyway, yeah, the point of this feature is that the OOM priority
> >> of the postmaster, and *only* the postmaster, should be raised.  Allowing
> >> unprivileged people to break that is not attractive on any level.
> 
> > A superuser could already write a function that echoes into /proc?
> 
> Maybe I'm mistaken, but I thought once the fork_process code has reset our
> process's setting to zero it's not possible to lower it again (without
> privileges we'd not have).

No, doesn't look that way. It's possible to reset it to the value set at
process start. So unless we introduce double forks for every backend
start it can be reset by ordinary processes.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: /proc/self/oom_adj is deprecated in newer Linux kernels

От
Robert Haas
Дата:
On Tue, Jun 10, 2014 at 11:14 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Tue, Jun 10, 2014 at 10:51 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> So?  The RPM packager could probably be expected to have compiled with the
>>> oom-adjust-reset option enabled.  If not, why aren't these people lobbying
>>> the packager to meet their expectation?
>
>> Because that might take years to happen,
>
> ... unlike adding a GUC?

/me shrugs.

Sure, it'll take a release cycle, but once we've made this
configurable, it'll always be configurable.  New packagers who don't
have this set up properly can show up at any time.

>> or the packager might never
>> do it at all on the theory that what is good for customers in general
>> is different than what's good for one particular customer, or on the
>> theory that it's just not a high enough priority for that packager.
>
>> Are you seriously saying that you've never needed to customize a
>> startup script on a RHEL box somewhere?
>
> Sure, but what's that have to do with this?  Any Red Hat or PGDG RPM will
> come with this code already enabled in the build, so there is no need for
> anyone to have a GUC to play around with the behavior.

Well, clearly, somebody hasn't got it right, or there wouldn't be this
complaint.  I'll grant you that "somebody" may be EnterpriseDB's own
packaging in this instance, but I wouldn't like to bet that no one
else has ever got this wrong nor ever will.  Peter asked upthread why
this wasn't a GUC with the comment that "Why is this feature not a
run-time configuration variable or at least a configure option?  It's
awfully well hidden now.  I doubt a lot of people are using this even
though they might wish to."  I think that's quite right, and note that
Peter is in no way affiliated with EnterpriseDB and made that comment
(rather presciently) long before Gurjeet's recent report.

>>> I remain of the opinion that allowing nonprivileged people to decide
>>> whether that code is active or not is unsafe from a system level.
>
>> On what factual basis?
>
> Because it would convert the intended behavior (postmaster and only
> postmaster is exempt from OOM kill) into a situation where possibly
> all of the database processes are exempt from OOM kill, at the whim
> of somebody who should not have the privilege to decide that.

Gurjeet already refused that argument.  Apparently, the child
processes can only increase their chance of being OOM-killed, not
decrease it, so you have this exactly backwards: right now, an
individual system administrator can exempt all of the database
processes from OOM kill, but cannot exempt the postmaster alone.

> In my view, the root-owned startup script grants OOM exemption to
> the postmaster because it *knows* that the postmaster's children
> will drop the exemption.  If that trust can be violated because some
> clueless DBA decided to frob a GUC setting, I'd be a lot more hesitant
> about giving the postmaster the exemption in the first place.

How about using an environment variable?  It seems to me that would
address the concern about DBAs without shell access.  They might be
able to frob GUCs, but presumably not the postmaster's starting
environment.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: /proc/self/oom_adj is deprecated in newer Linux kernels

От
Robert Haas
Дата:
On Tue, Jun 10, 2014 at 11:29 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> This patch gives the user a control to let the backend's likelyhood of
>> being killed be different/higher than that of the postmaster.
>
> If you think your users might want to give the postmaster OOM-exemption,
> why don't you just activate the existing code when you build?  Resetting
> the OOM setting to zero is safe whether or not the startup script did
> anything to the postmaster's setting.

The whole scenario here is that the user *doesn't want to recompile*.
You seem to be trying to relitigate an argument that Gurjeet already
discussed in his original post and I already refuted once after that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: /proc/self/oom_adj is deprecated in newer Linux kernels

От
Tom Lane
Дата:
Andres Freund <andres@2ndquadrant.com> writes:
> On 2014-06-10 11:14:43 -0400, Tom Lane wrote:
>> Because it would convert the intended behavior (postmaster and only
>> postmaster is exempt from OOM kill) into a situation where possibly
>> all of the database processes are exempt from OOM kill, at the whim
>> of somebody who should not have the privilege to decide that.

> Meh^3. By that argument we need to forbid superusers to create any form
> of untrusted functions. Forbid anything that does malloc(), system(),
> fork(), whatever from a user's influence.

That's utter and complete nonsense.  We're discussing an operation that is
root-privileged (ie, lowering a process's OOM score), not random stuff
that unprivileged processes can do.
        regards, tom lane



Re: /proc/self/oom_adj is deprecated in newer Linux kernels

От
Andres Freund
Дата:
On 2014-06-10 11:35:23 -0400, Robert Haas wrote:
> > Because it would convert the intended behavior (postmaster and only
> > postmaster is exempt from OOM kill) into a situation where possibly
> > all of the database processes are exempt from OOM kill, at the whim
> > of somebody who should not have the privilege to decide that.
> 
> Gurjeet already refused that argument.

Only that he was wrong:
~# echo -500 > /proc/$BASHPID/oom_score_adj
~# su andres
~$ echo -1000 > /proc/$BASHPID/oom_score_adj
bash: echo: write error: Permission denied # so I can't decrease the likelihood
~$ echo -400 > /proc/$BASHPID/oom_score_adj # but I can increase it
~$ echo -500 > /proc/$BASHPID/oom_score_adj # but also *reset* it

Since we have to do the adjustment after the fork - otherwise the
postmaster would be vulnerable for a time - normal backends can reset
their score.

> Apparently, the child
> processes can only increase their chance of being OOM-killed, not
> decrease it, so you have this exactly backwards: right now, an
> individual system administrator can exempt all of the database
> processes from OOM kill, but cannot exempt the postmaster alone.

Well, if you compile postgres with the #define it'll reset the
likelihood after the fork? That's the reset? Or do you mean without
compiling postgres with the option?

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: /proc/self/oom_adj is deprecated in newer Linux kernels

От
Gurjeet Singh
Дата:
On Tue, Jun 10, 2014 at 11:14 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> In my view, the root-owned startup script grants OOM exemption to
> the postmaster because it *knows* that the postmaster's children
> will drop the exemption.  If that trust can be violated because some
> clueless DBA decided to frob a GUC setting, I'd be a lot more hesitant
> about giving the postmaster the exemption in the first place.

Even if the clueless DBA tries to set the oom_score_adj below that of
Postmaster, Linux kernel prevents that from being done. I demonstrate
that in the below screen share. I used GUC as well as plain command
line to try and set a child's badness (oom_score_adj) to be lower than
that of Postmaster's, and Linux disallows doing that, unless I use
root's powers.

So the argument that this GUC is a security concern, can be ignored.
Root user (one with control of start script) still controls the lowest
badness setting of all Postgres processes. If done at fork_process
time, the child process simply inherits parent's badness setting.

Best regards,

# Set postmaster badness: -1000
$ sudo echo -1000 > /proc/$$/oom_score_adj ; cat /proc/self/oom_score_adj
-1000

# Set backend badness the same as postmaster: -1000
$ perl -pi -e 's/linux_oom_score_adj = .*/linux_oom_score_adj =
-1000/' $PGDATA/postgresql.conf ; grep oom $PGDATA/postgresql.conf
linux_oom_score_adj = -1000

$ pgstop; pgstart
pg_ctl: server is running (PID: 65355)
/home/gurjeet/dev/pgdbuilds/oom_guc/db/bin/postgres "-D"
"/home/gurjeet/dev/pgdbuilds/oom_guc/db/data"
waiting for server to shut down.... done
server stopped
pg_ctl: no server running
waiting for server to start.... done
server started
Database cluster state:               in production

# Backends and the postmaster have the same badness; lower than default 0
$ for p in $(echo $(pgserverPIDList)| cut -d , --output-delimiter=' '
-f 1-); do echo $p $(cat /proc/$p/oom_score_adj) $(cat
/proc/$p/cmdline); done
537 -1000 /home/gurjeet/dev/pgdbuilds/oom_guc/db/bin/postgres-D/home/gurjeet/dev/pgdbuilds/oom_guc/db/data
539 -1000 postgres: checkpointer process
540 -1000 postgres: writer process
541 -1000 postgres: wal writer process
542 -1000 postgres: autovacuum launcher process
543 -1000 postgres: stats collector process

# Set postmaster badness: -100
$ sudo echo -100 > /proc/$$/oom_score_adj ; cat /proc/self/oom_score_adj
-100

# Set backend badness the lower than postmaster: -500 vs. -100
$ perl -pi -e 's/linux_oom_score_adj = .*/linux_oom_score_adj = -500...
linux_oom_score_adj = -500

$ pgstop; pgstart
...

# Backends are capped to parent's badness.
$ for p in $(echo $(pgserverPIDList)| cut -d , ...
716 -100 /home/gurjeet/dev/pgdbuilds/oom_guc/db/bin/postgres-D/home/gurjeet/dev/pgdbuilds/oom_guc/db/data
718 -100 postgres: checkpointer process
719 -100 postgres: writer process
720 -100 postgres: wal writer process
721 -100 postgres: autovacuum launcher process
722 -100 postgres: stats collector process

# Set postmaster badness: -100
$ sudo echo -100 > /proc/$$/oom_score_adj ; cat /proc/self/oom_score_adj
-100

# Set backend badness the higher than postmaster: +500 vs. -100
$ perl -pi -e 's/linux_oom_score_adj = .*/linux_oom_score_adj = 500/'
$PGDATA/postgresql.conf ; grep oom $PGDATA/postgresql.conf
linux_oom_score_adj = 500

$ pgstop; pgstart
...

# Backends have higher badness, hence higher likelyhood to be killed
than the Postmaster.
$ for p in $(echo $(pgserverPIDList)| cut -d , ...
1602 -100 /home/gurjeet/dev/pgdbuilds/oom_guc/db/bin/postgres-D/home/gurjeet/dev/pgdbuilds/oom_guc/db/data
1604 500 postgres: checkpointer process
1605 500 postgres: writer process
1606 500 postgres: wal writer process
1607 500 postgres: autovacuum launcher process
1608 500 postgres: stats collector process

# Set postmaster badness: -100
$ sudo echo -100 > /proc/$$/oom_score_adj ; cat /proc/self/oom_score_adj
-100

# Reset backend badness to default: 0
$ perl -pi -e 's/linux_oom_score_adj = .*/linux_oom_score_adj = 0/'
$PGDATA/postgresql.conf ; grep oom $PGDATA/postgresql.conf
linux_oom_score_adj = 0

$ pgstop; pgstart
...

# Backends have higher badness, hence higher likelyhood to be killed
than the Postmaster.
$ for p in $(echo $(pgserverPIDList)| cut -d , ...
1835 -100 /home/gurjeet/dev/pgdbuilds/oom_guc/db/bin/postgres-D/home/gurjeet/dev/pgdbuilds/oom_guc/db/data
1837 0 postgres: checkpointer process
1838 0 postgres: writer process
1839 0 postgres: wal writer process
1840 0 postgres: autovacuum launcher process
1841 0 postgres: stats collector process

# Lower checkpointer's badness, but keep it higher than postmaster's
$ echo -1 > /proc/1837/oom_score_adj

$ for p in $(echo $(pgserverPIDList)| cut -d , ...
1835 -100 /home/gurjeet/dev/pgdbuilds/oom_guc/db/bin/postgres-D/home/gurjeet/dev/pgdbuilds/oom_guc/db/data
1837 -1 postgres: checkpointer process
...

# Lower checkpointer's badness, but keep it same as postmaster's
$ echo -100 > /proc/1837/oom_score_adj

$ for p in $(echo $(pgserverPIDList)| cut -d , ...
1835 -100 /home/gurjeet/dev/pgdbuilds/oom_guc/db/bin/postgres-D/home/gurjeet/dev/pgdbuilds/oom_guc/db/data
1837 -100 postgres: checkpointer process
...

# Lower checkpointer's badness, and try to lower it below postmaster's
$ echo -101 > /proc/1837/oom_score_adj
bash: echo: write error: Permission denied

$ for p in $(echo $(pgserverPIDList)| cut -d , ...
1835 -100 /home/gurjeet/dev/pgdbuilds/oom_guc/db/bin/postgres-D/home/gurjeet/dev/pgdbuilds/oom_guc/db/data
1837 -100 postgres: checkpointer process
...

# As root, force child process' badness to be lower than postnaster's
$ sudo echo -101 > /proc/1837/oom_score_adj
[sudo] password for gurjeet:

$ for p in $(echo $(pgserverPIDList)| cut -d , ...
1835 -100 /home/gurjeet/dev/pgdbuilds/oom_guc/db/bin/postgres-D/home/gurjeet/dev/pgdbuilds/oom_guc/db/data
1837 -101 postgres: checkpointer process
...

-- 
Gurjeet Singh http://gurjeet.singh.im/

EDB www.EnterpriseDB.com



Re: /proc/self/oom_adj is deprecated in newer Linux kernels

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
> Well, clearly, somebody hasn't got it right, or there wouldn't be this
> complaint.  I'll grant you that "somebody" may be EnterpriseDB's own
> packaging in this instance, but I wouldn't like to bet that no one
> else has ever got this wrong nor ever will.  Peter asked upthread why
> this wasn't a GUC with the comment that "Why is this feature not a
> run-time configuration variable or at least a configure option?  It's
> awfully well hidden now.  I doubt a lot of people are using this even
> though they might wish to."  I think that's quite right, and note that
> Peter is in no way affiliated with EnterpriseDB and made that comment
> (rather presciently) long before Gurjeet's recent report.

I'd be okay with a configure option, if you think that would make this
issue more visible to packagers.  It's delegating the responsibility to
the DBA level that I'm unhappy about.

>> Because it would convert the intended behavior (postmaster and only
>> postmaster is exempt from OOM kill) into a situation where possibly
>> all of the database processes are exempt from OOM kill, at the whim
>> of somebody who should not have the privilege to decide that.

> Gurjeet already refused that argument.

He can refuse it all he likes, but that doesn't make his opinion correct.

> How about using an environment variable?  It seems to me that would
> address the concern about DBAs without shell access.  They might be
> able to frob GUCs, but presumably not the postmaster's starting
> environment.

Hmmm ... yeah, that might work.  It would solve the problem I'm worried
about, which is making sure that the startup script has control of what
happens.
        regards, tom lane



Re: /proc/self/oom_adj is deprecated in newer Linux kernels

От
Andres Freund
Дата:
On 2014-06-10 11:40:25 -0400, Tom Lane wrote:
> Andres Freund <andres@2ndquadrant.com> writes:
> > On 2014-06-10 11:14:43 -0400, Tom Lane wrote:
> >> Because it would convert the intended behavior (postmaster and only
> >> postmaster is exempt from OOM kill) into a situation where possibly
> >> all of the database processes are exempt from OOM kill, at the whim
> >> of somebody who should not have the privilege to decide that.
> 
> > Meh^3. By that argument we need to forbid superusers to create any form
> > of untrusted functions. Forbid anything that does malloc(), system(),
> > fork(), whatever from a user's influence.
> 
> That's utter and complete nonsense.  We're discussing an operation that is
> root-privileged (ie, lowering a process's OOM score), not random stuff
> that unprivileged processes can do.

Oh, comeon. Tom. You a) conveniently left of the part where I said that
the user can execute code from the postmaster. b) fork() can be used to
escape the oom killer. c) Lots of much worse things can be done to the
system with arbitrary system calls than adjusting oom_score_adj.

The postmaster can currently change oom_score_adj. Users can run code as
a postmaster. Simple as that.

Besides, as demonstrated in
http://www.postgresql.org/message-id/20140610154536.GN8406@alap3.anarazel.de
postmaster children can already reset their score.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: /proc/self/oom_adj is deprecated in newer Linux kernels

От
Robert Haas
Дата:
On Tue, Jun 10, 2014 at 11:46 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I'd be okay with a configure option, if you think that would make this
> issue more visible to packagers.  It's delegating the responsibility to
> the DBA level that I'm unhappy about.
>
[...]
>
>> How about using an environment variable?  It seems to me that would
>> address the concern about DBAs without shell access.  They might be
>> able to frob GUCs, but presumably not the postmaster's starting
>> environment.
>
> Hmmm ... yeah, that might work.  It would solve the problem I'm worried
> about, which is making sure that the startup script has control of what
> happens.

A configure option wouldn't address the issue from my perspective; you
still have to recompile if you don't like what your packager did, and
your packager might still be stupid.  However, an environment variable
seems like it would be just fine.  It might even be more convenient
for packagers that having to compile the desired value into the
binary.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: /proc/self/oom_adj is deprecated in newer Linux kernels

От
Tom Lane
Дата:
andres@anarazel.de (Andres Freund) writes:
> On 2014-06-10 11:20:28 -0400, Tom Lane wrote:
>> Maybe I'm mistaken, but I thought once the fork_process code has reset our
>> process's setting to zero it's not possible to lower it again (without
>> privileges we'd not have).

> No, doesn't look that way. It's possible to reset it to the value set at
> process start. So unless we introduce double forks for every backend
> start it can be reset by ordinary processes.

That's kind of annoying --- I wonder why they went to the trouble of doing
that?  But anyway, it's probably not worth the cost of double-forking to
prevent it.  I still say though that this is not a reason to make it as
easy as change-a-GUC to break the intended behavior.

Robert's idea of having the start script set an environment variable to
control the OOM adjustment reset seems like it would satisfy my concern.
        regards, tom lane



Re: /proc/self/oom_adj is deprecated in newer Linux kernels

От
Gurjeet Singh
Дата:
On Tue, Jun 10, 2014 at 11:24 AM, Andres Freund <andres@2ndquadrant.com> wrote:
> On 2014-06-10 11:14:43 -0400, Tom Lane wrote:
>> Sure, but what's that have to do with this?  Any Red Hat or PGDG RPM will
>> come with this code already enabled in the build, so there is no need for
>> anyone to have a GUC to play around with the behavior.
>
> That's imo a fair point. Unless I misunderstand things Gurjeet picked
> the topic up again because he wants to increase the priority of the
> children. Is that correct Gurjeet?

Yes. A DBA would like to prevent the postmaster from being killed by
OOM killer, but let the child processes be still subject to OOM
killer's whim.

Best regards,
-- 
Gurjeet Singh http://gurjeet.singh.im/

EDB www.EnterpriseDB.com



Re: /proc/self/oom_adj is deprecated in newer Linux kernels

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
> On Tue, Jun 10, 2014 at 11:29 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> If you think your users might want to give the postmaster OOM-exemption,
>> why don't you just activate the existing code when you build?  Resetting
>> the OOM setting to zero is safe whether or not the startup script did
>> anything to the postmaster's setting.

> The whole scenario here is that the user *doesn't want to recompile*.

Yeah, I understood that.  The question is why EDB isn't satisfied to just
add "-DLINUX_OOM_ADJ=0" to their build options, but instead would like to
dump a bunch of uncertainty on other packagers who might not like the
implications of a GUC.
        regards, tom lane



Re: /proc/self/oom_adj is deprecated in newer Linux kernels

От
Andres Freund
Дата:
On 2014-06-10 11:52:17 -0400, Tom Lane wrote:
> andres@anarazel.de (Andres Freund) writes:
> > On 2014-06-10 11:20:28 -0400, Tom Lane wrote:
> >> Maybe I'm mistaken, but I thought once the fork_process code has reset our
> >> process's setting to zero it's not possible to lower it again (without
> >> privileges we'd not have).
> 
> > No, doesn't look that way. It's possible to reset it to the value set at
> > process start. So unless we introduce double forks for every backend
> > start it can be reset by ordinary processes.
> 
> That's kind of annoying --- I wonder why they went to the trouble of doing
> that?

My guess is that they want to allow a process to only temporarily reduce
the likelihood of getting killed while it's doing something
important.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: /proc/self/oom_adj is deprecated in newer Linux kernels

От
Gurjeet Singh
Дата:
On Tue, Jun 10, 2014 at 11:52 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> andres@anarazel.de (Andres Freund) writes:
>> On 2014-06-10 11:20:28 -0400, Tom Lane wrote:
>>> Maybe I'm mistaken, but I thought once the fork_process code has reset our
>>> process's setting to zero it's not possible to lower it again (without
>>> privileges we'd not have).
>
>> No, doesn't look that way. It's possible to reset it to the value set at
>> process start. So unless we introduce double forks for every backend
>> start it can be reset by ordinary processes.
>
> That's kind of annoying --- I wonder why they went to the trouble of doing
> that?  But anyway, it's probably not worth the cost of double-forking to
> prevent it.  I still say though that this is not a reason to make it as
> easy as change-a-GUC to break the intended behavior.
>
> Robert's idea of having the start script set an environment variable to
> control the OOM adjustment reset seems like it would satisfy my concern.

I'm fine with this solution. Should this be a constant 0, or be
configurable based on env. variable's value?

-- 
Gurjeet Singh http://gurjeet.singh.im/

EDB www.EnterpriseDB.com



Re: /proc/self/oom_adj is deprecated in newer Linux kernels

От
Tom Lane
Дата:
Gurjeet Singh <gurjeet@singh.im> writes:
> On Tue, Jun 10, 2014 at 11:52 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Robert's idea of having the start script set an environment variable to
>> control the OOM adjustment reset seems like it would satisfy my concern.

> I'm fine with this solution. Should this be a constant 0, or be
> configurable based on env. variable's value?

If we're going to do this, I would say that we should also take the
opportunity to get out from under the question of which kernel API
we're dealing with.  So perhaps a design like this:

1. If the environment variable PG_OOM_ADJUST_FILE is defined, it's
the name of a file to write something into after forking.  The typical
value would be "/proc/self/oom_score_adj" or "/proc/self/oom_adj".
If not set, nothing happens.

2. If the environment variable PG_OOM_ADJUST_VALUE is defined, that's
the string we write, otherwise we write "0".
        regards, tom lane



Re: /proc/self/oom_adj is deprecated in newer Linux kernels

От
Robert Haas
Дата:
On Tue, Jun 10, 2014 at 11:56 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Tue, Jun 10, 2014 at 11:29 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> If you think your users might want to give the postmaster OOM-exemption,
>>> why don't you just activate the existing code when you build?  Resetting
>>> the OOM setting to zero is safe whether or not the startup script did
>>> anything to the postmaster's setting.
>
>> The whole scenario here is that the user *doesn't want to recompile*.
>
> Yeah, I understood that.  The question is why EDB isn't satisfied to just
> add "-DLINUX_OOM_ADJ=0" to their build options, but instead would like to
> dump a bunch of uncertainty on other packagers who might not like the
> implications of a GUC.

Well, I think we should do that, too, and I'm going to propose it to
Dave & team.  If we're not already doing the right thing here (and I
don't know off-hand whether we are), then that is definitely our issue
to fix and there's no reason to change PostgreSQL on that account.

But I think the point is that this is a pretty subtle and
well-concealed thing which a PostgreSQL packager might fail to do
correctly in every instance - or where an individual user might want
to install a different policy than whatever the packager's default is.
Making that easy to do without recompiling seems to me to be a general
good.  Magnus once commented to me that every GUC which is
PGC_POSTMASTER is a bug - "perhaps not an easy bug to fix, but a bug
all the same" (his words), because requiring people to restart the
postmaster to change settings is often problematic.  I think this is
even more true of recompiling.  IMO, it should be an explicit goal of
the project to make reconfiguration of an already-installed system -
perhaps even already in production - as easy as possible. We will
doubtless fail to achieve perfection but that doesn't mean we
shouldn't try.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: /proc/self/oom_adj is deprecated in newer Linux kernels

От
David G Johnston
Дата:
In short:

I can accept the idea that picking reasonable specific values is impossible
so let's just ensure that children are always killed before the parent
(basically the default behavior). If you then say that any system that is
capable of implementing that rule should have it set then leaving it
unexposed makes sense.  That line of thought does not require the abstract
cost of a GUC to factor into the equation.

However, in considering system configuration and concurrently running
processes....


Tom Lane-2 wrote
> Extra GUCs do not have zero cost, especially not ones that are as
> complicated-to-explain as this would be.

The explain factor does little for me since if given a reasonable default
the GUC can be ignored until a problem manifests.  At that point not having
a GUC makes resolving the problem require someone to stop using packaging
and instead compile their own source.  This results in a poor user
experience.

So, if someone where to provide a complete patch that introduced a GUC to
enable/disable as well as set priorities - to include documentation and
reasonable postgresql.conf defaults - what specific ongoing GUC costs would
prevent applying said patch?

You mention a security hazard but that is not a cost of GUCs generally but
this one specifically; and one that seems to have been deemed no riskier
than other DBA capabilities.

Help me understand the cost side of the equation since the benefit side
seems clear-cut enough to me.  The OOM problem is real and making PostgreSQL
fit within the overall memory management architecture of a given server is
something that is the responsibility of the system admin in conjunction with
the DBA - not us or the packager.  I'll buy that because this is a linux
issue that the implementation could be packager selectable but given the
dynamics of Linux centralizing a reasonable default implementation into core
makes sense.

David J.




--
View this message in context:
http://postgresql.1045698.n5.nabble.com/proc-self-oom-adj-is-deprecated-in-newer-Linux-kernels-tp4810971p5806697.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.



Re: /proc/self/oom_adj is deprecated in newer Linux kernels

От
"Joshua D. Drake"
Дата:
On 06/10/2014 07:02 AM, Tom Lane wrote:
> Gurjeet Singh <gurjeet@singh.im> writes:
>> Startup scripts are not solely in the domain of packagers. End users
>> can also be expected to develop/edit their own startup scripts.
>
>> Providing it as GUC would have given end users both the peices, but
>> with a compile-time option they have only one half of the solution;
>> except if they go compile their own binaries, which forces them into
>> being packagers.
>
> I don't find that this argument holds any water at all.  Anyone who's
> developing their own start script can be expected to manage recompiling
> Postgres.

This is certainly not true in any fashion. Production systems require a 
degree of flexibility and configuration that does not require the 
maintaining or compiling of source code.

Sincerely,

Joshua D. Drake

-- 
Command Prompt, Inc. - http://www.commandprompt.com/  509-416-6579
PostgreSQL Support, Training, Professional Services and Development
High Availability, Oracle Conversion, @cmdpromptinc
"If we send our children to Caesar for their education, we should             not be surprised when they come back as
Romans."



Re: /proc/self/oom_adj is deprecated in newer Linux kernels

От
David G Johnston
Дата:
Gurjeet Singh-4 wrote
> Even if the clueless DBA tries to set the oom_score_adj below that of
> Postmaster, Linux kernel prevents that from being done. I demonstrate
> that in the below screen share. I used GUC as well as plain command
> line to try and set a child's badness (oom_score_adj) to be lower than
> that of Postmaster's, and Linux disallows doing that, unless I use
> root's powers.
> 
> So the argument that this GUC is a security concern, can be ignored.
> Root user (one with control of start script) still controls the lowest
> badness setting of all Postgres processes. If done at fork_process
> time, the child process simply inherits parent's badness setting.

The counter point here is that the postmaster can be set to "no kill" and
the >= condition allows for children to achieve the same while it is our
explicit intent that the children be strictly > parent.

To that end, should the adjustment value be provided as an offset to the
postmasters instead of an absolute value - and disallow <= zero offset
values in the process?

I get and generally agree with the environment variable proposal and it's
stated goal to restrict whom can makes changes. But how much less cost does
an environment variable have than a GUC if one GUC argument is still its
maintenance overhead?

David J.





--
View this message in context:
http://postgresql.1045698.n5.nabble.com/proc-self-oom-adj-is-deprecated-in-newer-Linux-kernels-tp4810971p5806700.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.



Re: /proc/self/oom_adj is deprecated in newer Linux kernels

От
Magnus Hagander
Дата:
<p dir="ltr"><br /> On Jun 10, 2014 7:05 PM, "Joshua D. Drake" <<a
href="mailto:jd@commandprompt.com">jd@commandprompt.com</a>>wrote:<br /> ><br /> ><br /> > On 06/10/2014
07:02AM, Tom Lane wrote:<br /> >><br /> >> Gurjeet Singh <<a
href="mailto:gurjeet@singh.im">gurjeet@singh.im</a>>writes:<br /> >>><br /> >>> Startup scripts
arenot solely in the domain of packagers. End users<br /> >>> can also be expected to develop/edit their own
startupscripts.<br /> >><br /> >><br /> >>> Providing it as GUC would have given end users both
thepeices, but<br /> >>> with a compile-time option they have only one half of the solution;<br />
>>>except if they go compile their own binaries, which forces them into<br /> >>> being packagers.<br
/>>><br /> >><br /> >> I don't find that this argument holds any water at all.  Anyone who's<br />
>>developing their own start script can be expected to manage recompiling<br /> >> Postgres.<br /> ><br
/>><br /> > This is certainly not true in any fashion. Production systems require a degree of flexibility and
configurationthat does not require the maintaining or compiling of source code.<br /> ><p dir="ltr">As JD surely
understands,it pains me a lot but I have to agree with him here.<p dir="ltr">There are also a non-trivial number of
organizationsthat just don't allow compilers on production boxes, adding another hurdle for those. We may think that's
ridiculous,and it may be, but it's reality.<p dir="ltr">That said, I agree that a guc is probably a bad idea. I like
theidea of an environment variable if we need it to be changeable. Or perhaps what we need is just better documentation
ofwhat's there already. Perhaps the documentation should have a section specifically aimed at packagers? <p
dir="ltr">/Magnus 

Re: /proc/self/oom_adj is deprecated in newer Linux kernels

От
Gurjeet Singh
Дата:
On Tue, Jun 10, 2014 at 1:13 PM, David G Johnston
<david.g.johnston@gmail.com> wrote:
> Gurjeet Singh-4 wrote
>> So the argument that this GUC is a security concern, can be ignored.
>> Root user (one with control of start script) still controls the lowest
>> badness setting of all Postgres processes. If done at fork_process
>> time, the child process simply inherits parent's badness setting.
>
> The counter point here is that the postmaster can be set to "no kill" and

Only the root user can do that, since he/she has control over the
start script. All that the DBA could do with a GUC is to set backends'
badness worse than postmaster's, but bot better.

> the >= condition allows for children to achieve the same while it is our
> explicit intent that the children be strictly > parent.

I don't think anyone argued for that behaviour.

> To that end, should the adjustment value be provided as an offset to the
> postmasters instead of an absolute value - and disallow <= zero offset
> values in the process?

Seems unnecessary, given current knowledge.

> I get and generally agree with the environment variable proposal and it's
> stated goal to restrict whom can makes changes. But how much less cost does
> an environment variable have than a GUC if one GUC argument is still its
> maintenance overhead?

Having it as a GUC would have meant that two entities are required to
get the configuration right: one who controls start scripts, and the
other who controls GUC settings.

With the environment variable approach, a root user alone can control
the behaviour like so in start script:

echo -200 > /proc/self/oom_score_adj
export PG_OOM_ADJUST_FILE=oom_score_adj
export PG_OOM_ADJUST_VALUE=-100

Best regards,
-- 
Gurjeet Singh http://gurjeet.singh.im/

EDB www.EnterpriseDB.com



Re: /proc/self/oom_adj is deprecated in newer Linux kernels

От
Gurjeet Singh
Дата:
On Tue, Jun 10, 2014 at 12:05 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> If we're going to do this, I would say that we should also take the
> opportunity to get out from under the question of which kernel API
> we're dealing with.  So perhaps a design like this:
>
> 1. If the environment variable PG_OOM_ADJUST_FILE is defined, it's
> the name of a file to write something into after forking.  The typical
> value would be "/proc/self/oom_score_adj" or "/proc/self/oom_adj".
> If not set, nothing happens.
>
> 2. If the environment variable PG_OOM_ADJUST_VALUE is defined, that's
> the string we write, otherwise we write "0".

Please find attached the patch. It includes the doc changes as well.

Best regards,
--
Gurjeet Singh http://gurjeet.singh.im/

EDB www.EnterpriseDB.com

Вложения

Re: /proc/self/oom_adj is deprecated in newer Linux kernels

От
Tom Lane
Дата:
Gurjeet Singh <gurjeet@singh.im> writes:
> Please find attached the patch. It includes the doc changes as well.

What exactly is the point of the static state you added here?  There
is no situation where that could possibly be useful, because this
code is executed at most once per process, and not at all in the
postmaster.  AFAICS, that's just complication (and permanent waste
of a kilobyte of static data space) for no return.

Also, this seems to try to write the file whether or not the environment
variable was set, which wasn't the plan.

I also don't really see the point of parsing the value variable into an
int.  Why not just spit it out as the original string?  It's not ours to
legislate whether the kernel will take a value that's not an integer.
        regards, tom lane



Re: /proc/self/oom_adj is deprecated in newer Linux kernels

От
Tom Lane
Дата:
Gurjeet Singh <gurjeet@singh.im> writes:
> On Tue, Jun 10, 2014 at 12:05 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> If we're going to do this, I would say that we should also take the
>> opportunity to get out from under the question of which kernel API
>> we're dealing with.  So perhaps a design like this:
>> 
>> 1. If the environment variable PG_OOM_ADJUST_FILE is defined, it's
>> the name of a file to write something into after forking.  The typical
>> value would be "/proc/self/oom_score_adj" or "/proc/self/oom_adj".
>> If not set, nothing happens.
>> 
>> 2. If the environment variable PG_OOM_ADJUST_VALUE is defined, that's
>> the string we write, otherwise we write "0".

> Please find attached the patch. It includes the doc changes as well.

Applied with some editorialization.
        regards, tom lane



Re: /proc/self/oom_adj is deprecated in newer Linux kernels

От
Gurjeet Singh
Дата:
On Wed, Jun 18, 2014 at 8:15 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Gurjeet Singh <gurjeet@singh.im> writes:
>
>> Please find attached the patch. It includes the doc changes as well.
>
> Applied with some editorialization.

Thanks!

would it be possible to include this in 9.4 as well?

Best regards,
-- 
Gurjeet Singh http://gurjeet.singh.im/

EDB www.EnterpriseDB.com



Re: /proc/self/oom_adj is deprecated in newer Linux kernels

От
Tom Lane
Дата:
Gurjeet Singh <gurjeet@singh.im> writes:
> would it be possible to include this in 9.4 as well?

While this is clearly an improvement over what we had before, it's
impossible to argue that it's a bug fix, and we are way past the 9.4
feature freeze deadline.  In particular, packagers who've already done
their 9.4 development work might be blindsided by us slipping this into
9.4 release.  So while I wouldn't have a problem with putting this change
into 9.4 from a technical standpoint, it's hard to argue that it'd meet
project norms from a development-process standpoint.
        regards, tom lane



Re: /proc/self/oom_adj is deprecated in newer Linux kernels

От
Gurjeet Singh
Дата:
On Mon, Jun 23, 2014 at 11:52 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Gurjeet Singh <gurjeet@singh.im> writes:
>> would it be possible to include this in 9.4 as well?
>
> While this is clearly an improvement over what we had before, it's
> impossible to argue that it's a bug fix, and we are way past the 9.4
> feature freeze deadline.  In particular, packagers who've already done
> their 9.4 development work might be blindsided by us slipping this into
> 9.4 release.  So while I wouldn't have a problem with putting this change
> into 9.4 from a technical standpoint, it's hard to argue that it'd meet
> project norms from a development-process standpoint.

While I'd love to reduce the number of future installations without
this fix in place, I respect the decision to honor project policy. At
the same time, this change does not break anything. It introduces new
environment variables which change the behaviour, but behaves the old
way in the absence of those variables. So I guess it's not changing
the default behaviour in incompatible way.

BTW, does the project publish the feature-freeze deadlines and other
dates somewhere (apart from on this list). I was looking the other day
and didn't find any reference. Either the commitfest app or the
'Developer' section of the wiki [1] seem to be good candidates for
this kind of information.

[1]: https://wiki.postgresql.org/wiki/Development_information

Best regards,
-- 
Gurjeet Singh http://gurjeet.singh.im/

EDB www.EnterpriseDB.com



Re: /proc/self/oom_adj is deprecated in newer Linux kernels

От
Tom Lane
Дата:
Gurjeet Singh <gurjeet@singh.im> writes:
> On Mon, Jun 23, 2014 at 11:52 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> While this is clearly an improvement over what we had before, it's
>> impossible to argue that it's a bug fix, and we are way past the 9.4
>> feature freeze deadline.  In particular, packagers who've already done
>> their 9.4 development work might be blindsided by us slipping this into
>> 9.4 release.  So while I wouldn't have a problem with putting this change
>> into 9.4 from a technical standpoint, it's hard to argue that it'd meet
>> project norms from a development-process standpoint.

> While I'd love to reduce the number of future installations without
> this fix in place, I respect the decision to honor project policy. At
> the same time, this change does not break anything. It introduces new
> environment variables which change the behaviour, but behaves the old
> way in the absence of those variables.

Uh, no, it doesn't.  We removed the dependence on -DLINUX_OOM_SCORE_ADJ.
If a packager is expecting that to still work in 9.4, he's going to be
unpleasantly surprised, because the system will silently fail to do what
he's expecting: it will run all the backend processes at no-OOM-kill
priority, which is likely to be bad.

It is possible to make packages that will work either way, along the lines
of the advice I sent to the Red Hat guys:
https://bugzilla.redhat.com/show_bug.cgi?id=1110969

but I think it's a bit late in the cycle to be telling packagers to do
that for 9.4.

> BTW, does the project publish the feature-freeze deadlines and other
> dates somewhere (apart from on this list).

It's usually in the dev meeting minutes
https://wiki.postgresql.org/wiki/PgCon_2014_Developer_Meeting#9.5_Schedule
        regards, tom lane



Re: /proc/self/oom_adj is deprecated in newer Linux kernels

От
Gurjeet Singh
Дата:
On Mon, Jun 23, 2014 at 12:49 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Gurjeet Singh <gurjeet@singh.im> writes:
>> While I'd love to reduce the number of future installations without
>> this fix in place, I respect the decision to honor project policy. At
>> the same time, this change does not break anything. It introduces new
>> environment variables which change the behaviour, but behaves the old
>> way in the absence of those variables.
>
> Uh, no, it doesn't.  We removed the dependence on -DLINUX_OOM_SCORE_ADJ.
> If a packager is expecting that to still work in 9.4, he's going to be
> unpleasantly surprised, because the system will silently fail to do what
> he's expecting: it will run all the backend processes at no-OOM-kill
> priority, which is likely to be bad.

True. I didn't think from a packager's perspective.

> It is possible to make packages that will work either way, along the lines
> of the advice I sent to the Red Hat guys:
> https://bugzilla.redhat.com/show_bug.cgi?id=1110969
>
> but I think it's a bit late in the cycle to be telling packagers to do
> that for 9.4.

Understandable.

>> BTW, does the project publish the feature-freeze deadlines and other
>> dates somewhere (apart from on this list).
>
> It's usually in the dev meeting minutes
> https://wiki.postgresql.org/wiki/PgCon_2014_Developer_Meeting#9.5_Schedule

Thanks. But it doesn't spell out the specific dates, or even the month
of feature-freeze.

Best regards,
-- 
Gurjeet Singh http://gurjeet.singh.im/

EDB www.EnterpriseDB.com



Re: /proc/self/oom_adj is deprecated in newer Linux kernels

От
Christoph Berg
Дата:
Re: Tom Lane 2014-06-23 <17054.1403542164@sss.pgh.pa.us>
> > While I'd love to reduce the number of future installations without
> > this fix in place, I respect the decision to honor project policy. At
> > the same time, this change does not break anything. It introduces new
> > environment variables which change the behaviour, but behaves the old
> > way in the absence of those variables.
> 
> Uh, no, it doesn't.  We removed the dependence on -DLINUX_OOM_SCORE_ADJ.
> If a packager is expecting that to still work in 9.4, he's going to be
> unpleasantly surprised, because the system will silently fail to do what
> he's expecting: it will run all the backend processes at no-OOM-kill
> priority, which is likely to be bad.

Ok I'm late to the party, but the reason I'm still joining is we have
proper unit tests which just told me the 9.5 packages have changed OOM
handling. So it wouldn't just slip through if you changed that in 9.4
as well, but would get fixed.

I have two comments on the patch:

The choice to make the behavior depend first on PG_OOM_ADJUST_FILE and
only secondly on PG_OOM_ADJUST_VALUE seems the wrong way round to me.
On every modestly new kernel oom_score_adj is present, so
PG_OOM_ADJUST_FILE should default to using it. On the other hand, what
people really want to achieve (or tune) with this feature is to set
the OOM adjustment back to 0 (or some other value), so to me, it would
be much more natural to set PG_OOM_ADJUST_VALUE=0 to activate the
feature, and only have to mess with PG_OOM_ADJUST_FILE if my kernel is
older. (Which it isn't on any kernel supported by Debian and Ubuntu
LTS.)

The other bit is the non-deprecation of OOM_ADJ in
contrib/start-scripts/linux. It took me a while to understand the
logic of the variables used there - the interface would be much
clearer if it just was like this:

... set default PG_OOM_ADJUST_FILE=/proc/self/oom_score_adj

... and then this in the configurable part of the script:

PG_MASTER_OOM_SCORE_ADJ=-1000
PG_OOM_SCORE_ADJ=0
# Older Linux kernels may not have /proc/self/oom_score_adj, but instead
# /proc/self/oom_adj, which works similarly except the disable value is -17.
# For such a system, uncomment these three lines instead.
#PG_OOM_ADJUST_FILE=/proc/self/oom_adj
#PG_MASTER_OOM_SCORE_ADJ=-17
#PG_OOM_SCORE_ADJ=0

... and then use PG_OOM_ADJUST_FILE below instead of manually figuring
out which proc file to write to by looking at OOM.*_ADJ.

Christoph
-- 
cb@df7cb.de | http://www.df7cb.de/



Re: /proc/self/oom_adj is deprecated in newer Linux kernels

От
Robert Haas
Дата:
On Tue, Jul 1, 2014 at 8:22 AM, Christoph Berg <cb@df7cb.de> wrote:
> Re: Tom Lane 2014-06-23 <17054.1403542164@sss.pgh.pa.us>
>> > While I'd love to reduce the number of future installations without
>> > this fix in place, I respect the decision to honor project policy. At
>> > the same time, this change does not break anything. It introduces new
>> > environment variables which change the behaviour, but behaves the old
>> > way in the absence of those variables.
>>
>> Uh, no, it doesn't.  We removed the dependence on -DLINUX_OOM_SCORE_ADJ.
>> If a packager is expecting that to still work in 9.4, he's going to be
>> unpleasantly surprised, because the system will silently fail to do what
>> he's expecting: it will run all the backend processes at no-OOM-kill
>> priority, which is likely to be bad.
>
> Ok I'm late to the party, but the reason I'm still joining is we have
> proper unit tests which just told me the 9.5 packages have changed OOM
> handling. So it wouldn't just slip through if you changed that in 9.4
> as well, but would get fixed.
>
> I have two comments on the patch:
>
> The choice to make the behavior depend first on PG_OOM_ADJUST_FILE and
> only secondly on PG_OOM_ADJUST_VALUE seems the wrong way round to me.
> On every modestly new kernel oom_score_adj is present, so
> PG_OOM_ADJUST_FILE should default to using it. On the other hand, what
> people really want to achieve (or tune) with this feature is to set
> the OOM adjustment back to 0 (or some other value), so to me, it would
> be much more natural to set PG_OOM_ADJUST_VALUE=0 to activate the
> feature, and only have to mess with PG_OOM_ADJUST_FILE if my kernel is
> older. (Which it isn't on any kernel supported by Debian and Ubuntu
> LTS.)

Of course, we have no guarantee that the Linux kernel guys won't
change this again.  Apparently "we don't break userspace" is a
somewhat selectively-enforced principle.

> The other bit is the non-deprecation of OOM_ADJ in
> contrib/start-scripts/linux. It took me a while to understand the
> logic of the variables used there - the interface would be much
> clearer if it just was like this:
>
> ... set default PG_OOM_ADJUST_FILE=/proc/self/oom_score_adj
>
> ... and then this in the configurable part of the script:
>
> PG_MASTER_OOM_SCORE_ADJ=-1000
> PG_OOM_SCORE_ADJ=0
> # Older Linux kernels may not have /proc/self/oom_score_adj, but instead
> # /proc/self/oom_adj, which works similarly except the disable value is -17.
> # For such a system, uncomment these three lines instead.
> #PG_OOM_ADJUST_FILE=/proc/self/oom_adj
> #PG_MASTER_OOM_SCORE_ADJ=-17
> #PG_OOM_SCORE_ADJ=0
>
> ... and then use PG_OOM_ADJUST_FILE below instead of manually figuring
> out which proc file to write to by looking at OOM.*_ADJ.

I can't help but agree with this, though.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: /proc/self/oom_adj is deprecated in newer Linux kernels

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
> On Tue, Jul 1, 2014 at 8:22 AM, Christoph Berg <cb@df7cb.de> wrote:
>> I have two comments on the patch:
>> 
>> The choice to make the behavior depend first on PG_OOM_ADJUST_FILE and
>> only secondly on PG_OOM_ADJUST_VALUE seems the wrong way round to me.

> Of course, we have no guarantee that the Linux kernel guys won't
> change this again.  Apparently "we don't break userspace" is a
> somewhat selectively-enforced principle.

Yeah, I'm unexcited about this proposal.  In any case, given the two
existing APIs we have to deal with, allowing PG_OOM_ADJUST_VALUE to
default to "0" is sane in both APIs but a default for the file name
can work for only one.

>> The other bit is the non-deprecation of OOM_ADJ in
>> contrib/start-scripts/linux. It took me a while to understand the
>> logic of the variables used there - the interface would be much
>> clearer if it just was like this:
>> ... and then use PG_OOM_ADJUST_FILE below instead of manually figuring
>> out which proc file to write to by looking at OOM.*_ADJ.

> I can't help but agree with this, though.

Fair enough.  I went for a minimum-change approach when hacking that
script, but we could change it some more in the name of readability.
Will do something about that.
        regards, tom lane



Re: /proc/self/oom_adj is deprecated in newer Linux kernels

От
Christoph Berg
Дата:
Re: Tom Lane 2014-07-01 <20654.1404247905@sss.pgh.pa.us>
> Yeah, I'm unexcited about this proposal.  In any case, given the two
> existing APIs we have to deal with, allowing PG_OOM_ADJUST_VALUE to
> default to "0" is sane in both APIs but a default for the file name
> can work for only one.

Nod.

> Fair enough.  I went for a minimum-change approach when hacking that
> script, but we could change it some more in the name of readability.
> Will do something about that.

Thanks, it's much nicer now. There's one uglyness left though: The
name PG_CHILD_OOM_SCORE_ADJ should match what actually gets passed to
the backends,

DAEMON_ENV="PG_OOM_ADJUST_FILE=$PG_OOM_ADJUST_FILE PG_OOM_ADJUST_VALUE=$PG_CHILD_OOM_SCORE_ADJ"

would better be PG_OOM_ADJUST_VALUE=$PG_OOM_ADJUST_VALUE.

(Possibly the smart way to fix this would be to change
src/backend/postmaster/fork_process.c to use PG_CHILD_OOM_SCORE_ADJ
instead.)

Christoph
-- 
cb@df7cb.de | http://www.df7cb.de/



Re: /proc/self/oom_adj is deprecated in newer Linux kernels

От
Jonathan Corbet
Дата:
On Tue, 1 Jul 2014 15:57:25 -0400
Robert Haas <robertmhaas@gmail.com> wrote:

> Of course, we have no guarantee that the Linux kernel guys won't
> change this again.  Apparently "we don't break userspace" is a
> somewhat selectively-enforced principle.

It's selectively enforced in that kernel developers don't (and can't)
scan the world for software that might break.  OTOH, if somebody were
to respond to a proposed change by saying "this breaks PostgreSQL," I
would be amazed if the change were to be merged in that form.

In other words, it's important to scream.  There were concerns during
the last round of messing with the OOM logic, but nobody pointed to
something that actually broke.

I predict that somebody will certainly try to rewrite the OOM code
again in the future.  The nature of that code is such that it is never
going to work as well as people would like.  But if application
developers make it clear that they are dependent on the current
interface working, kernel developers will be constrained to keep it
working.

jon



Re: /proc/self/oom_adj is deprecated in newer Linux kernels

От
Tom Lane
Дата:
Christoph Berg <cb@df7cb.de> writes:
> Re: Tom Lane 2014-07-01 <20654.1404247905@sss.pgh.pa.us>
>> Fair enough.  I went for a minimum-change approach when hacking that
>> script, but we could change it some more in the name of readability.
>> Will do something about that.

> Thanks, it's much nicer now. There's one uglyness left though: The
> name PG_CHILD_OOM_SCORE_ADJ should match what actually gets passed to
> the backends,

Meh ... I don't find that to be important.  Making a parallel with
PG_MASTER_OOM_SCORE_ADJ seemed more helpful here.
        regards, tom lane



Re: /proc/self/oom_adj is deprecated in newer Linux kernels

От
Robert Haas
Дата:
On Wed, Jul 2, 2014 at 9:08 AM, Jonathan Corbet <corbet@lwn.net> wrote:
> On Tue, 1 Jul 2014 15:57:25 -0400
> Robert Haas <robertmhaas@gmail.com> wrote:
>> Of course, we have no guarantee that the Linux kernel guys won't
>> change this again.  Apparently "we don't break userspace" is a
>> somewhat selectively-enforced principle.
>
> It's selectively enforced in that kernel developers don't (and can't)
> scan the world for software that might break.  OTOH, if somebody were
> to respond to a proposed change by saying "this breaks PostgreSQL," I
> would be amazed if the change were to be merged in that form.
>
> In other words, it's important to scream.  There were concerns during
> the last round of messing with the OOM logic, but nobody pointed to
> something that actually broke.
>
> I predict that somebody will certainly try to rewrite the OOM code
> again in the future.  The nature of that code is such that it is never
> going to work as well as people would like.  But if application
> developers make it clear that they are dependent on the current
> interface working, kernel developers will be constrained to keep it
> working.

Well, of course the reality is that if you don't break some things,
you can never change anything.  And clearly we want Linux to be able
to change things, to make them better.

On the flip side, interface changes do cause some pain, and it's not
realistic to expect every software project to monitor LKML.

This one is not really a big deal, though; I think we just need to
keep in mind, as you say, that it might change again.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company