Обсуждение: Out of Memory

Поиск
Список
Период
Сортировка

Out of Memory

От
greigwise
Дата:
Hello, I'm running postgres 9.6.10 on Centos 7.   Seeing the occasional out
of memory error trying to run a query.   In the logs I see something like
this:

Grand total: 462104832 bytes in 795 blocks; 142439136 free (819860 chunks);
319665696 used
2018-09-20 18:08:01 UTC xxxx 5ba3e1a2.7a8a dbname ERROR:  out of memory
2018-09-20 18:08:01 UTC xxxx 5ba3e1a2.7a8a dbname DETAIL:  Failed on request
of size 2016.

If I have 142439136 free, then why am I failing on a request of size 2016?

Am I misunderstanding here?

Thanks,
Greig Wise



--
Sent from: http://www.postgresql-archive.org/PostgreSQL-general-f1843780.html


Re: Out of Memory

От
Paul Carlucci
Дата:
My first two guesses are ulimit or numa.  numactl --hardware will show your how many nodes your box has and if you're exhausting any of them.

On Thu, Sep 20, 2018, 6:11 PM greigwise <greigwise@comcast.net> wrote:
Hello, I'm running postgres 9.6.10 on Centos 7.   Seeing the occasional out
of memory error trying to run a query.   In the logs I see something like
this:

Grand total: 462104832 bytes in 795 blocks; 142439136 free (819860 chunks);
319665696 used
2018-09-20 18:08:01 UTC xxxx 5ba3e1a2.7a8a dbname ERROR:  out of memory
2018-09-20 18:08:01 UTC xxxx 5ba3e1a2.7a8a dbname DETAIL:  Failed on request
of size 2016.

If I have 142439136 free, then why am I failing on a request of size 2016?

Am I misunderstanding here?

Thanks,
Greig Wise



--
Sent from: http://www.postgresql-archive.org/PostgreSQL-general-f1843780.html

Re: Out of Memory

От
greigwise
Дата:
ulimit -a for the postgres user shows memory unlimited.

numactl --hardware gives command not found.

Thanks again.

Greig



--
Sent from: http://www.postgresql-archive.org/PostgreSQL-general-f1843780.html


Re: Out of Memory

От
Tom Lane
Дата:
greigwise <greigwise@comcast.net> writes:
> Hello, I'm running postgres 9.6.10 on Centos 7.   Seeing the occasional out
> of memory error trying to run a query.   In the logs I see something like
> this:

> Grand total: 462104832 bytes in 795 blocks; 142439136 free (819860 chunks);
> 319665696 used
> 2018-09-20 18:08:01 UTC xxxx 5ba3e1a2.7a8a dbname ERROR:  out of memory
> 2018-09-20 18:08:01 UTC xxxx 5ba3e1a2.7a8a dbname DETAIL:  Failed on request
> of size 2016.

> If I have 142439136 free, then why am I failing on a request of size 2016?

The free space must be in contexts other than the one that last little
request wanted space in.  Overall, you've got about 460MB of space
consumed in that session, so it's not *that* surprising that you got OOM.
(At least, it's unsurprising on a 32-bit machine.  If the server is
64-bit I'd have thought the kernel would be a bit more liberal.)

But anyway, this looks like a mighty inefficient usage pattern at best,
and maybe a memory leak at worst.  Can you create a self-contained test
case that does this?

            regards, tom lane


Re: Out of Memory

От
greigwise
Дата:
Well, we are 64-bit.... I'll see if I can make some kind of self contained
test to repeat it.

Thanks,
Greig



--
Sent from: http://www.postgresql-archive.org/PostgreSQL-general-f1843780.html


Re: Out of Memory

От
greigwise
Дата:
Well, I've been unsuccessful so far on creating a standalone test.   

I have put some scripting in place to capture some additional information on
the server with the out of memory issues.   I have a script which just
periodically dumps the output of free -m to a text file.  

So, the output of free -m immediately before and after the out of memory
error looks like this:

Just before:
              total        used        free      shared  buff/cache  
available
Mem:          14877        2978         132        3553       11766       
7943
Swap:             0           0           0

Just after: 
              total        used        free      shared  buff/cache  
available
Mem:          14877        2946         649        3548       11280       
7982
Swap:             0           0           0

If I have nearly 8 GB of memory left, why am I getting out of memory errors?

Thanks,
Greig



--
Sent from: http://www.postgresql-archive.org/PostgreSQL-general-f1843780.html


Re: Out of Memory

От
Tom Lane
Дата:
greigwise <greigwise@comcast.net> writes:
> If I have nearly 8 GB of memory left, why am I getting out of memory errors?

Probably the postmaster is running under restrictive ulimit settings.

            regards, tom lane


Re: Out of Memory

От
greigwise
Дата:
Tom Lane-2 wrote
> greigwise <

> greigwise@

> > writes:
>> If I have nearly 8 GB of memory left, why am I getting out of memory
>> errors?
> 
> Probably the postmaster is running under restrictive ulimit settings.
> 
>             regards, tom lane

If I login as the user which runs postmaster, I get this:

-bash-4.2$ ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 59341
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1024
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 4096
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

Max memory size is unlimited as is virtual memory... is there something else
there I should be concerned with in regard to out of memory?

Thanks again.
Greig



--
Sent from: http://www.postgresql-archive.org/PostgreSQL-general-f1843780.html


Re: Out of Memory

От
PT
Дата:
On Tue, 25 Sep 2018 11:34:19 -0700 (MST)
greigwise <greigwise@comcast.net> wrote:

> Well, I've been unsuccessful so far on creating a standalone test.   
> 
> I have put some scripting in place to capture some additional information on
> the server with the out of memory issues.   I have a script which just
> periodically dumps the output of free -m to a text file.  
> 
> So, the output of free -m immediately before and after the out of memory
> error looks like this:
> 
> Just before:
>               total        used        free      shared  buff/cache  
> available
> Mem:          14877        2978         132        3553       11766       
> 7943
> Swap:             0           0           0
> 
> Just after: 
>               total        used        free      shared  buff/cache  
> available
> Mem:          14877        2946         649        3548       11280       
> 7982
> Swap:             0           0           0
> 
> If I have nearly 8 GB of memory left, why am I getting out of memory errors?

Doesn't the default NUMA setup mean that it can't actually allocate all
the available memory to a single NUMA zone (or whatever it's called)?

Or am I talking ancient Linux history with that?

-- 
Bill Moran


Re: Out of Memory

От
Tory M Blue
Дата:


On Tue, Sep 25, 2018 at 2:05 PM PT <wmoran@potentialtech.com> wrote:
On Tue, 25 Sep 2018 11:34:19 -0700 (MST)
greigwise <greigwise@comcast.net> wrote:

> Well, I've been unsuccessful so far on creating a standalone test.   
>
> I have put some scripting in place to capture some additional information on
> the server with the out of memory issues.   I have a script which just
> periodically dumps the output of free -m to a text file. 
>
> So, the output of free -m immediately before and after the out of memory
> error looks like this:
>
> Just before:
>               total        used        free      shared  buff/cache 
> available
> Mem:          14877        2978         132        3553       11766       
> 7943
> Swap:             0           0           0
>
> Just after:
>               total        used        free      shared  buff/cache 
> available
> Mem:          14877        2946         649        3548       11280       
> 7982
> Swap:             0           0           0
>
> If I have nearly 8 GB of memory left, why am I getting out of memory errors?

Doesn't the default NUMA setup mean that it can't actually allocate all
the available memory to a single NUMA zone (or whatever it's called)?

Or am I talking ancient Linux history with that?

--
Bill Moran

 
By the way we have similar issues running in our smaller stage environment.  9.5.6 on CentOS 7.2

Only ever my stage environment on smaller AWS t2 boxen. Memory looks fine but Postgres says it has no access to any. And my settings are really low.

Not sure if it's the same issue, but we run into it a few times a year in my stage environment. Also running postgres in unlimited mode for ulimit.

Tory

Re: Out of Memory

От
greigwise
Дата:
There is also this:

-bash-4.2$ prlimit -p 6590
RESOURCE   DESCRIPTION                             SOFT      HARD UNITS
AS         address space limit                unlimited unlimited bytes
CORE       max core file size                         0 unlimited blocks
CPU        CPU time                           unlimited unlimited seconds
DATA       max data size                      unlimited unlimited bytes
FSIZE      max file size                      unlimited unlimited blocks
LOCKS      max number of file locks held      unlimited unlimited
MEMLOCK    max locked-in-memory address space     65536     65536 bytes
MSGQUEUE   max bytes in POSIX mqueues            819200    819200 bytes
NICE       max nice prio allowed to raise             0         0
NOFILE     max number of open files                1024      4096
NPROC      max number of processes                 4096     59341
RSS        max resident set size              unlimited unlimited pages
RTPRIO     max real-time priority                     0         0
RTTIME     timeout for real-time tasks        unlimited unlimited microsecs
SIGPENDING max number of pending signals          59341     59341
STACK      max stack size                       8388608 unlimited bytes

Is it possible that the fact that my stack size is limited is what is
causing my issue?

Thanks,
Greig



--
Sent from: http://www.postgresql-archive.org/PostgreSQL-general-f1843780.html


Re: Out of Memory

От
Tom Lane
Дата:
greigwise <greigwise@comcast.net> writes:
> Is it possible that the fact that my stack size is limited is what is
> causing my issue?

No.  If you were hitting that limit you'd get a message specifically
talking about stack.

            regards, tom lane


Re: Out of Memory

От
greigwise
Дата:
Tom Lane-2 wrote
> greigwise <

> greigwise@

> > writes:
>> Is it possible that the fact that my stack size is limited is what is
>> causing my issue?
> 
> No.  If you were hitting that limit you'd get a message specifically
> talking about stack.
> 
>             regards, tom lane

Well, darn.  I'm at a loss... any suggestions of what I should do next to
troubleshoot this?

Thanks.
Greig



--
Sent from: http://www.postgresql-archive.org/PostgreSQL-general-f1843780.html


Re: Out of Memory

От
greigwise
Дата:
I think I figured it out:

vm.overcommit_memory = 2
vm.overcommit_ratio = 50

Only allows me to use 50% of my RAM... ugh!  I have 16 GB, so when only 8 is
left, I start seeing OOM.  Will increase this setting and see if it helps.

Thanks everyone for the help.

Greig



--
Sent from: http://www.postgresql-archive.org/PostgreSQL-general-f1843780.html


Re: Out of Memory

От
greigwise
Дата:
All right.. one more thing here.  Any suggestions for how to set overcommit
on a postgres db server with 16 GB of RAM and no swap?   I think I want
vm.overcommit_memory = 2, as I understand that prevents the OOM killer from
zapping me.   Is 100% the right way to go for overcommit_ratio?  Is there a
drawback to this?

Thanks again.
Greig



--
Sent from: http://www.postgresql-archive.org/PostgreSQL-general-f1843780.html


Re: Out of Memory

От
Laurenz Albe
Дата:
greigwise wrote:
> All right.. one more thing here.  Any suggestions for how to set overcommit
> on a postgres db server with 16 GB of RAM and no swap?   I think I want
> vm.overcommit_memory = 2, as I understand that prevents the OOM killer from
> zapping me.   Is 100% the right way to go for overcommit_ratio?  Is there a
> drawback to this?

vm.overcommit_memory = 2
vm_overcommit_ratio = 100

Linux commits (swap * overcommit_ratio * RAM / 100),
so without any swap the correct value would be 100.

I don't know how safe it is to got entirely without swap.
I would add a little bit just to be sure.

Yours,
Laurenz Albe



Re: Out of Memory

От
Christoph Moench-Tegeder
Дата:
## Laurenz Albe (laurenz.albe@cybertec.at):

> vm.overcommit_memory = 2
> vm_overcommit_ratio = 100
> 
> Linux commits (swap * overcommit_ratio * RAM / 100),
                      ^
                      That should be a "+".

See Documentation/sysctl/vm.txt and Documentation/vm/overcommit-accounting.rst
in your kernel tree.

> so without any swap the correct value would be 100.
> 
> I don't know how safe it is to got entirely without swap.

That's completely safe. Even more: if your production machine ever needs
swap, it's already mostly dead in the water. So there's also no benefit in
having the kernel copy stuff to swap "just in case" (read: set sysctl
vm.swappiness to 0) and use the swap disk space for something more
productive. If your memory usage ever exceeds available memory, something
already went horribly wrong, and you want to know about it right now.
A good way to know about this kind of mistakes is having stuff fail
hard, obviously and loudly (i.e. Out Of Memory). Do not try to save
some day by swapping: stuff is "sometimes slow" and if you really need
the swap, everything crawls to a halt anyway.
Of course, this does not hold for personal workstations and
the like where you might want to use the swap space for suspend to
disk, or have some buffer in case of runaway processes.

Regards,
Christoph

-- 
Spare Space.


Re: Out of Memory

От
Laurenz Albe
Дата:
Christoph Moench-Tegeder wrote:
> ## Laurenz Albe (laurenz.albe@cybertec.at):
> 
> > vm.overcommit_memory = 2
> > vm_overcommit_ratio = 100
> > 
> > Linux commits (swap * overcommit_ratio * RAM / 100),
> 
>                       ^
>                       That should be a "+".

Yes; shame on me for careless typing, and thank you for the
correction.

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com



Re: Out of Memory

От
Rob Sargent
Дата:

> On Sep 27, 2018, at 3:45 PM, Laurenz Albe <laurenz.albe@cybertec.at> wrote:
> 
> Christoph Moench-Tegeder wrote:
>> ## Laurenz Albe (laurenz.albe@cybertec.at):
>> 
>>> vm.overcommit_memory = 2
>>> vm_overcommit_ratio = 100
>>> 
>>> Linux commits (swap * overcommit_ratio * RAM / 100),
>> 
>>                      ^
>>                      That should be a "+".
> 
> Yes; shame on me for careless typing, and thank you for the
> correction.
Are there any parentheses needed in that formula?


Re: Out of Memory

От
Laurenz Albe
Дата:
Rob Sargent wrote:
> > Christoph Moench-Tegeder wrote:
> > > ## Laurenz Albe (laurenz.albe@cybertec.at):
> > > 
> > > > vm.overcommit_memory = 2
> > > > vm_overcommit_ratio = 100
> > > > 
> > > > Linux commits (swap * overcommit_ratio * RAM / 100),
> > > 
> > >                      ^
> > >                      That should be a "+".
> > 
> > Yes; shame on me for careless typing, and thank you for the
> > correction.
> 
> Are there any parentheses needed in that formula?

No.  It is swap space plus a certain percentage of RAM.

I don't know how the Linux developers came up with that
weird formula.

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com



Re: Out of Memory

От
"Peter J. Holzer"
Дата:
On 2018-09-28 07:23:59 +0200, Laurenz Albe wrote:
> Rob Sargent wrote:
> > > Christoph Moench-Tegeder wrote:
> > > > ## Laurenz Albe (laurenz.albe@cybertec.at):
> > > >
> > > > > vm.overcommit_memory = 2
> > > > > vm_overcommit_ratio = 100
> > > > >
> > > > > Linux commits (swap * overcommit_ratio * RAM / 100),
> > > >
> > > >                      ^
> > > >                      That should be a "+".
> > >
> > > Yes; shame on me for careless typing, and thank you for the
> > > correction.
> >
> > Are there any parentheses needed in that formula?
>
> No.  It is swap space plus a certain percentage of RAM.
>
> I don't know how the Linux developers came up with that
> weird formula.

I suspect they copied it from some other Unix.

Traditionally, Unix allocated all memory in the swap space. You could
say that the swap space was the "real memory" and RAM was a cache for
that (Hence the rule that swap must be at least as large as RAM and
should preferrably be 2 or 4 times the size of RAM). So, when Unixes
started to allow allocating more virtual memory than swap space, they
were "overcommitting".

But for Linux that doesn't make much sense, since a page lived either in
RAM /or/ in swap right from the start, so the limit was always RAM+swap,
not swap alone, and you are only overcommitting if you exceeded the size
of the sum. The overcommitment in Linux is of a different kind: Linux
uses copy on write whereever it can (e.g. when forking processes, but
also when mallocing memory), and a CoW page may or may not be written in
the future. It only needs additional space when it's actually written,
so by counting the page only once (hoping that there will be enough
space if and when that page is written) the kernel is overcommitting
memory.

        hp


--
   _  | Peter J. Holzer    | we build much bigger, better disasters now
|_|_) |                    | because we have much more sophisticated
| |   | hjp@hjp.at         | management tools.
__/   | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/>

Вложения