Re: DB running out of memory issues after upgrade

Поиск
Список
Период
Сортировка
От Peter J. Holzer
Тема Re: DB running out of memory issues after upgrade
Дата
Msg-id 20200223101928.GE12464@hjp.at
обсуждение исходный текст
Ответ на Re: DB running out of memory issues after upgrade  (Nagaraj Raj <nagaraj.sf@yahoo.com>)
Список pgsql-general
On 2020-02-18 18:10:08 +0000, Nagaraj Raj wrote:
> Below are the same configurations ins .conf file before and after updagrade
>
> show max_connections; = 1743
[...]
> show work_mem = "4MB"

This is an interesting combination: So you expect a large number of
connections but each one should use very little RAM?

[...]

> here is some sys logs,
>
> 2020-02-16 21:01:17 UTC         [-]The database process was killed by the OS
> due to excessive memory consumption.
> 2020-02-16 13:41:16 UTC         [-]The database process was killed by the OS
> due to excessive memory consumption.

The oom-killer produces a huge block of messages which you can find with
dmesg or in your syslog. It looks something like this:

Feb 19 19:06:53 akran kernel: [3026711.344817] platzangst invoked oom-killer:
gfp_mask=0x15080c0(GFP_KERNEL_ACCOUNT|__GFP_ZERO),nodemask=(null), order=1, oom_score_adj=0 
Feb 19 19:06:53 akran kernel: [3026711.344819] platzangst cpuset=/ mems_allowed=0-1
Feb 19 19:06:53 akran kernel: [3026711.344825] CPU: 7 PID: 2012 Comm: platzangst Tainted: G           OE
4.15.0-74-generic#84-Ubuntu 
Feb 19 19:06:53 akran kernel: [3026711.344826] Hardware name: Dell Inc. PowerEdge R630/02C2CP, BIOS 2.1.7 06/16/2016
Feb 19 19:06:53 akran kernel: [3026711.344827] Call Trace:
Feb 19 19:06:53 akran kernel: [3026711.344835]  dump_stack+0x6d/0x8e
Feb 19 19:06:53 akran kernel: [3026711.344839]  dump_header+0x71/0x285
...
Feb 19 19:06:53 akran kernel: [3026711.344893] RIP: 0033:0x7f292d076b1c
Feb 19 19:06:53 akran kernel: [3026711.344894] RSP: 002b:00007fff187ef240 EFLAGS: 00000246 ORIG_RAX: 0000000000000038
Feb 19 19:06:53 akran kernel: [3026711.344895] RAX: ffffffffffffffda RBX: 00007fff187ef240 RCX: 00007f292d076b1c
Feb 19 19:06:53 akran kernel: [3026711.344896] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000001200011
Feb 19 19:06:53 akran kernel: [3026711.344897] RBP: 00007fff187ef2b0 R08: 00007f292d596740 R09: 00000000009d43a0
Feb 19 19:06:53 akran kernel: [3026711.344897] R10: 00007f292d596a10 R11: 0000000000000246 R12: 0000000000000000
Feb 19 19:06:53 akran kernel: [3026711.344898] R13: 0000000000000020 R14: 0000000000000000 R15: 0000000000000000
Feb 19 19:06:53 akran kernel: [3026711.344899] Mem-Info:
Feb 19 19:06:53 akran kernel: [3026711.344905] active_anon:14862589 inactive_anon:1133875 isolated_anon:0
Feb 19 19:06:53 akran kernel: [3026711.344905]  active_file:467 inactive_file:371 isolated_file:0
Feb 19 19:06:53 akran kernel: [3026711.344905]  unevictable:0 dirty:3 writeback:0 unstable:0
...
Feb 19 19:06:53 akran kernel: [3026711.344985] [ pid ]   uid  tgid total_vm      rss pgtables_bytes swapents
oom_score_adjname 
Feb 19 19:06:53 akran kernel: [3026711.344997] [  823]     0   823    44909        0   106496      121             0
lvmetad
Feb 19 19:06:53 akran kernel: [3026711.344999] [ 1354]     0  1354    11901        3   135168      112             0
rpcbind
Feb 19 19:06:53 akran kernel: [3026711.345000] [ 1485]     0  1485    69911       99   180224      159             0
accounts-daemon
...
Feb 19 19:06:53 akran kernel: [3026711.345345] Out of memory: Kill process 25591 (postgres) score 697 or sacrifice
child
Feb 19 19:06:53 akran kernel: [3026711.346563] Killed process 25591 (postgres) total-vm:71116948kB,
anon-rss:52727552kB,file-rss:0kB, shmem-rss:3023196kB 

The most interesting lines are usually the last two: In this case they
tell us that the process killed was a postgres process and it occupied
about 71 GB of virtual memory at that time. That was clearly the right
choice since the machine has only 64 GB of RAM. Sometimes it is less
clear and then you might want to scroll through the (usually long) list
of processes to see if there are other processes which need suspicious
amounts of RAM or maybe if there are just more of them than you would
expect.


> I identified one simple select which consuming more memory and here is the
> query plan,
>
>
>
> "Result  (cost=0.00..94891854.11 rows=3160784900 width=288)"
> "  ->  Append  (cost=0.00..47480080.61 rows=3160784900 width=288)"
> "        ->  Seq Scan on msghist  (cost=0.00..15682777.12 rows=3129490000 width
> =288)"
> "              Filter: (((data -> 'info'::text) ->> 'status'::text) =
> 'CLOSE'::text)"
> "        ->  Seq Scan on msghist msghist_1  (cost=0.00..189454.50 rows=31294900
> width=288)"
> "              Filter: (((data -> 'info'::text) ->> 'status'::text) =
> 'CLOSE'::text)"

So: How much memory does that use? It produces a huge number of rows
(more than 3 billion) but it doesn't do much with them, so I wouldn't
expect the postgres process itself to use much memory. Are you sure its
the postgres process and not the application which uses a lot of memory?

        hp

--
   _  | Peter J. Holzer    | Story must make more sense than reality.
|_|_) |                    |
| |   | hjp@hjp.at         |    -- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |       challenge!"

Вложения

В списке pgsql-general по дате отправления:

Предыдущее
От: "Andrus"
Дата:
Сообщение: Re: How to get error message details from libpq based psqlODBC driver (regression)
Следующее
От: "Peter J. Holzer"
Дата:
Сообщение: Re: Can I trigger an action from a coalesce ?