Postgres Out Of Memory Crash

Поиск
Список
Период
Сортировка
От Avi Weinberg
Тема Postgres Out Of Memory Crash
Дата
Msg-id DB9PR07MB71808BC744552C349DA17E8DCBA6A@DB9PR07MB7180.eurprd07.prod.outlook.com
обсуждение исходный текст
Ответы Re: Postgres Out Of Memory Crash
Re: Postgres Out Of Memory Crash
Список pgsql-general

Hi experts,

 

I'm using Patroni Postgres installation and noticed that twice already postgres crashed due to out of memory.  I'm using logical replication with around 30-40 active subscribers on this machine.  The machine has 128GB but only 32GB is allocated to Postgres.  How can I know what is actually causing the out of memory issue?  Is it caused by not optimal postgres configuration or something else?

 

    /usr/lib/postgresql/13/bin/postgres -D /home/postgres/pgdata/pgroot/data --config-file=/home/postgres/pgdata/pgroot/data/postgresql.conf --port=5432 --cluster_name=postgres-cluster --wal_level=logical --hot_standby=on --max_connections=533 --max_wal_senders=90 --max_prepared_transactions=0 --max_locks_per_transaction=64 --track_commit_timestamp=on --max_replication_slots=90 --max_worker_processes=30 --wal_log_hints=on

 

 

    Oct 27 07:05:31 node2 kernel: postgres invoked oom-killer: gfp_mask=0xd0, order=0, oom_score_adj=993

    Oct 27 07:05:31 node2 kernel: postgres cpuset=docker-6ae67e04710619972d3b1ab5d4c69c318d001c2da47fecee121cdc60279a14a0.scope mems_allowed=0

    Oct 27 07:05:31 node2 kernel: CPU: 6 PID: 15536 Comm: postgres Kdump: loaded Tainted: G               ------------ T 3.10.0-1160.el7.x86_64 #1

    Oct 27 07:05:31 node2 kernel: Hardware name: Kontron MSP8040/4008, BIOS Core: 5.11, MSP804x: 1.57.0943FC77 05/06/2020

    Oct 27 07:05:31 node2 kernel: Call Trace:

    Oct 27 07:05:31 node2 kernel: [<ffffffffa4581340>] dump_stack+0x19/0x1b

    Oct 27 07:05:31 node2 kernel: [<ffffffffa457bc60>] dump_header+0x90/0x229

    Oct 27 07:05:31 node2 kernel: [<ffffffffa409d008>] ? ep_poll_callback+0xf8/0x220

    Oct 27 07:05:31 node2 kernel: [<ffffffffa3fc1b26>] ? find_lock_task_mm+0x56/0xc0

    Oct 27 07:05:31 node2 kernel: [<ffffffffa403c8c8>] ? try_get_mem_cgroup_from_mm+0x28/0x60

    Oct 27 07:05:31 node2 kernel: [<ffffffffa3fc208d>] oom_kill_process+0x2cd/0x490

    Oct 27 07:05:31 node2 kernel: [<ffffffffa4040cdc>] mem_cgroup_oom_synchronize+0x55c/0x590

    Oct 27 07:05:31 node2 kernel: [<ffffffffa4040140>] ? mem_cgroup_charge_common+0xc0/0xc0

    Oct 27 07:05:31 node2 kernel: [<ffffffffa3fc2974>] pagefault_out_of_memory+0x14/0x90

    Oct 27 07:05:31 node2 kernel: [<ffffffffa457a16c>] mm_fault_error+0x6a/0x157

    Oct 27 07:05:31 node2 kernel: [<ffffffffa458e8d1>] __do_page_fault+0x491/0x500

    Oct 27 07:05:31 node2 kernel: [<ffffffffa458e975>] do_page_fault+0x35/0x90

    Oct 27 07:05:31 node2 kernel: [<ffffffffa458a778>] page_fault+0x28/0x30

    Oct 27 07:05:31 node2 kernel: Task in /kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod5984f099_30b0_4506_8730_6c72e7c02b78.slice/docker-6ae67e0471061997$

    Oct 27 07:05:31 node2 kernel: memory: usage 32768000kB, limit 32768000kB, failcnt 144867

    Oct 27 07:05:31 node2 kernel: memory+swap: usage 32768000kB, limit 9007199254740988kB, failcnt 0

    Oct 27 07:05:31 node2 kernel: kmem: usage 0kB, limit 9007199254740988kB, failcnt 0

    Oct 27 07:05:31 node2 kernel: Memory cgroup stats for /kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod5984f099_30b0_4506_8730_6c72e7c02b78.slice: cache:$

    Oct 27 07:05:31 node2 kernel: Memory cgroup stats for /kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod5984f099_30b0_4506_8730_6c72e7c02b78.slice/docker-$

    Oct 27 07:05:31 node2 kernel: Memory cgroup stats for /kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod5984f099_30b0_4506_8730_6c72e7c02b78.slice/docker-$

    Oct 27 07:05:31 node2 kernel: [ pid ]   uid  tgid total_vm      rss nr_ptes swapents oom_score_adj name

    Oct 27 07:05:31 node2 kernel: [13159]     0 13159      239        1       3        0          -998 pause

    Oct 27 07:05:31 node2 kernel: [13322]     0 13322     1095       97       8        0           993 dumb-init

    Oct 27 07:05:31 node2 kernel: [13335]     0 13335     1156      171       8        0           993 sh

    Oct 27 07:05:31 node2 kernel: [13411]     0 13411     1137       98       8        0           993 runsvdir

    Oct 27 07:05:31 node2 kernel: [13438]     0 13438     1099       98       7        0           993 runsv

    Oct 27 07:05:31 node2 kernel: [13439]     0 13439     1099       98       7        0           993 runsv

    Oct 27 07:05:31 node2 kernel: [13440]   101 13440    27026     1186      54        0           993 pgqd

    Oct 27 07:05:31 node2 kernel: [13441]   101 13441   155215     8237     101        0           993 patroni

    Oct 27 07:05:31 node2 kernel: [19532]   101 19532  1740033    46817     171        0           993 postgres

    Oct 27 07:05:31 node2 kernel: [19542]   101 19542  1767874     6713     121        0           993 postgres

    Oct 27 07:05:31 node2 kernel: [19546]   101 19546  1740173  1445031    3166        0           993 postgres

    Oct 27 07:05:31 node2 kernel: [19547]   101 19547  1740069    20060     171        0           993 postgres

    Oct 27 07:05:31 node2 kernel: [19548]   101 19548  1740027     4821      86        0           993 postgres

    Oct 27 07:05:31 node2 kernel: [19549]   101 19549  1740283     1011      91        0           993 postgres

    Oct 27 07:05:31 node2 kernel: [19549]   101 19549  1740283     1011      91        0           993 postgres

    Oct 27 07:05:31 node2 kernel: [19550]   101 19550    50320      798      77        0           993 postgres

    Oct 27 07:05:31 node2 kernel: [19551]   101 19551  1740524     2097     117        0           993 postgres

    Oct 27 07:05:31 node2 kernel: [19552]   101 19552  1740766     2043     100        0           993 postgres

    Oct 27 07:05:31 node2 kernel: [19556]   101 19556  1744493    16306     476        0           993 postgres

    Oct 27 07:05:31 node2 kernel: [19588]   101 19588  1744522    16353     484        0           993 postgres

    Oct 27 07:05:31 node2 kernel: [19598]   101 19598  1745580    22539     602        0           993 postgres

    Oct 27 07:05:31 node2 kernel: [19609]   101 19609  5909368  4364063    9947        0           993 postgres

    Oct 27 07:05:31 node2 kernel: [19617]   101 19617  1745576    16669     494        0           993 postgres

    Oct 27 07:05:31 node2 kernel: [19676]   101 19676  1740468     2208     113        0           993 postgres

    Oct 27 07:05:31 node2 kernel: [19744]   101 19744  1741155    19698     615        0           993 postgres

    Oct 27 07:05:31 node2 kernel: [19776]   101 19776  1742008   105333    1733        0           993 postgres

    Oct 27 07:05:31 node2 kernel: [19793]   101 19793  1740738    58109     631        0           993 postgres

    Oct 27 07:05:31 node2 kernel: [19799]   101 19799  1740737    29227     546        0           993 postgres

    Oct 27 07:05:31 node2 kernel: [19806]   101 19806  1740435     1393      96        0           993 postgres

    Oct 27 07:05:31 node2 kernel: [19811]   101 19811  1741021     3658     139        0           993 postgres

    Oct 27 07:05:31 node2 kernel: [19971]   101 19971  1740734    15521     449        0           993 postgres

    Oct 27 07:05:31 node2 kernel: [20007]   101 20007  1740435     1394      96        0           993 postgres

    Oct 27 07:05:31 node2 kernel: [20537]   101 20537  1891639   121701     407        0           993 postgres

    Oct 27 07:05:31 node2 kernel: [20623]   101 20623  1889590   121300     405        0           993 postgres

    Oct 27 07:05:31 node2 kernel: [20840]   101 20840  1889591   120970     406        0           993 postgres

    Oct 27 07:05:31 node2 kernel: [20851]   101 20851  1891639   122165     408        0           993 postgres

    Oct 27 07:05:31 node2 kernel: [20955]   101 20955  1889591   120056     405        0           993 postgres

    Oct 27 07:05:31 node2 kernel: [21117]   101 21117  1891639   121217     405        0           993 postgres

    Oct 27 07:05:31 node2 kernel: [21328]   101 21328  1891639   120761     404        0           993 postgres

    Oct 27 07:05:31 node2 kernel: [21332]   101 21332  1889591   119904     402        0           993 postgres

    Oct 27 07:05:31 node2 kernel: [21451]   101 21451  1891639   120435     403        0           993 postgres

    Oct 27 07:05:31 node2 kernel: [21988]   101 21988  1889591   121723     407        0           993 postgres

    Oct 27 07:05:31 node2 kernel: [ 2456]   101  2456  1889592   120006     406        0           993 postgres

    Oct 27 07:05:31 node2 kernel: [ 2456]   101  2456  1889592   120006     406        0           993 postgres

    Oct 27 07:05:31 node2 kernel: [32261]   101 32261  1878518   100278     365        0           993 postgres

    Oct 27 07:05:31 node2 kernel: [ 3352]   101  3352  1926434   149555     459        0           993 postgres

    Oct 27 07:05:31 node2 kernel: [ 3893]   101  3893  1926436   149552     459        0           993 postgres

    Oct 27 07:05:31 node2 kernel: [15217]   101 15217  1880573   101541     370        0           993 postgres

    Oct 27 07:05:31 node2 kernel: [15246]   101 15246  1879605   101892     368        0           993 postgres

    Oct 27 07:05:31 node2 kernel: [14774]   101 14774  1773892    26263     219        0           993 postgres

    Oct 27 07:05:31 node2 kernel: [15904]   101 15904  1773886    26721     220        0           993 postgres

    Oct 27 07:05:31 node2 kernel: [ 9266]   101  9266  1744448    15139     468        0           993 postgres

    Oct 27 07:05:31 node2 kernel: [15536]   101 15536  1800641    46553     261        0           993 postgres

    Oct 27 07:05:31 node2 kernel: [21552]   101 21552  1918253   139792     444        0           993 postgres

    Oct 27 07:05:31 node2 kernel: [22239]   101 22239  1769356    21754     210        0           993 postgres

    Oct 27 07:05:31 node2 kernel: [13344]   101 13344  1767312    19434     206        0           993 postgres

    Oct 27 07:05:31 node2 kernel: [27568]   101 27568  1743509    11079     327        0           993 postgres

    Oct 27 07:05:31 node2 kernel: [27754]   101 27754  1743483    10910     313        0           993 postgres

    Oct 27 07:05:31 node2 kernel: [28036]   101 28036  1743550    12152     385        0           993 postgres

    Oct 27 07:05:31 node2 kernel: [ 7772]   101  7772  1743520    11171     332        0           993 postgres

    Oct 27 07:05:31 node2 kernel: [20855]   101 20855  1743564    13723     422        0           993 postgres

    Oct 27 07:05:31 node2 kernel: [30883]   101 30883  1743518    11180     333        0           993 postgres

    Oct 27 07:05:31 node2 kernel: [31463]   101 31463  1740560     3099     137        0           993 postgres

    Oct 27 07:05:31 node2 kernel: [31466]   101 31466  1740560     3084     137        0           993 postgres

    Oct 27 07:05:31 node2 kernel: [31475]   101 31475  1740560     3101     137        0           993 postgres

    Oct 27 07:05:31 node2 kernel: [31485]   101 31485  1740560     3080     137        0           993 postgres

    Oct 27 07:05:31 node2 kernel: [31513]   101 31513  1740560     3082     137        0           993 postgres

    Oct 27 07:05:31 node2 kernel: [ 3321]   101  3321  1753416    13629     164        0           993 postgres

    Oct 27 07:05:31 node2 kernel: [ 3325]   101  3325  1753416    13756     164        0           993 postgres

    Oct 27 07:05:31 node2 kernel: [16159]   101 16159  1743331     6323     310        0           993 postgres

    Oct 27 07:05:31 node2 kernel: [32368]   101 32368  1743330     6156     311        0           993 postgres

    Oct 27 07:05:31 node2 kernel: [ 1528]   101  1528  2009656    24435     346        0           993 postgres

    -- more lines...

    Oct 27 07:05:31 node2 kernel: Memory cgroup out of memory: Kill process 19609 (postgres) score 1526 or sacrifice child

    Oct 27 07:05:31 node2 kernel: Killed process 19609 (postgres), UID 101, total-vm:23637472kB, anon-rss:16612848kB, file-rss:19156kB, shmem-rss:824248kB

    Oct 27 07:05:32 node2 etcd: 2023-10-27 07:05:32.607769 W | etcdserver: read-only range request "key:\"/registry/services/endpoints/default/kubernetes\" " with result "$

    Oct 27 07:05:34 node2 kubelet: I1027 07:05:34.624970   24409 kubelet.go:1926] SyncLoop (UPDATE, "api"): "postgres-cluster-0_postgres(5984f099-30b0-4506-873$

    Oct 27 07:05:38 node2 kubelet: I1027 07:05:38.823153   24409 setters.go:86] Using node IP: "x.x.x.x"

    Oct 27 07:05:44 node2 kubelet: I1027 07:05:44.011520   24409 container_manager_linux.go:490] [ContainerManager]: Discovered runtime cgroups name: /systemd/system.slice

    Oct 27 07:05:44 node2 kubelet: I1027 07:05:44.826318   24409 kubelet.go:1926] SyncLoop (UPDATE, "api"): "postgres-cluster-0_postgres(5984f099-30b0-4506-873$

    Oct 27 07:05:48 node2 kubelet: I1027 07:05:48.915249   24409 setters.go:86] Using node IP: "x.x.x.x"

    Oct 27 07:05:55 node2 kubelet: I1027 07:05:55.104931   24409 kubelet.go:1926] SyncLoop (UPDATE, "api"): "postgres-cluster-0_postgres(5984f099-30b0-4506-873$

    Oct 27 07:05:58 node2 kubelet: I1027 07:05:58.997936   24409 setters.go:86] Using node IP: "x.x.x.x"

    Oct 27 07:06:01 node2 systemd: Started Session 332365 of user root.

    Oct 27 07:06:06 node2 kubelet: I1027 07:06:06.133210   24409 kubelet.go:1926] SyncLoop (UPDATE, "api"): "postgres-cluster-0_postgres(5984f099-30b0-4506-873$

    Oct 27 07:06:08 node2 kubelet: I1027 07:06:08.029905   24409 kubelet_getters.go:178] "Pod status updated" pod="kube-system/kube-scheduler-node2" status=Running

    Oct 27 07:06:08 node2 kubelet: I1027 07:06:08.029966   24409 kubelet_getters.go:178] "Pod status updated" pod="kube-system/kube-apiserver-node2" status=Running

    Oct 27 07:06:08 node2 kubelet: I1027 07:06:08.029996   24409 kubelet_getters.go:178] "Pod status updated" pod="kube-system/kube-controller-manager-node2" status=Ru$

    Oct 27 07:06:08 node2 kubelet: E1027 07:06:08.874227   24409 remote_runtime.go:392] ExecSync 28f30220f15ec2a26071c2175f24f77d1b20870fcb3c24bc659d824223d93deb '/usr/bin$

    Oct 27 07:06:08 node2 kubelet: I1027 07:06:08.874326   24409 prober.go:117] Readiness probe for "calico-kube-controllers-7c5b64bf96-q6rjm_kube-system(14519a12-7721-43b$

    Oct 27 07:06:09 node2 kubelet: I1027 07:06:09.090193   24409 setters.go:86] Using node IP: "x.x.x.x"

    Oct 27 07:06:09 node2 etcd: 2023-10-27 07:06:09.223921 I | mvcc: store.index: compact 76535976

    Oct 27 07:06:09 node2 etcd: 2023-10-27 07:06:09.253047 I | mvcc: finished scheduled compaction at 76535976 (took 28.190376ms)

    Oct 27 07:06:15 node2 kubelet: I1027 07:06:15.232655   24409 kubelet.go:1926] SyncLoop (UPDATE, "api"): "postgres-cluster-0_postgres(5984f099-30b0-4506-873$

    Oct 27 07:06:19 node2 kubelet: W1027 07:06:19.053762   24409 kubelet_pods.go:880] Unable to retrieve pull secret /registrypullsecret for /-arcmana$

    Oct 27 07:06:19 node2 kubelet: I1027 07:06:19.198356   24409 setters.go:86] Using node IP: "x.x.x.x"

    Oct 27 07:06:25 node2 kubelet: I1027 07:06:25.132514   24409 kubelet.go:1926] SyncLoop (UPDATE, "api"): "postgres-cluster-0_postgres(5984f099-30b0-4506-873$

    Oct 27 07:06:29 node2 kubelet: I1027 07:06:29.279555   24409 setters.go:86] Using node IP: "x.x.x.x"

    Oct 27 07:06:34 node2 kubelet: I1027 07:06:34.596501   24409 kubelet.go:1926] SyncLoop (UPDATE, "api"): "postgres-cluster-0_postgres(5984f099-30b0-4506-873$

    Oct 27 07:06:36 node2 etcd: 2023-10-27 07:06:36.005277 W | etcdserver: read-only range request "key:\"/registry/ingress/\" range_end:\"/registry/ingress0\" count_only:$

    Oct 27 07:06:36 node2 etcd: 2023-10-27 07:06:36.005533 W | etcdserver: request "header:<ID:1981451119914121853 username:\"etcd-node-node1\" auth_revision:1 > txn:<$

    Oct 27 07:06:36 node2 etcd: 2023-10-27 07:06:36.005961 W | etcdserver: read-only range request "key:\"/registry/volumeattachments/\" range_end:\"/registry/volumeattach$

    Oct 27 07:06:36 node2 etcd: 2023-10-27 07:06:36.005996 W | etcdserver: read-only range request "key:\"/registry/crd.projectcalico.org/ipamblocks/\" range_end:\"/regist$

    Oct 27 07:06:36 node2 etcd: 2023-10-27 07:06:36.006012 W | etcdserver: read-only range request "key:\"/registry/leases/kube-system/kube-scheduler\" " with result "rang$

    Oct 27 07:06:36 node2 etcd: 2023-10-27 07:06:36.006092 W | etcdserver: read-only range request "key:\"/registry/namespaces/kube-system\" " with result "range_response_$

    Oct 27 07:06:36 node2 etcd: 2023-10-27 07:06:36.307799 W | etcdserver: read-only range request "key:\"/registry/apiextensions.k8s.io/customresourcedefinitions/\" range$

    Oct 27 07:06:36 node2 etcd: 2023-10-27 07:06:36.308862 W | etcdserver: read-only range request "key:\"/registry/health\" " with result "range_response_count:0 size:7" $

    Oct 27 07:06:36 node2 etcd: 2023-10-27 07:06:36.308906 W | etcdserver: read-only range request "key:\"/registry/namespaces/default\" " with result "r

IMPORTANT - This email and any attachments is intended for the above named addressee(s), and may contain information which is confidential or privileged. If you are not the intended recipient, please inform the sender immediately and delete this email: you should not copy or use this e-mail for any purpose nor disclose its contents to any person.

В списке pgsql-general по дате отправления:

Предыдущее
От: Tomek
Дата:
Сообщение: Re: pg_dump/pg_restore --jobs practical limit?
Следующее
От: Paul Förster
Дата:
Сообщение: Re: pg_checksums?