Обсуждение: Abnormally high memory usage/OOM triggered
Hello, I'm troubleshooting a problem with a Postgres installation (Linux): a client process got killed by OOM while executing an update statement, how can I avoid it in the future? From kernel logs it appears that the client process was using ~ 19GB of total virtual RAM when it was killed, which seems way too high. Does my configuration look reasonable? I just don't understand how it could possibly use up 19 GB of memory based on the configuration below. Is there a memory leak in there somewhere? I'm using Postgres 9.4.8 on x86_64-redhat-linux-gnu with 16GB of physical RAM and 8GB of swap space. Postgres configuration: ======================= wal_level = hot_standby max_wal_senders = 3 checkpoint_segments = 20 checkpoint_completion_target = 0.8 wal_keep_segments = 500 hot_standby = on max_standby_streaming_delay = 10s maintenance_work_mem = 128MB wal_sender_timeout = 20s wal_receiver_status_interval = 10s shared_buffers = 2560MB maintenance_work_mem = 256MB autovacuum_max_workers = 3 autovacuum_work_mem = -1 work_mem = 15695kB effective_cache_size = 8192MB max_connections = 200 /var/log/messages: ======================= (See line that says "Killed process 10540 ..." towards the end) Jan 16 17:08:37 aimapp1 kernel: ubiatn invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0, oom_score_adj=0 Jan 16 17:08:37 aimapp1 kernel: ubiatn cpuset=/ mems_allowed=0-1 Jan 16 17:08:37 aimapp1 kernel: Pid: 7181, comm: ubiatn Not tainted 2.6.32-504.el6.x86_64 #1 Jan 16 17:08:37 aimapp1 kernel: Call Trace: Jan 16 17:08:37 aimapp1 kernel: [<ffffffff810d40c1>] ? cpuset_print_task_mems_allowed+0x91/0xb0 Jan 16 17:08:37 aimapp1 kernel: [<ffffffff81127300>] ? dump_header+0x90/0x1b0 Jan 16 17:08:37 aimapp1 kernel: [<ffffffff8122ea2c>] ? security_real_capable_noaudit+0x3c/0x70 Jan 16 17:08:37 aimapp1 kernel: [<ffffffff81127782>] ? oom_kill_process+0x82/0x2a0 Jan 16 17:08:37 aimapp1 kernel: [<ffffffff811276c1>] ? select_bad_process+0xe1/0x120 Jan 16 17:08:37 aimapp1 kernel: [<ffffffff81127bc0>] ? out_of_memory+0x220/0x3c0 Jan 16 17:08:37 aimapp1 kernel: [<ffffffff811344df>] ? __alloc_pages_nodemask+0x89f/0x8d0 Jan 16 17:08:37 aimapp1 kernel: [<ffffffff8116c69a>] ? alloc_pages_current+0xaa/0x110 Jan 16 17:08:37 aimapp1 kernel: [<ffffffff811246f7>] ? __page_cache_alloc+0x87/0x90 Jan 16 17:08:37 aimapp1 kernel: [<ffffffff811240de>] ? find_get_page+0x1e/0xa0 Jan 16 17:08:37 aimapp1 kernel: [<ffffffff81125697>] ? filemap_fault+0x1a7/0x500 Jan 16 17:08:37 aimapp1 kernel: [<ffffffff8114eae4>] ? __do_fault+0x54/0x530 Jan 16 17:08:37 aimapp1 kernel: [<ffffffff8114f0b7>] ? handle_pte_fault+0xf7/0xb00 Jan 16 17:08:37 aimapp1 kernel: [<ffffffff81063bf3>] ? perf_event_task_sched_out+0x33/0x70 Jan 16 17:08:37 aimapp1 kernel: [<ffffffff810097cc>] ? __switch_to+0x1ac/0x320 Jan 16 17:08:37 aimapp1 kernel: [<ffffffff8114fcea>] ? handle_mm_fault+0x22a/0x300 Jan 16 17:08:37 aimapp1 kernel: [<ffffffff815299be>] ? thread_return+0x4e/0x7d0 Jan 16 17:08:37 aimapp1 kernel: [<ffffffff8104d0d8>] ? __do_page_fault+0x138/0x480 Jan 16 17:08:37 aimapp1 kernel: [<ffffffff810a3def>] ? hrtimer_try_to_cancel+0x3f/0xd0 Jan 16 17:08:37 aimapp1 kernel: [<ffffffff810a3ea2>] ? hrtimer_cancel+0x22/0x30 Jan 16 17:08:37 aimapp1 kernel: [<ffffffff8152c053>] ? do_nanosleep+0x93/0xc0 Jan 16 17:08:37 aimapp1 kernel: [<ffffffff810a3f74>] ? hrtimer_nanosleep+0xc4/0x180 Jan 16 17:08:37 aimapp1 kernel: [<ffffffff810a2dd0>] ? hrtimer_wakeup+0x0/0x30 Jan 16 17:08:37 aimapp1 kernel: [<ffffffff8152ffbe>] ? do_page_fault+0x3e/0xa0 Jan 16 17:08:37 aimapp1 kernel: [<ffffffff8152d375>] ? page_fault+0x25/0x30 Jan 16 17:08:37 aimapp1 kernel: Mem-Info: Jan 16 17:08:37 aimapp1 kernel: Node 0 DMA per-cpu: Jan 16 17:08:37 aimapp1 kernel: CPU 0: hi: 0, btch: 1 usd: 0 Jan 16 17:08:37 aimapp1 kernel: CPU 1: hi: 0, btch: 1 usd: 0 Jan 16 17:08:37 aimapp1 kernel: CPU 2: hi: 0, btch: 1 usd: 0 Jan 16 17:08:37 aimapp1 kernel: CPU 3: hi: 0, btch: 1 usd: 0 Jan 16 17:08:37 aimapp1 kernel: CPU 4: hi: 0, btch: 1 usd: 0 Jan 16 17:08:37 aimapp1 kernel: CPU 5: hi: 0, btch: 1 usd: 0 Jan 16 17:08:37 aimapp1 kernel: CPU 6: hi: 0, btch: 1 usd: 0 Jan 16 17:08:37 aimapp1 kernel: CPU 7: hi: 0, btch: 1 usd: 0 Jan 16 17:08:37 aimapp1 kernel: CPU 8: hi: 0, btch: 1 usd: 0 Jan 16 17:08:37 aimapp1 kernel: CPU 9: hi: 0, btch: 1 usd: 0 Jan 16 17:08:37 aimapp1 kernel: CPU 10: hi: 0, btch: 1 usd: 0 Jan 16 17:08:37 aimapp1 kernel: CPU 11: hi: 0, btch: 1 usd: 0 Jan 16 17:08:37 aimapp1 kernel: CPU 12: hi: 0, btch: 1 usd: 0 Jan 16 17:08:37 aimapp1 kernel: CPU 13: hi: 0, btch: 1 usd: 0 Jan 16 17:08:37 aimapp1 kernel: CPU 14: hi: 0, btch: 1 usd: 0 Jan 16 17:08:37 aimapp1 kernel: CPU 15: hi: 0, btch: 1 usd: 0 Jan 16 17:08:37 aimapp1 kernel: CPU 16: hi: 0, btch: 1 usd: 0 Jan 16 17:08:37 aimapp1 kernel: CPU 17: hi: 0, btch: 1 usd: 0 Jan 16 17:08:37 aimapp1 kernel: CPU 18: hi: 0, btch: 1 usd: 0 Jan 16 17:08:37 aimapp1 kernel: CPU 19: hi: 0, btch: 1 usd: 0 Jan 16 17:08:37 aimapp1 kernel: CPU 20: hi: 0, btch: 1 usd: 0 Jan 16 17:08:37 aimapp1 kernel: CPU 21: hi: 0, btch: 1 usd: 0 Jan 16 17:08:37 aimapp1 kernel: CPU 22: hi: 0, btch: 1 usd: 0 Jan 16 17:08:37 aimapp1 kernel: CPU 23: hi: 0, btch: 1 usd: 0 Jan 16 17:08:37 aimapp1 kernel: Node 0 DMA32 per-cpu: Jan 16 17:08:37 aimapp1 kernel: CPU 0: hi: 186, btch: 31 usd: 0 Jan 16 17:08:37 aimapp1 kernel: CPU 1: hi: 186, btch: 31 usd: 0 Jan 16 17:08:37 aimapp1 kernel: CPU 2: hi: 186, btch: 31 usd: 0 Jan 16 17:08:37 aimapp1 kernel: CPU 3: hi: 186, btch: 31 usd: 0 Jan 16 17:08:37 aimapp1 kernel: CPU 4: hi: 186, btch: 31 usd: 0 Jan 16 17:08:37 aimapp1 kernel: CPU 5: hi: 186, btch: 31 usd: 0 Jan 16 17:08:37 aimapp1 kernel: CPU 6: hi: 186, btch: 31 usd: 0 Jan 16 17:08:37 aimapp1 kernel: CPU 7: hi: 186, btch: 31 usd: 0 Jan 16 17:08:37 aimapp1 kernel: CPU 8: hi: 186, btch: 31 usd: 0 Jan 16 17:08:37 aimapp1 kernel: CPU 9: hi: 186, btch: 31 usd: 0 Jan 16 17:08:37 aimapp1 kernel: CPU 10: hi: 186, btch: 31 usd: 0 Jan 16 17:08:37 aimapp1 kernel: CPU 11: hi: 186, btch: 31 usd: 0 Jan 16 17:08:37 aimapp1 kernel: CPU 12: hi: 186, btch: 31 usd: 0 Jan 16 17:08:37 aimapp1 kernel: CPU 13: hi: 186, btch: 31 usd: 0 Jan 16 17:08:37 aimapp1 kernel: CPU 14: hi: 186, btch: 31 usd: 0 Jan 16 17:08:37 aimapp1 kernel: CPU 15: hi: 186, btch: 31 usd: 0 Jan 16 17:08:37 aimapp1 kernel: CPU 16: hi: 186, btch: 31 usd: 0 Jan 16 17:08:37 aimapp1 kernel: CPU 17: hi: 186, btch: 31 usd: 0 Jan 16 17:08:37 aimapp1 kernel: CPU 18: hi: 186, btch: 31 usd: 0 Jan 16 17:08:37 aimapp1 kernel: CPU 19: hi: 186, btch: 31 usd: 0 Jan 16 17:08:37 aimapp1 kernel: CPU 20: hi: 186, btch: 31 usd: 0 Jan 16 17:08:37 aimapp1 kernel: CPU 21: hi: 186, btch: 31 usd: 0 Jan 16 17:08:37 aimapp1 kernel: CPU 22: hi: 186, btch: 31 usd: 0 Jan 16 17:08:37 aimapp1 kernel: CPU 23: hi: 186, btch: 31 usd: 0 Jan 16 17:08:37 aimapp1 kernel: Node 0 Normal per-cpu: Jan 16 17:08:37 aimapp1 kernel: CPU 0: hi: 186, btch: 31 usd: 0 Jan 16 17:08:37 aimapp1 kernel: CPU 1: hi: 186, btch: 31 usd: 0 Jan 16 17:08:37 aimapp1 kernel: CPU 2: hi: 186, btch: 31 usd: 0 Jan 16 17:08:37 aimapp1 kernel: CPU 3: hi: 186, btch: 31 usd: 0 Jan 16 17:08:37 aimapp1 kernel: CPU 4: hi: 186, btch: 31 usd: 0 Jan 16 17:08:37 aimapp1 kernel: CPU 5: hi: 186, btch: 31 usd: 0 Jan 16 17:08:37 aimapp1 kernel: CPU 6: hi: 186, btch: 31 usd: 0 Jan 16 17:08:37 aimapp1 kernel: CPU 7: hi: 186, btch: 31 usd: 0 Jan 16 17:08:37 aimapp1 kernel: CPU 8: hi: 186, btch: 31 usd: 0 Jan 16 17:08:37 aimapp1 kernel: CPU 9: hi: 186, btch: 31 usd: 0 Jan 16 17:08:37 aimapp1 kernel: CPU 10: hi: 186, btch: 31 usd: 0 Jan 16 17:08:37 aimapp1 kernel: CPU 11: hi: 186, btch: 31 usd: 0 Jan 16 17:08:37 aimapp1 kernel: CPU 12: hi: 186, btch: 31 usd: 0 Jan 16 17:08:37 aimapp1 kernel: CPU 13: hi: 186, btch: 31 usd: 0 Jan 16 17:08:37 aimapp1 kernel: CPU 14: hi: 186, btch: 31 usd: 0 Jan 16 17:08:37 aimapp1 kernel: CPU 15: hi: 186, btch: 31 usd: 0 Jan 16 17:08:37 aimapp1 kernel: CPU 16: hi: 186, btch: 31 usd: 0 Jan 16 17:08:37 aimapp1 kernel: CPU 17: hi: 186, btch: 31 usd: 0 Jan 16 17:08:37 aimapp1 kernel: CPU 18: hi: 186, btch: 31 usd: 0 Jan 16 17:08:37 aimapp1 kernel: CPU 19: hi: 186, btch: 31 usd: 0 Jan 16 17:08:37 aimapp1 kernel: CPU 20: hi: 186, btch: 31 usd: 0 Jan 16 17:08:37 aimapp1 kernel: CPU 21: hi: 186, btch: 31 usd: 0 Jan 16 17:08:37 aimapp1 kernel: CPU 22: hi: 186, btch: 31 usd: 0 Jan 16 17:08:37 aimapp1 kernel: CPU 23: hi: 186, btch: 31 usd: 0 Jan 16 17:08:37 aimapp1 kernel: Node 1 Normal per-cpu: Jan 16 17:08:37 aimapp1 kernel: CPU 0: hi: 186, btch: 31 usd: 0 Jan 16 17:08:37 aimapp1 kernel: CPU 1: hi: 186, btch: 31 usd: 0 Jan 16 17:08:37 aimapp1 kernel: CPU 2: hi: 186, btch: 31 usd: 0 Jan 16 17:08:37 aimapp1 kernel: CPU 3: hi: 186, btch: 31 usd: 0 Jan 16 17:08:37 aimapp1 kernel: CPU 4: hi: 186, btch: 31 usd: 0 Jan 16 17:08:37 aimapp1 kernel: CPU 5: hi: 186, btch: 31 usd: 0 Jan 16 17:08:37 aimapp1 kernel: CPU 6: hi: 186, btch: 31 usd: 0 Jan 16 17:08:37 aimapp1 kernel: CPU 7: hi: 186, btch: 31 usd: 0 Jan 16 17:08:37 aimapp1 kernel: CPU 8: hi: 186, btch: 31 usd: 0 Jan 16 17:08:37 aimapp1 kernel: CPU 9: hi: 186, btch: 31 usd: 0 Jan 16 17:08:37 aimapp1 kernel: CPU 10: hi: 186, btch: 31 usd: 0 Jan 16 17:08:37 aimapp1 kernel: CPU 11: hi: 186, btch: 31 usd: 0 Jan 16 17:08:37 aimapp1 kernel: CPU 12: hi: 186, btch: 31 usd: 0 Jan 16 17:08:37 aimapp1 kernel: CPU 13: hi: 186, btch: 31 usd: 0 Jan 16 17:08:37 aimapp1 kernel: CPU 14: hi: 186, btch: 31 usd: 0 Jan 16 17:08:37 aimapp1 kernel: CPU 15: hi: 186, btch: 31 usd: 0 Jan 16 17:08:37 aimapp1 kernel: CPU 16: hi: 186, btch: 31 usd: 0 Jan 16 17:08:37 aimapp1 kernel: CPU 17: hi: 186, btch: 31 usd: 0 Jan 16 17:08:37 aimapp1 kernel: CPU 18: hi: 186, btch: 31 usd: 0 Jan 16 17:08:37 aimapp1 kernel: CPU 19: hi: 186, btch: 31 usd: 0 Jan 16 17:08:37 aimapp1 kernel: CPU 20: hi: 186, btch: 31 usd: 0 Jan 16 17:08:37 aimapp1 kernel: CPU 21: hi: 186, btch: 31 usd: 0 Jan 16 17:08:37 aimapp1 kernel: CPU 22: hi: 186, btch: 31 usd: 0 Jan 16 17:08:37 aimapp1 kernel: CPU 23: hi: 186, btch: 31 usd: 0 Jan 16 17:08:37 aimapp1 kernel: active_anon:3428843 inactive_anon:500193 isolated_anon:0 Jan 16 17:08:37 aimapp1 kernel: active_file:311 inactive_file:90 isolated_file:0 Jan 16 17:08:37 aimapp1 kernel: unevictable:0 dirty:0 writeback:48 unstable:0 Jan 16 17:08:37 aimapp1 kernel: free:32365 slab_reclaimable:7217 slab_unreclaimable:19877 Jan 16 17:08:37 aimapp1 kernel: mapped:558347 shmem:665176 pagetables:26928 bounce:0 Jan 16 17:08:37 aimapp1 kernel: Node 0 DMA free:15748kB min:84kB low:104kB high:124kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15364kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes Jan 16 17:08:37 aimapp1 kernel: lowmem_reserve[]: 0 1848 7908 7908 Jan 16 17:08:37 aimapp1 kernel: Node 0 DMA32 free:34592kB min:10404kB low:13004kB high:15604kB active_anon:1219496kB inactive_anon:341812kB active_file:20kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:1892572kB mlocked:0kB dirty:0kB writeback:36kB mapped:180384kB shmem:257528kB slab_reclaimable:4456kB slab_unreclaimable:3264kB kernel_stack:32kB pagetables:14376kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:69 all_unreclaimable? no Jan 16 17:08:37 aimapp1 kernel: lowmem_reserve[]: 0 0 6060 6060 Jan 16 17:08:37 aimapp1 kernel: Node 0 Normal free:33700kB min:34120kB low:42648kB high:51180kB active_anon:5321596kB inactive_anon:759600kB active_file:992kB inactive_file:460kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:6205440kB mlocked:0kB dirty:0kB writeback:136kB mapped:941364kB shmem:1114560kB slab_reclaimable:10220kB slab_unreclaimable:52948kB kernel_stack:4312kB pagetables:39156kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:1387 all_unreclaimable? no Jan 16 17:08:37 aimapp1 kernel: lowmem_reserve[]: 0 0 0 0 Jan 16 17:08:37 aimapp1 kernel: Node 1 Normal free:45420kB min:45496kB low:56868kB high:68244kB active_anon:7174280kB inactive_anon:899360kB active_file:232kB inactive_file:16kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:8273920kB mlocked:0kB dirty:0kB writeback:20kB mapped:1111640kB shmem:1288616kB slab_reclaimable:14192kB slab_unreclaimable:23296kB kernel_stack:1000kB pagetables:54180kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:82 all_unreclaimable? no Jan 16 17:08:37 aimapp1 kernel: lowmem_reserve[]: 0 0 0 0 Jan 16 17:08:37 aimapp1 kernel: Node 0 DMA: 3*4kB 3*8kB 2*16kB 0*32kB 1*64kB 0*128kB 1*256kB 0*512kB 1*1024kB 1*2048kB 3*4096kB = 15748kB Jan 16 17:08:37 aimapp1 kernel: Node 0 DMA32: 587*4kB 482*8kB 338*16kB 188*32kB 70*64kB 35*128kB 8*256kB 8*512kB 2*1024kB 0*2048kB 0*4096kB = 34780kB Jan 16 17:08:37 aimapp1 kernel: Node 0 Normal: 795*4kB 493*8kB 272*16kB 145*32kB 74*64kB 40*128kB 22*256kB 6*512kB 0*1024kB 0*2048kB 0*4096kB = 34676kB Jan 16 17:08:37 aimapp1 kernel: Node 1 Normal: 1113*4kB 712*8kB 358*16kB 226*32kB 104*64kB 41*128kB 21*256kB 11*512kB 1*1024kB 0*2048kB 0*4096kB = 47044kB Jan 16 17:08:37 aimapp1 kernel: 721314 total pagecache pages Jan 16 17:08:37 aimapp1 kernel: 55586 pages in swap cache Jan 16 17:08:37 aimapp1 kernel: Swap cache stats: add 16359283, delete 16303697, find 15346604869/15347200740 Jan 16 17:08:37 aimapp1 kernel: Free swap = 0kB Jan 16 17:08:37 aimapp1 kernel: Total swap = 8388604kB Jan 16 17:08:37 aimapp1 kernel: 4194303 pages RAM Jan 16 17:08:37 aimapp1 kernel: 144188 pages reserved Jan 16 17:08:37 aimapp1 kernel: 2433569 pages shared Jan 16 17:08:37 aimapp1 kernel: 3447964 pages non-shared Jan 16 17:08:37 aimapp1 kernel: [ pid ] uid tgid total_vm rss cpu oom_adj oom_score_adj name Jan 16 17:08:37 aimapp1 kernel: [ 1042] 0 1042 2663 8 6 -17 -1000 udevd Jan 16 17:08:37 aimapp1 kernel: [ 2445] 0 2445 23283 35 19 -17 -1000 auditd Jan 16 17:08:37 aimapp1 kernel: [ 2475] 0 2475 62991 609 18 0 0 rsyslogd Jan 16 17:08:37 aimapp1 kernel: [ 2541] 0 2541 4585 55 12 0 0 irqbalance Jan 16 17:08:37 aimapp1 kernel: [ 2557] 32 2557 4744 15 0 0 0 rpcbind Jan 16 17:08:37 aimapp1 kernel: [ 2577] 29 2577 5837 1 0 0 0 rpc.statd Jan 16 17:08:37 aimapp1 kernel: [ 2608] 0 2608 154350 4393 1 0 0 corosync Jan 16 17:08:37 aimapp1 kernel: [ 2711] 81 2711 5391 9 0 0 0 dbus-daemon Jan 16 17:08:37 aimapp1 kernel: [ 2729] 0 2729 47352 1 6 0 0 cupsd Jan 16 17:08:37 aimapp1 kernel: [ 2758] 0 2758 1020 0 6 0 0 acpid Jan 16 17:08:37 aimapp1 kernel: [ 2768] 68 2768 9714 292 6 0 0 hald Jan 16 17:08:37 aimapp1 kernel: [ 2769] 0 2769 5100 5 8 0 0 hald-runner Jan 16 17:08:37 aimapp1 kernel: [ 2801] 0 2801 5630 9 7 0 0 hald-addon-inpu Jan 16 17:08:37 aimapp1 kernel: [ 2810] 68 2810 4502 1 0 0 0 hald-addon-acpi Jan 16 17:08:37 aimapp1 kernel: [ 2837] 0 2837 113175 47 12 0 0 automount Jan 16 17:08:37 aimapp1 kernel: [ 2881] 0 2881 1565 0 19 0 0 mcelog Jan 16 17:08:37 aimapp1 kernel: [ 2897] 0 2897 16673 8 6 -17 -1000 sshd Jan 16 17:08:37 aimapp1 kernel: [ 2906] 38 2906 6628 42 0 0 0 ntpd Jan 16 17:08:37 aimapp1 kernel: [ 2915] 496 2915 10338 17 7 0 0 nrpe Jan 16 17:08:37 aimapp1 kernel: [ 3019] 0 3019 20333 32 6 0 0 master Jan 16 17:08:37 aimapp1 kernel: [ 3040] 89 3040 20399 39 0 0 0 qmgr Jan 16 17:08:37 aimapp1 kernel: [ 3045] 0 3045 28661 9 0 0 0 abrtd Jan 16 17:08:37 aimapp1 kernel: [ 3057] 0 3057 65362 45 0 0 0 httpd Jan 16 17:08:37 aimapp1 kernel: [ 3068] 0 3068 29341 30 0 0 0 crond Jan 16 17:08:37 aimapp1 kernel: [ 3090] 0 3090 5394 4 12 0 0 atd Jan 16 17:08:37 aimapp1 kernel: [ 3165] 0 3165 26868 25 19 0 0 pacemakerd Jan 16 17:08:37 aimapp1 kernel: [ 3173] 189 3173 28658 2694 9 0 0 cib Jan 16 17:08:37 aimapp1 kernel: [ 3175] 0 3175 26870 1336 13 0 0 stonithd Jan 16 17:08:37 aimapp1 kernel: [ 3176] 0 3176 17978 116 7 0 0 lrmd Jan 16 17:08:37 aimapp1 kernel: [ 3177] 189 3177 23416 849 0 0 0 attrd Jan 16 17:08:37 aimapp1 kernel: [ 3179] 189 3179 26847 19 12 0 0 pengine Jan 16 17:08:37 aimapp1 kernel: [ 3877] 0 3877 1016 1 6 0 0 mingetty Jan 16 17:08:37 aimapp1 kernel: [ 3879] 0 3879 1016 1 19 0 0 mingetty Jan 16 17:08:37 aimapp1 kernel: [ 3881] 0 3881 1016 1 22 0 0 mingetty Jan 16 17:08:37 aimapp1 kernel: [ 3883] 0 3883 1016 1 10 0 0 mingetty Jan 16 17:08:37 aimapp1 kernel: [ 3885] 0 3885 1016 1 21 0 0 mingetty Jan 16 17:08:37 aimapp1 kernel: [ 6813] 0 6813 2662 7 6 -17 -1000 udevd Jan 16 17:08:37 aimapp1 kernel: [ 6814] 0 6814 2662 7 6 -17 -1000 udevd Jan 16 17:08:37 aimapp1 kernel: [ 1006] 0 1006 1029170 8 12 0 0 console-kit-dae Jan 16 17:08:37 aimapp1 kernel: [ 8046] 0 8046 1016 6 1 0 0 mingetty Jan 16 17:08:37 aimapp1 kernel: [15669] 26 15669 723376 12231 16 -17 -1000 postgres Jan 16 17:08:37 aimapp1 kernel: [15671] 26 15671 44567 40 1 0 0 postgres Jan 16 17:08:37 aimapp1 kernel: [15673] 26 15673 723800 284257 6 0 0 postgres Jan 16 17:08:37 aimapp1 kernel: [15674] 26 15674 723732 31588 6 0 0 postgres Jan 16 17:08:37 aimapp1 kernel: [15675] 26 15675 45277 149 0 0 0 postgres Jan 16 17:08:37 aimapp1 kernel: [15764] 26 15764 723697 2788 11 0 0 postgres Jan 16 17:08:37 aimapp1 kernel: [15765] 26 15765 723834 175 6 0 0 postgres Jan 16 17:08:37 aimapp1 kernel: [16521] 26 16521 723914 164 16 0 0 postgres Jan 16 17:08:37 aimapp1 kernel: [ 6474] 189 6474 32269 397 0 0 0 crmd Jan 16 17:08:37 aimapp1 kernel: [14919] 0 14919 24626 88 5 0 0 sshd Jan 16 17:08:37 aimapp1 kernel: [15679] 0 15679 27580 8 12 0 0 bash Jan 16 17:08:37 aimapp1 kernel: [10151] 26 10151 729494 337404 12 0 0 postgres Jan 16 17:08:37 aimapp1 kernel: [10153] 26 10153 728940 993 1 0 0 postgres Jan 16 17:08:37 aimapp1 kernel: [10156] 26 10156 725774 1153 15 0 0 postgres Jan 16 17:08:37 aimapp1 kernel: [10157] 26 10157 725921 1061 0 0 0 postgres Jan 16 17:08:37 aimapp1 kernel: [10244] 26 10244 732881 3691 1 0 0 postgres Jan 16 17:08:37 aimapp1 kernel: [10245] 26 10245 728956 986 18 0 0 postgres Jan 16 17:08:37 aimapp1 kernel: [10248] 26 10248 732628 4575 13 0 0 postgres Jan 16 17:08:37 aimapp1 kernel: [10273] 26 10273 729495 337372 1 0 0 postgres Jan 16 17:08:37 aimapp1 kernel: [30924] 0 30924 25736 4025 2 0 0 crm_mon Jan 16 17:08:37 aimapp1 kernel: [ 4951] 0 4951 24689 118 5 0 0 sshd Jan 16 17:08:37 aimapp1 kernel: [ 5360] 0 5360 27613 8 6 0 0 bash Jan 16 17:08:37 aimapp1 kernel: [21108] 0 21108 24592 82 17 0 0 sshd Jan 16 17:08:37 aimapp1 kernel: [21171] 0 21171 27613 63 7 0 0 bash Jan 16 17:08:37 aimapp1 kernel: [31495] 0 31495 24936 95 0 0 0 sshd Jan 16 17:08:37 aimapp1 kernel: [31590] 0 31590 14463 9 12 0 0 sftp-server Jan 16 17:08:37 aimapp1 kernel: [ 7174] 500 7174 1115232 68649 0 0 0 ubiatn Jan 16 17:08:37 aimapp1 kernel: [14606] 0 14606 25237 19 2 0 0 tail Jan 16 17:08:37 aimapp1 kernel: [ 8278] 26 8278 724157 1098 1 0 0 postgres Jan 16 17:08:37 aimapp1 kernel: [ 8468] 26 8468 726500 3147 9 0 0 postgres Jan 16 17:08:37 aimapp1 kernel: [ 8679] 26 8679 727303 4279 2 0 0 postgres Jan 16 17:08:37 aimapp1 kernel: [10540] 26 10540 4738280 3427932 18 0 0 postgres Jan 16 17:08:37 aimapp1 kernel: [ 1276] 26 1276 724168 377697 9 0 0 postgres Jan 16 17:08:37 aimapp1 kernel: [ 1962] 26 1962 724145 377746 1 0 0 postgres Jan 16 17:08:37 aimapp1 kernel: [29437] 48 29437 65362 35 15 0 0 httpd Jan 16 17:08:37 aimapp1 kernel: [29438] 48 29438 65362 37 3 0 0 httpd Jan 16 17:08:37 aimapp1 kernel: [29439] 48 29439 65362 35 15 0 0 httpd Jan 16 17:08:37 aimapp1 kernel: [29440] 48 29440 65362 35 17 0 0 httpd Jan 16 17:08:37 aimapp1 kernel: [29441] 48 29441 65362 36 5 0 0 httpd Jan 16 17:08:37 aimapp1 kernel: [29442] 48 29442 65362 35 17 0 0 httpd Jan 16 17:08:37 aimapp1 kernel: [29443] 48 29443 65362 35 5 0 0 httpd Jan 16 17:08:37 aimapp1 kernel: [29444] 48 29444 65362 35 17 0 0 httpd Jan 16 17:08:37 aimapp1 kernel: [24913] 91 24913 2447253 216518 20 0 0 java Jan 16 17:08:37 aimapp1 kernel: [22150] 26 22150 729385 1230 19 0 0 postgres Jan 16 17:08:37 aimapp1 kernel: [25111] 26 25111 729385 1214 19 0 0 postgres Jan 16 17:08:37 aimapp1 kernel: [23940] 26 23940 723934 479 19 0 0 postgres Jan 16 17:08:37 aimapp1 kernel: [21174] 26 21174 724201 48139 7 0 0 postgres Jan 16 17:08:37 aimapp1 kernel: [27824] 26 27824 724201 76093 18 0 0 postgres Jan 16 17:08:37 aimapp1 kernel: [16949] 26 16949 729385 1181 20 0 0 postgres Jan 16 17:08:37 aimapp1 kernel: [ 685] 89 685 20353 243 0 0 0 pickup Jan 16 17:08:37 aimapp1 kernel: [20901] 26 20901 729417 1162 20 0 0 postgres Jan 16 17:08:37 aimapp1 kernel: Out of memory: Kill process 10540 (postgres) score 738 or sacrifice child Jan 16 17:08:37 aimapp1 kernel: Killed process 10540, UID 26, (postgres) total-vm:18953120kB, anon-rss:11637464kB, file-rss:2074240kB
On Jan 17, 2018, at 2:57 PM, Davlet Panech <dpanech@gmail.com> wrote: > > Does my configuration look reasonable? I just don't understand how it could possibly use up 19 GB of memory based on theconfiguration below. Is there a memory leak in there somewhere? It does seem awfully high, but... An update can involve a join across multiple tables. Or an update can run a trigger whichcan cascade. Either of those could result in an "accidental cross product" join, which can always blow up memory. -- Scott Ribe https://www.linkedin.com/in/scottribe/ (303) 722-0567
Davlet Panech <dpanech@gmail.com> writes: > I'm troubleshooting a problem with a Postgres installation (Linux): a > client process got killed by OOM while executing an update statement, > Is there a memory leak in there somewhere? > I'm using Postgres 9.4.8 on x86_64-redhat-linux-gnu with 16GB of > physical RAM and 8GB of swap space. I see a possibly relevant entry in the 9.4.10 release notes: Fix query-lifespan memory leak in a bulk UPDATE on a table with a PRIMARY KEY or REPLICA IDENTITY index Looking at the relevant commit (ae4760d66), it seems the leak was just a few bytes per row, but if the update touches enough rows ... regards, tom lane
On 1/17/2018 5:57 PM, scott ribe wrote: > On Jan 17, 2018, at 2:57 PM, Davlet Panech <dpanech@gmail.com> wrote: >> >> Does my configuration look reasonable? I just don't understand how it could possibly use up 19 GB of memory based on theconfiguration below. Is there a memory leak in there somewhere? > > It does seem awfully high, but... An update can involve a join across multiple tables. Or an update can run a trigger whichcan cascade. Either of those could result in an "accidental cross product" join, which can always blow up memory. There must be a way to put an upper limit on memory even for such cases. I was under the impression that parameters such as "work_mem" serve that purpose, is that not the case? So an "accidental cross product" join's memory usage is unbounded? It can't be... could somebody confirm this please? Thanks, D.
On Jan 18, 2018, at 10:13 AM, Davlet Panech <dpanech@gmail.com> wrote: > > On 1/17/2018 5:57 PM, scott ribe wrote: >> On Jan 17, 2018, at 2:57 PM, Davlet Panech <dpanech@gmail.com> wrote: >>> >>> Does my configuration look reasonable? I just don't understand how it could possibly use up 19 GB of memory based onthe configuration below. Is there a memory leak in there somewhere? >> It does seem awfully high, but... An update can involve a join across multiple tables. Or an update can run a triggerwhich can cascade. Either of those could result in an "accidental cross product" join, which can always blow up memory. > There must be a way to put an upper limit on memory even for such cases. I was under the impression that parameters suchas "work_mem" serve that purpose, is that not the case? So an "accidental cross product" join's memory usage is unbounded?It can't be... could somebody confirm this please? You are correct as far as I know, so yeah, that case should result in filling disk, not RAM -- Scott Ribe https://www.linkedin.com/in/scottribe/ (303) 722-0567
Davlet Panech <dpanech@gmail.com> writes: > On 1/17/2018 5:57 PM, scott ribe wrote: >> It does seem awfully high, but... An update can involve a join across multiple tables. Or an update can run a triggerwhich can cascade. Either of those could result in an "accidental cross product" join, which can always blow up memory. > There must be a way to put an upper limit on memory even for such cases. > I was under the impression that parameters such as "work_mem" serve that > purpose, is that not the case? So an "accidental cross product" join's > memory usage is unbounded? It can't be... could somebody confirm this > please? A large join result could blow out memory on the client side, unless the client is careful to read it in segments, which most clients aren't. I expect the server to be smarter though. regards, tom lane
On Thu, Jan 18, 2018 at 12:13 PM, Davlet Panech <dpanech@gmail.com> wrote:
KeithOn 1/17/2018 5:57 PM, scott ribe wrote:On Jan 17, 2018, at 2:57 PM, Davlet Panech <dpanech@gmail.com> wrote:There must be a way to put an upper limit on memory even for such cases. I was under the impression that parameters such as "work_mem" serve that purpose, is that not the case? So an "accidental cross product" join's memory usage is unbounded? It can't be... could somebody confirm this please?
Does my configuration look reasonable? I just don't understand how it could possibly use up 19 GB of memory based on the configuration below. Is there a memory leak in there somewhere?
It does seem awfully high, but... An update can involve a join across multiple tables. Or an update can run a trigger which can cascade. Either of those could result in an "accidental cross product" join, which can always blow up memory.
Thanks,
D.
work_mem isn't really an upper limit on overall memory usage. It's just an upper limit on how much is used in certain processes before spilling to disk. A query or group of queries can easily use up all of system memory if it's complex enough by using multiple instances of work_mem. This is why work_mem shouldn't be set any higher than necessary. The wiki explains this better
"This size is applied to each and every sort done by each user, and complex queries can use multiple working memory sort buffers. Set it to 50MB, and have 30 users submitting queries, and you are soon using 1.5GB of real memory. "
I would go with Tom's suggestion in this case, though, since that bug seems to fit the situation described by the patch he found. It's always important to be running the latest patch release to rule out a bug being the cause of an issue.
On 1/18/2018 12:45 PM, Keith wrote: > > > On Thu, Jan 18, 2018 at 12:13 PM, Davlet Panech <dpanech@gmail.com > <mailto:dpanech@gmail.com>> wrote: > > On 1/17/2018 5:57 PM, scott ribe wrote: > > On Jan 17, 2018, at 2:57 PM, Davlet Panech <dpanech@gmail.com > <mailto:dpanech@gmail.com>> wrote: > > > Does my configuration look reasonable? I just don't > understand how it could possibly use up 19 GB of memory > based on the configuration below. Is there a memory leak in > there somewhere? > > > It does seem awfully high, but... An update can involve a join > across multiple tables. Or an update can run a trigger which can > cascade. Either of those could result in an "accidental cross > product" join, which can always blow up memory. > > There must be a way to put an upper limit on memory even for such > cases. I was under the impression that parameters such as "work_mem" > serve that purpose, is that not the case? So an "accidental cross > product" join's memory usage is unbounded? It can't be... could > somebody confirm this please? > > Thanks, > D. > > > work_mem isn't really an upper limit on overall memory usage. It's just > an upper limit on how much is used in certain processes before spilling > to disk. A query or group of queries can easily use up all of system > memory if it's complex enough by using multiple instances of work_mem. > This is why work_mem shouldn't be set any higher than necessary. The > wiki explains this better > > https://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server > > "This size is applied to each and every sort done by each user, and > complex queries can use multiple working memory sort buffers. Set it to > 50MB, and have 30 users submitting queries, and you are soon using 1.5GB > of real memory. " I understand, but in my case a single server-side postgres process used 19GB, which (excluding shared memory etc) is something like a 100 times what I would expect, even for "complex" queries. > > I would go with Tom's suggestion in this case, though, since that bug > seems to fit the situation described by the patch he found. It's always > important to be running the latest patch release to rule out a bug being > the cause of an issue. OK, so it is likely a memory leak; I just wanted to rule out other explanations. Thanks to all who replied.