Re: postgres invoked oom-killer

Поиск
Список
Период
Сортировка
От Lacey Powers
Тема Re: postgres invoked oom-killer
Дата
Msg-id 4BE44258.2010603@commandprompt.com
обсуждение исходный текст
Ответ на Re: postgres invoked oom-killer  (Silvio Brandani <silvio.brandani@tech.sdb.it>)
Список pgsql-admin
Silvio Brandani wrote:
> Lacey Powers ha scritto:
>> Silvio Brandani wrote:
>>> We have a postgres 8.3.8 on linux
>>>
>>> We get following messages int /var/log/messages:
>>>
>>> May  6 22:31:01 pgblade02 kernel: postgres invoked oom-killer:
>>> gfp_mask=0x201d2, order=0, oomkilladj=0
>>> May  6 22:31:01 pgblade02 kernel:
>>> May  6 22:31:01 pgblade02 kernel: Call Trace:
>>> May  6 22:31:19 pgblade02 kernel:  [<ffffffff800bed05>]
>>> out_of_memory+0x8e/0x2f5
>>> May  6 22:31:19 pgblade02 kernel:  [<ffffffff8000f071>]
>>> __alloc_pages+0x22b/0x2b4
>>> May  6 22:31:19 pgblade02 kernel:  [<ffffffff80012720>]
>>> __do_page_cache_readahead+0x95/0x1d9
>>> May  6 22:31:19 pgblade02 kernel:  [<ffffffff800618e1>]
>>> __wait_on_bit_lock+0x5b/0x66
>>> May  6 22:31:19 pgblade02 kernel:  [<ffffffff881fdc61>]
>>> :dm_mod:dm_any_congested+0x38/0x3f
>>> May  6 22:31:19 pgblade02 kernel:  [<ffffffff800130ab>]
>>> filemap_nopage+0x148/0x322
>>> May  6 22:31:19 pgblade02 kernel:  [<ffffffff800087ed>]
>>> __handle_mm_fault+0x1f8/0xdf4
>>> May  6 22:31:19 pgblade02 kernel:  [<ffffffff80064a6a>]
>>> do_page_fault+0x4b8/0x81d
>>> May  6 22:31:19 pgblade02 kernel:  [<ffffffff80060f29>]
>>> thread_return+0x0/0xeb
>>> May  6 22:31:19 pgblade02 kernel:  [<ffffffff8005bde9>]
>>> error_exit+0x0/0x84
>>> May  6 22:31:27 pgblade02 kernel:
>>> May  6 22:31:28 pgblade02 kernel: Mem-info:
>>> May  6 22:31:28 pgblade02 kernel: Node 0 DMA per-cpu:
>>> May  6 22:31:28 pgblade02 kernel: cpu 0 hot: high 0, batch 1 used:0
>>> May  6 22:31:28 pgblade02 kernel: cpu 0 cold: high 0, batch 1 used:0
>>> May  6 22:31:28 pgblade02 kernel: cpu 1 hot: high 0, batch 1 used:0
>>> May  6 22:31:28 pgblade02 kernel: cpu 1 cold: high 0, batch 1 used:0
>>> May  6 22:31:28 pgblade02 kernel: cpu 2 hot: high 0, batch 1 used:0
>>> May  6 22:31:28 pgblade02 kernel: cpu 2 cold: high 0, batch 1 used:0
>>> May  6 22:31:28 pgblade02 kernel: cpu 3 hot: high 0, batch 1 used:0
>>> May  6 22:31:28 pgblade02 kernel: cpu 3 cold: high 0, batch 1 used:0
>>> May  6 22:31:28 pgblade02 kernel: Node 0 DMA32 per-cpu:
>>> May  6 22:31:28 pgblade02 kernel: cpu 0 hot: high 186, batch 31 used:27
>>> May  6 22:31:29 pgblade02 kernel: cpu 0 cold: high 62, batch 15 used:54
>>> May  6 22:31:29 pgblade02 kernel: cpu 1 hot: high 186, batch 31 used:23
>>> May  6 22:31:29 pgblade02 kernel: cpu 1 cold: high 62, batch 15 used:49
>>> May  6 22:31:29 pgblade02 kernel: cpu 2 hot: high 186, batch 31 used:12
>>> May  6 22:31:29 pgblade02 kernel: cpu 2 cold: high 62, batch 15 used:14
>>> May  6 22:31:29 pgblade02 kernel: cpu 3 hot: high 186, batch 31 used:50
>>> May  6 22:31:29 pgblade02 kernel: cpu 3 cold: high 62, batch 15 used:60
>>> May  6 22:31:29 pgblade02 kernel: Node 0 Normal per-cpu:
>>> May  6 22:31:29 pgblade02 kernel: cpu 0 hot: high 186, batch 31 used:5
>>> May  6 22:31:29 pgblade02 kernel: cpu 0 cold: high 62, batch 15 used:48
>>> May  6 22:31:29 pgblade02 kernel: cpu 1 hot: high 186, batch 31 used:11
>>> May  6 22:31:29 pgblade02 kernel: cpu 1 cold: high 62, batch 15 used:39
>>> May  6 22:31:29 pgblade02 kernel: cpu 2 hot: high 186, batch 31 used:14
>>> May  6 22:31:29 pgblade02 kernel: cpu 2 cold: high 62, batch 15 used:57
>>> May  6 22:31:29 pgblade02 kernel: cpu 3 hot: high 186, batch 31 used:94
>>> May  6 22:31:29 pgblade02 kernel: cpu 3 cold: high 62, batch 15 used:36
>>> May  6 22:31:29 pgblade02 kernel: Node 0 HighMem per-cpu: empty
>>> May  6 22:31:29 pgblade02 kernel: Free pages:       41788kB (0kB
>>> HighMem)
>>> May  6 22:31:29 pgblade02 kernel: Active:974250 inactive:920579
>>> dirty:0 writeback:0 unstable:0 free:10447 slab:11470 mapped-file:985
>>> mapped-anon:1848625 pagetables:111027
>>> May  6 22:31:29 pgblade02 kernel: Node 0 DMA free:11172kB min:12kB
>>> low:12kB high:16kB active:0kB inactive:0kB present:10816kB
>>> pages_scanned:0 all_unreclaimable? yes
>>> May  6 22:31:29 pgblade02 kernel: lowmem_reserve[]: 0 3254 8052 8052
>>> May  6 22:31:29 pgblade02 kernel: Node 0 DMA32 free:23804kB
>>> min:4636kB low:5792kB high:6952kB active:1555260kB
>>> inactive:1566144kB present:3332668kB pages_scanned:35703257
>>> all_unreclaimable? yes
>>> May  6 22:31:29 pgblade02 kernel: lowmem_reserve[]: 0 0 4797 4797
>>> May  6 22:31:29 pgblade02 kernel: Node 0 Normal free:6812kB
>>> min:6836kB low:8544kB high:10252kB active:2342332kB
>>> inactive:2115836kB present:4912640kB pages_scanned:10165709
>>> all_unreclaimable? yes
>>> May  6 22:31:29 pgblade02 kernel: lowmem_reserve[]: 0 0 0 0
>>> May  6 22:31:29 pgblade02 kernel: Node 0 HighMem free:0kB min:128kB
>>> low:128kB high:128kB active:0kB inactive:0kB present:0kB
>>> pages_scanned:0 all_unreclaimable? no
>>> May  6 22:31:29 pgblade02 kernel: lowmem_reserve[]: 0 0 0 0
>>> May  6 22:31:29 pgblade02 kernel: Node 0 DMA: 3*4kB 5*8kB 3*16kB
>>> 6*32kB 4*64kB 3*128kB 0*256kB 0*512kB 2*1024kB 0*2048kB 2*4096kB =
>>> 11172kB
>>> May  6 22:31:29 pgblade02 kernel: Node 0 DMA32: 27*4kB 0*8kB 1*16kB
>>> 0*32kB 2*64kB 4*128kB 0*256kB 1*512kB 0*1024kB 1*2048kB 5*4096kB =
>>> 23804kB
>>> May  6 22:31:29 pgblade02 ker
>>> if it asks for more memory than is actually available.
>>> nel: Node 0 Normal: 21*4kB 9*8kB 26*16kB 3*32kB 6*64kB 5*128kB
>>> 0*256kB 0*512kB 1*1024kB 0*2048kB 1*4096kB = 6812kB
>>> May  6 22:31:29 pgblade02 kernel: Node 0 HighMem: empty
>>> May  6 22:31:29 pgblade02 kernel: Swap cache: add 71286821, delete
>>> 71287152, find 207780333/216904318, race 1387+10506
>>> May  6 22:31:29 pgblade02 kernel: Free swap  = 0kB
>>> May  6 22:31:30 pgblade02 kernel: Total swap = 8388600kB
>>> May  6 22:31:30 pgblade02 kernel: Free swap:            0kB
>>> May  6 22:31:30 pgblade02 kernel: 2293759 pages of RAM
>>> May  6 22:31:30 pgblade02 kernel: 249523 reserved pages
>>> May  6 22:31:30 pgblade02 kernel: 56111 pages shared
>>> May  6 22:31:30 pgblade02 kernel: 260 pages swap cached
>>> May  6 22:31:30 pgblade02 kernel: Out of memory: Killed process
>>> 29076 (postgres).
>>>
>>>
>>> We get folloowing errors in the postgres log:
>>>
>>> A couple of time:
>>> 2010-05-06 22:26:28 CEST [23001]: [2-1] WARNING:  worker took too
>>> long to start; cancelled
>>> Then:
>>> 2010-05-06 22:31:21 CEST [29059]: [27-1] LOG:  system logger process
>>> (PID 29076) was terminated by signal 9: Killed
>>> Finally:
>>> 2010-05-06 22:50:20 CEST [29059]: [28-1] LOG:  background writer
>>> process (PID 22999) was terminated by signal 9: Killed
>>> 2010-05-06 22:50:20 CEST [29059]: [29-1] LOG:  terminating any other
>>> active server processes
>>>
>>> Any help higly apprecaited,
>>>
>>> ---
>>>
>>
>> Hello Silvio,
>>
>> Is this machine dedicated to PostgreSQL?
>>
>> If so, I'd recommend adding these two parameters to your sysctl.conf
>>
>> vm.overcommit_memory = 2
>> vm.overcommit_ratio = 0
>>
>> So that OOMKiller is turned off.
>>
>> PostgreSQL should gracefully degrade if a malloc() fails because it
>> asks for too much memory.
>>
>> Hope that helps. =)
>>
>> Regards,
>> Lacey
>>
>>
>
> Thanks a lot,
> yes the server is dedicated to PostgreSQL.
>
> Could be a bug of PostgreSQL the fact that the system went Out of
> Memory?? Wath can be the cause of it?
>
> Regards,
> Silvio
>

Hello Silvio,

This isn't a bug in PostgreSQL.

OOMKiller is a OS level application, and is designed to free up memory
by terminating a low priority process.

So, something filled up the available memory in your server, and
OOMKiller then decided to ungracefully terminate PostgreSQL. =(

It's equivalent to sending a kill -9 <pid/> to PostgreSQL, which is not
a good thing to do, ever.

If you have sar running (or other system resource logging), and pair
that data with other log data, you might be able to get an idea from
that, as to what might have caused the out of memory condition, if
you're interested.

But, since this is a dedicated machine, if you just add the parameters
to your sysctl.conf, this shouldn't happen again. =)

Hope that helps. =)

Regards,

Lacey

--
Lacey Powers

The PostgreSQL Company - Command Prompt, Inc. 1.503.667.4564 ext 104
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


В списке pgsql-admin по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: ERROR: XX001 (Critical and Urgent)
Следующее
От: "Kevin Grittner"
Дата:
Сообщение: Re: help me on creation cube