Обсуждение: Understand this error

Поиск
Список
Период
Сортировка

Understand this error

От
paulo matadr
Дата:
Hi all,
my database entry in mode recovery,
analyzing my pg_log I seem this:
system logger process (PID 6517) was terminated by signal 9
background writer process (PID 6519) was terminated by signal 9
terminating any other active server processes

and OS in var/logs:

 kernel:  [<ffffffff800ba475>] out_of_memory+0x53/0x267
 kernel:  [<ffffffff8000f012>] __alloc_pages+0x229/0x2b2
 kernel:  [<ffffffff80031f4b>] read_swap_cache_async+0x45/0xd8
 kernel:  [<ffffffff800bf60c>] swapin_readahead+0x60/0xd3
 kernel:  [<ffffffff80008f3a>] __handle_mm_fault+0x952/0xdf2
 kernel:  [<ffffffff800127fd>] sock_def_readable+0x34/0x5f
 kernel:  [<ffffffff80251481>] unix_dgram_sendmsg+0x43d/0x4cf
 kernel:  [<ffffffff800645a5>] do_page_fault+0x4b8/0x81d
 kernel:  [<ffffffff80037264>] do_sock_write+0xc4/0xce
 kernel:  [<ffffffff8008630f>] dequeue_task+0x18/0x37
 kernel:  [<ffffffff80060ab8>] thread_return+0x0/0xea
 kernel:  [<ffffffff8005be1d>] error_exit+0x0/0x84
 kernel:  [<ffffffff8008bfb3>] do_syslog+0x173/0x3ae
 kernel:  [<ffffffff8008bf81>] do_syslog+0x141/0x3ae
 kernel:  [<ffffffff8009b666>] autoremove_wake_function+0x0/0x2e
 kernel:  [<ffffffff800f65fd>] kmsg_read+0x3a/0x44
 kernel:  [<ffffffff8000b212>] vfs_read+0xcb/0x171
 kernel:  [<ffffffff8001145c>] sys_read+0x45/0x6e
 kernel:  [<ffffffff8005b2c1>] tracesys+0xd1/0xdc

 kernel: Free swap  = 0kB
 kernel: Total swap = 2031608kB
 kernel: Free swap:            0kB
kernel: 4390912 pages of RAM
 kernel: 280785 reserved pages
 kernel: 10222 pages shared
kernel: 4 pages swap cached
kernel: Out of memory: Killed process 6519 (postmaster).

How prenvent postgres use all memory of system?Why this happen?

Thanks for all
Paulo




 


 



                


Veja quais são os assuntos do momento no Yahoo! + Buscados: Top 10 - Celebridades - Música - Esportes

Re: Understand this error

От
Scott Marlowe
Дата:
On Thu, Apr 30, 2009 at 7:00 AM, paulo matadr <saddoness@yahoo.com.br> wrote:
> Hi all,
> my database entry in mode recovery,
> analyzing my pg_log I seem this:
> system logger process (PID 6517) was terminated by signal 9
> background writer process (PID 6519) was terminated by signal 9
> terminating any other active server processes

Yeah, you're getting bitten by the OOM killer.  What changes, if any,
have you made to the postgresql.conf file?

Re: Understand this error

От
Craig Ringer
Дата:
paulo matadr wrote:
> Hi all,
> my database entry in mode recovery,
> analyzing my pg_log I seem this:
>
> system logger process (PID 6517) was terminated by signal 9
> background writer process (PID 6519) was terminated by signal 9
> terminating any other active server processes

You haven't told us what OS you are on. Based on the log below, though,
it looks like Linux.

`kill -l' on Linux tells us that signal 9 is SIGKILL, a hard kill. That
should only happen if (a) you send it with `kill -9' or `kill -KILL' or
(b) the machine runs out of memory while in overcommit mode (the
default) and the OOM killer picks PostgreSQL as the process to terminate
to free memory.

You should NOT have your server in overcommit mode if you are running
PostgreSQL. See, in the PostgreSQL manual:

http://www.postgresql.org/docs/current/static/kernel-resources.html#AEN22235

>  kernel:  [<ffffffff800ba475>] out_of_memory+0x53/0x267
[snip]
> kernel: Out of memory: Killed process 6519 (postmaster).

> How prenvent postgres use all memory of system?Why this happen?

Read the link in the PostgreSQL manual, above.

Note that it's not very likely that PostgreSQL was the process that used
up all your memory. It was just unlucky enough to be picked as the one
to be killed, because the OOM killer is terrible at estimating which
process is using the most memory when programs like PostgreSQL have
allocated large blocks of shared memory.

--
Craig Ringer

Re: Understand this error

От
Dennis Brakhane
Дата:
On Thu, Apr 30, 2009 at 3:00 PM, paulo matadr <saddoness@yahoo.com.br> wrote:
> Hi all,
> my database entry in mode recovery,
> analyzing my pg_log I seem this:
> system logger process (PID 6517) was terminated by signal 9
> background writer process (PID 6519) was terminated by signal 9
> terminating any other active server processes

You are bitten by the OOM-killer. It can lead to severy data loss if it decides
to kill the postmaster. To avoid this, you should always set overcommit_memory
to 2 (which means off). See Section 17.4.3. here:

http://www.postgresql.org/docs/8.3/interactive/kernel-resources.html

You should *never* run a production database server in overcommit_memory mode!

Re: Understand this error

От
Tom Lane
Дата:
Craig Ringer <craig@postnewspapers.com.au> writes:
> Note that it's not very likely that PostgreSQL was the process that used
> up all your memory. It was just unlucky enough to be picked as the one
> to be killed, because the OOM killer is terrible at estimating which
> process is using the most memory when programs like PostgreSQL have
> allocated large blocks of shared memory.

It's worse than that: the OOM killer is broken by design, because it
intentionally picks on processes that have a lot of large children
--- without reference to the fact that a lot of the "largeness" might
be the same shared memory block.  So the postmaster process very often
looks like a good target to it, even though killing the postmaster will
in fact free a negligible amount of memory.

            regards, tom lane