Обсуждение: vacuum killed because of out of memory

Поиск
Список
Период
Сортировка

vacuum killed because of out of memory

От
Geoffrey
Дата:
I've recently reviewed the various recent threads on out of memory
problems.  We just had a similar issue last night.  We have 11
postmasters running on two machines in a cluster environment.  Five on
one, six on the other.  They've been running in this manner for a little
over a year now.

Configuration:
Quad dual-core Opertons
8 gig memory
Red Hat Advance Server 4

relevant postgresql.conf settings:

tcpip_socket = true
max_connections = 35
shared_buffers = 16000
checkpoint_segments = 10
log_min_error_statement = warning
log_connections = true
log_pid = true
log_timestamp = true

We run a 'vacuum full analyze' once a week (and I've seen a thread that
says this should not be necessary).

Just the same, last night, while running a nightly 'vacuum full' process
for our largest database (7.5G base), the vacuum process was killed by
the OS because of out of memory issues.

Aug 27 00:59:07 gan-lxc-01 kernel: Out of Memory: Killed process 26169
(postmaster).

The process 26169 does appear to correspond to the vacuum process and
not the database postmaster process.  The postmaster process did not
die.  We did see the following in the database log:

2007-08-27 00:59:07 [13586] LOG:  server process (PID 26169) was
terminated by signal 9
2007-08-27 00:59:07 [13586] LOG:  terminating any other active server
processes
2007-08-27 00:59:07 [7790] WARNING:  terminating connection because of
crash of another server process
DETAIL:  The postmaster has commanded this server process to roll back
the current transaction and exit, because another server process exited
abnormally and possibly corrupted shared memory.
HINT:  In a moment you should be able to reconnect to the database and
repeat your command.
2007-08-27 00:59:07 [13586] LOG:  all server processes terminated;
reinitializing
2007-08-27 00:59:07 [2031] LOG:  database system was interrupted at
2007-08-27 00:58:59 EDT
2007-08-27 00:59:07 [2031] LOG:  checkpoint record is at 18/B3DF3B94
2007-08-27 00:59:07 [2031] LOG:  redo record is at 18/B3DF3B94; undo
record is at 0/0; shutdown FALSE
2007-08-27 00:59:07 [2031] LOG:  next transaction ID: 63340557; next
OID: 6459085
2007-08-27 00:59:07 [2031] LOG:  database system was not properly shut
down; automatic recovery in progress
2007-08-27 00:59:07 [2031] LOG:  redo starts at 18/B3DF3BD4
2007-08-27 00:59:08 [2033] LOG:  connection received:
host=198.212.166.38 port=33787
2007-08-27 00:59:08 [2033] FATAL:  the database system is starting up
2007-08-27 00:59:11 [2035] LOG:  connection received:
host=XXX.XXX.XXX.XXX port=33788

So, my question is, based on the configuration of this box and the
configuration of postgresql, can anyone point to anything that might
cause this to happen?

--
Until later, Geoffrey

Those who would give up essential Liberty, to purchase a little
temporary Safety, deserve neither Liberty nor Safety.
  - Benjamin Franklin

Re: vacuum killed because of out of memory

От
Alvaro Herrera
Дата:
Geoffrey wrote:

> Aug 27 00:59:07 gan-lxc-01 kernel: Out of Memory: Killed process 26169
> (postmaster).

> So, my question is, based on the configuration of this box and the
> configuration of postgresql, can anyone point to anything that might cause
> this to happen?

An operating system configured to overcommit memory will do this.  The
usual suggestion is to configure it not to.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: vacuum killed because of out of memory

От
"Scott Marlowe"
Дата:
On 8/27/07, Geoffrey <lists@serioustechnology.com> wrote:
> I've recently reviewed the various recent threads on out of memory
> problems.  We just had a similar issue last night.  We have 11
> postmasters running on two machines in a cluster environment.  Five on
> one, six on the other.  They've been running in this manner for a little
> over a year now.

> So, my question is, based on the configuration of this box and the
> configuration of postgresql, can anyone point to anything that might
> cause this to happen?


Too high a setting for maint_work_mem