Обсуждение: Backend exited with signal 14?

Поиск
Список
Период
Сортировка

Backend exited with signal 14?

От
Jeff Boes
Дата:
We had a production server crash last night. The postmaster log is given
below:

2003-05-22 03:16:22 [26031]  DEBUG:  server process (pid 16661) was
terminated by signal 14
2003-05-22 03:16:22 [26031]  DEBUG:  terminating any other active
server processes
2003-05-22 03:16:22 [1472]   NOTICE:  Message from PostgreSQL backend:
        The Postmaster has informed me that some other backend
        died abnormally and possibly corrupted shared memory.
        I have rolled back the current transaction and am
        going to terminate your database system connection and exit.
        Please reconnect to the database system and repeat your query.

... followed, of course, by a whole bunch of other backends exiting.

Signal 14 is SIGALRM. What would cause a PG backend to terminate with
this condition?  As near as we can determine, pid 16661 was associated
with an Apache process on a different box.

Particulars are:

Red Hat Linux release 7.1 (Seawolf)
Kernel 2.4.18-17.7.xbigmem on a 2-processor i686
(4 GB memory)
psql (PostgreSQL) 7.2.4

shared_buffers = 131072
max_fsm_relations = 200
max_fsm_pages = 350000
wal_buffers = 32
sort_mem = 65536
vacuum_mem = 65536
wal_files = 2

--
Jeff Boes                                      vox 269.226.9550 ext 24
Database Engineer                                     fax 269.349.9076
Nexcerpt, Inc.                                 http://www.nexcerpt.com
           ...Nexcerpt... Extend your Expertise

Re: Backend exited with signal 14?

От
Tom Lane
Дата:
Jeff Boes <jboes@nexcerpt.com> writes:
> We had a production server crash last night. The postmaster log is given
> below:
> 2003-05-22 03:16:22 [26031]  DEBUG:  server process (pid 16661) was
> terminated by signal 14

It's really hard to see how that could happen.  There is no point in the
lifespan of a backend where there is not a valid signal handler for
SIGALRM --- it inherits SIG_IGN from the postmaster, and changes that
to a normal handler routine before it ever enables the alarm anyway.

Color me baffled ... is it possible this is a kernel or glibc bug?

            regards, tom lane