Обсуждение: Segfaults with 8.1.3 on amd64

Поиск
Список
Период
Сортировка

Segfaults with 8.1.3 on amd64

От
Gavin Hamill
Дата:
Hi, our 8.1.3 system on quad Xeon has been happily chugging away for
weeks with no stability problems until yesterday:

/var/log/syslog:May  4 11:57:17 cayenne kernel: postmaster[19291]:
segfault at 0000000000000000 rip 00002aaaab5e8c00 rsp 00007fffffffd418
error 4
/var/log/syslog.0:May  3 09:39:06 cayenne kernel: postmaster[32698]:
segfault at 0000000000000000 rip 00002aaaab5e8c00 rsp 00007fffffffd418
error 4
/var/log/syslog.0:May  3 11:02:00 cayenne kernel: postmaster[12427]:
segfault at 0000000000000000 rip 00002aaaab5e8c00 rsp 00007fffffffd418
error 4

I don't know what the rip + rsp values represent, but is it interesting
that they are identical in all three cases?

Not a single OS change has occurred on the machine - in fact the only
thing happening other than pg itself is me tail'ing the logs..

I'm using Debian sarge with the 8.1.3 debs from backports.org which I
trust; I doubt running postmaster under gdb will be workable due to the
performance penalty.

The pg logs don't show much of interest:

2006-05-04 11:57:17 BST LOG:  server process (PID 19291) was terminated
by signal 11
2006-05-04 11:57:17 BST LOG:  terminating any other active server processes
2006-05-04 11:57:17 BST WARNING:  terminating connection because of
crash of another server process
2006-05-04 11:57:17 BST DETAIL:  The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2006-05-04 11:57:17 BST HINT:  In a moment you should be able to
reconnect to the database and repeat your command.

[loads of these]

2006-05-04 11:57:18 BST FATAL:  the database system is in recovery mode
2006-05-04 11:57:18 BST FATAL:  the database system is in recovery mode
2006-05-04 11:57:18 BST FATAL:  the database system is in recovery mode
2006-05-04 11:57:18 BST FATAL:  the database system is in recovery mode
2006-05-04 11:57:18 BST LOG:  all server processes terminated;
reinitializing
2006-05-04 11:57:18 BST FATAL:  the database system is starting up
2006-05-04 11:57:18 BST FATAL:  the database system is starting up
2006-05-04 11:57:18 BST FATAL:  the database system is starting up
2006-05-04 11:57:18 BST FATAL:  the database system is starting up
2006-05-04 11:57:18 BST FATAL:  the database system is starting up
2006-05-04 11:57:18 BST LOG:  database system was interrupted at
2006-05-04 11:56:17 BST
2006-05-04 11:57:18 BST LOG:  checkpoint record is at 68/A9D2F2E8
2006-05-04 11:57:18 BST LOG:  redo record is at 68/A9D17DD0; undo record
is at 0/0; shutdown FALSE
2006-05-04 11:57:18 BST LOG:  next transaction ID: 728532363; next OID:
183302937
2006-05-04 11:57:18 BST LOG:  next MultiXactId: 46957; next
MultiXactOffset: 98539
2006-05-04 11:57:18 BST LOG:  database system was not properly shut
down; automatic recovery in progress
2006-05-04 11:57:18 BST LOG:  redo starts at 68/A9D17DD0
2006-05-04 11:57:18 BST FATAL:  the database system is starting up

[ loads of these]

2006-05-04 11:57:19 BST LOG:  record with zero length at 68/ABAF4F48
2006-05-04 11:57:19 BST LOG:  redo done at 68/ABAF4F18
2006-05-04 11:57:19 BST LOG:  could not truncate directory
"pg_multixact/members": apparent wraparound
2006-05-04 11:57:19 BST LOG:  database system is ready
2006-05-04 11:57:19 BST LOG:  transaction ID wrap limit is 1362094701,
limited by database "postgres"

Encouragingly, pg_config shows that --enable_debug was passed as a
./configure argument:

CONFIGURE = '--build=x86_64-linux' '--prefix=/usr'
'--includedir=/usr/include' '--mandir=/usr/share/man'
'--infodir=/usr/share/info' '--sysconfdir=/etc' '--localstatedir=/var'
'--libexecdir=/usr/lib/postgresql-8.1' '--srcdir=.'
'--disable-maintainer-mode' '--mandir=/usr/share/postgresql/8.1/man'
'--with-docdir=/usr/share/doc/postgresql-doc-8.1'
'--datadir=/usr/share/postgresql/8.1'
'--bindir=/usr/lib/postgresql/8.1/bin'
'--includedir=/usr/include/postgresql/' '--enable-nls'
'--enable-integer-datetimes' '--enable-debug' '--disable-rpath'
'--with-tcl' '--with-perl' '--with-python' '--with-pam' '--with-krb5'
'--with-openssl' '--with-gnu-ld' '--with-tclconfig=/usr/lib/tcl8.4'
'--with-tkconfig=/usr/lib/tk8.4' '--with-includes=/usr/include/tcl8.4'
'--with-pgport=5432' '--enable-thread-safety' 'CC=cc' 'CFLAGS=-g -Wall
-O2 -Wl,--as-needed' 'build_alias=x86_64-linux'
CC = cc
CPPFLAGS = -D_GNU_SOURCE -I/usr/include/tcl8.4
CFLAGS = -g -Wall -O2 -Wl,--as-needed -Wall -Wmissing-prototypes
-Wpointer-arith -Winline -Wendif-labels -fno-strict-aliasing -g
CFLAGS_SL = -fpic
LDFLAGS =
LDFLAGS_SL =
LIBS = -lpgport -lpam -lssl -lcrypto -lkrb5 -lz -lreadline -lcrypt
-lresolv -lnsl -ldl -lm
VERSION = PostgreSQL 8.1.3

How can I enable coredumps or something similarly useful for debugging
purposes?

Cheers,
Gavin.


Re: Segfaults with 8.1.3 on amd64

От
Martijn van Oosterhout
Дата:
On Thu, May 04, 2006 at 12:22:01PM +0100, Gavin Hamill wrote:
> Hi, our 8.1.3 system on quad Xeon has been happily chugging away for
> weeks with no stability problems until yesterday:
>
> /var/log/syslog:May  4 11:57:17 cayenne kernel: postmaster[19291]:
> segfault at 0000000000000000 rip 00002aaaab5e8c00 rsp 00007fffffffd418
> error 4

<snip>

> I don't know what the rip + rsp values represent, but is it interesting
> that they are identical in all three cases?

At a guess rip = return instruction pointer, rsp = return stack point.
The fact that they're all the same seems to rule out hardware.

> I'm using Debian sarge with the 8.1.3 debs from backports.org which I
> trust; I doubt running postmaster under gdb will be workable due to the
> performance penalty.

I didn't think attaching gds had much effect on performance, but you
may be right.

<snip other usual output on server crash>

> How can I enable coredumps or something similarly useful for debugging
> purposes?

Before starting the server, run "ulimit -S -c unlimited"

If done properly it should enable core dumps for the backend.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

Вложения

Re: Segfaults with 8.1.3 on amd64

От
"Guy Rouillier"
Дата:
Martijn van Oosterhout wrote:
> On Thu, May 04, 2006 at 12:22:01PM +0100, Gavin Hamill wrote:
>> Hi, our 8.1.3 system on quad Xeon has been happily chugging away for
>> weeks with no stability problems until yesterday:
>>
>> /var/log/syslog:May  4 11:57:17 cayenne kernel: postmaster[19291]:
>> segfault at 0000000000000000 rip 00002aaaab5e8c00 rsp
>> 00007fffffffd418 error 4
>
> <snip>
>
>> I don't know what the rip + rsp values represent, but is it
>> interesting that they are identical in all three cases?
>
> At a guess rip = return instruction pointer, rsp = return stack
> point. The fact that they're all the same seems to rule out hardware.

The R* registers in AMD64 are just the 64-bit extensions to the standard
registers.  They couldn't use EIP because that was taken in the
expansion from 16-bit to 32-bit.  So RIP is simply the 64-bit
instruction pointer, RSP the 64-bit stack pointer.

--
Guy Rouillier



Re: Segfaults with 8.1.3 on amd64

От
Gavin Hamill
Дата:
Martijn van Oosterhout wrote:

>On Thu, May 04, 2006 at 12:22:01PM +0100, Gavin Hamill wrote:
>
>
>
>At a guess rip = return instruction pointer, rsp = return stack point.
>The fact that they're all the same seems to rule out hardware.
>
>
>
That's good to hear (in one way... :)

>fore starting the server, run "ulimit -S -c unlimited"
>
>If done properly it should enable core dumps for the backend.
>
>Have a nice day,
>
>
Great stuff - it's crashed again and dropped 6MB of core which points
the finger squarely at Slony - I'll ask on the relevant list :)

Core was generated by `postgres: sharp laterooms 194.24.250.135(54478)
UPDATE                        '.
Program terminated with signal 11, Segmentation fault.
Reading symbols from /lib/libpam.so.0...(no debugging symbols found)...done.
....
Reading symbols from /usr/lib/postgresql/8.1/lib/slony1_funcs.so...done.
Loaded symbols for /usr/lib/postgresql/8.1/lib/slony1_funcs.so
Reading symbols from /usr/lib/postgresql/8.1/lib/xxid.so...done.
Loaded symbols for /usr/lib/postgresql/8.1/lib/xxid.so
#0  0x00002aaaab5e8c00 in strlen () from /lib/libc.so.6
(gdb) bt
#0  0x00002aaaab5e8c00 in strlen () from /lib/libc.so.6
#1  0x00002aaaca65b062 in slon_quote_literal (str=0x0) at
slony1_funcs.c:1044
#2  0x00002aaaca65c348 in _Slony_I_logTrigger (fcinfo=0x8f5ec5) at
slony1_funcs.c:783
#3  0x00000000005ca9f9 in fmgr_internal_function ()
#4  0x00000000004ce6a4 in FreeTriggerDesc ()
#5  0x00000000004cf42e in ExecARUpdateTriggers ()
#6  0x00000000004cf873 in ExecARUpdateTriggers ()
#7  0x00000000004cfb10 in AfterTriggerEndQuery ()
#8  0x000000000055ef05 in FreeQueryDesc ()
#9  0x000000000055fecf in PortalRun ()
#10 0x000000000055f78f in PortalRun ()
#11 0x000000000055b721 in pg_plan_queries ()
#12 0x000000000055e14c in PostgresMain ()
#13 0x0000000000539cc1 in ClosePostmasterPorts ()
#14 0x0000000000539797 in ClosePostmasterPorts ()
#15 0x0000000000537d3d in PostmasterMain ()
#16 0x000000000053704e in PostmasterMain ()
#17 0x00000000004fdb58 in main ()

Cheers,
Gavin,