Re: trouble restarting a server

Поиск
Список
Период
Сортировка
От Peter Koczan
Тема Re: trouble restarting a server
Дата
Msg-id 4544e0330705310955g3c95679vbbda98773a687d22@mail.gmail.com
обсуждение исходный текст
Ответ на Re: trouble restarting a server  ("Peter Koczan" <pjkoczan@gmail.com>)
Список pgsql-admin
It finally reoccurred. Here's what I got from attaching to those processes from gdb. I attached with the postmaster binary, let me know if I should use something else.

vero(su): ps axvw | grep notify
24556 ?        Ss     0:03      0  3265 41262 29672  0.7 postgres: jerel csdb chef(36275) notify interrupt
 2889 ?        Ss     0:04      0  3265 41270 29688  0.7 postgres: ela csdb newton(32777) notify interrupt
 2943 ?        Ss     0:04      0  3265 41270 29684  0.7 postgres: stefan csdb stupid(32788) notify interrupt
 5866 ?        Ss     0:04      0  3265 41270 29680  0.7 postgres: petska csdb brian(32786) notify interrupt
27850 ?        Ss     0:03      0  3265 41270 29768  0.7 postgres: dparter csdb yfandes(35456) notify interrupt
18582 ?        Ss     0:03      0  3265 41270 29732  0.7 postgres: timc csdb tornado(47047) notify interrupt
  449 ?        Ss     0:02      0  3265 41270 29764  0.7 postgres: archer csdb spoon(33141) notify interrupt
12731 pts/0    S+     0:00      0    71  3828  664  0.0 grep notify

vero(su): gdb /s/postgresql/bin/postmaster 24556
[-----  Begin lots of messages -----]
GNU gdb Red Hat Linux (6.3.0.0-1.132.EL4rh)
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386-redhat-linux-gnu"...Using host libthread_db library "/lib/tls/libthread_db.so.1".

Attaching to program: /afs/cs.wisc.edu/s/postgresql-8.2.4/@sys/bin/postmaster, process 5866
Reading symbols from /lib/libpam.so.0...done.
Loaded symbols for /lib/libpam.so.0
Reading symbols from /afs/cs.wisc.edu/s/openssl- 0.9.8d/@sys/lib/libssl.so.0.9.8d...done.
Loaded symbols for /s/openssl-0.9.8d/lib/libssl.so.0.9.8d
Reading symbols from /afs/cs.wisc.edu/s/openssl-0.9.8d/@sys/lib/libcrypto.so.0.9.8d...done.
Loaded symbols for /s/openssl- 0.9.8d/lib/libcrypto.so.0.9.8d
Reading symbols from /afs/cs.wisc.edu/s/krb5-1.5.1/@sys/lib/libkrb5.so.3...done.Loaded symbols for /s/krb5-1.5.1/lib/libkrb5.so.3
Reading symbols from /lib/libcrypt.so.1...done.
Loaded symbols for /lib/libcrypt.so.1
Reading symbols from /lib/libdl.so.2...done.
Loaded symbols for /lib/libdl.so.2
Reading symbols from /lib/tls/libm.so.6...done.
Loaded symbols for /lib/tls/libm.so.6
Reading symbols from /lib/tls/libc.so.6...done.
Loaded symbols for /lib/tls/libc.so.6
Reading symbols from /afs/cs.wisc.edu/s/krb5-1.5.1/@sys/lib/libcom_err.so.3...done.
Loaded symbols for /s/krb5-1.5.1/lib/libcom_err.so.3
Reading symbols from /lib/libaudit.so.0...done.
Loaded symbols for /lib/libaudit.so.0
Reading symbols from /afs/cs.wisc.edu/s/krb5-1.5.1/i386_cent40/lib/libk5crypto.so.3...done.
Loaded symbols for /s/krb5-1.5.1/i386_cent40/lib/libk5crypto.so.3
Reading symbols from /afs/cs.wisc.edu/s/krb5- 1.5.1/i386_cent40/lib/libkrb5support.so.0...done.
Loaded symbols for /s/krb5-1.5.1/i386_cent40/lib/libkrb5support.so.0
Reading symbols from /lib/libresolv.so.2...done.
Loaded symbols for /lib/libresolv.so.2
Reading symbols from /lib/ld- linux.so.2...done.
Loaded symbols for /lib/ld-linux.so.2
Reading symbols from /lib/libnss_files.so.2...done.
Loaded symbols for /lib/libnss_files.so.2
Reading symbols from /lib/libnss_dns.so.2...done.
Loaded symbols for /lib/libnss_dns.so.2
0x007ef7a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
[----- End lots of messages -----]

(gdb) bt
#0  0x007ef7a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#1  0x008c60f3 in __write_nocancel () from /lib/tls/libc.so.6
#2  0x0064d734 in sock_write () from /s/openssl-0.9.8d/lib/libcrypto.so.0.9.8d
#3  0x00000008 in ?? ()
#4  0x09ede1a2 in ?? ()
#5  0x00000038 in ?? ()
#6  0x09ee50c0 in ?? ()
#7  0x006e06c8 in ?? () from /s/openssl- 0.9.8d/lib/libcrypto.so.0.9.8d
#8  0x09ed4148 in ?? ()
#9  0x00000000 in ?? ()

All the other processes are the same except for addresses in #4, #6, and #8, but they're all within a few MB of each other (they're probably asynchronous interrupts).

Let me know if you need more info.

Peter

On 5/22/07, Peter Koczan <pjkoczan@gmail.com> wrote:
The release is 8.2.4. I haven't been able to reproduce the condition yet, but I will send along stack traces as soon as I can. I have this strange feeling that it's only going to happen when I find a reason to make a restart-worthy config change.

Peter


On 5/21/07, Tom Lane < tgl@sss.pgh.pa.us> wrote:
"Peter Koczan" <pjkoczan@gmail.com> writes:
> [ lots of processes stuck in "notify interrupt" code ]

That's weird.  If it's still in that state, or if you can reproduce it,
could you attach to a few of those processes with gdb and get stack
traces?

Looking at the async.c code, an obvious candidate is that that routine
tries to take ExclusiveLock on pg_listener --- so if something had
managed to exit without releasing a lock on that table, hangups could be
expected.  But if that were the case, you'd think the process status
lines would include "waiting".  My guess is they're blocked on something
lower-level than a table lock, but without a stack trace it's hard to
guess what.

Which PG release is this exactly?

                        regards, tom lane


В списке pgsql-admin по дате отправления:

Предыдущее
От: "Sipos Ferenc"
Дата:
Сообщение: postgresql recovery (PANIC: btree_split_redo: lost left sibling)
Следующее
От: Andrew Sullivan
Дата:
Сообщение: Re: Deletes hurt