Обсуждение: Vacuum analyze problem
Hi, My problem is when running the vacuum with analyze an error occurs but it runs ok without the analyse. From psql gsmain_test=# vacuum; VACUUM gsmain_test=# vacuum analyze; pqReadData() -- backend closed the channel unexpectedly. This probably means the backend terminated abnormally before or while processing the request. The connection to the server was lost. Attempting reset: Failed. !# From the shell bash-2.04$ vacuumdb -d gsmain_test VACUUM bash-2.04$ vacuumdb -d gsmain_test -z pqReadData() -- backend closed the channel unexpectedly. This probably means the backend terminated abnormally before or while processing the request. connection to server was lost vacuumdb: vacuum failed bash-2.04$ Linux Box Postgresql 7.0.3-2 installed from rpms Redhat 7.0 Any help is greatly appreciated Regards, John
John Hatfield <jhatfield@g-s.com.au> writes: > My problem is when running the vacuum with analyze an error occurs but > it runs ok without the analyse. Try "vacuum verbose analyze" so you can see which table it's failing on (or, just look in the postmaster log to see which table is mentioned last). There's probably a core file left from the crashed backend; can you get a stack backtrace from it with gdb? regards, tom lane
The last bit of vacuum verbose analyse NOTICE: --Relation pg_ipl-- NOTICE: Pages 0: Changed 0, reaped 0, Empty 0, New 0; Tup 0: Vac 0, Keep/VTL 0/ 0, Crash 0, UnUsed 0, MinLen 0, MaxLen 0; Re-using: Free/Avail. Space 0/0; EndEm pty/Avail. Pages 0/0. CPU 0.00s/0.00u sec. NOTICE: --Relation pg_inheritproc-- NOTICE: Pages 0: Changed 0, reaped 0, Empty 0, New 0; Tup 0: Vac 0, Keep/VTL 0/ 0, Crash 0, UnUsed 0, MinLen 0, MaxLen 0; Re-using: Free/Avail. Space 0/0; EndEm pty/Avail. Pages 0/0. CPU 0.00s/0.00u sec. NOTICE: --Relation pg_rewrite-- pqReadData() -- backend closed the channel unexpectedly. This probably means the backend terminated abnormally before or while processing the request. The connection to the server was lost. Attempting reset: Failed. !# Yes there is a core file in the dir with the database files bash-2.04$ ls -l core -rw------- 1 postgres postgres 2600960 Mar 2 10:02 core bash-2.04$ date Fri Mar 2 10:02:32 EST 2001 bash-2.04$ gdb core GNU gdb 5.0 Copyright 2000 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-redhat-linux"..."/var/lib/pgsql/data/base/gsmai n_test/core": not in executable format: File format not recognized (gdb) where No stack. (gdb) gdb does not seem to like the core file format though Regards, John -----Original Message----- From: Tom Lane Sent: Friday, 2 March 2001 11:50 AM To: jhatfield@g-s.com.au Cc: 'PostgreSQL Admin' Subject: Re: [ADMIN] Vacuum analyze problem John Hatfield <jhatfield@g-s.com.au> writes: > My problem is when running the vacuum with analyze an error occurs but > it runs ok without the analyse. Try "vacuum verbose analyze" so you can see which table it's failing on (or, just look in the postmaster log to see which table is mentioned last). There's probably a core file left from the crashed backend; can you get a stack backtrace from it with gdb? regards, tom lane
John Hatfield <jhatfield@g-s.com.au> writes: > The last bit of vacuum verbose analyse > NOTICE: --Relation pg_rewrite-- > pqReadData() -- backend closed the channel unexpectedly. OK, so pg_rewrite seems to be broken. Not good... > This GDB was configured as "i386-redhat-linux"..."/var/lib/pgsql/data/base/gsmai > n_test/core": not in executable format: File format not recognized You need to do "gdb /path/to/postgres/executable core". Or try it like this instead: * fire up psql in one window * determine PID of backend connected to psql * attach to live backend process with gdb: gdb /path/to/postgres/executable attach PID cont * issue vacuum command to psql gdb should catch the crash and then you can issue "bt". regards, tom lane
Gdb output bash-2.04$ gdb /usr/bin/postgres core GNU gdb 5.0 Copyright 2000 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-redhat-linux"... (no debugging symbols found)... warning: core file may not match specified executable file. Core was generated by `/usr/bin/postgres localhos'. Program terminated with signal 11, Segmentation fault. Reading symbols from /lib/libcrypt.so.1...done. Loaded symbols for /lib/libcrypt.so.1 Reading symbols from /lib/libnsl.so.1...done. Loaded symbols for /lib/libnsl.so.1 Reading symbols from /lib/libdl.so.2...done. Loaded symbols for /lib/libdl.so.2 Reading symbols from /lib/libm.so.6...done. Loaded symbols for /lib/libm.so.6 Reading symbols from /lib/libutil.so.1...done. Loaded symbols for /lib/libutil.so.1 Reading symbols from /usr/lib/libreadline.so.4.1...done. Loaded symbols for /usr/lib/libreadline.so.4.1 Reading symbols from /lib/libtermcap.so.2...done. Loaded symbols for /lib/libtermcap.so.2 Reading symbols from /usr/lib/libncurses.so.5...done. Loaded symbols for /usr/lib/libncurses.so.5 Reading symbols from /lib/libc.so.6...done. Loaded symbols for /lib/libc.so.6 Reading symbols from /lib/ld-linux.so.2...done. Loaded symbols for /lib/ld-linux.so.2 Reading symbols from /lib/libnss_files.so.2...done. Loaded symbols for /lib/libnss_files.so.2 #0 strcoll () at strcoll.c:248 248 strcoll.c: No such file or directory. (gdb) where #0 strcoll () at strcoll.c:248 #1 0x8115b1f in lztext_cmp () #2 0x8115b72 in lztext_eq () #3 0x8098809 in vacuum () #4 0x8096120 in vacuum () #5 0x8095705 in vacuum () #6 0x8094e9d in vacuum () #7 0x8094e21 in vacuum () #8 0x81000b3 in ProcessUtility () #9 0x80fd42c in pg_exec_query_dest () #10 0x80fd378 in pg_plan_query () #11 0x80fe465 in PostgresMain () #12 0x80e5b7b in PostmasterMain () #13 0x80e571c in PostmasterMain () #14 0x80e4889 in PostmasterMain () #15 0x80e420c in PostmasterMain () #16 0x80b571d in main () #17 0x40112b65 in __libc_start_main (main=0x80b5670 <main>, argc=5, ubp_av=0xbffffd34, init=0x80648b8 <_init>, fini=0x814632c <_fini>, rtld_fini=0x4000df24 <_dl_fini>, stack_end=0xbffffd2c) at ../sysdeps/generic/libc-start.c:111 Regards John -----Original Message----- From: Tom Lane Sent: Friday, 2 March 2001 12:34 PM To: jhatfield@g-s.com.au Cc: 'PostgreSQL Admin' Subject: Re: [ADMIN] Vacuum analyze problem John Hatfield <jhatfield@g-s.com.au> writes: > The last bit of vacuum verbose analyse > NOTICE: --Relation pg_rewrite-- > pqReadData() -- backend closed the channel unexpectedly. OK, so pg_rewrite seems to be broken. Not good... > This GDB was configured as "i386-redhat-linux"..."/var/lib/pgsql/data/base/gsmai > n_test/core": not in executable format: File format not recognized You need to do "gdb /path/to/postgres/executable core". Or try it like this instead: * fire up psql in one window * determine PID of backend connected to psql * attach to live backend process with gdb: gdb /path/to/postgres/executable attach PID cont * issue vacuum command to psql gdb should catch the crash and then you can issue "bt". regards, tom lane
John Hatfield <jhatfield@g-s.com.au> writes: > (gdb) where > #0 strcoll () at strcoll.c:248 > #1 0x8115b1f in lztext_cmp () > #2 0x8115b72 in lztext_eq () > #3 0x8098809 in vacuum () Hm. I suspect this is the same problem that a couple of other people reported recently: crashes inside strcoll(), even though the strings being passed to it are perfectly OK. [ digs in archives... ] Dave Cramer reported a similar crash on 2001-01-24, and Jukka Honkela reported one on 2001-01-01. Dave was also using 7.0.3, and Jukka current-as-of-then development sources. Oh, this is interesting: all three of you are running RedHat 7.0! I had originally assumed that Postgres was somehow clobbering strcoll's internal locale information structures, but given that the problem is only being reported on RH 7.0, a bug in strcoll itself is starting to look like a plausible idea too. Would one or another of you crank up ye olde debugger and try to figure this out? If it is Postgres' fault, I want to fix it ... but it's hard to do much when I can't reproduce the problem ... regards, tom lane