Обсуждение: Current CVS tip segfaulting
Hackers, In current (as of a couple hours ago) clean CVS tip sources, without any of my local changes, I'm getting a postmaster segfault when trying to connect to a non existant database. The generated core file does not seem to contain any useful information. The first time I saw this I managed to PANIC the system -- I can't seem to be able to reproduce that part. (Newly built on an empty vpath, so this should be a case of "make distcleaning" ...) Core was generated by `postgres: alvherre asd [local] startup '. Program terminated with signal 11, Segmentation fault. warning: current_sos: Can't read pathname for load map: Input/output error Reading symbols from /lib/libz.so.1...done. Loaded symbols for /lib/libz.so.1 Reading symbols from /lib/libreadline.so.4.3...done. Loaded symbols for /lib/libreadline.so.4.3 Reading symbols from /lib/libncurses.so.5...done. Loaded symbols for /lib/libncurses.so.5 Reading symbols from /lib/libcrypt.so.1...done. Loaded symbols for /lib/libcrypt.so.1 Reading symbols from /lib/libresolv.so.2...done. Loaded symbols for /lib/libresolv.so.2 Reading symbols from /lib/libnsl.so.1...done. Loaded symbols for /lib/libnsl.so.1 Reading symbols from /lib/libdl.so.2...done. Loaded symbols for /lib/libdl.so.2 Reading symbols from /lib/tls/libm.so.6...done. Loaded symbols for /lib/tls/libm.so.6 Reading symbols from /lib/tls/libc.so.6...done. Loaded symbols for /lib/tls/libc.so.6 Reading symbols from /lib/libgpm.so.1...done. Loaded symbols for /lib/libgpm.so.1 Reading symbols from /lib/ld-linux.so.2...done. Loaded symbols for /lib/ld-linux.so.2 Reading symbols from /lib/libnss_files.so.2...done. Loaded symbols for /lib/libnss_files.so.2 Reading symbols from /usr/lib/gconv/ISO8859-15.so...done. Loaded symbols for /usr/lib/gconv/ISO8859-15.so Reading symbols from /usr/lib/gconv/ISO8859-1.so...done. Loaded symbols for /usr/lib/gconv/ISO8859-1.so 0x00000000 in ?? () (gdb) bt #0 0x00000000 in ?? () -- Alvaro Herrera (<alvherre[a]dcc.uchile.cl>) "The only difference is that Saddam would kill you on private, where the Americans will kill you in public" (Mohammad Saleh, 39, a building contractor)
Please recompile with debug symbols and report back the stack trace. See the faq on running debug. --------------------------------------------------------------------------- Alvaro Herrera wrote: > Hackers, > > In current (as of a couple hours ago) clean CVS tip sources, without any > of my local changes, I'm getting a postmaster segfault when trying to > connect to a non existant database. The generated core file does not > seem to contain any useful information. The first time I saw this I > managed to PANIC the system -- I can't seem to be able to reproduce that > part. > > (Newly built on an empty vpath, so this should be a case of "make > distcleaning" ...) > > Core was generated by `postgres: alvherre asd [local] startup > '. > Program terminated with signal 11, Segmentation fault. > > warning: current_sos: Can't read pathname for load map: Input/output error > > Reading symbols from /lib/libz.so.1...done. > Loaded symbols for /lib/libz.so.1 > Reading symbols from /lib/libreadline.so.4.3...done. > Loaded symbols for /lib/libreadline.so.4.3 > Reading symbols from /lib/libncurses.so.5...done. > Loaded symbols for /lib/libncurses.so.5 > Reading symbols from /lib/libcrypt.so.1...done. > Loaded symbols for /lib/libcrypt.so.1 > Reading symbols from /lib/libresolv.so.2...done. > Loaded symbols for /lib/libresolv.so.2 > Reading symbols from /lib/libnsl.so.1...done. > Loaded symbols for /lib/libnsl.so.1 > Reading symbols from /lib/libdl.so.2...done. > Loaded symbols for /lib/libdl.so.2 > Reading symbols from /lib/tls/libm.so.6...done. > Loaded symbols for /lib/tls/libm.so.6 > Reading symbols from /lib/tls/libc.so.6...done. > Loaded symbols for /lib/tls/libc.so.6 > Reading symbols from /lib/libgpm.so.1...done. > Loaded symbols for /lib/libgpm.so.1 > Reading symbols from /lib/ld-linux.so.2...done. > Loaded symbols for /lib/ld-linux.so.2 > Reading symbols from /lib/libnss_files.so.2...done. > Loaded symbols for /lib/libnss_files.so.2 > Reading symbols from /usr/lib/gconv/ISO8859-15.so...done. > Loaded symbols for /usr/lib/gconv/ISO8859-15.so > Reading symbols from /usr/lib/gconv/ISO8859-1.so...done. > Loaded symbols for /usr/lib/gconv/ISO8859-1.so > 0x00000000 in ?? () > (gdb) bt > #0 0x00000000 in ?? () > > -- > Alvaro Herrera (<alvherre[a]dcc.uchile.cl>) > "The only difference is that Saddam would kill you on private, where the > Americans will kill you in public" (Mohammad Saleh, 39, a building contractor) > > ---------------------------(end of broadcast)--------------------------- > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
On Fri, Apr 23, 2004 at 07:00:05PM -0400, Bruce Momjian wrote: > > Please recompile with debug symbols and report back the stack trace. > See the faq on running debug. No, I already did that (all my builds are like that anyway and I read stack traces more frequently than I'd like). The "can't read pathname" message I don't understand, but I had never seen it. -- Alvaro Herrera (<alvherre[@]dcc.uchile.cl>) La web junta la gente porque no importa que clase de mutante sexual seas, tienes millones de posibles parejas. Pon "buscar gente que tengan sexo con ciervos incendi�nse", y el computador dir� "especifique el tipo de ciervo" (Jason Alexander)
On Fri, Apr 23, 2004 at 08:38:29PM -0400, Alvaro Herrera Munoz wrote:
> On Fri, Apr 23, 2004 at 07:00:05PM -0400, Bruce Momjian wrote:
> > 
> > Please recompile with debug symbols and report back the stack trace. 
> > See the faq on running debug.
> 
> No, I already did that (all my builds are like that anyway and I read
> stack traces more frequently than I'd like).  The "can't read pathname"
> message I don't understand, but I had never seen it.
strace'ing the postmaster suggested me that the dbname string in
utils/init/postinit.c, the InitPostgres function, is the culprit.
In fact, if I apply the following patch to tcop/postgres.c the
whole thing stops happening.  I don't know if this is the correct
fix, but it may suggest something.  Maybe it's a problem with my
platform's argv handling (Mandrakelinux 10, kernel 2.6.3, glibc 2.3.3).
Index: postgres.c
===================================================================
RCS file: /home/alvherre/cvs/pgsql-server/src/backend/tcop/postgres.c,v
retrieving revision 1.400
diff -c -r1.400 postgres.c
*** postgres.c  19 Apr 2004 17:42:58 -0000  1.400
--- postgres.c  24 Apr 2004 02:20:47 -0000
***************
*** 2686,2692 ****                    errhint("Try \"%s --help\" for more information.", argv[0])));       }       else
if(argc - optind == 1)
 
!           dbname = argv[optind];       else if ((dbname = username) == NULL)       {           ereport(FATAL,
--- 2648,2654 ----                    errhint("Try \"%s --help\" for more information.", argv[0])));       }       else
if(argc - optind == 1)
 
!           dbname = pstrdup(argv[optind]);       else if ((dbname = username) == NULL)       {
ereport(FATAL,
-- 
Alvaro Herrera (<alvherre[@]dcc.uchile.cl>)
"Et put se mouve" (Galileo Galilei)
			
		Alvaro Herrera <alvherre@dcc.uchile.cl> writes:
> In current (as of a couple hours ago) clean CVS tip sources, without any
> of my local changes, I'm getting a postmaster segfault when trying to
> connect to a non existant database.
Hmm, works for me with this morning's sources.  Bruce created a bug of
that ilk a few days ago but fixed it shortly thereafter.  Is it possible
the anon-CVS server is out of date?
        regards, tom lane
			
		Alvaro Herrera Munoz wrote: > On Fri, Apr 23, 2004 at 07:00:05PM -0400, Bruce Momjian wrote: > > > > Please recompile with debug symbols and report back the stack trace. > > See the faq on running debug. > > No, I already did that (all my builds are like that anyway and I read > stack traces more frequently than I'd like). The "can't read pathname" > message I don't understand, but I had never seen it. Oh, you mean the line: > warning: current_sos: Can't read pathname for load map: Input/output error That is strange. Does it happen if you call abort() from the C code? That should dump a core on its own. The question is whether things are getting corrupted because of the way it crashed or some other configure problem. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
Tom Lane wrote: > Alvaro Herrera <alvherre@dcc.uchile.cl> writes: > > In current (as of a couple hours ago) clean CVS tip sources, without any > > of my local changes, I'm getting a postmaster segfault when trying to > > connect to a non existant database. > > Hmm, works for me with this morning's sources. Bruce created a bug of > that ilk a few days ago but fixed it shortly thereafter. Is it possible > the anon-CVS server is out of date? The bug I fixed was related to a postmaster restart when connecting to a non-existant database, and the fix was to prevent the longjump for elog(FATAL) if the code hadn't reached the longjump location yet. It could be a bug, but if it is, it is a different fix than the one I did, I think. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
FYI, I just tried:
$ psql lkjasdfpsql: FATAL:  database "lkjasdf" does not exist(2) cat /u/pg/server.logLOG:  database system was shut
downat 2004-04-23 15:23:20 EDTLOG:  checkpoint record is at 0/9DCCCCLOG:  redo record is at 0/9DCCCC; undo record is at
0/0;shutdown TRUELOG:  next transaction ID: 457; next OID: 17208LOG:  database system is readyFATAL:  database
"lkjasdf"does not exist
 
That looks OK to me on BSD/OS.
I can put a copy of CVS head on my ftp site for testing if you wish.
---------------------------------------------------------------------------
Alvaro Herrera Munoz wrote:
> On Fri, Apr 23, 2004 at 08:38:29PM -0400, Alvaro Herrera Munoz wrote:
> > On Fri, Apr 23, 2004 at 07:00:05PM -0400, Bruce Momjian wrote:
> > > 
> > > Please recompile with debug symbols and report back the stack trace. 
> > > See the faq on running debug.
> > 
> > No, I already did that (all my builds are like that anyway and I read
> > stack traces more frequently than I'd like).  The "can't read pathname"
> > message I don't understand, but I had never seen it.
> 
> strace'ing the postmaster suggested me that the dbname string in
> utils/init/postinit.c, the InitPostgres function, is the culprit.
> In fact, if I apply the following patch to tcop/postgres.c the
> whole thing stops happening.  I don't know if this is the correct
> fix, but it may suggest something.  Maybe it's a problem with my
> platform's argv handling (Mandrakelinux 10, kernel 2.6.3, glibc 2.3.3).
> 
> Index: postgres.c
> ===================================================================
> RCS file: /home/alvherre/cvs/pgsql-server/src/backend/tcop/postgres.c,v
> retrieving revision 1.400
> diff -c -r1.400 postgres.c
> *** postgres.c  19 Apr 2004 17:42:58 -0000  1.400
> --- postgres.c  24 Apr 2004 02:20:47 -0000
> ***************
> *** 2686,2692 ****
>                      errhint("Try \"%s --help\" for more information.", argv[0])));
>         }
>         else if (argc - optind == 1)
> !           dbname = argv[optind];
>         else if ((dbname = username) == NULL)
>         {
>             ereport(FATAL,
> --- 2648,2654 ----
>                      errhint("Try \"%s --help\" for more information.", argv[0])));
>         }
>         else if (argc - optind == 1)
> !           dbname = pstrdup(argv[optind]);
>         else if ((dbname = username) == NULL)
>         {
>             ereport(FATAL,
> 
> -- 
> Alvaro Herrera (<alvherre[@]dcc.uchile.cl>)
> "Et put se mouve" (Galileo Galilei)
> 
--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 
			
		Bruce Momjian <pgman@candle.pha.pa.us> writes:
> It could be a bug, but if it is, it is a different fix than the one I
> did, I think.
Re-reading Alvaro's message, I wondered if cranking logging up to a
higher-than-default setting was needed to reproduce the bug.  A quick
experiment in that line didn't show a problem, but maybe I missed the
critical setting.  Alvaro, what postgresql.conf settings are you using?
        regards, tom lane
			
		Alvaro Herrera Munoz <alvherre@dcc.uchile.cl> writes:
> [ bug goes away if ]
> !           dbname = argv[optind];
> [becomes]
> !           dbname = pstrdup(argv[optind]);
Hm, that's interesting.  I could believe this would have something to do
with overwriting the argv area, but we have not touched any of that code
recently; so why would it break for you just now?
Which PS_USE_FOO option does your platform use?  (See
src/backend/utils/misc/ps_status.c)
        regards, tom lane
			
		On Sat, Apr 24, 2004 at 12:27:14AM -0400, Tom Lane wrote: > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > It could be a bug, but if it is, it is a different fix than the one I > > did, I think. > > Re-reading Alvaro's message, I wondered if cranking logging up to a > higher-than-default setting was needed to reproduce the bug. A quick > experiment in that line didn't show a problem, but maybe I missed the > critical setting. Alvaro, what postgresql.conf settings are you using? I don't touch the standard settings ... log values are from the default installation. In another mail you asked: > Which PS_USE_FOO option does your platform use? (See > src/backend/utils/misc/ps_status.c) PS_USE_CLOBBER_ARGV AFAICS (ugh, sure uppercase is ugly) ;-) The relevant strace extract is this (3448 is the backend, 3443 is postmaster): 3448 write(2, "FATAL: database \"asd\" does not exist\n", 38) = 38 3448 send(10, "R\0\0\0\10\0\0\0\0E\0\0\0\217SFATAL\0C3D000\0Mdatabase \"asd\" does not exist\0F/home/alvherre/CVS/pgsql/source/00orig/src/backend/utils/init/postinit.c\0L264\0RInitPostgres\0\0", 153, 0) = 153 3448 --- SIGSEGV (Segmentation fault) @ 0 (0) --- 3443 <... select resumed> ) = ? ERESTARTNOHAND (To be restarted) 3443 --- SIGCHLD (Child exited) @ 0 (0) --- Note that the ereport() did get the line number, file and function name, the correct database name, etc. I don't know if the code is changing the ps status after that; it's difficult to attach a debugger to this ... huh wait, I'll try the backend's developer switches. ... plays for a while ... Heh, the -s switch to postmaster seems to behave funny. The bgwriter process appears in T status in ps (stopped), but not the postmaster; if I then send SIGCONT to the bgwriter it seems to continue, it returns to S status but then postmaster doesn't respond correctly to signals (INT or TERM don't shut it down). Has it been always like this? I haven't used this switch before. Anyway, this doesn't allow me to examine the dead backend. Trying postmaster -o "-W 60" allows me to attach gdb to the backend before it dies: (gdb) bt #0 0xffffe410 in ?? () #1 0xbfffeda8 in ?? () #2 0x4025f800 in ?? () from /lib/tls/libc.so.6 #3 0xbfffec04 in ?? () #4 0x401cb460 in nanosleep () from /lib/tls/libc.so.6 #5 0x401cb263 in sleep () from /lib/tls/libc.so.6 #6 0x0818791e in PostgresMain (argc=6, argv=0x82dff18, username=0x82dfee0 "alvherre") at stdlib.h:382 #7 0x0815fab0 in BackendRun (port=0x82ed050) at /home/alvherre/CVS/pgsql/source/00orig/src/backend/postmaster/postmaster.c:2664 #8 0x0815f371 in BackendStartup (port=0x82ed050) at /home/alvherre/CVS/pgsql/source/00orig/src/backend/postmaster/postmaster.c:2297 #9 0x0815db6e in ServerLoop () at /home/alvherre/CVS/pgsql/source/00orig/src/backend/postmaster/postmaster.c:1167 #10 0x0815d157 in PostmasterMain (argc=3, argv=0x82deb80) at /home/alvherre/CVS/pgsql/source/00orig/src/backend/postmaster/postmaster.c:928 #11 0x0812f030 in main (argc=3, argv=0x82deb80) at /home/alvherre/CVS/pgsql/source/00orig/src/backend/main/main.c:257 (gdb) cont Continuing. Program received signal SIGSEGV, Segmentation fault. 0x00000000 in ?? () (gdb) bt #0 0x00000000 in ?? () Whoa! New backend, new gdb, try again: (gdb) break InitPostgres Breakpoint 1 at 0x81f3c3c: file /home/alvherre/CVS/pgsql/source/00orig/src/backend/utils/init/postinit.c, line 230. (gdb) cont Continuing. Breakpoint 1, InitPostgres (dbname=0xc <Address 0xc out of bounds>, username=0x80e2540 "U\211åSPè\222Îøÿ\200= ±*\b") at /home/alvherre/CVS/pgsql/source/00orig/src/backend/utils/init/postinit.c:230 230 bool bootstrap = IsBootstrapProcessingMode(); (gdb) This surely looks suspicious ... (gdb) p dbname $2 = 0xc <Address 0xc out of bounds> (gdb) frame 1 #1 0x08187581 in PostgresMain (argc=6, argv=0x82dff18, username=0x82dfee0 "alvherre") at /home/alvherre/CVS/pgsql/source/00orig/src/backend/tcop/postgres.c:2745 2745 InitPostgres(dbname, username); (gdb) p argv $3 = (char **) 0x82dff18 (gdb) p argv[0] $5 = 0x8265402 "postgres" (gdb) p argv[1] $6 = 0x82aa301 "-W" (gdb) p argv[2] $7 = 0x82aa304 "60" (gdb) p argv[3] $8 = 0xbfffee60 "-v196608" (gdb) p argv[4] $9 = 0x826d97a "-p" (gdb) p argv[5] $10 = 0x82dfefc "asd" (gdb) p argv[6] $11 = 0x0 (gdb) p dbname $12 = 0x82ea848 "asd" -- Note that this is not the same as argv[5], it's a copy, and as far as I can see, it's set by the -p option in the switch/case, in tcop/postgres.c line 2391, using strdup. What else? -- Alvaro Herrera (<alvherre[a]dcc.uchile.cl>) Syntax error: function hell() needs an argument. Please choose what hell you want to involve.
On Fri, Apr 23, 2004 at 10:31:46PM -0400, Tom Lane wrote: > Alvaro Herrera <alvherre@dcc.uchile.cl> writes: > > In current (as of a couple hours ago) clean CVS tip sources, without any > > of my local changes, I'm getting a postmaster segfault when trying to > > connect to a non existant database. > > Hmm, works for me with this morning's sources. Bruce created a bug of > that ilk a few days ago but fixed it shortly thereafter. Is it possible > the anon-CVS server is out of date? Did I already say that I use CVSup? It seems to be up to date with the latest commits, so I don't think this is it. I'm starting to think that this could be a problem with my glibc/kernel combination ... This is linux-2.6.3-7mdk with glibc 2.3.3-10mdk. Is anyone else using Mandrakelinux 10 official? -- Alvaro Herrera (<alvherre[a]dcc.uchile.cl>) "Nadie esta tan esclavizado como el que se cree libre no siendolo" (Goethe)
Alvaro Herrera <alvherre@dcc.uchile.cl> writes:
>>> In current (as of a couple hours ago) clean CVS tip sources, without any
>>> of my local changes, I'm getting a postmaster segfault when trying to
>>> connect to a non existant database.
Alvaro, did you figure this out?  I've been mostly distracted for the
past week ...
        regards, tom lane
			
		On Fri, Apr 30, 2004 at 12:52:10AM -0400, Tom Lane wrote: > Alvaro Herrera <alvherre@dcc.uchile.cl> writes: > >>> In current (as of a couple hours ago) clean CVS tip sources, without any > >>> of my local changes, I'm getting a postmaster segfault when trying to > >>> connect to a non existant database. > > Alvaro, did you figure this out? I've been mostly distracted for the > past week ... No. I still see the failure on my platform but I don't know what to attribute it to. -- Alvaro Herrera (<alvherre[a]dcc.uchile.cl>) "Hay quien adquiere la mala costumbre de ser infeliz" (M. A. Evans)
> > Alvaro Herrera <alvherre@dcc.uchile.cl> writes: > > >>> In current (as of a couple hours ago) clean CVS tip sources, without any > > >>> of my local changes, I'm getting a postmaster segfault when trying to > > >>> connect to a non existant database. > > > > Alvaro, did you figure this out? I've been mostly distracted for the > > past week ... > > No. I still see the failure on my platform but I don't know what to > attribute it to. I also have that for a database installation from CVS on April 17. It also leaves the server in some incoherent state: Apr 30 17:58:22 sablons postgres[31629]: [31-1] FATAL: database "toto" does not exist Apr 30 17:58:22 sablons postgres[31604]: [31-1] LOG: server process (PID 31629) was terminated by signal 11 Apr 30 17:58:22 sablons postgres[31604]: [32-1] LOG: terminating any other active server processes Apr 30 17:58:22 sablons postgres[31532]: [31-1] WARNING: terminating connection because of crash of another server process Apr 30 17:58:22 sablons postgres[31532]: [31-2] DETAIL: The postmaster has commanded this server process to roll back thecurrent transaction and exit, because another server Apr 30 17:58:22 sablons postgres[31532]: [31-3] process exited abnormally and possibly corrupted shared memory. Apr 30 17:58:22 sablons postgres[31532]: [31-4] HINT: In a moment you should be able to reconnect to the database and repeatyour command. Apr 30 17:58:22 sablons postgres[31604]: [33-1] LOG: all server processes terminated; reinitializing Apr 30 17:58:22 sablons postgres[31630]: [34-1] LOG: database system was interrupted at 2004-04-30 17:54:56 CEST Apr 30 17:58:22 sablons postgres[31630]: [35-1] LOG: checkpoint record is at 0/B486F30 Apr 30 17:58:22 sablons postgres[31630]: [36-1] LOG: redo record is at 0/B486F30; undo record is at 0/0; shutdown TRUE Apr 30 17:58:22 sablons postgres[31630]: [37-1] LOG: next transaction ID: 10769; next OID: 123703 Apr 30 17:58:22 sablons postgres[31630]: [38-1] LOG: database system was not properly shut down; automatic recovery in progress Apr 30 17:58:22 sablons postgres[31630]: [39-1] LOG: redo starts at 0/B486F70Apr 30 17:58:22 sablons postgres[31630]: [40-1]PANIC: could not create relation 123703/16660: No such file or directory Apr 30 17:58:22 sablons postgres[31604]: [34-1] LOG: startup process (PID 31630) was terminated by signal 6 Apr 30 17:58:22 sablons postgres[31604]: [35-1] LOG: aborting startup due to startup process failure So it is not a "clean" coredump, if some may be;-) -- Fabien Coelho - coelho@cri.ensmp.fr
I think we fixed it since then. --------------------------------------------------------------------------- Fabien COELHO wrote: > > > > Alvaro Herrera <alvherre@dcc.uchile.cl> writes: > > > >>> In current (as of a couple hours ago) clean CVS tip sources, without any > > > >>> of my local changes, I'm getting a postmaster segfault when trying to > > > >>> connect to a non existant database. > > > > > > Alvaro, did you figure this out? I've been mostly distracted for the > > > past week ... > > > > No. I still see the failure on my platform but I don't know what to > > attribute it to. > > I also have that for a database installation from CVS on April 17. > > It also leaves the server in some incoherent state: > > Apr 30 17:58:22 sablons postgres[31629]: [31-1] FATAL: database "toto" does not exist > Apr 30 17:58:22 sablons postgres[31604]: [31-1] LOG: server process (PID 31629) was terminated by signal 11 > Apr 30 17:58:22 sablons postgres[31604]: [32-1] LOG: terminating any other active server processes > Apr 30 17:58:22 sablons postgres[31532]: [31-1] WARNING: terminating connection because of crash of another server process > Apr 30 17:58:22 sablons postgres[31532]: [31-2] DETAIL: The postmaster has commanded this server process to roll backthe current transaction and exit, because another server > Apr 30 17:58:22 sablons postgres[31532]: [31-3] process exited abnormally and possibly corrupted shared memory. > Apr 30 17:58:22 sablons postgres[31532]: [31-4] HINT: In a moment you should be able to reconnect to the database andrepeat your command. > Apr 30 17:58:22 sablons postgres[31604]: [33-1] LOG: all server processes terminated; reinitializing > Apr 30 17:58:22 sablons postgres[31630]: [34-1] LOG: database system was interrupted at 2004-04-30 17:54:56 CEST > Apr 30 17:58:22 sablons postgres[31630]: [35-1] LOG: checkpoint record is at 0/B486F30 > Apr 30 17:58:22 sablons postgres[31630]: [36-1] LOG: redo record is at 0/B486F30; undo record is at 0/0; shutdown TRUE > Apr 30 17:58:22 sablons postgres[31630]: [37-1] LOG: next transaction ID: 10769; next OID: 123703 > Apr 30 17:58:22 sablons postgres[31630]: [38-1] LOG: database system was not properly shut down; automatic recovery inprogress > Apr 30 17:58:22 sablons postgres[31630]: [39-1] LOG: redo starts at 0/B486F70Apr 30 17:58:22 sablons postgres[31630]:[40-1] PANIC: could not create relation 123703/16660: No such file or directory > Apr 30 17:58:22 sablons postgres[31604]: [34-1] LOG: startup process (PID 31630) was terminated by signal 6 > Apr 30 17:58:22 sablons postgres[31604]: [35-1] LOG: aborting startup due to startup process failure > > So it is not a "clean" coredump, if some may be;-) > > -- > Fabien Coelho - coelho@cri.ensmp.fr > > ---------------------------(end of broadcast)--------------------------- > TIP 3: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
Alvaro Herrera Munoz <alvherre@dcc.uchile.cl> writes:
> strace'ing the postmaster suggested me that the dbname string in
> utils/init/postinit.c, the InitPostgres function, is the culprit.
> In fact, if I apply the following patch to tcop/postgres.c the
> whole thing stops happening.
>         else if (argc - optind == 1)
> !           dbname = argv[optind];
> ...
>         else if (argc - optind == 1)
> !           dbname = pstrdup(argv[optind]);
Surely this is a red herring --- that code path does not even execute
except in the case of a standalone backend.
        regards, tom lane
			
		On Fri, Apr 30, 2004 at 11:36:36PM -0400, Tom Lane wrote: > Alvaro Herrera Munoz <alvherre@dcc.uchile.cl> writes: > > strace'ing the postmaster suggested me that the dbname string in > > utils/init/postinit.c, the InitPostgres function, is the culprit. > > In fact, if I apply the following patch to tcop/postgres.c the > > whole thing stops happening. > > > else if (argc - optind == 1) > > ! dbname = argv[optind]; > > ... > > else if (argc - optind == 1) > > ! dbname = pstrdup(argv[optind]); > > Surely this is a red herring --- that code path does not even execute > except in the case of a standalone backend. Yes, I figured that out later (the normal path uses -p instead). In fact I then took out the pstrdup() and the fault wasn't happening; so I recompiled all over again, without the pstrdup and it was back. I think maybe there's something clobbering argv. I thought about tracing that with gdb but never got to it. I will do that now and report back. -- Alvaro Herrera (<alvherre[a]dcc.uchile.cl>) "El miedo atento y previsor es la madre de la seguridad" (E. Burke)
On Fri, Apr 23, 2004 at 05:10:34PM -0400, Alvaro Herrera wrote: > In current (as of a couple hours ago) clean CVS tip sources, without any > of my local changes, I'm getting a postmaster segfault when trying to > connect to a non existant database. Just to follow up, I no longer see this problem in CVS tip. I don't know if somebody fixed it on purpose, but my system is the same as before and I can't reproduce the bug anymore. -- Alvaro Herrera (<alvherre[a]dcc.uchile.cl>) "El hombre nunca sabe de lo que es capaz hasta que lo intenta" (C. Dickens)