Обсуждение: Possible bug in PostgreSQL 8.3beta4 (postgres process segfaults)
While developing a Python program I encountered a situation that makes on of the PostgreSQL 8.3b4's processes crash badly with a segfault. Let me explain. After enabling "debug5" as log level I was able to locate and reproduce the exact sequence of SQL queries made against my test database and crash postgres again using only "psql". See for yourself: foodb=# \o out.txt foodb=# SELECT * FROM foobaz.instrument WHERE code = 'TEST0118'; LOG: duration: 3.683 ms statement: SELECT * FROM foobaz.instrument WHERE code = 'TEST0118'; foodb=# SELECT xpath('//sp:description/text()', content, ARRAY[ARRAY['sp', 'http://www.foobaz.com/']]) foodb-# FROM foobaz.instrument WHERE code = 'TEST0018'; server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. The connection to the server was lost. Attempting reset: Failed. The pgsql log file contains these error logs (the pid 753 is a postgres child process): """"""""" [loluyede:foodb:2007-12-11 14:38:20.992 CET]LOG: 00000: duration: 0.090 ms statement: BEGIN; SET TRANSACTION ISOLATION LEVEL READ COMMITTED [loluyede:foodb:2007-12-11 14:38:20.992 CET]LOCATION: exec_simple_query, postgres.c:1040 [::2007-12-11 14:38:20.993 CET]LOG: 00000: server process (PID 753) was terminated by signal 11: Segmentation fault [::2007-12-11 14:38:20.993 CET]LOCATION: LogChildExit, postmaster.c:2510 [::2007-12-11 14:38:20.993 CET]LOG: 00000: terminating any other active server processes [::2007-12-11 14:38:20.993 CET]LOCATION: HandleChildCrash, postmaster.c:2355 [::2007-12-11 14:38:20.997 CET]LOG: 00000: all server processes terminated; reinitializing [::2007-12-11 14:38:20.997 CET]LOCATION: PostmasterStateMachine, postmaster.c:2663 [::2007-12-11 14:38:21.000 CET]LOG: 00000: database system was interrupted; last known up at 2007-12-11 14:38:02 CET [::2007-12-11 14:38:21.000 CET]LOCATION: StartupXLOG, xlog.c:4789 [::2007-12-11 14:38:21.000 CET]DEBUG: 00000: checkpoint record is at 0/2AB634A0 """""""" The out.txt file contains the output of the "SELECT *" statement code | kind | last_modified | created | modified_by | created_by | content ----------+------------+-------------------------------+------------------------------+-------------+------------+---------------------------------------------------------------------- TEST0118 | bazfoo | 2007-12-12 08:17:01.846919+01 | 2007-12-12 08:17:00.32337+01 | anonymous | anonymous | <xml ... > (1 row) The "instrument" table has the following schema: """ CREATE TABLE foobaz.instrument ( -- Inherited: code character varying(32) NOT NULL, -- Inherited: kind character varying(32) NOT NULL DEFAULT 'bazfoo'::character varying, last_modified timestamp with time zone NOT NULL, created timestamp with time zone NOT NULL DEFAULT ('now'::text)::timestamp(3) with time zone, modified_by character varying(64), created_by character varying(64) NOT NULL DEFAULT 'anonymous'::character varying, content xml NOT NULL, CONSTRAINT instrument_pkey PRIMARY KEY (code), CONSTRAINT instrument_kind_check CHECK (kind::text = 'bazfoo'::text) ) INHERITS (foobaz.instrument_base) WITH (OIDS=FALSE); """ Its "parent" table is instrument_base and has the following schema: """ CREATE TABLE foobaz.instrument_base ( code character varying(32) NOT NULL, kind character varying(32) NOT NULL, CONSTRAINT instrument_base_pkey PRIMARY KEY (code) ) WITH (OIDS=FALSE); """ PostgreSQL is again 8.3beta 4 on Ubuntu Linux 7.10. It's compiled with the following flags: ./configure --with-python --with-openssl --with-pam --with-libxml --with-libxslt --enable-thread-safety --enable-debug libxml is 2.6.30, libxslt is 1.1.21 I think that is all. Is it a bug or am I doing something wrong? -- Lawrence, oluyede.org - neropercaso.it "It is difficult to get a man to understand something when his salary depends on not understanding it" - Upton Sinclair
Hello this is bug. Please send backtrace from core file. Regards Pavel Stehule On 12/12/2007, Lawrence Oluyede <l.oluyede@gmail.com> wrote: > While developing a Python program I encountered a situation that makes > on of the PostgreSQL 8.3b4's processes crash badly with a segfault. > Let me explain. > > After enabling "debug5" as log level I was able to locate and > reproduce the exact sequence of SQL queries made against my test > database > and crash postgres again using only "psql". See for yourself: > > > foodb=# \o out.txt > foodb=# SELECT * FROM foobaz.instrument WHERE code = 'TEST0118'; > LOG: duration: 3.683 ms statement: SELECT * FROM foobaz.instrument > WHERE code = 'TEST0118'; > foodb=# SELECT xpath('//sp:description/text()', content, > ARRAY[ARRAY['sp', 'http://www.foobaz.com/']]) > foodb-# FROM foobaz.instrument WHERE code = 'TEST0018'; > server closed the connection unexpectedly > This probably means the server terminated abnormally > before or while processing the request. > The connection to the server was lost. Attempting reset: Failed. > > The pgsql log file contains these error logs (the pid 753 is a > postgres child process): > > """"""""" > [loluyede:foodb:2007-12-11 14:38:20.992 CET]LOG: 00000: duration: > 0.090 ms statement: BEGIN; SET TRANSACTION ISOLATION LEVEL READ > COMMITTED > [loluyede:foodb:2007-12-11 14:38:20.992 CET]LOCATION: > exec_simple_query, postgres.c:1040 > [::2007-12-11 14:38:20.993 CET]LOG: 00000: server process (PID 753) > was terminated by signal 11: Segmentation fault > [::2007-12-11 14:38:20.993 CET]LOCATION: LogChildExit, postmaster.c:2510 > [::2007-12-11 14:38:20.993 CET]LOG: 00000: terminating any other > active server processes > [::2007-12-11 14:38:20.993 CET]LOCATION: HandleChildCrash, postmaster.c:2355 > [::2007-12-11 14:38:20.997 CET]LOG: 00000: all server processes > terminated; reinitializing > [::2007-12-11 14:38:20.997 CET]LOCATION: PostmasterStateMachine, > postmaster.c:2663 > [::2007-12-11 14:38:21.000 CET]LOG: 00000: database system was > interrupted; last known up at 2007-12-11 14:38:02 CET > [::2007-12-11 14:38:21.000 CET]LOCATION: StartupXLOG, xlog.c:4789 > [::2007-12-11 14:38:21.000 CET]DEBUG: 00000: checkpoint record is at 0/2AB634A0 > """""""" > > The out.txt file contains the output of the "SELECT *" statement > > code | kind | last_modified | > created | modified_by | created_by | > content > ----------+------------+-------------------------------+------------------------------+-------------+------------+---------------------------------------------------------------------- > TEST0118 | bazfoo | 2007-12-12 08:17:01.846919+01 | 2007-12-12 > 08:17:00.32337+01 | anonymous | anonymous | <xml ... > > (1 row) > > The "instrument" table has the following schema: > > """ > CREATE TABLE foobaz.instrument > ( > -- Inherited: code character varying(32) NOT NULL, > -- Inherited: kind character varying(32) NOT NULL DEFAULT > 'bazfoo'::character varying, > last_modified timestamp with time zone NOT NULL, > created timestamp with time zone NOT NULL DEFAULT > ('now'::text)::timestamp(3) with time zone, > modified_by character varying(64), > created_by character varying(64) NOT NULL DEFAULT > 'anonymous'::character varying, > content xml NOT NULL, > CONSTRAINT instrument_pkey PRIMARY KEY (code), > CONSTRAINT instrument_kind_check CHECK (kind::text = 'bazfoo'::text) > ) > INHERITS (foobaz.instrument_base) > WITH (OIDS=FALSE); > """ > > Its "parent" table is instrument_base and has the following schema: > > """ > CREATE TABLE foobaz.instrument_base > ( > code character varying(32) NOT NULL, > kind character varying(32) NOT NULL, > CONSTRAINT instrument_base_pkey PRIMARY KEY (code) > ) > WITH (OIDS=FALSE); > """ > > PostgreSQL is again 8.3beta 4 on Ubuntu Linux 7.10. > It's compiled with the following flags: > ./configure --with-python --with-openssl --with-pam --with-libxml > --with-libxslt --enable-thread-safety --enable-debug > > libxml is 2.6.30, libxslt is 1.1.21 > > I think that is all. > > Is it a bug or am I doing something wrong? > > -- > Lawrence, oluyede.org - neropercaso.it > "It is difficult to get a man to understand > something when his salary depends on not > understanding it" - Upton Sinclair > > ---------------------------(end of broadcast)--------------------------- > TIP 2: Don't 'kill -9' the postmaster >
Here it is: postgres$ gdb /usr/local/pgsql/bin/postgres core.1600 (gdb) bt #0 0x082c101c in pfree (pointer=0x8472f00) at mcxt.c:591 #1 0xb7e46513 in xmlCleanupCharEncodingHandlers () from /usr/lib/libxml2.so.2 #2 0xb7e4f091 in xmlCleanupParser () from /usr/lib/libxml2.so.2 #3 0x082940e4 in xpath (fcinfo=0xbfee59c4) at xml.c:3441 #4 0x0817107f in ExecMakeFunctionResult (fcache=0x84723d0, econtext=0x8472348, isNull=0x8472fa8 "", isDone=0x8472fb8) at execQual.c:1351 #5 0x0816f522 in ExecProject (projInfo=0x8472e98, isDone=0xbfee5c78) at execQual.c:4601 #6 0x08175987 in ExecScan (node=0x84722c0, accessMtd=0x817e310 <IndexNext>) at execScan.c:143 #7 0x0817e2e7 in ExecIndexScan (node=0x84722c0) at nodeIndexscan.c:147 #8 0x0816ee7a in ExecProcNode (node=0x84722c0) at execProcnode.c:338 #9 0x0816e113 in ExecutorRun (queryDesc=0x845e908, direction=ForwardScanDirection, count=0) at execMain.c:1233 #10 0x08204ea0 in PortalRunSelect (portal=0x84577a0, forward=<value optimized out>, count=0, dest=0x845d1c8) at pquery.c:943 #11 0x08205eaa in PortalRun (portal=0x84577a0, count=2147483647, isTopLevel=1 '\001', dest=0x845d1c8, altdest=0x845d1c8, completionTag=0xbfee5ee8 "") at pquery.c:797 #12 0x082013c3 in exec_simple_query ( query_string=0x8453790 "SELECT xpath('//sp:description/text()', content, ARRAY[ARRAY['sp', 'http://www.statpro.net/xml/structure/1.0']])\n FROM instrument WHERE code = 'TEST0018';") at postgres.c:963 #13 0x08202ebc in PostgresMain (argc=4, argv=<value optimized out>, username=0x83d2330 "postgres") at postgres.c:3531 #14 0x081d76cf in ServerLoop () at postmaster.c:3180 #15 0x081d83a5 in PostmasterMain (argc=3, argv=0x83ccb08) at postmaster.c:1028 #16 0x081909a0 in main (argc=3, argv=0x82c0c00) at main.c:188 (gdb) l 586 header = (StandardChunkHeader *) 587 ((char *) pointer - STANDARDCHUNKHEADERSIZE); 588 589 AssertArg(MemoryContextIsValid(header->context)); 590 591 (*header->context->methods->free_p) (header->context, pointer); 592 } 593 594 /* 595 * repalloc I also uploaded the entire core file on the net, if you want to inspect it: http://www.oluyede.org/temp/core.1600 (it's ~ 30M) thank you -- Lawrence, oluyede.org - neropercaso.it "It is difficult to get a man to understand something when his salary depends on not understanding it" - Upton Sinclair
Hello I cannot repeat this bug on my test date. Can you send some small set that reproduce the bug? Regards Pavel On 12/12/2007, Lawrence Oluyede <l.oluyede@gmail.com> wrote: > Here it is: > > postgres$ gdb /usr/local/pgsql/bin/postgres core.1600 > (gdb) bt > #0 0x082c101c in pfree (pointer=0x8472f00) at mcxt.c:591 > #1 0xb7e46513 in xmlCleanupCharEncodingHandlers () from /usr/lib/libxml2.so.2 > #2 0xb7e4f091 in xmlCleanupParser () from /usr/lib/libxml2.so.2 > #3 0x082940e4 in xpath (fcinfo=0xbfee59c4) at xml.c:3441 > #4 0x0817107f in ExecMakeFunctionResult (fcache=0x84723d0, > econtext=0x8472348, isNull=0x8472fa8 "", isDone=0x8472fb8) at > execQual.c:1351 > #5 0x0816f522 in ExecProject (projInfo=0x8472e98, isDone=0xbfee5c78) > at execQual.c:4601 > #6 0x08175987 in ExecScan (node=0x84722c0, accessMtd=0x817e310 > <IndexNext>) at execScan.c:143 > #7 0x0817e2e7 in ExecIndexScan (node=0x84722c0) at nodeIndexscan.c:147 > #8 0x0816ee7a in ExecProcNode (node=0x84722c0) at execProcnode.c:338 > #9 0x0816e113 in ExecutorRun (queryDesc=0x845e908, > direction=ForwardScanDirection, count=0) at execMain.c:1233 > #10 0x08204ea0 in PortalRunSelect (portal=0x84577a0, forward=<value > optimized out>, count=0, dest=0x845d1c8) at pquery.c:943 > #11 0x08205eaa in PortalRun (portal=0x84577a0, count=2147483647, > isTopLevel=1 '\001', dest=0x845d1c8, altdest=0x845d1c8, > completionTag=0xbfee5ee8 "") at pquery.c:797 > #12 0x082013c3 in exec_simple_query ( > query_string=0x8453790 "SELECT xpath('//sp:description/text()', > content, ARRAY[ARRAY['sp', > 'http://www.statpro.net/xml/structure/1.0']])\n FROM instrument > WHERE code = 'TEST0018';") at postgres.c:963 > #13 0x08202ebc in PostgresMain (argc=4, argv=<value optimized out>, > username=0x83d2330 "postgres") at postgres.c:3531 > #14 0x081d76cf in ServerLoop () at postmaster.c:3180 > #15 0x081d83a5 in PostmasterMain (argc=3, argv=0x83ccb08) at postmaster.c:1028 > #16 0x081909a0 in main (argc=3, argv=0x82c0c00) at main.c:188 > (gdb) l > 586 header = (StandardChunkHeader *) > 587 ((char *) pointer - STANDARDCHUNKHEADERSIZE); > 588 > 589 AssertArg(MemoryContextIsValid(header->context)); > 590 > 591 (*header->context->methods->free_p) (header->context, pointer); > 592 } > 593 > 594 /* > 595 * repalloc > > I also uploaded the entire core file on the net, if you want to inspect it: > http://www.oluyede.org/temp/core.1600 (it's ~ 30M) > > thank you > > > > -- > Lawrence, oluyede.org - neropercaso.it > "It is difficult to get a man to understand > something when his salary depends on not > understanding it" - Upton Sinclair >
sorry s/date/data/g Pavel
On Dec 12, 2007 1:29 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote: > I cannot repeat this bug on my test date. Can you send some small set > that reproduce the bug? I dumped the entire table here: http://www.oluyede.org/temp/instrument.txt Let me know if you need anything else -- Lawrence, oluyede.org - neropercaso.it "It is difficult to get a man to understand something when his salary depends on not understanding it" - Upton Sinclair
"Lawrence Oluyede" <l.oluyede@gmail.com> writes: > On Dec 12, 2007 1:29 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote: >> I cannot repeat this bug on my test date. Can you send some small set >> that reproduce the bug? > I dumped the entire table here: http://www.oluyede.org/temp/instrument.txt I didn't have any luck reproducing the crash either. Please try to put together a SQL script that reproduces the crash for you, starting from an empty database. It's not exactly clear to me whether I'm doing the same things you did. regards, tom lane
"Lawrence Oluyede" <l.oluyede@gmail.com> writes: > Here it is: > postgres$ gdb /usr/local/pgsql/bin/postgres core.1600 > (gdb) bt > #0 0x082c101c in pfree (pointer=0x8472f00) at mcxt.c:591 > #1 0xb7e46513 in xmlCleanupCharEncodingHandlers () from /usr/lib/libxml2.so.2 > #2 0xb7e4f091 in xmlCleanupParser () from /usr/lib/libxml2.so.2 > #3 0x082940e4 in xpath (fcinfo=0xbfee59c4) at xml.c:3441 I think we finally found the problem. Please see if things are more stable with 8.3RC1 plus this patch: http://archives.postgresql.org/pgsql-committers/2008-01/msg00190.php regards, tom lane
> I think we finally found the problem. Please see if things are more > stable with 8.3RC1 plus this patch: > > http://archives.postgresql.org/pgsql-committers/2008-01/msg00190.php Sorry for being so late. I'm currently using PostgreSQL RC2 at work and I notice no crashes. Thanks! -- Lawrence, stacktrace.it - oluyede.org - neropercaso.it "It is difficult to get a man to understand something when his salary depends on not understanding it" - Upton Sinclair