Обсуждение: segmentation fault in execTuples.c#ExecStoreVirtualTuple

Поиск
Список
Период
Сортировка

segmentation fault in execTuples.c#ExecStoreVirtualTuple

От
Manuel Kniep
Дата:
Hi, 

we are running postges 9.3.5 on gentoo linux kernel 3.16.5, compiled with gcc 4.8.3 

getting a segfault from time to time with the below core dump.

The error happens only on our production system and is not reproducible a second run after
database recovery always succeed without any error.

From the coredump I can’t imagine where this error might come from.

Any ideas ?

Thanks

Manuel


#0  0x0000000000608f74 in ExecStoreVirtualTuple (slot=0x24a40bd00) at execTuples.c:508
#1  0x00000000005f9d6c in ExecFilterJunk (junkfilter=0x24a40cf50, slot=0x3ecbac18) at execJunk.c:318
#2  0x000000000061eba4 in ExecModifyTable (node=0x59ac150) at nodeModifyTable.c:988
#3  0x00000000005fda57 in ExecProcNode (node=0x59ac150) at execProcnode.c:377
#4  0x0000000000622a62 in CteScanNext (node=0x29500050) at nodeCtescan.c:103
#5  0x00000000006084e7 in ExecScanFetch (node=0x29500050, accessMtd=0x622938 <CteScanNext>, recheckMtd=0x622ac8
<CteScanRecheck>)at execScan.c:82 
#6  0x0000000000608556 in ExecScan (node=0x29500050, accessMtd=0x622938 <CteScanNext>, recheckMtd=0x622ac8
<CteScanRecheck>)at execScan.c:132 
#7  0x0000000000622afd in ExecCteScan (node=0x29500050) at nodeCtescan.c:155
#8  0x00000000005fdb53 in ExecProcNode (node=0x29500050) at execProcnode.c:434
#9  0x00000000005fbcaf in ExecutePlan (estate=0x9280ef0, planstate=0x29500050, operation=CMD_SELECT, sendTuples=1
'\001',numberTuples=0, direction=ForwardScanDirection, dest=0x13b4530) 
    at execMain.c:1473
#10 0x00000000005fa12b in standard_ExecutorRun (queryDesc=0x228e58a78, direction=ForwardScanDirection, count=0) at
execMain.c:307
#11 0x00000000005fa02d in ExecutorRun (queryDesc=0x228e58a78, direction=ForwardScanDirection, count=0) at
execMain.c:255
#12 0x000000000059449a in ExecCreateTableAs (stmt=0x927e3e0, 
    queryString=0x22b880bb0 "\n    CREATE TEMPORARY TABLE unpro_inc_23472 ON COMMIT DROP AS\n    WITH affected AS (\n  
  UPDATE events t SET processed = true\n      WHERE tracker_id BETWEEN $1 AND $2\n      AND ajdate(t.created_at) "...,
params=0x563d0520,completionTag=0x7ffffaa88cc0 "") at createas.c:170 
#13 0x00000000007372c0 in ProcessUtilitySlow (parsetree=0x927e3e0, 
    queryString=0x22b880bb0 "\n    CREATE TEMPORARY TABLE unpro_inc_23472 ON COMMIT DROP AS\n    WITH affected AS (\n  
  UPDATE events t SET processed = true\n      WHERE tracker_id BETWEEN $1 AND $2\n      AND ajdate(t.created_at) "...,
context=PROCESS_UTILITY_QUERY,params=0x563d0520, dest=0xc56fc0 <spi_printtupDR>, completionTag=0x7ffffaa88cc0 "") at
utility.c:1231
#14 0x000000000073688f in standard_ProcessUtility (parsetree=0x927e3e0, 
    queryString=0x22b880bb0 "\n    CREATE TEMPORARY TABLE unpro_inc_23472 ON COMMIT DROP AS\n    WITH affected AS (\n  
  UPDATE events t SET processed = true\n      WHERE tracker_id BETWEEN $1 AND $2\n      AND ajdate(t.created_at) "...,
context=PROCESS_UTILITY_QUERY,params=0x563d0520, dest=0xc56fc0 <spi_printtupDR>, completionTag=0x7ffffaa88cc0 "") at
utility.c:829
#15 0x0000000000735ba3 in ProcessUtility (parsetree=0x927e3e0, 
    queryString=0x22b880bb0 "\n    CREATE TEMPORARY TABLE unpro_inc_23472 ON COMMIT DROP AS\n    WITH affected AS (\n  
  UPDATE events t SET processed = true\n      WHERE tracker_id BETWEEN $1 AND $2\n      AND ajdate(t.created_at) "...,
context=PROCESS_UTILITY_QUERY,params=0x563d0520, dest=0xc56fc0 <spi_printtupDR>, completionTag=0x7ffffaa88cc0 "") at
utility.c:309
#16 0x000000000062e897 in _SPI_execute_plan (plan=0x7ffffaa88d70, paramLI=0x563d0520, snapshot=0x0,
crosscheck_snapshot=0x0,read_only=0 '\000', fire_triggers=1 '\001', tcount=0) at spi.c:2160 
#17 0x000000000062bb9d in SPI_execute_with_args (
    src=0x22b880bb0 "\n    CREATE TEMPORARY TABLE unpro_inc_23472 ON COMMIT DROP AS\n    WITH affected AS (\n    
 UPDATEevents t SET processed = true\n      WHERE tracker_id BETWEEN $1 AND $2\n      AND ajdate(t.created_at) "...,
nargs=4,argtypes=0x22b880ab0, Values=0x22b880af0, Nulls=0x22b881010 "    \002", read_only=0 '\000', tcount=0) at
spi.c:537
#18 0x00007f3635560ed0 in exec_stmt_dynexecute (estate=0x7ffffaa890d0, stmt=0xe2f140) at pl_exec.c:3462
#19 0x00007f363555cfe9 in exec_stmt (estate=0x7ffffaa890d0, stmt=0xe2f140) at pl_exec.c:1450
#20 0x00007f363555cd05 in exec_stmts (estate=0x7ffffaa890d0, stmts=0xe2ef58) at pl_exec.c:1345
#21 0x00007f363555cbb0 in exec_stmt_block (estate=0x7ffffaa890d0, block=0xe2fa80) at pl_exec.c:1283
#22 0x00007f363555ab97 in plpgsql_exec_function (func=0xe1f5b0, fcinfo=0x11daac0) at pl_exec.c:321
#23 0x00007f36355552be in plpgsql_call_handler (fcinfo=0x11daac0) at pl_handler.c:129
#24 0x000000000060114d in ExecMakeFunctionResultNoSets (fcache=0x11daa50, econtext=0x346e2840, isNull=0x7ffffaa89473
"",isDone=0x0) at execQual.c:1992 
#25 0x0000000000601b4b in ExecEvalFunc (fcache=0x11daa50, econtext=0x346e2840, isNull=0x7ffffaa89473 "", isDone=0x0) at
execQual.c:2383
#26 0x00007f3635563f3b in exec_eval_simple_expr (estate=0x7ffffaa89740, expr=0xdc4ab0, result=0x7ffffaa89428,
isNull=0x7ffffaa89473"", rettype=0x7ffffaa89474) at pl_exec.c:5130 
#27 0x00007f363556365e in exec_eval_expr (estate=0x7ffffaa89740, expr=0xdc4ab0, isNull=0x7ffffaa89473 "",
rettype=0x7ffffaa89474)at pl_exec.c:4717 
#28 0x00007f3635561c83 in exec_assign_expr (estate=0x7ffffaa89740, target=0xdbf1d8, expr=0xdc4ab0) at pl_exec.c:3916
#29 0x00007f363555d138 in exec_stmt_assign (estate=0x7ffffaa89740, stmt=0xdc4bd0) at pl_exec.c:1494
#30 0x00007f363555ce1e in exec_stmt (estate=0x7ffffaa89740, stmt=0xdc4bd0) at pl_exec.c:1382
#31 0x00007f363555cd05 in exec_stmts (estate=0x7ffffaa89740, stmts=0xdc3e50) at pl_exec.c:1345
#32 0x00007f363555cbb0 in exec_stmt_block (estate=0x7ffffaa89740, block=0xdc64e0) at pl_exec.c:1283
#33 0x00007f363555ab97 in plpgsql_exec_function (func=0xd888c8, fcinfo=0x7ffffaa89a60) at pl_exec.c:321
#34 0x00007f36355552be in plpgsql_call_handler (fcinfo=0x7ffffaa89a60) at pl_handler.c:129
#35 0x000000000060154f in ExecMakeTableFunctionResult (funcexpr=0x34e7f488, econtext=0x34e7e580, argContext=0xf19b80,
expectedDesc=0x34e7f030,randomAccess=0 '\000') at execQual.c:2162 
#36 0x000000000061ff76 in FunctionNext (node=0x34e7e470) at nodeFunctionscan.c:65
#37 0x00000000006084e7 in ExecScanFetch (node=0x34e7e470, accessMtd=0x61fef8 <FunctionNext>, recheckMtd=0x61ffc0
<FunctionRecheck>)at execScan.c:82 
#38 0x00000000006085ce in ExecScan (node=0x34e7e470, accessMtd=0x61fef8 <FunctionNext>, recheckMtd=0x61ffc0
<FunctionRecheck>)at execScan.c:167 
#39 0x000000000061fff5 in ExecFunctionScan (node=0x34e7e470) at nodeFunctionscan.c:106
#40 0x00000000005fdb29 in ExecProcNode (node=0x34e7e470) at execProcnode.c:426
#41 0x00000000005fbcaf in ExecutePlan (estate=0x34e7e360, planstate=0x34e7e470, operation=CMD_SELECT, sendTuples=1
'\001',numberTuples=0, direction=ForwardScanDirection, dest=0xddb240) 
    at execMain.c:1473
#42 0x00000000005fa12b in standard_ExecutorRun (queryDesc=0x57b20360, direction=ForwardScanDirection, count=0) at
execMain.c:307
---Type <return> to continue, or q <return> to quit---
#43 0x00000000005fa02d in ExecutorRun (queryDesc=0x57b20360, direction=ForwardScanDirection, count=0) at execMain.c:255
#44 0x0000000000734798 in PortalRunSelect (portal=0xcdf590, forward=1 '\001', count=0, dest=0xddb240) at pquery.c:946
#45 0x0000000000734427 in PortalRun (portal=0xcdf590, count=9223372036854775807, isTopLevel=1 '\001', dest=0xddb240,
altdest=0xddb240,completionTag=0x7ffffaa8a290 "") at pquery.c:790 
#46 0x000000000072e4eb in exec_simple_query (
    query_string=0xd633b0 "SELECT 'event' as item, '2014-12-30' as date, 'Backends::Backend9' as backend, '33' as
bucket, * FROM materialize_events('2014-11-20', '2014-12-30')") at postgres.c:1048 
#47 0x0000000000732750 in PostgresMain (argc=1, argv=0xcbf648, dbname=0xcbf570 "adjust_backend9", username=0xcbf550
"postgres")at postgres.c:4007 
#48 0x00000000006d904a in BackendRun (port=0xce3a10) at postmaster.c:4011
#49 0x00000000006d8746 in BackendStartup (port=0xce3a10) at postmaster.c:3685
#50 0x00000000006d5240 in ServerLoop () at postmaster.c:1586
#51 0x00000000006d49d7 in PostmasterMain (argc=4, argv=0xcbe800) at postmaster.c:1253
#52 0x0000000000642930 in main (argc=4, argv=0xcbe800) at main.c:226


--
Manuel Kniep
co founder & Developer

++49 174 175 7069 | www.adjust.com
adjust GmbH, Saarbrücker Straße 36, 10405 Berlin, Germany

Geschäftsführer/Managing Director: Christian Henschel, Paul H. Müller
Jurisdiction: Local Court Berlin-Charlottenburg: HRB 140616 Beim






Re: segmentation fault in execTuples.c#ExecStoreVirtualTuple

От
Michael Paquier
Дата:
On Tue, Jan 6, 2015 at 12:39 AM, Manuel Kniep <manuel@adjust.com> wrote:
> Hi,
>
> we are running postges 9.3.5 on gentoo linux kernel 3.16.5, compiled with gcc 4.8.3
> Any ideas ?

> #17 0x000000000062bb9d in SPI_execute_with_args (   src=0x22b880bb0 "\n    CREATE TEMPORARY TABLE
> [...]
> #33 0x00007f363555ab97 in plpgsql_exec_function (func=0xd888c8, fcinfo=0x7ffffaa89a60) at pl_exec.c:321
> #34 0x00007f36355552be in plpgsql_call_handler (fcinfo=0x7ffffaa89a60) at pl_handler.c:129
> [...]
> #46 0x000000000072e4eb in exec_simple_query (
>     query_string=0xd633b0 "SELECT 'event' as item, '2014-12-30' as date, 'Backends::Backend9' as backend, '33' as
bucket, * FROM materialize_events('2014-11-20', '2014-12-30')") at postgres.c:1048
 
From the backtrace you are showing, you are creating a temporary table
with CREATE TABLE AS within a plpgsql function. Could you provide a
self-contained test case?
-- 
Michael



Re: segmentation fault in execTuples.c#ExecStoreVirtualTuple

От
Manuel Kniep
Дата:
On 6. Januar 2015 at 07:20:21, Michael Paquier (michael.paquier@gmail.com) wrote:
> On Tue, Jan 6, 2015 at 12:39 AM, Manuel Kniep wrote:
> > Hi,
> >
> > we are running postges 9.3.5 on gentoo linux kernel 3.16.5, compiled with gcc 4.8.3
> > Any ideas ?
>
> > #17 0x000000000062bb9d in SPI_execute_with_args (
> src=0x22b880bb0 "\n CREATE TEMPORARY TABLE
> > [...]
> > #33 0x00007f363555ab97 in plpgsql_exec_function (func=0xd888c8, fcinfo=0x7ffffaa89a60)
> at pl_exec.c:321
> > #34 0x00007f36355552be in plpgsql_call_handler (fcinfo=0x7ffffaa89a60) at pl_handler.c:129
> > [...]
> > #46 0x000000000072e4eb in exec_simple_query (
> > query_string=0xd633b0 "SELECT 'event' as item, '2014-12-30' as date, 'Backends::Backend9'
> as backend, '33' as bucket, * FROM materialize_events('2014-11-20', '2014-12-30')")
> at postgres.c:1048
> From the backtrace you are showing, you are creating a temporary table
> with CREATE TABLE AS within a plpgsql function. Could you provide a
> self-contained test case?
> --
> Michael

ok after lot’s of testing I could create a test case 
which can be found here https://gist.github.com/rapimo/3c8c1b35270e5854c524 
it’s written in ruby an depends on the gem activerecord pg and parallel 
the issue is basically the combination of writable CTE on a parent table and concurrent updates on the child table 

the test case create 3 child tables with 1e6 rows each  

CREATE TABLE test_data( 
  tracker_id  integer not null, 
  created_at timestamp, 
  processed boolean 
); 

and then runs the CTE command 

 CREATE TEMPORARY TABLE outtbl ON COMMIT DROP AS 
          WITH affected AS ( 
            UPDATE test_data t SET processed = true 
            WHERE tracker_id BETWEEN 200 AND 300 
            AND NOT processed 
            RETURNING 
              date(t.created_at) as date, tracker_id, created_at 
          ) 
          SELECT * FROM affected 

while this command is running 5 other concurrent updates do a 

UPDATE test_data SET tracker_id = 400 
      WHERE tracker_id = 200 
      and created_at = '2015-01-02 00:03:20' 
      and date(created_at) = '2015-01-02’ 

So my feeling is that the row exclusive lock is not taken properly on inherited tables. 
I tried the tastcase without inheritance and it did not segfault. 

If you don’t like ruby I might be able to rewrite the test case in bash. 

thanks 

Manuel 





Re: segmentation fault in execTuples.c#ExecStoreVirtualTuple

От
Tom Lane
Дата:
Manuel Kniep <manuel@adjust.com> writes:
> ok after lot’s of testing I could create a test case 
> which can be found here https://gist.github.com/rapimo/3c8c1b35270e5854c524 
> it’s written in ruby an depends on the gem activerecord pg and parallel 

Hm.  I don't see a segfault from this.  I do see the CREATE TEMP TABLE
command failing with "ctid is NULL", which probably shouldn't be happening
... but no segfault.  I tried with current HEAD and with current 9.3
branch tip, so it's possible there's been some relevant bugfix since
9.3.5.

> So my feeling is that the row exclusive lock is not taken properly on inherited tables. 

I think it's much more likely something to do with EvalPlanQual.  There
have been some bugs fixed in that area since 9.3.5, but none of them
look real promising as an explanation for this ...
        regards, tom lane



Re: segmentation fault in execTuples.c#ExecStoreVirtualTuple

От
Tom Lane
Дата:
I wrote:
> Manuel Kniep <manuel@adjust.com> writes:
>> ok after lot’s of testing I could create a test case 
>> which can be found here https://gist.github.com/rapimo/3c8c1b35270e5854c524 
>> it’s written in ruby an depends on the gem activerecord pg and parallel 

> Hm.  I don't see a segfault from this.  I do see the CREATE TEMP TABLE
> command failing with "ctid is NULL", which probably shouldn't be happening
> ... but no segfault.

The reason turns out to be that this is a dangling-pointer bug, and I was
using a memory-clobber-enabled build so it was pretty predictable what the
pointer would be pointing at.  I've got no doubt that hard-to-reproduce
misbehavior, including segfaults, would ensue without CLOBBER_FREED_MEMORY
turned on.

You need this patch:
http://git.postgresql.org/gitweb/?p=postgresql.git;a=patch;h=34668c8eca065d745bf1166a92c9efc588e7aee2
        regards, tom lane



Re: segmentation fault in execTuples.c#ExecStoreVirtualTuple

От
Manuel Kniep
Дата:
On 16. Januar 2015 at 00:57:14, Tom Lane (tgl@sss.pgh.pa.us) wrote:
> I wrote:
> > Manuel Kniep writes:
> >> ok after lot’s of testing I could create a test case
> >> which can be found here https://gist.github.com/rapimo/3c8c1b35270e5854c524
> >> it’s written in ruby an depends on the gem activerecord pg and parallel
>
> > Hm. I don't see a segfault from this. I do see the CREATE TEMP TABLE
> > command failing with "ctid is NULL", which probably shouldn't be happening
> > ... but no segfault.
>
> The reason turns out to be that this is a dangling-pointer bug, and I was
> using a memory-clobber-enabled build so it was pretty predictable what the
> pointer would be pointing at. I've got no doubt that hard-to-reproduce
> misbehavior, including segfaults, would ensue without CLOBBER_FREED_MEMORY
> turned on.
>
> You need this patch:
> http://git.postgresql.org/gitweb/?p=postgresql.git;a=patch;h=34668c8eca065d745bf1166a92c9efc588e7aee2

thanks a lot 
we applied the patch to our servers and everything looks great so far

cheers Manuel