Обсуждение: Segfault in 9.0 inlining SRF

Поиск
Список
Период
Сортировка

Segfault in 9.0 inlining SRF

От
Brendan Jurd
Дата:
Hi folks,

I have encountered a reproducible segfault in Postgres, and confirmed
it in 9.0.1 and HEAD on three separate machines.  The bug was not
present in 8.4.  I've attached a copy of the SQL script I have been
using to induce the segfault.

With asserts enabled, I get a failed assertion:

TRAP: FailedAssertion("!(((list) == ((List *) ((void *)0)) ||
((((Node*)((list)))->type) == T_List)))", File: "list.c", Line: 130)

If I attach gdb to the server process, here's my backtrace as the
assert is fired:

#0  0x00007f0247f7f1b5 in raise () from /lib/libc.so.6
#1  0x00007f0247f805e0 in abort () from /lib/libc.so.6
#2  0x00000000006fbf6d in ExceptionalCondition (conditionName=<value
optimized out>,
    errorType=<value optimized out>, fileName=<value optimized out>,
    lineNumber=<value optimized out>) at assert.c:57
#3  0x00000000005acdbb in lappend (list=0xc49e60, datum=0xc562f0) at list.c:130
#4  0x00000000005e536f in record_plan_function_dependency
(glob=0xb6bb78, funcid=16385)
    at setrefs.c:1738
#5  0x00000000005f378f in inline_set_returning_function
(root=0xb6c140, rte=0xb6bc10)
    at clauses.c:4318
#6  0x00000000005eaced in inline_set_returning_functions (root=0xb6c140)
    at prepjointree.c:438
#7  0x00000000005e48b8 in subquery_planner (glob=<value optimized
out>, parse=0xb6b9f0,
    parent_root=0x0, hasRecursion=0 '\000', tuple_fraction=<value
optimized out>,
    subroot=<value optimized out>) at planner.c:341
#8  0x00000000005e5051 in standard_planner (parse=0xb6b9f0, cursorOptions=0,
    boundParams=0x0) at planner.c:200
#9  0x00000000006414b6 in pg_plan_query (querytree=0xb6b9f0, cursorOptions=0,
    boundParams=0x0) at postgres.c:757
#10 0x00000000006415b4 in pg_plan_queries (querytrees=<value optimized out>,
    cursorOptions=0, boundParams=0x0) at postgres.c:816
#11 0x00000000006425d0 in exec_simple_query (query_string=<value optimized out>)
    at postgres.c:981
#12 0x0000000000643498 in PostgresMain (argc=<value optimized out>,
    argv=<value optimized out>, username=<value optimized out>) at
postgres.c:3869
#13 0x0000000000607ba1 in BackendRun () at postmaster.c:3556
#14 BackendStartup () at postmaster.c:3241
#15 ServerLoop () at postmaster.c:1432
#16 0x000000000060a4ad in PostmasterMain (argc=3, argv=0xb645e0) at
postmaster.c:1093
#17 0x00000000005a8b70 in main (argc=3, argv=0xb645d0) at main.c:188

And then the postmaster outputs:

LOG:  server process (PID 18343) was terminated by signal 6: Aborted
LOG:  terminating any other active server processes
WARNING:  terminating connection because of crash of another server process
DETAIL:  The postmaster has commanded this server process to roll back
the current transaction and exit, because another server process
exited abnormally and possibly corrupted shared memory.
HINT:  In a moment you should be able to reconnect to the database and
repeat your command.
LOG:  all server processes terminated; reinitializing
LOG:  database system was interrupted; last known up at 2010-10-25 07:02:45 EST
LOG:  database system was not properly shut down; automatic recovery in progress
LOG:  consistent recovery state reached at 0/62C768
LOG:  redo starts at 0/62C768
LOG:  invalid magic number 0000 in log file 0, segment 0, offset 6520832
LOG:  redo done at 0/6370B8
LOG:  database system is ready to accept connections

I had a go at investigating the cause of the bug, but didn't have much
success as I'm not at all familiar with the guts of the optimizer.

Cheers,
BJ

Вложения

Re: Segfault in 9.0 inlining SRF

От
Tom Lane
Дата:
Brendan Jurd <direvus@gmail.com> writes:
> I have encountered a reproducible segfault in Postgres, and confirmed
> it in 9.0.1 and HEAD on three separate machines.  The bug was not
> present in 8.4.  I've attached a copy of the SQL script I have been
> using to induce the segfault.
> ...
> I had a go at investigating the cause of the bug, but didn't have much
> success as I'm not at all familiar with the guts of the optimizer.

Looks like the invalItems list has been clobbered:

(gdb) p *root->glob->invalItems
$6 = {type = 2139062143, length = 2139062143, head = 0x7f7f7f7f,  tail = 0x7f7f7f7f}

I'm guessing it was modified in the temporary memory context and not
properly copied out to the parent context when we finished inlining
the function.
        regards, tom lane


Re: Segfault in 9.0 inlining SRF

От
Brendan Jurd
Дата:
On 25 October 2010 07:36, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Looks like the invalItems list has been clobbered:
>
> (gdb) p *root->glob->invalItems
> $6 = {type = 2139062143, length = 2139062143, head = 0x7f7f7f7f,
>  tail = 0x7f7f7f7f}
>
> I'm guessing it was modified in the temporary memory context and not
> properly copied out to the parent context when we finished inlining
> the function.

I note that the error goes away if I do any of the following with my test case:
a) define the SRF as having no arguments, orb) make the argument passed into the SRF a constant rather than a
call to a function, orc) make the argument function (year) itself take no arguments, ord) make the argument passed to
yeara constant rather than current_date, ore) define year as VOLATILE, orf) put the call to segfault_setof into a
subqueryin the FROM clause. 

So this only manifests under a pretty specific type of query tree; a
function call as the argument to another function call, which is an
argument to an inlinable SRF in FROM.

Cheers,
BJ


Re: Segfault in 9.0 inlining SRF

От
Brendan Jurd
Дата:
On 25 October 2010 07:36, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Brendan Jurd <direvus@gmail.com> writes:
>> I have encountered a reproducible segfault in Postgres ...
>
> Looks like the invalItems list has been clobbered:
>
> (gdb) p *root->glob->invalItems
> $6 = {type = 2139062143, length = 2139062143, head = 0x7f7f7f7f,
>  tail = 0x7f7f7f7f}
>
> I'm guessing it was modified in the temporary memory context and not
> properly copied out to the parent context when we finished inlining
> the function.
>

Hi Tom,

Thanks for the hint; I found that the attached patch resolved my
specific segfault, but I am wondering whether it goes far enough.  The
patch just copies invalItems up out of the temporary context before it
is deleted.  Could there also be changes to other elements of
PlannerGlobal that need to be saved?  Should we in fact be copying out
the whole of PlannerGlobal each time, and would that necessitate a new
copyfunc for it?

Cheers,
BJ

Вложения

Re: Segfault in 9.0 inlining SRF

От
Tom Lane
Дата:
Brendan Jurd <direvus@gmail.com> writes:
> On 25 October 2010 07:36, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I'm guessing it was modified in the temporary memory context and not
>> properly copied out to the parent context when we finished inlining
>> the function.

> Thanks for the hint; I found that the attached patch resolved my
> specific segfault, but I am wondering whether it goes far enough.  The
> patch just copies invalItems up out of the temporary context before it
> is deleted.  Could there also be changes to other elements of
> PlannerGlobal that need to be saved?  Should we in fact be copying out
> the whole of PlannerGlobal each time, and would that necessitate a new
> copyfunc for it?

Well, it definitely doesn't go far enough, because the invalItems list
has to be restored to its original state if we fail to inline at some
point after calling eval_const_expressions.  I'm currently testing the
attached patch.

I thought about whether we need something more general, but for the
moment I think this is sufficient; eval_const_expressions has only
very limited reason to examine the PlannerInfo data at all, and less
reason to modify it.  Copying the whole structure would be overkill.
Moreover, it wouldn't do anything to improve modularity anyhow: this
function would still have to know which parts of the structure to copy
back to the top level, and which not.  So it'd still need to know
quite a bit about exactly what eval_const_expressions is doing.
        regards, tom lane


diff --git a/src/backend/optimizer/util/clauses.c b/src/backend/optimizer/util/clauses.c
index 13e89ec6678f01e7baf39f722aa960bf4550261b..00156d8eeeadfc18ad7680d283dc14a4db641f62 100644
*** a/src/backend/optimizer/util/clauses.c
--- b/src/backend/optimizer/util/clauses.c
*************** rowtype_field_matches(Oid rowtypeid, int
*** 2018,2028 ****  * will not be pre-evaluated here, although we will reduce their  * arguments as far as possible.  *
* We assume that the tree has already been type-checked and contains  * only operators and functions that are
reasonableto try to execute.  *  * NOTE: "root" can be passed as NULL if the caller never wants to do any
 
!  * Param substitutions.  *  * NOTE: the planner assumes that this will always flatten nested AND and  * OR clauses
intoN-argument form.  See comments in prepqual.c.
 
--- 2018,2033 ----  * will not be pre-evaluated here, although we will reduce their  * arguments as far as possible.
*
+  * Whenever a function is eliminated from the expression by means of
+  * constant-expression evaluation or inlining, we add the function's
+  * OID to root->glob->invalItems.  This ensures the plan is known to
+  * depend on such functions, even though they aren't referenced anymore.
+  *  * We assume that the tree has already been type-checked and contains  * only operators and functions that are
reasonableto try to execute.  *  * NOTE: "root" can be passed as NULL if the caller never wants to do any
 
!  * Param substitutions nor receive info about inlined functions.  *  * NOTE: the planner assumes that this will
alwaysflatten nested AND and  * OR clauses into N-argument form.  See comments in prepqual.c.
 
*************** inline_set_returning_function(PlannerInf
*** 4095,4100 ****
--- 4100,4106 ----     bool        modifyTargetList;     MemoryContext oldcxt;     MemoryContext mycxt;
+     List       *saveInvalItems;     inline_error_callback_arg callback_arg;     ErrorContextCallback sqlerrcontext;
 List       *raw_parsetree_list;
 
*************** inline_set_returning_function(PlannerInf
*** 4181,4186 ****
--- 4187,4202 ----                                   ALLOCSET_DEFAULT_MAXSIZE);     oldcxt =
MemoryContextSwitchTo(mycxt);
 
+     /*
+      * When we call eval_const_expressions below, it might try to add items
+      * to root->glob->invalItems.  Since it is running in the temp context,
+      * those items will be in that context, and will need to be copied out
+      * if we're successful.  Temporarily reset the list so that we can keep
+      * those items separate from the pre-existing list contents.
+      */
+     saveInvalItems = root->glob->invalItems;
+     root->glob->invalItems = NIL;
+      /* Fetch the function body */     tmp = SysCacheGetAttr(PROCOID,                           func_tuple,
*************** inline_set_returning_function(PlannerInf
*** 4307,4312 ****
--- 4323,4331 ----      querytree = copyObject(querytree); 
+     root->glob->invalItems = list_concat(saveInvalItems,
+                                          copyObject(root->glob->invalItems));
+      MemoryContextDelete(mycxt);     error_context_stack = sqlerrcontext.previous;     ReleaseSysCache(func_tuple);
*************** inline_set_returning_function(PlannerInf
*** 4322,4327 ****
--- 4341,4347 ----     /* Here if func is not inlinable: release temp memory and return NULL */ fail:
MemoryContextSwitchTo(oldcxt);
+     root->glob->invalItems = saveInvalItems;     MemoryContextDelete(mycxt);     error_context_stack =
sqlerrcontext.previous;    ReleaseSysCache(func_tuple);
 


Re: Segfault in 9.0 inlining SRF

От
Brendan Jurd
Дата:
On 26 October 2010 03:42, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Brendan Jurd <direvus@gmail.com> writes:
>> Thanks for the hint; I found that the attached patch resolved my
>> specific segfault, but I am wondering whether it goes far enough.
>
> Well, it definitely doesn't go far enough, because the invalItems list
> has to be restored to its original state if we fail to inline at some
> point after calling eval_const_expressions.

Point taken.

> I thought about whether we need something more general, but for the
> moment I think this is sufficient; eval_const_expressions has only
> very limited reason to examine the PlannerInfo data at all, and less
> reason to modify it.  Copying the whole structure would be overkill.
> Moreover, it wouldn't do anything to improve modularity anyhow: this
> function would still have to know which parts of the structure to copy
> back to the top level, and which not.  So it'd still need to know
> quite a bit about exactly what eval_const_expressions is doing.

That makes sense to me.

This whole business of passing around global pointers while switching
memory contexts seems like an optimal breeding-ground for bugs.  It
would be nice to come up with a more ... well, "global" way to manage
PlannerGlobal.  To me it suggests something along the lines of a
dedicated MemoryContext in which PlannerGlobal and all its members
live, and you operate on PlannerGlobal by calling methods, rather than
by directly twiddling its pointers.

But, that is probably way over the top for this, given its narrow area
of effect.

Cheers,
BJ


Re: Segfault in 9.0 inlining SRF

От
Tom Lane
Дата:
Brendan Jurd <direvus@gmail.com> writes:
> This whole business of passing around global pointers while switching
> memory contexts seems like an optimal breeding-ground for bugs.

Yeah.  If it were to get significantly more complicated than this,
the best solution IMO would be to give up on trying to use a temporary
memory context during function inlining, and just accept that whatever
memory we chew up there is going to be leaked for the duration of
planning.
        regards, tom lane