Обсуждение: Core dump with nested CREATE TEMP TABLE

Поиск
Список
Период
Сортировка

Core dump with nested CREATE TEMP TABLE

От
Jim Nasby
Дата:
I don't have an independent reproduction yet (though make test in [1]
should reproduce this). I haven't actually been able to reproduce by
hand yet, but pgtap has something to do with this. This is affecting at
least 9.4 and a fairly recent HEAD.


-- Bits from top of test/sql/base.sql
\i test/helpers/setup.sql

SET ROLE = DEFAULT;
CREATE ROLE test_role;
GRANT USAGE ON SCHEMA tap TO test_role;
GRANT test_role TO test_factory__owner;

CREATE SCHEMA test AUTHORIZATION test_role;
SET ROLE = test_role;
SET search_path = test, tap;
\i test/helpers/create.sql

SELECT tf.register(
   'customer'
   , array[
     row(
       'insert'
       , $$INSERT INTO customer VALUES (DEFAULT, 'first', 'last' )
RETURNING *$$
     )::tf.test_set
     , row(
       'function'
       , $$SELECT * FROM customer__add( 'func first', 'func last' )$$
     )::tf.test_set
   ]
);
SELECT tf.register(
   'invoice'
   , array[
       row(
         'base'
         , $$INSERT INTO invoice VALUES(
             DEFAULT
             , (tf.get( NULL::customer, 'insert' )).customer_id
             , current_date
             , current_date + 30
           ) RETURNING *$$
       )::tf.test_set
   ]
);
SELECT no_plan();
SELECT lives_ok($$SELECT * FROM tf.get( NULL::invoice, 'base' )$$);
not ok 1
# Failed test 1
#         died: 42703: column "c_data_table_name" does not exist


-- Ok, got an error, but no crash. But now...

SELECT results_eq(
   $$SELECT * FROM tf.get( NULL::invoice, 'base' )$$
   , $$VALUES( 1, 1, current_date, current_date + 30 )$$
   , 'invoice factory output'
);
server closed the connection unexpectedly
    This probably means the server terminated abnormally
    before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.

lives_ok() and results_eq() are both pgTap functions. Definitions at [2]
and [3].

I've attached a full server log from running make test, but the most
relevant bit is below:

DEBUG:  AbortSubTransaction
CONTEXT:  PL/pgSQL function results_eq(refcursor,refcursor,text) line 11
during exception cleanup
    PL/pgSQL function results_eq(text,text,text) line 9 at assignment
DEBUG:  name: unnamed; blockState:    INPROGRESS; state: INPROGR,
xid/subid/cid: 1980/1/979, nestlvl: 1, children: 1981 1982 1983 1984
1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999
CONTEXT:  PL/pgSQL function results_eq(refcursor,refcursor,text) line 11
during exception cleanup
    PL/pgSQL function results_eq(text,text,text) line 9 at assignment
DEBUG:  name: pg_psql_temporary_savepoint; blockState:  SUB INPROGRS;
state: INPROGR, xid/subid/cid: 2000/34/979, nestlvl: 2, children:
CONTEXT:  PL/pgSQL function results_eq(refcursor,refcursor,text) line 11
during exception cleanup
    PL/pgSQL function results_eq(text,text,text) line 9 at assignment
DEBUG:  name: unnamed; blockState:  SUB INPROGRS; state: INPROGR,
xid/subid/cid: 2001/35/979, nestlvl: 3, children:
CONTEXT:  PL/pgSQL function results_eq(refcursor,refcursor,text) line 11
during exception cleanup
    PL/pgSQL function results_eq(text,text,text) line 9 at assignment
TRAP: FailedAssertion("!(rebuild ? !((bool)((relation)->rd_refcnt == 0))
: ((bool)((relation)->rd_refcnt == 0)))", File: "relcache.c", Line: 2055)

[1] https://github.com/BlueTreble/test_factory/tree/crash
[2] https://github.com/theory/pgtap/blob/master/sql/pgtap.sql.in#L746
[3] https://github.com/theory/pgtap/blob/master/sql/pgtap.sql.in#L6541
which is being called by
https://github.com/theory/pgtap/blob/master/sql/pgtap.sql.in#L6591
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com

Вложения

Fwd: Core dump with nested CREATE TEMP TABLE

От
Jim Nasby
Дата:
Looks like a 98k file won't get through the list...


-------- Forwarded Message --------
Subject: Core dump with nested CREATE TEMP TABLE
Date: Thu, 27 Aug 2015 19:45:12 -0500
From: Jim Nasby <Jim.Nasby@BlueTreble.com>
To: Pg Hackers <pgsql-hackers@postgresql.org>

I don't have an independent reproduction yet (though make test in [1]
should reproduce this). I haven't actually been able to reproduce by
hand yet, but pgtap has something to do with this. This is affecting at
least 9.4 and a fairly recent HEAD.


-- Bits from top of test/sql/base.sql
\i test/helpers/setup.sql

SET ROLE = DEFAULT;
CREATE ROLE test_role;
GRANT USAGE ON SCHEMA tap TO test_role;
GRANT test_role TO test_factory__owner;

CREATE SCHEMA test AUTHORIZATION test_role;
SET ROLE = test_role;
SET search_path = test, tap;
\i test/helpers/create.sql

SELECT tf.register(   'customer'   , array[     row(       'insert'       , $$INSERT INTO customer VALUES (DEFAULT,
'first','last' )
 
RETURNING *$$     )::tf.test_set     , row(       'function'       , $$SELECT * FROM customer__add( 'func first', 'func
last')$$     )::tf.test_set   ]
 
);
SELECT tf.register(   'invoice'   , array[       row(         'base'         , $$INSERT INTO invoice VALUES(
DEFAULT             , (tf.get( NULL::customer, 'insert' )).customer_id             , current_date             ,
current_date+ 30           ) RETURNING *$$       )::tf.test_set   ]
 
);
SELECT no_plan();
SELECT lives_ok($$SELECT * FROM tf.get( NULL::invoice, 'base' )$$);
not ok 1
# Failed test 1
#         died: 42703: column "c_data_table_name" does not exist


-- Ok, got an error, but no crash. But now...

SELECT results_eq(   $$SELECT * FROM tf.get( NULL::invoice, 'base' )$$   , $$VALUES( 1, 1, current_date, current_date +
30)$$   , 'invoice factory output'
 
);
server closed the connection unexpectedlyThis probably means the server terminated abnormallybefore or while processing
therequest.
 
The connection to the server was lost. Attempting reset: Failed.

lives_ok() and results_eq() are both pgTap functions. Definitions at [2]
and [3].

I've attached a full server log from running make test, but the most
relevant bit is below:

DEBUG:  AbortSubTransaction
CONTEXT:  PL/pgSQL function results_eq(refcursor,refcursor,text) line 11
during exception cleanupPL/pgSQL function results_eq(text,text,text) line 9 at assignment
DEBUG:  name: unnamed; blockState:    INPROGRESS; state: INPROGR,
xid/subid/cid: 1980/1/979, nestlvl: 1, children: 1981 1982 1983 1984
1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999
CONTEXT:  PL/pgSQL function results_eq(refcursor,refcursor,text) line 11
during exception cleanupPL/pgSQL function results_eq(text,text,text) line 9 at assignment
DEBUG:  name: pg_psql_temporary_savepoint; blockState:  SUB INPROGRS;
state: INPROGR, xid/subid/cid: 2000/34/979, nestlvl: 2, children:
CONTEXT:  PL/pgSQL function results_eq(refcursor,refcursor,text) line 11
during exception cleanupPL/pgSQL function results_eq(text,text,text) line 9 at assignment
DEBUG:  name: unnamed; blockState:  SUB INPROGRS; state: INPROGR,
xid/subid/cid: 2001/35/979, nestlvl: 3, children:
CONTEXT:  PL/pgSQL function results_eq(refcursor,refcursor,text) line 11
during exception cleanupPL/pgSQL function results_eq(text,text,text) line 9 at assignment
TRAP: FailedAssertion("!(rebuild ? !((bool)((relation)->rd_refcnt == 0))
: ((bool)((relation)->rd_refcnt == 0)))", File: "relcache.c", Line: 2055)

[1] https://github.com/BlueTreble/test_factory/tree/crash
[2] https://github.com/theory/pgtap/blob/master/sql/pgtap.sql.in#L746
[3] https://github.com/theory/pgtap/blob/master/sql/pgtap.sql.in#L6541
which is being called by
https://github.com/theory/pgtap/blob/master/sql/pgtap.sql.in#L6591
-- 
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com






Re: Fwd: Core dump with nested CREATE TEMP TABLE

От
Michael Paquier
Дата:


On Sat, Aug 29, 2015 at 5:02 AM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:
Looks like a 98k file won't get through the list...

Is it compressed? Note that we have sometimes larger patches than that, but perhaps those had special permissions by the admins of this list.
--
Michael

Re: Fwd: Core dump with nested CREATE TEMP TABLE

От
Tom Lane
Дата:
Michael Paquier <michael.paquier@gmail.com> writes:
> On Sat, Aug 29, 2015 at 5:02 AM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:
>> Looks like a 98k file won't get through the list...

> Is it compressed? Note that we have sometimes larger patches than that, but
> perhaps those had special permissions by the admins of this list.

Messages significantly larger than that go through all the time.  Maybe
you had it marked with some weird MIME type?
        regards, tom lane



Re: Fwd: Core dump with nested CREATE TEMP TABLE

От
Jim Nasby
Дата:
On 8/28/15 8:39 PM, Tom Lane wrote:
> Michael Paquier <michael.paquier@gmail.com> writes:
>> On Sat, Aug 29, 2015 at 5:02 AM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:
>>> Looks like a 98k file won't get through the list...
>
>> Is it compressed? Note that we have sometimes larger patches than that, but
>> perhaps those had special permissions by the admins of this list.
>
> Messages significantly larger than that go through all the time.  Maybe
> you had it marked with some weird MIME type?

Apparently the original email did go through and my MUA search just 
failed to find it. Sorry for the noise.

Original email: 
http://www.postgresql.org/message-id/55DFAF18.4060601@BlueTreble.com
-- 
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com



Re: Fwd: Core dump with nested CREATE TEMP TABLE

От
Michael Paquier
Дата:
<div dir="ltr"><br /><div class="gmail_extra"><br /><div class="gmail_quote">On Sat, Aug 29, 2015 at 11:18 AM, Jim
Nasby<span dir="ltr"><<a href="mailto:Jim.Nasby@bluetreble.com"
target="_blank">Jim.Nasby@bluetreble.com</a>></span>wrote:<br /><blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px#ccc solid;padding-left:1ex"><span class="">On 8/28/15 8:39 PM, Tom Lane wrote:<br /><blockquote
class="gmail_quote"style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> Michael Paquier <<a
href="mailto:michael.paquier@gmail.com"target="_blank">michael.paquier@gmail.com</a>> writes:<br /><blockquote
class="gmail_quote"style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> On Sat, Aug 29, 2015 at 5:02
AM,Jim Nasby <<a href="mailto:Jim.Nasby@bluetreble.com" target="_blank">Jim.Nasby@bluetreble.com</a>> wrote:<br
/><blockquoteclass="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> Looks like a
98kfile won't get through the list...<br /></blockquote></blockquote><br /><blockquote class="gmail_quote"
style="margin:00 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> Is it compressed? Note that we have sometimes
largerpatches than that, but<br /> perhaps those had special permissions by the admins of this list.<br
/></blockquote><br/> Messages significantly larger than that go through all the time.  Maybe<br /> you had it marked
withsome weird MIME type?<br /></blockquote><br /></span> Apparently the original email did go through and my MUA
searchjust failed to find it. Sorry for the noise.<br /><br /> Original email: <a
href="http://www.postgresql.org/message-id/55DFAF18.4060601@BlueTreble.com"rel="noreferrer"
target="_blank">http://www.postgresql.org/message-id/55DFAF18.4060601@BlueTreble.com</a></blockquote></div><br
/></div><divclass="gmail_extra">Ah, OK, you meant this file... Yes I was able to receive it as well in your original
email.I'll try to investigate further later, but Tom may beat me first. He usually does.<br />-- <br /><div
class="gmail_signature">Michael<br/></div></div></div> 

Re: Fwd: Core dump with nested CREATE TEMP TABLE

От
Tom Lane
Дата:
Michael Paquier <michael.paquier@gmail.com> writes:
> Ah, OK, you meant this file... Yes I was able to receive it as well in your
> original email. I'll try to investigate further later, but Tom may beat me
> first. He usually does.

Jim had indicated the bug wasn't reproducible on the strength of that
info, so I was waiting for him to provide a more reproducible case.
        regards, tom lane



Re: Fwd: Core dump with nested CREATE TEMP TABLE

От
Jim Nasby
Дата:
On 8/29/15 8:04 AM, Tom Lane wrote:
> Michael Paquier <michael.paquier@gmail.com> writes:
>> Ah, OK, you meant this file... Yes I was able to receive it as well in your
>> original email. I'll try to investigate further later, but Tom may beat me
>> first. He usually does.
>
> Jim had indicated the bug wasn't reproducible on the strength of that
> info, so I was waiting for him to provide a more reproducible case.

It was reproducible, just not very self contained. [1] is a bit better, 
but it still involves pgTap as well as test_factory.

Steps to reproduce:
Download https://github.com/BlueTreble/test_factory/archive/crash.zip
Unzip, cd into directory
pgxn install pgtap (or just make test)
make install if you didn't do make test
psql -f crash.sql

Stack trace below. Relevant assert:
   1967         * As per notes above, a rel to be rebuilt MUST have refcnt > 
0; while of   1968         * course it would be a bad idea to blow away one with nonzero 
refcnt.   1969         */
-> 1970        Assert(rebuild ?   1971               !RelationHasReferenceCountZero(relation) :   1972
RelationHasReferenceCountZero(relation));

Relevant bits of relation:
> │ ◆─(Relation) relation = 0x00000001165832a8
                            ││                                   │
 
> │ ├─◆─(RelFileNode) rd_node
                            ││                                   │
 
> │ ├─◆─(SMgrRelationData *) rd_smgr = 0x0000000000000000
                            ││                                   │
 
> │ ├─(int) rd_refcnt = 1
                            ││                                   │
 
> │ ├─(BackendId) rd_backend = 2
                            ││                                   │
 
> │ ├─(bool) rd_islocaltemp = '\x01'
                            ││                                   │
 
> │ ├─(bool) rd_isnailed = '\0'
                            ││                                   │
 
> │ ├─(bool) rd_isvalid = '\x01'
                            ││                                   │
 
> │ ├─(char) rd_indexvalid = '\0'
                            ││                                   │
 
> │ ├─(SubTransactionId) rd_createSubid = 16
                            ││                                   │
 
> │ ├─(SubTransactionId) rd_newRelfilenodeSubid = 0
                            ││                                   │
 
> │ ├─◆─(Form_pg_class) rd_rel = 0x00000001165838d8
                            ││                                   │
 
> │ │ ├─◆─(NameData) relname
                            ││                                   │
 
> │ │ │ └─◆─(char [64]) data "invoice_0000000003"

rebuild is 0.

[1]https://github.com/BlueTreble/test_factory/blob/crash/crash.sql

> (lldb) bt
> * thread #1: tid = 0x3b03a0, 0x00007fff92a5a866 libsystem_kernel.dylib`__pthread_kill + 10, queue =
'com.apple.main-thread',stop reason = signal SIGABRT
 
>   * frame #0: 0x00007fff92a5a866 libsystem_kernel.dylib`__pthread_kill + 10
>     frame #1: 0x00007fff9001b35c libsystem_pthread.dylib`pthread_kill + 92
>     frame #2: 0x00007fff8cf89b1a libsystem_c.dylib`abort + 125
>     frame #3: 0x000000010cdb4039 postgres`ExceptionalCondition(conditionName=0x000000010cf59cfd,
errorType=0x000000010cec392d,fileName=0x000000010cf59045, lineNumber=1972) + 137 at assert.c:54
 
>     frame #4: 0x000000010cd9b332 postgres`RelationClearRelation(relation=0x0000000116594cd8, rebuild='\0') + 162 at
relcache.c:1970
>     frame #5: 0x000000010cd9cc0f postgres`AtEOSubXact_cleanup(relation=0x0000000116594cd8, isCommit='\0', mySubid=15,
parentSubid=14)+ 79 at relcache.c:2665
 
>     frame #6: 0x000000010cd9cb92 postgres`AtEOSubXact_RelationCache(isCommit='\0', mySubid=15, parentSubid=14) + 242
atrelcache.c:2633
 
>     frame #7: 0x000000010c9b6e88 postgres`AbortSubTransaction + 440 at xact.c:4373
>     frame #8: 0x000000010c9b8ef2 postgres`RollbackAndReleaseCurrentSubTransaction + 178 at xact.c:3948
>     frame #9: 0x0000000116295c93 plpgsql.so`exec_stmt_block(estate=0x00007fff532bfa50, block=0x00007fa3f56ab600) +
1443at pl_exec.c:1214
 
>     frame #10: 0x0000000116294625 plpgsql.so`plpgsql_exec_function(func=0x00007fa3f5369c10,
fcinfo=0x00007fa3f5634160,simple_eval_estate=0x0000000000000000) + 837 at pl_exec.c:344
 
>     frame #11: 0x000000011628dbb6 plpgsql.so`plpgsql_call_handler(fcinfo=0x00007fa3f5634160) + 470 at
pl_handler.c:243
>     frame #12: 0x000000010caf2471 postgres`ExecMakeFunctionResultNoSets(fcache=0x00007fa3f56340f0,
econtext=<unavailable>,isNull=0x00007fff532bfd5b, isDone=<unavailable>) + 209 at execQual.c:2026
 
>     frame #13: 0x0000000116299772 plpgsql.so`exec_eval_simple_expr(estate=0x00007fff532c0090,
expr=0x00007fa3f53a1e60,result=0x00007fff532bfd10, isNull=0x00007fff532bfd5b, rettype=0x00007fff532bfd5c) + 562 at
pl_exec.c:5203
>     frame #14: 0x000000011629911d plpgsql.so`exec_eval_expr(estate=0x00007fff532c0090, expr=0x00007fa3f53a1e60,
isNull=0x00007fff532bfd5b,rettype=0x00007fff532bfd5c) + 93 at pl_exec.c:4790
 
>     frame #15: 0x000000011629ac70 plpgsql.so`exec_assign_expr(estate=0x00007fff532c0090, target=0x00007fa3f541c100,
expr=0x00007fa3f53a1e60)+ 48 at pl_exec.c:3992
 
>     frame #16: 0x000000011629b300 plpgsql.so`exec_stmt_assign(estate=0x00007fff532c0090, stmt=0x00007fa3f53a1f50) +
128at pl_exec.c:1511
 
>     frame #17: 0x000000011629af6c plpgsql.so`exec_stmt(estate=0x00007fff532c0090, stmt=0x00007fa3f53a1f50) + 252 at
pl_exec.c:1399
>     frame #18: 0x000000011629ad1e plpgsql.so`exec_stmts(estate=0x00007fff532c0090, stmts=0x00007fa3f53a1bd0) + 126 at
pl_exec.c:1362
>     frame #19: 0x0000000116295f1f plpgsql.so`exec_stmt_block(estate=0x00007fff532c0090, block=0x00007fa3f53a21d0) +
2095at pl_exec.c:1300
 
>     frame #20: 0x0000000116294625 plpgsql.so`plpgsql_exec_function(func=0x00007fa3f52b1a30,
fcinfo=0x00007fa3f53f9500,simple_eval_estate=0x0000000000000000) + 837 at pl_exec.c:344
 
>     frame #21: 0x000000011628dbb6 plpgsql.so`plpgsql_call_handler(fcinfo=0x00007fa3f53f9500) + 470 at
pl_handler.c:243
>     frame #22: 0x000000010caf2471 postgres`ExecMakeFunctionResultNoSets(fcache=0x00007fa3f53f9490,
econtext=<unavailable>,isNull=0x00007fa3f53f9f98, isDone=<unavailable>) + 209 at execQual.c:2026
 
>     frame #23: 0x000000010caf1a4d postgres`ExecProject [inlined] ExecTargetList(targetlist=<unavailable>,
econtext=<unavailable>,values=<unavailable>, isnull=<unavailable>, itemIsDone=<unavailable>, isDone=<unavailable>) + 90
atexecQual.c:5307
 
>     frame #24: 0x000000010caf19f3 postgres`ExecProject(projInfo=<unavailable>, isDone=0x00007fff532c033c) + 323 at
execQual.c:5522
>     frame #25: 0x000000010cb07233 postgres`ExecResult(node=0x00007fa3f53f9150) + 179 at nodeResult.c:155
>     frame #26: 0x000000010caeae1c postgres`ExecProcNode(node=0x00007fa3f53f9150) + 92 at execProcnode.c:373
>     frame #27: 0x000000010cae6836 postgres`ExecutePlan(estate=0x00007fa3f53f9038, planstate=0x00007fa3f53f9150,
operation=CMD_SELECT,sendTuples='\x01', numberTuples=0, direction=ForwardScanDirection, dest=0x00007fa3f53cd740) + 118
atexecMain.c:1490 
>     frame #28: 0x000000010cae675b postgres`standard_ExecutorRun(queryDesc=0x00007fa3f54f3638,
direction=ForwardScanDirection,count=0) + 523 at execMain.c:319
 
>     frame #29: 0x000000010cae653d postgres`ExecutorRun(queryDesc=0x00007fa3f54f3638, direction=ForwardScanDirection,
count=0)+ 77 at execMain.c:267
 
>     frame #30: 0x000000010cc2a2b7 postgres`PortalRunSelect(portal=0x00007fa3f3012638, forward=<unavailable>, count=0,
dest=<unavailable>)+ 263 at pquery.c:946
 
>     frame #31: 0x000000010cc29dd5 postgres`PortalRun(portal=0x00007fa3f3012638, count=9223372036854775807,
isTopLevel=<unavailable>,dest=0x00007fa3f53cd740, altdest=<unavailable>, completionTag=0x00007fff532c06d0) + 469 at
pquery.c:790
>     frame #32: 0x000000010cc25b86 postgres`exec_simple_query(query_string=0x00007fa3f300e638) + 1302 at
postgres.c:1072
>     frame #33: 0x000000010cc24def postgres`PostgresMain(argc=1, argv=0x00007fa3f401c5f0, dbname=0x00007fa3f401c450,
username=0x00007fa3f30031f8)+ 2895 at postgres.c:4074
 
>     frame #34: 0x000000010cb9da48 postgres`PostmasterMain [inlined] BackendRun + 8024 at postmaster.c:4155
>     frame #35: 0x000000010cb9da22 postgres`PostmasterMain [inlined] BackendStartup at postmaster.c:3829
>     frame #36: 0x000000010cb9da22 postgres`PostmasterMain [inlined] ServerLoop at postmaster.c:1597
>     frame #37: 0x000000010cb9da22 postgres`PostmasterMain(argc=<unavailable>, argv=<unavailable>) + 7986 at
postmaster.c:1244
>     frame #38: 0x000000010cb218cd postgres`main(argc=<unavailable>, argv=<unavailable>) + 1325 at main.c:228
>     frame #39: 0x00007fff8e9a35fd libdyld.dylib`start + 1
> (lldb)

-- 
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com



Re: Fwd: Core dump with nested CREATE TEMP TABLE

От
Andres Freund
Дата:
Hi,

On 2015-08-29 11:06:05 -0500, Jim Nasby wrote:
> Stack trace below. Relevant assert:

What exact revision is this on? Either there's some aggressive inlining
going on, or these lines don't entirely match up with HEAD.

That's compiled with optimization, isn't it? Could you compile with -O0?

Can you go up to
> >    frame #19: 0x0000000116295f1f plpgsql.so`exec_stmt_block(estate=0x00007fff532c0090, block=0x00007fa3f53a21d0) +
2095at pl_exec.c:1300
 

and do 'disass/m' to see where we actually are?


The area around #19 likely is the PG_CATCH inside exec_stmt_block. So
what's happening is that there's an exception raised, while handling the
previous exception. And apparently we're not doing that entirely
right. Besides the line numbers "during exception cleanup" hints at
that.

Greetings,

Andres Freund



Re: Fwd: Core dump with nested CREATE TEMP TABLE

От
Jim Nasby
Дата:
On 8/29/15 11:31 AM, Andres Freund wrote:
> Hi,
>
> On 2015-08-29 11:06:05 -0500, Jim Nasby wrote:
>> Stack trace below. Relevant assert:
>
> What exact revision is this on? Either there's some aggressive inlining
> going on, or these lines don't entirely match up with HEAD.

Oops, that was 9.4.1. Trace from just pulled HEAD below, compiled with -O0.

> That's compiled with optimization, isn't it? Could you compile with -O0?
>
> Can you go up to
>>>     frame #19: 0x0000000116295f1f plpgsql.so`exec_stmt_block(estate=0x00007fff532c0090, block=0x00007fa3f53a21d0) +
2095at pl_exec.c:1300
 
>
> and do 'disass/m' to see where we actually are?

That's now 20 instead of 19. disass/m gives an error (note my gdb is 
tweaked for apple and everything else I've posted here is from lldb). 
Output from 'disassemble' in frame 20 at bottom.

> The area around #19 likely is the PG_CATCH inside exec_stmt_block. So
> what's happening is that there's an exception raised, while handling the
> previous exception. And apparently we're not doing that entirely
> right. Besides the line numbers "during exception cleanup" hints at
> that.

If the line numbers and the 'gui' mode of lldb are to be believed, frame 
20 is NOT in the try block, but #9 is:

> │ 1242 │                PG_CATCH();
                            ││                                   │
 
> │ 1243 │                {
                            ││                                   │
 
> │ 1244 │                        ErrorData  *edata;
                            ││                                   │
 
> │ 1245 │                        ListCell   *e;
                            ││                                   │
 
> │ 1246 │
                            ││                                   │
 
> │ 1247 │                        estate->err_text = gettext_noop("during exception cleanup");
                            ││                                   │
 
> │ 1248 │
                            ││                                   │
 
> │ 1249 │                        /* Save error info */
                            ││                                   │
 
> │ 1250 │                        MemoryContextSwitchTo(oldcontext);
                            ││                                   │
 
> │ 1251 │                        edata = CopyErrorData();
                            ││                                   │
 
> │ 1252 │                        FlushErrorState();
                            ││                                   │
 
> │ 1253 │
                            ││                                   │
 
> │ 1254 │                        /* Abort the inner transaction */
                            ││                                   │
 
> │ 1255 │◆                       RollbackAndReleaseCurrentSubTransaction();

Looking at frame 10
   432          */   433         estate.err_text = NULL;   434         estate.err_stmt = (PLpgSQL_stmt *)
(func->action);
-> 435         rc = exec_stmt_block(&estate, func->action);   436         if (rc != PLPGSQL_RC_RETURN)   437         {
438             estate.err_stmt = NULL;
 

func->fn_signature is results_eq(refcursor,refcursor,text), which is a 
pgtap function that has an exception handler, and I know the code that 
it's calling also has an exception handler.

> (lldb) bt
> * thread #1: tid = 0x3c8fb4, 0x00007fff92a5a866 libsystem_kernel.dylib`__pthread_kill + 10, queue =
'com.apple.main-thread',stop reason = signal SIGABRT
 
>   * frame #0: 0x00007fff92a5a866 libsystem_kernel.dylib`__pthread_kill + 10
>     frame #1: 0x00007fff9001b35c libsystem_pthread.dylib`pthread_kill + 92
>     frame #2: 0x00007fff8cf89b1a libsystem_c.dylib`abort + 125
>     frame #3: 0x000000010fcf3b99 postgres`ExceptionalCondition(conditionName=0x000000010fe157e9,
errorType=0x000000010fd60053,fileName=0x000000010fe14a7f, lineNumber=2055) + 137 at assert.c:54
 
>     frame #4: 0x000000010fcda0ac postgres`RelationClearRelation(relation=0x000000011954f388, rebuild='\0') + 140 at
relcache.c:2053
>     frame #5: 0x000000010fcdb8cf postgres`AtEOSubXact_cleanup(relation=0x000000011954f388, isCommit='\0', mySubid=16,
parentSubid=15)+ 79 at relcache.c:2752
 
>     frame #6: 0x000000010fcdb852 postgres`AtEOSubXact_RelationCache(isCommit='\0', mySubid=16, parentSubid=15) + 242
atrelcache.c:2720
 
>     frame #7: 0x000000010f7dd096 postgres`AbortSubTransaction + 486 at xact.c:4606
>     frame #8: 0x000000010f7df322 postgres`RollbackAndReleaseCurrentSubTransaction + 178 at xact.c:4163
>     frame #9: 0x0000000119245a29 plpgsql.so`exec_stmt_block(estate=0x00007fff50509138, block=0x00007fcc4d0367b0) +
1449at pl_exec.c:1255
 
>     frame #10: 0x000000011924403b plpgsql.so`plpgsql_exec_function(func=0x00007fcc4d145b18,
fcinfo=0x00007fcc4d025ef8,simple_eval_estate=0x0000000000000000) + 1163 at pl_exec.c:435
 
>     frame #11: 0x000000011923d5c6 plpgsql.so`plpgsql_call_handler(fcinfo=0x00007fcc4d025ef8) + 470 at
pl_handler.c:253
>     frame #12: 0x000000010f98cbc9 postgres`ExecMakeFunctionResultNoSets(fcache=0x00007fcc4d025e88,
econtext=0x00007fcc4d124350,isNull=0x00007fff505094ff, isDone=0x0000000000000000) + 377 at execQual.c:2019
 
>     frame #13: 0x000000010f98686d postgres`ExecEvalFunc(fcache=0x00007fcc4d025e88, econtext=0x00007fcc4d124350,
isNull=0x00007fff505094ff,isDone=0x0000000000000000) + 205 at execQual.c:2410
 
>     frame #14: 0x00000001192498c9 plpgsql.so`exec_eval_simple_expr(estate=0x00007fff50509818,
expr=0x00007fcc4d14a0d8,result=0x00007fff505094a8, isNull=0x00007fff505094ff, rettype=0x00007fff505094f8,
rettypmod=0x00007fff505094f4)+ 633 at pl_exec.c:5449
 
>     frame #15: 0x00000001192491e5 plpgsql.so`exec_eval_expr(estate=0x00007fff50509818, expr=0x00007fcc4d14a0d8,
isNull=0x00007fff505094ff,rettype=0x00007fff505094f8, rettypmod=0x00007fff505094f4) + 101 at pl_exec.c:5025
 
>     frame #16: 0x000000011924b226 plpgsql.so`exec_assign_expr(estate=0x00007fff50509818, target=0x00007fcc4d0f7300,
expr=0x00007fcc4d14a0d8)+ 118 at pl_exec.c:4201
 
>     frame #17: 0x000000011924b88a plpgsql.so`exec_stmt_assign(estate=0x00007fff50509818, stmt=0x00007fcc4d14a1c8) +
106at pl_exec.c:1556
 
>     frame #18: 0x000000011924b4fc plpgsql.so`exec_stmt(estate=0x00007fff50509818, stmt=0x00007fcc4d14a1c8) + 252 at
pl_exec.c:1440
>     frame #19: 0x000000011924b2de plpgsql.so`exec_stmts(estate=0x00007fff50509818, stmts=0x00007fcc4d149e28) + 126 at
pl_exec.c:1403
>     frame #20: 0x0000000119245cbe plpgsql.so`exec_stmt_block(estate=0x00007fff50509818, block=0x00007fcc4d14a3e8) +
2110at pl_exec.c:1341
 
>     frame #21: 0x000000011924403b plpgsql.so`plpgsql_exec_function(func=0x00007fcc4d144d28,
fcinfo=0x00007fcc4d0e6700,simple_eval_estate=0x0000000000000000) + 1163 at pl_exec.c:435
 
>     frame #22: 0x000000011923d5c6 plpgsql.so`plpgsql_call_handler(fcinfo=0x00007fcc4d0e6700) + 470 at
pl_handler.c:253
>     frame #23: 0x000000010f98cbc9 postgres`ExecMakeFunctionResultNoSets(fcache=0x00007fcc4d0e6690,
econtext=0x00007fcc4d0e6468,isNull=0x00007fcc4d0e7198, isDone=0x00007fcc4d0e72d8) + 377 at execQual.c:2019 
>     frame #24: 0x000000010f98686d postgres`ExecEvalFunc(fcache=0x00007fcc4d0e6690, econtext=0x00007fcc4d0e6468,
isNull=0x00007fcc4d0e7198,isDone=0x00007fcc4d0e72d8) + 205 at execQual.c:2410
 
>     frame #25: 0x000000010f98bd9b postgres`ExecTargetList(targetlist=0x00007fcc4d0e72a0, econtext=0x00007fcc4d0e6468,
values=0x00007fcc4d0e7178,isnull=0x00007fcc4d0e7198, itemIsDone=0x00007fcc4d0e72d8, isDone=0x00007fff50509c5c) + 187 at
execQual.c:5345
>     frame #26: 0x000000010f98bca4 postgres`ExecProject(projInfo=0x00007fcc4d0e71b8, isDone=0x00007fff50509c5c) + 756
atexecQual.c:5560
 
>     frame #27: 0x000000010f9af2eb postgres`ExecResult(node=0x00007fcc4d0e6350) + 363 at nodeResult.c:155
>     frame #28: 0x000000010f9800a0 postgres`ExecProcNode(node=0x00007fcc4d0e6350) + 160 at execProcnode.c:385
>     frame #29: 0x000000010f97b4f6 postgres`ExecutePlan(estate=0x00007fcc4d0e6238, planstate=0x00007fcc4d0e6350,
operation=CMD_SELECT,sendTuples='\x01', numberTuples=0, direction=ForwardScanDirection, dest=0x00007fcc4d01cbb8) + 118
atexecMain.c:1549
 
>     frame #30: 0x000000010f97b419 postgres`standard_ExecutorRun(queryDesc=0x00007fcc4d1b4638,
direction=ForwardScanDirection,count=0) + 457 at execMain.c:337
 
>     frame #31: 0x000000010f97b23d postgres`ExecutorRun(queryDesc=0x00007fcc4d1b4638, direction=ForwardScanDirection,
count=0)+ 77 at execMain.c:285
 
>     frame #32: 0x000000010fb4c143 postgres`PortalRunSelect(portal=0x00007fcc4c020c38, forward='\x01', count=0,
dest=0x00007fcc4d01cbb8)+ 307 at pquery.c:946
 
>     frame #33: 0x000000010fb4bb04 postgres`PortalRun(portal=0x00007fcc4c020c38, count=9223372036854775807,
isTopLevel='\x01',dest=0x00007fcc4d01cbb8, altdest=0x00007fcc4d01cbb8, completionTag=0x00007fff5050a090) + 820 at
pquery.c:790
>     frame #34: 0x000000010fb470c6 postgres`exec_simple_query(query_string=0x00007fcc4d863c38) + 1302 at
postgres.c:1104
>     frame #35: 0x000000010fb46352 postgres`PostgresMain(argc=1, argv=0x00007fcc4c0028b0, dbname=0x00007fcc4c002710,
username=0x00007fcc4c0026f0)+ 2546 at postgres.c:4025
 
>     frame #36: 0x000000010faaa41e postgres`BackendRun(port=0x00007fcc4be000f0) + 686 at postmaster.c:4230
>     frame #37: 0x000000010faa99b0 postgres`BackendStartup(port=0x00007fcc4be000f0) + 384 at postmaster.c:3906
>     frame #38: 0x000000010faa60c7 postgres`ServerLoop + 663 at postmaster.c:1665
>     frame #39: 0x000000010faa3704 postgres`PostmasterMain(argc=3, argv=0x00007fcc4bc042a0) + 5652 at
postmaster.c:1274
>     frame #40: 0x000000010f9de85c postgres`main(argc=3, argv=0x00007fcc4bc042a0) + 796 at main.c:223
>     frame #41: 0x00007fff8e9a35fd libdyld.dylib`start + 1

> Dump of assembler code for function exec_stmt_block:
> 0x0000000119245480 <exec_stmt_block+0>:    push   %rbp
> 0x0000000119245481 <exec_stmt_block+1>:    mov    %rsp,%rbp
> 0x0000000119245484 <exec_stmt_block+4>:    sub    $0x190,%rsp
> 0x000000011924548b <exec_stmt_block+11>:    lea    0x173a7(%rip),%rax        # 0x11925c839
> 0x0000000119245492 <exec_stmt_block+18>:    mov    0x18bc7(%rip),%rcx        # 0x11925e060
> 0x0000000119245499 <exec_stmt_block+25>:    mov    (%rcx),%rcx
> 0x000000011924549c <exec_stmt_block+28>:    mov    %rcx,-0x8(%rbp)
> 0x00000001192454a0 <exec_stmt_block+32>:    mov    %rdi,-0xb0(%rbp)
> 0x00000001192454a7 <exec_stmt_block+39>:    mov    %rsi,-0xb8(%rbp)
> 0x00000001192454ae <exec_stmt_block+46>:    movl   $0xffffffff,-0xbc(%rbp)
> 0x00000001192454b8 <exec_stmt_block+56>:    mov    -0xb0(%rbp),%rcx
> 0x00000001192454bf <exec_stmt_block+63>:    mov    %rax,0xb0(%rcx)
> 0x00000001192454c6 <exec_stmt_block+70>:    movl   $0x0,-0xc0(%rbp)
> 0x00000001192454d0 <exec_stmt_block+80>:    mov    -0xc0(%rbp),%eax
> 0x00000001192454d6 <exec_stmt_block+86>:    mov    -0xb8(%rbp),%rcx
> 0x00000001192454dd <exec_stmt_block+93>:    cmp    0x18(%rcx),%eax
> 0x00000001192454e0 <exec_stmt_block+96>:    jge    0x1192457bd <exec_stmt_block+829>
> 0x00000001192454e6 <exec_stmt_block+102>:    movslq -0xc0(%rbp),%rax
> 0x00000001192454ed <exec_stmt_block+109>:    mov    -0xb8(%rbp),%rcx
> 0x00000001192454f4 <exec_stmt_block+116>:    mov    0x20(%rcx),%rcx
> 0x00000001192454f8 <exec_stmt_block+120>:    mov    (%rcx,%rax,4),%edx
> 0x00000001192454fb <exec_stmt_block+123>:    mov    %edx,-0xc4(%rbp)
> 0x0000000119245501 <exec_stmt_block+129>:    movslq -0xc4(%rbp),%rax
> 0x0000000119245508 <exec_stmt_block+136>:    mov    -0xb0(%rbp),%rcx
> 0x000000011924550f <exec_stmt_block+143>:    mov    0x60(%rcx),%rcx
> 0x0000000119245513 <exec_stmt_block+147>:    mov    (%rcx,%rax,8),%rax
> 0x0000000119245517 <exec_stmt_block+151>:    mov    (%rax),%edx
> 0x0000000119245519 <exec_stmt_block+153>:    mov    %edx,%esi
> 0x000000011924551b <exec_stmt_block+155>:    add    $0xfffffffffffffffd,%esi
> 0x000000011924551e <exec_stmt_block+158>:    sub    $0x2,%esi
> 0x0000000119245521 <exec_stmt_block+161>:    mov    %edx,-0x13c(%rbp)
> 0x0000000119245527 <exec_stmt_block+167>:    mov    %esi,-0x140(%rbp)
> 0x000000011924552d <exec_stmt_block+173>:    jb     0x119245750 <exec_stmt_block+720>
> 0x0000000119245533 <exec_stmt_block+179>:    jmpq   0x119245538 <exec_stmt_block+184>
> 0x0000000119245538 <exec_stmt_block+184>:    mov    -0x13c(%rbp),%eax
> 0x000000011924553e <exec_stmt_block+190>:    sub    $0x2,%eax
> 0x0000000119245541 <exec_stmt_block+193>:    mov    %eax,-0x144(%rbp)
> 0x0000000119245547 <exec_stmt_block+199>:    je     0x1192456b8 <exec_stmt_block+568>
> 0x000000011924554d <exec_stmt_block+205>:    jmpq   0x119245552 <exec_stmt_block+210>
> 0x0000000119245552 <exec_stmt_block+210>:    mov    -0x13c(%rbp),%eax
> 0x0000000119245558 <exec_stmt_block+216>:    test   %eax,%eax
> 0x000000011924555a <exec_stmt_block+218>:    jne    0x119245755 <exec_stmt_block+725>
> 0x0000000119245560 <exec_stmt_block+224>:    jmpq   0x119245565 <exec_stmt_block+229>
> 0x0000000119245565 <exec_stmt_block+229>:    mov    $0x0,%rdx
> 0x000000011924556f <exec_stmt_block+239>:    mov    $0x1,%ecx
> 0x0000000119245574 <exec_stmt_block+244>:    mov    $0x0,%r8d
> 0x000000011924557a <exec_stmt_block+250>:    movslq -0xc4(%rbp),%rax
> 0x0000000119245581 <exec_stmt_block+257>:    mov    -0xb0(%rbp),%rsi
> 0x0000000119245588 <exec_stmt_block+264>:    mov    0x60(%rsi),%rsi
> 0x000000011924558c <exec_stmt_block+268>:    mov    (%rsi,%rax,8),%rax
> 0x0000000119245590 <exec_stmt_block+272>:    mov    %rax,-0xd0(%rbp)
> 0x0000000119245597 <exec_stmt_block+279>:    mov    -0xb0(%rbp),%rdi
> 0x000000011924559e <exec_stmt_block+286>:    mov    -0xd0(%rbp),%rsi
> 0x00000001192455a5 <exec_stmt_block+293>:    callq  0x119244dc0 <assign_simple_var>
> 0x00000001192455aa <exec_stmt_block+298>:    mov    -0xd0(%rbp),%rax
> 0x00000001192455b1 <exec_stmt_block+305>:    cmpq   $0x0,0x28(%rax)
> 0x00000001192455b9 <exec_stmt_block+313>:    jne    0x119245692 <exec_stmt_block+530>
> 0x00000001192455bf <exec_stmt_block+319>:    mov    -0xd0(%rbp),%rax
> 0x00000001192455c6 <exec_stmt_block+326>:    mov    0x18(%rax),%rax
> 0x00000001192455ca <exec_stmt_block+330>:    movsbl 0x13(%rax),%ecx
> 0x00000001192455ce <exec_stmt_block+334>:    cmp    $0x64,%ecx
> 0x00000001192455d4 <exec_stmt_block+340>:    jne    0x11924560b <exec_stmt_block+395>
> 0x00000001192455da <exec_stmt_block+346>:    mov    $0x0,%rdx
> 0x00000001192455e4 <exec_stmt_block+356>:    mov    $0x1,%ecx
> 0x00000001192455e9 <exec_stmt_block+361>:    mov    $0x2c1,%r8d
> 0x00000001192455ef <exec_stmt_block+367>:    mov    $0xffffffff,%r9d
> 0x00000001192455f5 <exec_stmt_block+373>:    mov    -0xb0(%rbp),%rdi
> 0x00000001192455fc <exec_stmt_block+380>:    mov    -0xd0(%rbp),%rax
> 0x0000000119245603 <exec_stmt_block+387>:    mov    %rax,%rsi
> 0x0000000119245606 <exec_stmt_block+390>:    callq  0x119247d40 <exec_assign_value>
> 0x000000011924560b <exec_stmt_block+395>:    mov    -0xd0(%rbp),%rax
> 0x0000000119245612 <exec_stmt_block+402>:    cmpl   $0x0,0x24(%rax)
> 0x0000000119245619 <exec_stmt_block+409>:    je     0x11924568d <exec_stmt_block+525>
> 0x000000011924561f <exec_stmt_block+415>:    jmpq   0x119245624 <exec_stmt_block+420>
> 0x0000000119245624 <exec_stmt_block+420>:    mov    $0x14,%edi
> 0x0000000119245629 <exec_stmt_block+425>:    lea    0x16860(%rip),%rsi        # 0x11925be90
> 0x0000000119245630 <exec_stmt_block+432>:    mov    $0x46e,%edx
> 0x0000000119245635 <exec_stmt_block+437>:    lea    0x17232(%rip),%rcx        # 0x11925c86e
> 0x000000011924563c <exec_stmt_block+444>:    lea    0x136fd(%rip),%r8        # 0x119258d40
> 0x0000000119245643 <exec_stmt_block+451>:    callq  0x119256590 <dyld_stub_errstart>
> 0x0000000119245648 <exec_stmt_block+456>:    cmp    $0x0,%al
> 0x000000011924564a <exec_stmt_block+458>:    je     0x119245688 <exec_stmt_block+520>
> 0x0000000119245650 <exec_stmt_block+464>:    mov    $0x4000082,%edi
> 0x0000000119245655 <exec_stmt_block+469>:    callq  0x119256554 <dyld_stub_errcode>
> 0x000000011924565a <exec_stmt_block+474>:    lea    0x1721d(%rip),%rdi        # 0x11925c87e
> 0x0000000119245661 <exec_stmt_block+481>:    mov    -0xd0(%rbp),%rcx
> 0x0000000119245668 <exec_stmt_block+488>:    mov    0x8(%rcx),%rsi
> 0x000000011924566c <exec_stmt_block+492>:    mov    %eax,-0x148(%rbp)
> 0x0000000119245672 <exec_stmt_block+498>:    mov    $0x0,%al
> 0x0000000119245674 <exec_stmt_block+500>:    callq  0x119256578 <dyld_stub_errmsg>
> 0x0000000119245679 <exec_stmt_block+505>:    mov    -0x148(%rbp),%edi
> 0x000000011924567f <exec_stmt_block+511>:    mov    %eax,%esi
> 0x0000000119245681 <exec_stmt_block+513>:    mov    $0x0,%al
> 0x0000000119245683 <exec_stmt_block+515>:    callq  0x11925656c <dyld_stub_errfinish>
> 0x0000000119245688 <exec_stmt_block+520>:    callq  0x119256776 <dyld_stub_abort>
> 0x000000011924568d <exec_stmt_block+525>:    jmpq   0x1192456b3 <exec_stmt_block+563>
> 0x0000000119245692 <exec_stmt_block+530>:    mov    -0xb0(%rbp),%rdi
> 0x0000000119245699 <exec_stmt_block+537>:    mov    -0xd0(%rbp),%rax
> 0x00000001192456a0 <exec_stmt_block+544>:    mov    -0xd0(%rbp),%rcx
> 0x00000001192456a7 <exec_stmt_block+551>:    mov    0x28(%rcx),%rdx
> 0x00000001192456ab <exec_stmt_block+555>:    mov    %rax,%rsi
> 0x00000001192456ae <exec_stmt_block+558>:    callq  0x11924b1b0 <exec_assign_expr>
> 0x00000001192456b3 <exec_stmt_block+563>:    jmpq   0x1192457a2 <exec_stmt_block+802>
> 0x00000001192456b8 <exec_stmt_block+568>:    movslq -0xc4(%rbp),%rax
> 0x00000001192456bf <exec_stmt_block+575>:    mov    -0xb0(%rbp),%rcx
> 0x00000001192456c6 <exec_stmt_block+582>:    mov    0x60(%rcx),%rcx
> 0x00000001192456ca <exec_stmt_block+586>:    mov    (%rcx,%rax,8),%rax
> 0x00000001192456ce <exec_stmt_block+590>:    mov    %rax,-0xd8(%rbp)
> 0x00000001192456d5 <exec_stmt_block+597>:    mov    -0xd8(%rbp),%rax
> 0x00000001192456dc <exec_stmt_block+604>:    cmpb   $0x0,0x28(%rax)
> 0x00000001192456e0 <exec_stmt_block+608>:    je     0x119245701 <exec_stmt_block+641>
> 0x00000001192456e6 <exec_stmt_block+614>:    mov    -0xd8(%rbp),%rax
> 0x00000001192456ed <exec_stmt_block+621>:    mov    0x18(%rax),%rdi
> 0x00000001192456f1 <exec_stmt_block+625>:    callq  0x11925665c <dyld_stub_heap_freetuple>
> 0x00000001192456f6 <exec_stmt_block+630>:    mov    -0xd8(%rbp),%rax
> 0x00000001192456fd <exec_stmt_block+637>:    movb   $0x0,0x28(%rax)
> 0x0000000119245701 <exec_stmt_block+641>:    mov    -0xd8(%rbp),%rax
> 0x0000000119245708 <exec_stmt_block+648>:    cmpb   $0x0,0x29(%rax)
> 0x000000011924570c <exec_stmt_block+652>:    je     0x11924572d <exec_stmt_block+685>
> 0x0000000119245712 <exec_stmt_block+658>:    mov    -0xd8(%rbp),%rax
> 0x0000000119245719 <exec_stmt_block+665>:    mov    0x20(%rax),%rdi
> 0x000000011924571d <exec_stmt_block+669>:    callq  0x119256338 <dyld_stub_FreeTupleDesc>
> 0x0000000119245722 <exec_stmt_block+674>:    mov    -0xd8(%rbp),%rax
> 0x0000000119245729 <exec_stmt_block+681>:    movb   $0x0,0x29(%rax)
> 0x000000011924572d <exec_stmt_block+685>:    mov    -0xd8(%rbp),%rax
> 0x0000000119245734 <exec_stmt_block+692>:    movq   $0x0,0x18(%rax)
> 0x000000011924573c <exec_stmt_block+700>:    mov    -0xd8(%rbp),%rax
> 0x0000000119245743 <exec_stmt_block+707>:    movq   $0x0,0x20(%rax)
> 0x000000011924574b <exec_stmt_block+715>:    jmpq   0x1192457a2 <exec_stmt_block+802>
> 0x0000000119245750 <exec_stmt_block+720>:    jmpq   0x1192457a2 <exec_stmt_block+802>
> 0x0000000119245755 <exec_stmt_block+725>:    jmpq   0x11924575a <exec_stmt_block+730>
> 0x000000011924575a <exec_stmt_block+730>:    lea    0x1672f(%rip),%rdi        # 0x11925be90
> 0x0000000119245761 <exec_stmt_block+737>:    mov    $0x491,%esi
> 0x0000000119245766 <exec_stmt_block+742>:    lea    0x17101(%rip),%rdx        # 0x11925c86e
> 0x000000011924576d <exec_stmt_block+749>:    callq  0x119256548 <dyld_stub_elog_start>
> 0x0000000119245772 <exec_stmt_block+754>:    mov    $0x14,%edi
> 0x0000000119245777 <exec_stmt_block+759>:    lea    0x141c1(%rip),%rsi        # 0x11925993f
> 0x000000011924577e <exec_stmt_block+766>:    movslq -0xc4(%rbp),%rdx
> 0x0000000119245785 <exec_stmt_block+773>:    mov    -0xb0(%rbp),%rax
> 0x000000011924578c <exec_stmt_block+780>:    mov    0x60(%rax),%rax
> 0x0000000119245790 <exec_stmt_block+784>:    mov    (%rax,%rdx,8),%rax
> 0x0000000119245794 <exec_stmt_block+788>:    mov    (%rax),%edx
> 0x0000000119245796 <exec_stmt_block+790>:    mov    $0x0,%al
> 0x0000000119245798 <exec_stmt_block+792>:    callq  0x119256542 <dyld_stub_elog_finish>
> 0x000000011924579d <exec_stmt_block+797>:    callq  0x119256776 <dyld_stub_abort>
> 0x00000001192457a2 <exec_stmt_block+802>:    jmpq   0x1192457a7 <exec_stmt_block+807>
> 0x00000001192457a7 <exec_stmt_block+807>:    mov    -0xc0(%rbp),%eax
> 0x00000001192457ad <exec_stmt_block+813>:    add    $0x1,%eax
> 0x00000001192457b2 <exec_stmt_block+818>:    mov    %eax,-0xc0(%rbp)
> 0x00000001192457b8 <exec_stmt_block+824>:    jmpq   0x1192454d0 <exec_stmt_block+80>
> 0x00000001192457bd <exec_stmt_block+829>:    mov    -0xb8(%rbp),%rax
> 0x00000001192457c4 <exec_stmt_block+836>:    cmpq   $0x0,0x28(%rax)
> 0x00000001192457cc <exec_stmt_block+844>:    je     0x119245c95 <exec_stmt_block+2069>
> 0x00000001192457d2 <exec_stmt_block+850>:    mov    $0x0,%rdi
> 0x00000001192457dc <exec_stmt_block+860>:    lea    0x170d2(%rip),%rax        # 0x11925c8b5
> 0x00000001192457e3 <exec_stmt_block+867>:    mov    0x1881e(%rip),%rcx        # 0x11925e008
> 0x00000001192457ea <exec_stmt_block+874>:    mov    0x1880f(%rip),%rdx        # 0x11925e000
> 0x00000001192457f1 <exec_stmt_block+881>:    mov    (%rdx),%rdx
> 0x00000001192457f4 <exec_stmt_block+884>:    mov    %rdx,-0xe0(%rbp)
> 0x00000001192457fb <exec_stmt_block+891>:    mov    (%rcx),%rcx
> 0x00000001192457fe <exec_stmt_block+894>:    mov    %rcx,-0xe8(%rbp)
> 0x0000000119245805 <exec_stmt_block+901>:    mov    -0xb0(%rbp),%rcx
> 0x000000011924580c <exec_stmt_block+908>:    mov    0xa0(%rcx),%rcx
> 0x0000000119245813 <exec_stmt_block+915>:    mov    %rcx,-0xf0(%rbp)
> 0x000000011924581a <exec_stmt_block+922>:    mov    -0xb0(%rbp),%rcx
> 0x0000000119245821 <exec_stmt_block+929>:    mov    0x30(%rcx),%rcx
> 0x0000000119245825 <exec_stmt_block+933>:    mov    %rcx,-0xf8(%rbp)
> 0x000000011924582c <exec_stmt_block+940>:    mov    -0xb0(%rbp),%rcx
> 0x0000000119245833 <exec_stmt_block+947>:    mov    %rax,0xb0(%rcx)
> 0x000000011924583a <exec_stmt_block+954>:    callq  0x1192562ae <dyld_stub_BeginInternalSubTransaction>
> 0x000000011924583f <exec_stmt_block+959>:    mov    -0xe0(%rbp),%rdi
> 0x0000000119245846 <exec_stmt_block+966>:    callq  0x119245e50 <MemoryContextSwitchTo>
> 0x000000011924584b <exec_stmt_block+971>:    mov    %rax,-0x150(%rbp)
> 0x0000000119245852 <exec_stmt_block+978>:    mov    $0x0,%esi
> 0x0000000119245857 <exec_stmt_block+983>:    lea    -0xa0(%rbp),%rdi
> 0x000000011924585e <exec_stmt_block+990>:    mov    0x18813(%rip),%rax        # 0x11925e078
> 0x0000000119245865 <exec_stmt_block+997>:    mov    0x187bc(%rip),%rcx        # 0x11925e028
> 0x000000011924586c <exec_stmt_block+1004>:    mov    (%rcx),%rcx
> 0x000000011924586f <exec_stmt_block+1007>:    mov    %rcx,-0x100(%rbp)
> 0x0000000119245876 <exec_stmt_block+1014>:    mov    (%rax),%rax
> 0x0000000119245879 <exec_stmt_block+1017>:    mov    %rax,-0x108(%rbp)
> 0x0000000119245880 <exec_stmt_block+1024>:    callq  0x11925679a <dyld_stub_sigsetjmp>
> 0x0000000119245885 <exec_stmt_block+1029>:    cmp    $0x0,%eax
> 0x000000011924588a <exec_stmt_block+1034>:    jne    0x1192459c9 <exec_stmt_block+1353>
> 0x0000000119245890 <exec_stmt_block+1040>:    mov    0x18791(%rip),%rax        # 0x11925e028
> 0x0000000119245897 <exec_stmt_block+1047>:    lea    -0xa0(%rbp),%rcx
> 0x000000011924589e <exec_stmt_block+1054>:    mov    %rcx,(%rax)
> 0x00000001192458a1 <exec_stmt_block+1057>:    mov    -0xb0(%rbp),%rdi
> 0x00000001192458a8 <exec_stmt_block+1064>:    callq  0x11924b0a0 <plpgsql_create_econtext>
> 0x00000001192458ad <exec_stmt_block+1069>:    mov    -0xb0(%rbp),%rax
> 0x00000001192458b4 <exec_stmt_block+1076>:    movq   $0x0,0xb0(%rax)
> 0x00000001192458bf <exec_stmt_block+1087>:    mov    -0xb0(%rbp),%rdi
> 0x00000001192458c6 <exec_stmt_block+1094>:    mov    -0xb8(%rbp),%rax
> 0x00000001192458cd <exec_stmt_block+1101>:    mov    0x10(%rax),%rsi
> 0x00000001192458d1 <exec_stmt_block+1105>:    callq  0x11924b260 <exec_stmts>
> 0x00000001192458d6 <exec_stmt_block+1110>:    lea    0x16ff5(%rip),%rcx        # 0x11925c8d2
> 0x00000001192458dd <exec_stmt_block+1117>:    mov    %eax,-0xbc(%rbp)
> 0x00000001192458e3 <exec_stmt_block+1123>:    mov    -0xb0(%rbp),%rsi
> 0x00000001192458ea <exec_stmt_block+1130>:    mov    %rcx,0xb0(%rsi)
> 0x00000001192458f1 <exec_stmt_block+1137>:    mov    -0xbc(%rbp),%eax
> 0x00000001192458f7 <exec_stmt_block+1143>:    cmp    $0x2,%eax
> 0x00000001192458fc <exec_stmt_block+1148>:    jne    0x119245981 <exec_stmt_block+1281>
> 0x0000000119245902 <exec_stmt_block+1154>:    mov    -0xb0(%rbp),%rax
> 0x0000000119245909 <exec_stmt_block+1161>:    cmpb   $0x0,0x1d(%rax)
> 0x000000011924590d <exec_stmt_block+1165>:    jne    0x119245981 <exec_stmt_block+1281>
> 0x0000000119245913 <exec_stmt_block+1171>:    mov    -0xb0(%rbp),%rax
> 0x000000011924591a <exec_stmt_block+1178>:    cmpb   $0x0,0x10(%rax)
> 0x000000011924591e <exec_stmt_block+1182>:    jne    0x119245981 <exec_stmt_block+1281>
> 0x0000000119245924 <exec_stmt_block+1188>:    mov    -0xb0(%rbp),%rax
> 0x000000011924592b <exec_stmt_block+1195>:    cmpq   $0x0,0x20(%rax)
> 0x0000000119245933 <exec_stmt_block+1203>:    jne    0x119245981 <exec_stmt_block+1281>
> 0x0000000119245939 <exec_stmt_block+1209>:    lea    -0x10a(%rbp),%rsi
> 0x0000000119245940 <exec_stmt_block+1216>:    lea    -0x10b(%rbp),%rdx
> 0x0000000119245947 <exec_stmt_block+1223>:    mov    -0xb0(%rbp),%rax
> 0x000000011924594e <exec_stmt_block+1230>:    mov    0x14(%rax),%edi
> 0x0000000119245951 <exec_stmt_block+1233>:    callq  0x119256620 <dyld_stub_get_typlenbyval>
> 0x0000000119245956 <exec_stmt_block+1238>:    mov    -0xb0(%rbp),%rax
> 0x000000011924595d <exec_stmt_block+1245>:    mov    0x8(%rax),%rdi
> 0x0000000119245961 <exec_stmt_block+1249>:    mov    -0x10b(%rbp),%cl
> 0x0000000119245967 <exec_stmt_block+1255>:    movswl -0x10a(%rbp),%edx
> 0x000000011924596e <exec_stmt_block+1262>:    movsbl %cl,%esi
> 0x0000000119245971 <exec_stmt_block+1265>:    callq  0x119256530 <dyld_stub_datumCopy>
> 0x0000000119245976 <exec_stmt_block+1270>:    mov    -0xb0(%rbp),%rdi
> 0x000000011924597d <exec_stmt_block+1277>:    mov    %rax,0x8(%rdi)
> 0x0000000119245981 <exec_stmt_block+1281>:    callq  0x1192563c8 <dyld_stub_ReleaseCurrentSubTransaction>
> 0x0000000119245986 <exec_stmt_block+1286>:    mov    -0xe0(%rbp),%rdi
> 0x000000011924598d <exec_stmt_block+1293>:    callq  0x119245e50 <MemoryContextSwitchTo>
> 0x0000000119245992 <exec_stmt_block+1298>:    mov    0x1866f(%rip),%rdi        # 0x11925e008
> 0x0000000119245999 <exec_stmt_block+1305>:    mov    -0xe8(%rbp),%rcx
> 0x00000001192459a0 <exec_stmt_block+1312>:    mov    %rcx,(%rdi)
> 0x00000001192459a3 <exec_stmt_block+1315>:    mov    -0xf0(%rbp),%rcx
> 0x00000001192459aa <exec_stmt_block+1322>:    mov    -0xb0(%rbp),%rdi
> 0x00000001192459b1 <exec_stmt_block+1329>:    mov    %rcx,0xa0(%rdi)
> 0x00000001192459b8 <exec_stmt_block+1336>:    mov    %rax,-0x158(%rbp)
> 0x00000001192459bf <exec_stmt_block+1343>:    callq  0x11925646a <dyld_stub_SPI_restore_connection>
> 0x00000001192459c4 <exec_stmt_block+1348>:    jmpq   0x119245c2d <exec_stmt_block+1965>
> 0x00000001192459c9 <exec_stmt_block+1353>:    lea    0x16f1e(%rip),%rax        # 0x11925c8ee
> 0x00000001192459d0 <exec_stmt_block+1360>:    mov    0x186a1(%rip),%rcx        # 0x11925e078
> 0x00000001192459d7 <exec_stmt_block+1367>:    mov    0x1864a(%rip),%rdx        # 0x11925e028
> 0x00000001192459de <exec_stmt_block+1374>:    mov    -0x100(%rbp),%rsi
> 0x00000001192459e5 <exec_stmt_block+1381>:    mov    %rsi,(%rdx)
> 0x00000001192459e8 <exec_stmt_block+1384>:    mov    -0x108(%rbp),%rdx
> 0x00000001192459ef <exec_stmt_block+1391>:    mov    %rdx,(%rcx)
> 0x00000001192459f2 <exec_stmt_block+1394>:    mov    -0xb0(%rbp),%rcx
> 0x00000001192459f9 <exec_stmt_block+1401>:    mov    %rax,0xb0(%rcx)
> 0x0000000119245a00 <exec_stmt_block+1408>:    mov    -0xe0(%rbp),%rdi
> 0x0000000119245a07 <exec_stmt_block+1415>:    callq  0x119245e50 <MemoryContextSwitchTo>
> 0x0000000119245a0c <exec_stmt_block+1420>:    mov    %rax,-0x160(%rbp)
> 0x0000000119245a13 <exec_stmt_block+1427>:    callq  0x1192562c6 <dyld_stub_CopyErrorData>
> 0x0000000119245a18 <exec_stmt_block+1432>:    mov    %rax,-0x118(%rbp)
> 0x0000000119245a1f <exec_stmt_block+1439>:    callq  0x119256320 <dyld_stub_FlushErrorState>
> 0x0000000119245a24 <exec_stmt_block+1444>:    callq  0x1192563da <dyld_stub_RollbackAndReleaseCurrentSubTransaction>
> 0x0000000119245a29 <exec_stmt_block+1449>:    mov    -0xe0(%rbp),%rdi
> 0x0000000119245a30 <exec_stmt_block+1456>:    callq  0x119245e50 <MemoryContextSwitchTo>
> 0x0000000119245a35 <exec_stmt_block+1461>:    mov    0x185cc(%rip),%rcx        # 0x11925e008
> 0x0000000119245a3c <exec_stmt_block+1468>:    mov    -0xe8(%rbp),%rdx
> 0x0000000119245a43 <exec_stmt_block+1475>:    mov    %rdx,(%rcx)
> 0x0000000119245a46 <exec_stmt_block+1478>:    mov    -0xf0(%rbp),%rcx
> 0x0000000119245a4d <exec_stmt_block+1485>:    mov    -0xb0(%rbp),%rdx
> 0x0000000119245a54 <exec_stmt_block+1492>:    mov    %rcx,0xa0(%rdx)
> 0x0000000119245a5b <exec_stmt_block+1499>:    mov    %rax,-0x168(%rbp)
> 0x0000000119245a62 <exec_stmt_block+1506>:    callq  0x11925646a <dyld_stub_SPI_restore_connection>
> 0x0000000119245a67 <exec_stmt_block+1511>:    mov    -0xb0(%rbp),%rax
> 0x0000000119245a6e <exec_stmt_block+1518>:    movq   $0x0,0x90(%rax)
> 0x0000000119245a79 <exec_stmt_block+1529>:    mov    -0xb0(%rbp),%rdi
> 0x0000000119245a80 <exec_stmt_block+1536>:    callq  0x1192453a0 <exec_eval_cleanup>
> 0x0000000119245a85 <exec_stmt_block+1541>:    mov    -0xb8(%rbp),%rax
> 0x0000000119245a8c <exec_stmt_block+1548>:    mov    0x28(%rax),%rax
> 0x0000000119245a90 <exec_stmt_block+1552>:    mov    0x8(%rax),%rdi
> 0x0000000119245a94 <exec_stmt_block+1556>:    callq  0x11924a3e0 <list_head>
> 0x0000000119245a99 <exec_stmt_block+1561>:    mov    %rax,-0x120(%rbp)
> 0x0000000119245aa0 <exec_stmt_block+1568>:    cmpq   $0x0,-0x120(%rbp)
> 0x0000000119245aab <exec_stmt_block+1579>:    je     0x119245bed <exec_stmt_block+1901>
> 0x0000000119245ab1 <exec_stmt_block+1585>:    mov    -0x120(%rbp),%rax
> 0x0000000119245ab8 <exec_stmt_block+1592>:    mov    (%rax),%rax
> 0x0000000119245abb <exec_stmt_block+1595>:    mov    %rax,-0x128(%rbp)
> 0x0000000119245ac2 <exec_stmt_block+1602>:    mov    -0x118(%rbp),%rdi
> 0x0000000119245ac9 <exec_stmt_block+1609>:    mov    -0x128(%rbp),%rax
> 0x0000000119245ad0 <exec_stmt_block+1616>:    mov    0x8(%rax),%rsi
> 0x0000000119245ad4 <exec_stmt_block+1620>:    callq  0x11924b320 <exception_matches_conditions>
> 0x0000000119245ad9 <exec_stmt_block+1625>:    cmp    $0x0,%al
> 0x0000000119245adb <exec_stmt_block+1627>:    je     0x119245bd1 <exec_stmt_block+1873>
> 0x0000000119245ae1 <exec_stmt_block+1633>:    mov    -0xb8(%rbp),%rax
> 0x0000000119245ae8 <exec_stmt_block+1640>:    mov    0x28(%rax),%rax
> 0x0000000119245aec <exec_stmt_block+1644>:    movslq (%rax),%rax
> 0x0000000119245aef <exec_stmt_block+1647>:    mov    -0xb0(%rbp),%rcx
> 0x0000000119245af6 <exec_stmt_block+1654>:    mov    0x60(%rcx),%rcx
> 0x0000000119245afa <exec_stmt_block+1658>:    mov    (%rcx,%rax,8),%rax
> 0x0000000119245afe <exec_stmt_block+1662>:    mov    %rax,-0x130(%rbp)
> 0x0000000119245b05 <exec_stmt_block+1669>:    mov    -0xb8(%rbp),%rax
> 0x0000000119245b0c <exec_stmt_block+1676>:    mov    0x28(%rax),%rax
> 0x0000000119245b10 <exec_stmt_block+1680>:    movslq 0x4(%rax),%rax
> 0x0000000119245b14 <exec_stmt_block+1684>:    mov    -0xb0(%rbp),%rcx
> 0x0000000119245b1b <exec_stmt_block+1691>:    mov    0x60(%rcx),%rcx
> 0x0000000119245b1f <exec_stmt_block+1695>:    mov    (%rcx,%rax,8),%rax
> 0x0000000119245b23 <exec_stmt_block+1699>:    mov    %rax,-0x138(%rbp)
> 0x0000000119245b2a <exec_stmt_block+1706>:    mov    -0xb0(%rbp),%rdi
> 0x0000000119245b31 <exec_stmt_block+1713>:    mov    -0x130(%rbp),%rsi
> 0x0000000119245b38 <exec_stmt_block+1720>:    mov    -0x118(%rbp),%rax
> 0x0000000119245b3f <exec_stmt_block+1727>:    mov    0x38(%rax),%edx
> 0x0000000119245b42 <exec_stmt_block+1730>:    mov    %rdi,-0x170(%rbp)
> 0x0000000119245b49 <exec_stmt_block+1737>:    mov    %edx,%edi
> 0x0000000119245b4b <exec_stmt_block+1739>:    mov    %rsi,-0x178(%rbp)
> 0x0000000119245b52 <exec_stmt_block+1746>:    callq  0x11925674c <dyld_stub_unpack_sql_state>
> 0x0000000119245b57 <exec_stmt_block+1751>:    mov    -0x170(%rbp),%rdi
> 0x0000000119245b5e <exec_stmt_block+1758>:    mov    -0x178(%rbp),%rsi
> 0x0000000119245b65 <exec_stmt_block+1765>:    mov    %rax,%rdx
> 0x0000000119245b68 <exec_stmt_block+1768>:    callq  0x119246cf0 <assign_text_var>
> 0x0000000119245b6d <exec_stmt_block+1773>:    mov    -0xb0(%rbp),%rdi
> 0x0000000119245b74 <exec_stmt_block+1780>:    mov    -0x138(%rbp),%rsi
> 0x0000000119245b7b <exec_stmt_block+1787>:    mov    -0x118(%rbp),%rax
> 0x0000000119245b82 <exec_stmt_block+1794>:    mov    0x40(%rax),%rdx
> 0x0000000119245b86 <exec_stmt_block+1798>:    callq  0x119246cf0 <assign_text_var>
> 0x0000000119245b8b <exec_stmt_block+1803>:    mov    -0x118(%rbp),%rax
> 0x0000000119245b92 <exec_stmt_block+1810>:    mov    -0xb0(%rbp),%rcx
> 0x0000000119245b99 <exec_stmt_block+1817>:    mov    %rax,0x30(%rcx)
> 0x0000000119245b9d <exec_stmt_block+1821>:    mov    -0xb0(%rbp),%rax
> 0x0000000119245ba4 <exec_stmt_block+1828>:    movq   $0x0,0xb0(%rax)
> 0x0000000119245baf <exec_stmt_block+1839>:    mov    -0xb0(%rbp),%rdi
> 0x0000000119245bb6 <exec_stmt_block+1846>:    mov    -0x128(%rbp),%rax
> 0x0000000119245bbd <exec_stmt_block+1853>:    mov    0x10(%rax),%rsi
> 0x0000000119245bc1 <exec_stmt_block+1857>:    callq  0x11924b260 <exec_stmts>
> 0x0000000119245bc6 <exec_stmt_block+1862>:    mov    %eax,-0xbc(%rbp)
> 0x0000000119245bcc <exec_stmt_block+1868>:    jmpq   0x119245bed <exec_stmt_block+1901>
> 0x0000000119245bd1 <exec_stmt_block+1873>:    jmpq   0x119245bd6 <exec_stmt_block+1878>
> 0x0000000119245bd6 <exec_stmt_block+1878>:    mov    -0x120(%rbp),%rax
> 0x0000000119245bdd <exec_stmt_block+1885>:    mov    0x8(%rax),%rax
> 0x0000000119245be1 <exec_stmt_block+1889>:    mov    %rax,-0x120(%rbp)
> 0x0000000119245be8 <exec_stmt_block+1896>:    jmpq   0x119245aa0 <exec_stmt_block+1568>
> 0x0000000119245bed <exec_stmt_block+1901>:    mov    -0xf8(%rbp),%rax
> 0x0000000119245bf4 <exec_stmt_block+1908>:    mov    -0xb0(%rbp),%rcx
> 0x0000000119245bfb <exec_stmt_block+1915>:    mov    %rax,0x30(%rcx)
> 0x0000000119245bff <exec_stmt_block+1919>:    cmpq   $0x0,-0x120(%rbp)
> 0x0000000119245c0a <exec_stmt_block+1930>:    jne    0x119245c1c <exec_stmt_block+1948>
> 0x0000000119245c10 <exec_stmt_block+1936>:    mov    -0x118(%rbp),%rdi
> 0x0000000119245c17 <exec_stmt_block+1943>:    callq  0x1192563b0 <dyld_stub_ReThrowError>
> 0x0000000119245c1c <exec_stmt_block+1948>:    mov    -0x118(%rbp),%rdi
> 0x0000000119245c23 <exec_stmt_block+1955>:    callq  0x119256326 <dyld_stub_FreeErrorData>
> 0x0000000119245c28 <exec_stmt_block+1960>:    jmpq   0x119245c2d <exec_stmt_block+1965>
> 0x0000000119245c2d <exec_stmt_block+1965>:    mov    0x18444(%rip),%rax        # 0x11925e078
> 0x0000000119245c34 <exec_stmt_block+1972>:    mov    0x183ed(%rip),%rcx        # 0x11925e028
> 0x0000000119245c3b <exec_stmt_block+1979>:    mov    -0x100(%rbp),%rdx
> 0x0000000119245c42 <exec_stmt_block+1986>:    mov    %rdx,(%rcx)
> 0x0000000119245c45 <exec_stmt_block+1989>:    mov    -0x108(%rbp),%rcx
> 0x0000000119245c4c <exec_stmt_block+1996>:    mov    %rcx,(%rax)
> 0x0000000119245c4f <exec_stmt_block+1999>:    jmpq   0x119245c54 <exec_stmt_block+2004>
> 0x0000000119245c54 <exec_stmt_block+2004>:    mov    -0xf8(%rbp),%rax
> 0x0000000119245c5b <exec_stmt_block+2011>:    mov    -0xb0(%rbp),%rcx
> 0x0000000119245c62 <exec_stmt_block+2018>:    cmp    0x30(%rcx),%rax
> 0x0000000119245c66 <exec_stmt_block+2022>:    je     0x119245c8b <exec_stmt_block+2059>
> 0x0000000119245c6c <exec_stmt_block+2028>:    lea    0x16c94(%rip),%rdi        # 0x11925c907
> 0x0000000119245c73 <exec_stmt_block+2035>:    lea    0x13488(%rip),%rsi        # 0x119259102
> 0x0000000119245c7a <exec_stmt_block+2042>:    lea    0x1620f(%rip),%rdx        # 0x11925be90
> 0x0000000119245c81 <exec_stmt_block+2049>:    mov    $0x534,%ecx
> 0x0000000119245c86 <exec_stmt_block+2054>:    callq  0x119256314 <dyld_stub_ExceptionalCondition>
> 0x0000000119245c8b <exec_stmt_block+2059>:    jmpq   0x119245c90 <exec_stmt_block+2064>
> 0x0000000119245c90 <exec_stmt_block+2064>:    jmpq   0x119245cc4 <exec_stmt_block+2116>
> 0x0000000119245c95 <exec_stmt_block+2069>:    mov    -0xb0(%rbp),%rax
> 0x0000000119245c9c <exec_stmt_block+2076>:    movq   $0x0,0xb0(%rax)
> 0x0000000119245ca7 <exec_stmt_block+2087>:    mov    -0xb0(%rbp),%rdi
> 0x0000000119245cae <exec_stmt_block+2094>:    mov    -0xb8(%rbp),%rax
> 0x0000000119245cb5 <exec_stmt_block+2101>:    mov    0x10(%rax),%rsi
> 0x0000000119245cb9 <exec_stmt_block+2105>:    callq  0x11924b260 <exec_stmts>
> 0x0000000119245cbe <exec_stmt_block+2110>:    mov    %eax,-0xbc(%rbp)
> 0x0000000119245cc4 <exec_stmt_block+2116>:    mov    -0xb0(%rbp),%rax
> 0x0000000119245ccb <exec_stmt_block+2123>:    movq   $0x0,0xb0(%rax)
> 0x0000000119245cd6 <exec_stmt_block+2134>:    mov    -0xbc(%rbp),%ecx
> 0x0000000119245cdc <exec_stmt_block+2140>:    test   %ecx,%ecx
> 0x0000000119245cde <exec_stmt_block+2142>:    mov    %ecx,-0x17c(%rbp)
> 0x0000000119245ce4 <exec_stmt_block+2148>:    je     0x119245d26 <exec_stmt_block+2214>
> 0x0000000119245cea <exec_stmt_block+2154>:    jmpq   0x119245cef <exec_stmt_block+2159>
> 0x0000000119245cef <exec_stmt_block+2159>:    mov    -0x17c(%rbp),%eax
> 0x0000000119245cf5 <exec_stmt_block+2165>:    sub    $0x1,%eax
> 0x0000000119245cf8 <exec_stmt_block+2168>:    mov    %eax,-0x180(%rbp)
> 0x0000000119245cfe <exec_stmt_block+2174>:    je     0x119245d37 <exec_stmt_block+2231>
> 0x0000000119245d04 <exec_stmt_block+2180>:    jmpq   0x119245d09 <exec_stmt_block+2185>
> 0x0000000119245d09 <exec_stmt_block+2185>:    mov    -0x17c(%rbp),%eax
> 0x0000000119245d0f <exec_stmt_block+2191>:    add    $0xfffffffffffffffe,%eax
> 0x0000000119245d12 <exec_stmt_block+2194>:    sub    $0x1,%eax
> 0x0000000119245d15 <exec_stmt_block+2197>:    mov    %eax,-0x184(%rbp)
> 0x0000000119245d1b <exec_stmt_block+2203>:    ja     0x119245dd2 <exec_stmt_block+2386>
> 0x0000000119245d21 <exec_stmt_block+2209>:    jmpq   0x119245d26 <exec_stmt_block+2214>
> 0x0000000119245d26 <exec_stmt_block+2214>:    mov    -0xbc(%rbp),%eax
> 0x0000000119245d2c <exec_stmt_block+2220>:    mov    %eax,-0xa4(%rbp)
> 0x0000000119245d32 <exec_stmt_block+2226>:    jmpq   0x119245e0d <exec_stmt_block+2445>
> 0x0000000119245d37 <exec_stmt_block+2231>:    mov    -0xb0(%rbp),%rax
> 0x0000000119245d3e <exec_stmt_block+2238>:    cmpq   $0x0,0x28(%rax)
> 0x0000000119245d46 <exec_stmt_block+2246>:    jne    0x119245d5b <exec_stmt_block+2267>
> 0x0000000119245d4c <exec_stmt_block+2252>:    movl   $0x1,-0xa4(%rbp)
> 0x0000000119245d56 <exec_stmt_block+2262>:    jmpq   0x119245e0d <exec_stmt_block+2445>
> 0x0000000119245d5b <exec_stmt_block+2267>:    mov    -0xb8(%rbp),%rax
> 0x0000000119245d62 <exec_stmt_block+2274>:    cmpq   $0x0,0x8(%rax)
> 0x0000000119245d6a <exec_stmt_block+2282>:    jne    0x119245d7f <exec_stmt_block+2303>
> 0x0000000119245d70 <exec_stmt_block+2288>:    movl   $0x1,-0xa4(%rbp)
> 0x0000000119245d7a <exec_stmt_block+2298>:    jmpq   0x119245e0d <exec_stmt_block+2445>
> 0x0000000119245d7f <exec_stmt_block+2303>:    mov    -0xb8(%rbp),%rax
> 0x0000000119245d86 <exec_stmt_block+2310>:    mov    0x8(%rax),%rdi
> 0x0000000119245d8a <exec_stmt_block+2314>:    mov    -0xb0(%rbp),%rax
> 0x0000000119245d91 <exec_stmt_block+2321>:    mov    0x28(%rax),%rsi
> 0x0000000119245d95 <exec_stmt_block+2325>:    callq  0x1192567a6 <dyld_stub_strcmp>
> 0x0000000119245d9a <exec_stmt_block+2330>:    cmp    $0x0,%eax
> 0x0000000119245d9f <exec_stmt_block+2335>:    je     0x119245db4 <exec_stmt_block+2356>
> 0x0000000119245da5 <exec_stmt_block+2341>:    movl   $0x1,-0xa4(%rbp)
> 0x0000000119245daf <exec_stmt_block+2351>:    jmpq   0x119245e0d <exec_stmt_block+2445>
> 0x0000000119245db4 <exec_stmt_block+2356>:    mov    -0xb0(%rbp),%rax
> 0x0000000119245dbb <exec_stmt_block+2363>:    movq   $0x0,0x28(%rax)
> 0x0000000119245dc3 <exec_stmt_block+2371>:    movl   $0x0,-0xa4(%rbp)
> 0x0000000119245dcd <exec_stmt_block+2381>:    jmpq   0x119245e0d <exec_stmt_block+2445>
> 0x0000000119245dd2 <exec_stmt_block+2386>:    jmpq   0x119245dd7 <exec_stmt_block+2391>
> 0x0000000119245dd7 <exec_stmt_block+2391>:    lea    0x160b2(%rip),%rdi        # 0x11925be90
> 0x0000000119245dde <exec_stmt_block+2398>:    mov    $0x55c,%esi
> 0x0000000119245de3 <exec_stmt_block+2403>:    lea    0x16a84(%rip),%rdx        # 0x11925c86e
> 0x0000000119245dea <exec_stmt_block+2410>:    callq  0x119256548 <dyld_stub_elog_start>
> 0x0000000119245def <exec_stmt_block+2415>:    mov    $0x14,%edi
> 0x0000000119245df4 <exec_stmt_block+2420>:    lea    0x16b33(%rip),%rsi        # 0x11925c92e
> 0x0000000119245dfb <exec_stmt_block+2427>:    mov    -0xbc(%rbp),%edx
> 0x0000000119245e01 <exec_stmt_block+2433>:    mov    $0x0,%al
> 0x0000000119245e03 <exec_stmt_block+2435>:    callq  0x119256542 <dyld_stub_elog_finish>
> 0x0000000119245e08 <exec_stmt_block+2440>:    callq  0x119256776 <dyld_stub_abort>
> 0x0000000119245e0d <exec_stmt_block+2445>:    mov    0x1824c(%rip),%rax        # 0x11925e060
> 0x0000000119245e14 <exec_stmt_block+2452>:    mov    -0xa4(%rbp),%ecx
> 0x0000000119245e1a <exec_stmt_block+2458>:    mov    (%rax),%rax
> 0x0000000119245e1d <exec_stmt_block+2461>:    cmp    -0x8(%rbp),%rax
> 0x0000000119245e21 <exec_stmt_block+2465>:    mov    %ecx,-0x188(%rbp)
> 0x0000000119245e27 <exec_stmt_block+2471>:    jne    0x119245e3c <exec_stmt_block+2492>
> 0x0000000119245e2d <exec_stmt_block+2477>:    mov    -0x188(%rbp),%eax
> 0x0000000119245e33 <exec_stmt_block+2483>:    add    $0x190,%rsp
> 0x0000000119245e3a <exec_stmt_block+2490>:    pop    %rbp
> 0x0000000119245e3b <exec_stmt_block+2491>:    retq
> 0x0000000119245e3c <exec_stmt_block+2492>:    callq  0x11925676a <dyld_stub___stack_chk_fail>
> End of assembler dump.

-- 
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com



Re: Fwd: Core dump with nested CREATE TEMP TABLE

От
Michael Paquier
Дата:
On Sun, Aug 30, 2015 at 1:06 AM, Jim Nasby wrote:
> Steps to reproduce:
> Download https://github.com/BlueTreble/test_factory/archive/crash.zip
> Unzip, cd into directory
> pgxn install pgtap (or just make test)

FWIW, make test fails:
! ERROR:  42703: column "c_data_table_name" does not exist
! LINE 4:     , c_data_table_name

> make install if you didn't do make test
> psql -f crash.sql

As does this one with the same error. I am not exactly sure what I am missing.
Regards,
-- 
Michael



Re: Fwd: Core dump with nested CREATE TEMP TABLE

От
Jim Nasby
Дата:
On 9/1/15 1:08 AM, Michael Paquier wrote:
> On Sun, Aug 30, 2015 at 1:06 AM, Jim Nasby wrote:
>> Steps to reproduce:
>> Download https://github.com/BlueTreble/test_factory/archive/crash.zip
>> Unzip, cd into directory
>> pgxn install pgtap (or just make test)
>
> FWIW, make test fails:
> ! ERROR:  42703: column "c_data_table_name" does not exist
> ! LINE 4:     , c_data_table_name
>
>> make install if you didn't do make test
>> psql -f crash.sql
>
> As does this one with the same error. I am not exactly sure what I am missing.
> Regards,

Well nuts, pretty sure that means the error isn't reproducing for you. 
:/ Do you maybe have unusual config options or postgresql.conf options?

Here's a log of how I can reproduce from a just-pulled copy of HEAD:
decibel@decina:[17:22]~/pgsql/HEAD (master $=)$head config.log |grep ./conf  $ ./configure
--with-includes=/opt/local/include
 
--with-libraries=/opt/local/lib --with-perl --with-python 
--enable-depend -C --enable-tap-tests 
--prefix=/Users/decibel/pgsql/HEAD/i --with-pgport=5555 --enable-debug 
CC=ccache clang -Qunused-arguments -fcolor-diagnostics --enable-cassert 
CFLAGS=-O0 --no-create --no-recursion
decibel@decina:[17:22]~/pgsql/HEAD (master $=)$

make; make install; pg_ctl init; pg_ctl start; createdb

decibel@decina:[17:20]~/tmp$mv ~/Downloads/test_factory-crash.zip .
decibel@decina:[17:20]~/tmp$open test_factory-crash.zip
decibel@decina:[17:20]~/tmp$cd test_factory-crash/
decibel@decina:[17:20]~/tmp/test_factory-crash$pgHEAD
decibel@decina:[17:20]~/tmp/test_factory-crash$make test
rm -rf  sql/test_factory--0.1.1.sql
rm -rf results/ regression.diffs regression.out tmp_check/ log/
cp sql/test_factory.sql sql/test_factory--0.1.1.sql
/bin/sh 
/Users/decibel/pgsql/HEAD/i/lib/pgxs/src/makefiles/../../config/install-sh 
-c -d '/Users/decibel/pgsql/HEAD/i/share/extension'
/bin/sh 
/Users/decibel/pgsql/HEAD/i/lib/pgxs/src/makefiles/../../config/install-sh 
-c -d '/Users/decibel/pgsql/HEAD/i/share/doc/extension'
/usr/bin/install -c -m 644 .//test_factory.control 
'/Users/decibel/pgsql/HEAD/i/share/extension/'
/usr/bin/install -c -m 644 .//doc/test_factory.asc 
'/Users/decibel/pgsql/HEAD/i/share/doc/extension/'
/Users/decibel/pgsql/HEAD/i/lib/pgxs/src/makefiles/../../src/test/regress/pg_regress 
--inputdir=./ --bindir='/Users/decibel/pgsql/HEAD/i/bin' 
--inputdir=test --load-language=plpgsql --dbname=contrib_regression base
(using postmaster on Unix socket, default port)
============== dropping database "contrib_regression" ==============
NOTICE:  database "contrib_regression" does not exist, skipping
DROP DATABASE
============== creating database "contrib_regression" ==============
CREATE DATABASE
ALTER DATABASE
============== installing plpgsql                     ==============
CREATE LANGUAGE
============== running regression test queries        ==============
test base                     ... FAILED (test process exited with exit 
code 2)

====================== 1 of 1 tests failed.
======================

The differences that caused some tests to fail can be viewed in the
file "/Users/decibel/tmp/test_factory-crash/regression.diffs".  A copy 
of the test summary that you see
above is saved in the file 
"/Users/decibel/tmp/test_factory-crash/regression.out".

make: [installcheck] Error 1 (ignored)
*** /Users/decibel/tmp/test_factory-crash/test/expected/base.out    Sat Aug 
29 08:50:08 2015
--- /Users/decibel/tmp/test_factory-crash/results/base.out    Tue Sep  1 
17:21:04 2015
***************
*** 1,18 ****  \set ECHO none
! ok 1 - Create customer table
! ok 2 - Register test customers
! ok 3 - Create function customer__add
! ok 4 - Create invoice table
! ok 5 - Register test invoices
! ok 6 - customer table is empty
! ok 7 - invoice table is empty
! ok 8 - invoice factory output
! ok 9 - invoice table content
! ok 10 - customer table content
! ok 11 - invoice factory second call
! ok 12 - invoice table content stayed constant
! ok 13 - customer table content stayed constant
! ok 14 - Test function factory
! ok 15 - customer table has new row
! ok 16 - truncate invoice
! ok 17 - invoice factory get remains the same after truncate
--- 1,10 ----  \set ECHO none
! ok 1 - Register test customers
! ok 2 - Create function customer__add
! ok 3 - Register test invoices
! ok 4 - customer table is empty
! ok 5 - invoice table is empty
! server closed the connection unexpectedly
!     This probably means the server terminated abnormally
!     before or while processing the request.
! connection to server was lost

======================================================================

decibel@decina:[17:21]~/tmp/test_factory-crash$
-- 
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com



Re: Fwd: Core dump with nested CREATE TEMP TABLE

От
Michael Paquier
Дата:
On Wed, Sep 2, 2015 at 7:23 AM, Jim Nasby wrote:
> Well nuts, pretty sure that means the error isn't reproducing for you. :/ Do
> you maybe have unusual config options or postgresql.conf options?

Yep. And I have found one reason why it was not working, I have been
using --extra-version with a custom string and the Makefile of
test_factory failed to detect that (you may want to use VERSION_NUM in
Makefile.global instead), leading to test_factory--0.1.1.sql to not be
installed as well.

Even after fixing that, I am facing the same problem when running the test, aka:
psql:crash.sql:42: ERROR:  42703: column "c_data_table_name" does not exist
LINE 4:     , c_data_table_name
And the same error show up, should I use the branch crash in the
github repo or your zip file, make test or make install/psql -f
crash.sql.

test_factory is a jungle to me. Perhaps you could just extract a
self-contained test case? It does not matter if the file is long as
long as the problem can be easily reproduced.
Regards,
-- 
Michael



Re: Fwd: Core dump with nested CREATE TEMP TABLE

От
Michael Paquier
Дата:
On Wed, Sep 2, 2015 at 9:39 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Wed, Sep 2, 2015 at 7:23 AM, Jim Nasby wrote:
>> Well nuts, pretty sure that means the error isn't reproducing for you. :/ Do
>> you maybe have unusual config options or postgresql.conf options?
>
> Yep. And I have found one reason why it was not working, I have been
> using --extra-version with a custom string and the Makefile of
> test_factory failed to detect that (you may want to use VERSION_NUM in
> Makefile.global instead), leading to test_factory--0.1.1.sql to not be
> installed as well.
>
> Even after fixing that, I am facing the same problem when running the test, aka:
> psql:crash.sql:42: ERROR:  42703: column "c_data_table_name" does not exist
> LINE 4:     , c_data_table_name
> And the same error show up, should I use the branch crash in the
> github repo or your zip file, make test or make install/psql -f
> crash.sql.
>
> test_factory is a jungle to me. Perhaps you could just extract a
> self-contained test case? It does not matter if the file is long as
> long as the problem can be easily reproduced.

Mea culpa. It is possible to run crash.sql, with test_factory instead
0.2.1 installed in a cluster. So I picked it up from github on master
branch, deployed it, and then crash.sql is able to run. However I was
not able to reproduce a failure on master, REL9_4_STABLE and 9.4.1.
Thoughts?
-- 
Michael



Re: Fwd: Core dump with nested CREATE TEMP TABLE

От
Jim Nasby
Дата:
On 9/1/15 8:42 PM, Michael Paquier wrote:
> On Wed, Sep 2, 2015 at 9:39 AM, Michael Paquier
> <michael.paquier@gmail.com> wrote:
>> On Wed, Sep 2, 2015 at 7:23 AM, Jim Nasby wrote:
>>> Well nuts, pretty sure that means the error isn't reproducing for you. :/ Do
>>> you maybe have unusual config options or postgresql.conf options?
>>
>> Yep. And I have found one reason why it was not working, I have been
>> using --extra-version with a custom string and the Makefile of
>> test_factory failed to detect that (you may want to use VERSION_NUM in
>> Makefile.global instead), leading to test_factory--0.1.1.sql to not be
>> installed as well.
>>
>> Even after fixing that, I am facing the same problem when running the test, aka:
>> psql:crash.sql:42: ERROR:  42703: column "c_data_table_name" does not exist
>> LINE 4:     , c_data_table_name
>> And the same error show up, should I use the branch crash in the
>> github repo or your zip file, make test or make install/psql -f
>> crash.sql.
>>
>> test_factory is a jungle to me. Perhaps you could just extract a
>> self-contained test case? It does not matter if the file is long as
>> long as the problem can be easily reproduced.
>
> Mea culpa. It is possible to run crash.sql, with test_factory instead
> 0.2.1 installed in a cluster. So I picked it up from github on master
> branch, deployed it, and then crash.sql is able to run. However I was
> not able to reproduce a failure on master, REL9_4_STABLE and 9.4.1.
> Thoughts?

The crash is triggered by having an exception raised in this particular 
call stack. Since there's no syntax error in master/0.2.1, there's no 
assert failure either. Were you running asserts? What configure line are 
you using? I might be able to track this down to something particular to 
my configure.
-- 
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com



Re: Fwd: Core dump with nested CREATE TEMP TABLE

От
Jim Nasby
Дата:
On 9/1/15 8:42 PM, Michael Paquier wrote:
>>test_factory is a jungle to me. Perhaps you could just extract a
>>self-contained test case? It does not matter if the file is long as
>>long as the problem can be easily reproduced.

Sorry, more info on what's happening here.

The idea behind test_factory is to allow you to register a command that 
creates and returns test data (IE: INSERT INTO table VALUES( DEFAULT, 
'my test data' ) RETURNING *). That insert statement is stored in a 
table (_tf._test_factory). When this dynamic statement is executed, the 
results will be stored in a specially named table (plpgsql variable 
c_data_table_name).

When you call tf.get(), it first attempts to grab all the rows from 
c_data_table_name. The first time you do this in a database, that table 
won't exist. tg.get's exception block will create a temp table holding 
the results of the stored statement (IE: INSERT INTO table ...).

Something else important here is that in crash.sql there is a nested 
tf.get() call:

-- Invoice        , $$INSERT INTO invoice VALUES(            DEFAULT            , (tf.get( NULL::customer, 'insert'
)).customer_id           , current_date            , current_date + 30          ) RETURNING *$$
 

Note that calls tf.get for customer (which is a simple INSERT).

This is where stuff gets weird. If you run tf.get( NULL::customer, 
'insert' ) you get a regular plpgsql error. If you simply run tf.get() 
for invoices, *outside* of tap.results_eq(), you also only get a plpgsql 
error. To trigger the assert, you must use tf.get( NULL::invoice, 'base' 
) from within tap.results_eq(). That's the only way I've found to 
trigger this.

AFAICT, that call stack looks like this:

results_eq opens a cursor to run $$SELECT * FROM tf.get( NULL::invoice, 
'base' )$$

plpgsql does it's thing and eventually that statement begins execution
tf.get() does a few things then attempts to read from a non-existent 
table. tf.get's outer block catches that exception and runs dynamic SQL 
to create a temp table. That dynamic SQL contains (in part) this:  , 
(tf.get( NULL::customer, 'insert' )).customer_id

*That* tf.get also attempts to read from a non-existent table and 
*successfully* creates it's temp table. It then does      PERFORM _tf.table_create( c_data_table_name );

which fails due to a bug in _tf.table_create().

Now we have a second exception bubbling back up to the exception handler 
of the second tf.get call, which goes up to the exception handler for 
the first tf.get call. That call was in the process of creating a temp 
table (invoice_003). The error continues up to the FETCH command in 
results_eq(). The assert happens somewhere after here, and it's because 
the refcount on that temp table (invoice_003) is unexpected. I'm tracing 
through this scenario by hand right now to try and figure out exactly 
when that assert blows up, but I know it's happening in 
results_eq(refcursor, refcursor, text).

BTW, I just updated the crash branch to ensure that test_factory 0.1.1 
is what gets installed.
-- 
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com



Re: Fwd: Core dump with nested CREATE TEMP TABLE

От
Michael Paquier
Дата:
On Wed, Sep 2, 2015 at 12:59 PM, Jim Nasby wrote:
> On 9/1/15 8:42 PM, Michael Paquier wrote:
> The crash is triggered by having an exception raised in this particular call
> stack. Since there's no syntax error in master/0.2.1, there's no assert
> failure either. Were you running asserts?

I thought I was, but visibly it was mistaken. So, after re-rolling
configure, I have been able to reproduce the crash, and as far as I
can see all supported versions are impacted. I tested down to 9.1
where extensions were introduced, and the stack trace, as well as the
assertion failing are the same, similar to what Jim has reported. I am
just digging more into this thing now.
-- 
Michael



Re: Fwd: Core dump with nested CREATE TEMP TABLE

От
Jim Nasby
Дата:
On 9/1/15 11:59 PM, Michael Paquier wrote:
> On Wed, Sep 2, 2015 at 12:59 PM, Jim Nasby wrote:
>> On 9/1/15 8:42 PM, Michael Paquier wrote:
>> The crash is triggered by having an exception raised in this particular call
>> stack. Since there's no syntax error in master/0.2.1, there's no assert
>> failure either. Were you running asserts?
>
> I thought I was, but visibly it was mistaken. So, after re-rolling
> configure, I have been able to reproduce the crash, and as far as I
> can see all supported versions are impacted. I tested down to 9.1
> where extensions were introduced, and the stack trace, as well as the
> assertion failing are the same, similar to what Jim has reported. I am
> just digging more into this thing now.

I just had a theory that reference counts were messed up because there 
was both a temp table and a real table with the same name. Turns out 
that's not the case, but I do now know that the assert is failing for 
the reference count on the temp table.
-- 
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com



Re: Fwd: Core dump with nested CREATE TEMP TABLE

От
Michael Paquier
Дата:
<div dir="ltr"><br /><br />On Wed, Sep 2, 2015 at 6:13 AM, Jim Nasby <<a
href="mailto:Jim.Nasby@bluetreble.com">Jim.Nasby@bluetreble.com</a>>wrote:<br />> On 9/1/15 11:59 PM, Michael
Paquierwrote:<br />>><br />>> On Wed, Sep 2, 2015 at 12:59 PM, Jim Nasby wrote:<br />>>><br
/>>>>On 9/1/15 8:42 PM, Michael Paquier wrote:<br />>>> The crash is triggered by having an exception
raisedin this particular<br />>>> call<br />>>> stack. Since there's no syntax error in master/0.2.1,
there'sno assert<br />>>> failure either. Were you running asserts?<br />>><br />>><br />>>
Ithought I was, but visibly it was mistaken. So, after re-rolling<br />>> configure, I have been able to
reproducethe crash, and as far as I<br />>> can see all supported versions are impacted. I tested down to 9.1<br
/>>>where extensions were introduced, and the stack trace, as well as the<br />>> assertion failing are the
same,similar to what Jim has reported. I am<br />>> just digging more into this thing now.<br />><br />><br
/>>I just had a theory that reference counts were messed up because there was<br />> both a temp table and a real
tablewith the same name. Turns out that's not<br />> the case, but I do now know that the assert is failing for the
reference<br/>> count on the temp table.<br /><br />Yes, that's what I have been looking at actually by having some
markersin relcache.c. The reference count of this relation get incremented here:<br />LOG:  increment ref count
relation:invoice_0000000003, rd_refcnt: 1<br />CONTEXT:  SQL statement "<br />        CREATE TEMP TABLE
invoice_0000000003ON COMMIT DROP AS<br />        WITH i AS (<br />              INSERT INTO invoice VALUES(<br />     
             DEFAULT<br />                    , (tf.get( NULL::customer, 'insert' )).customer_id<br />                 
 , current_date<br />                    , current_date + 30<br />                  ) RETURNING *<br />            )<br
/>         SELECT *<br />            FROM i<br />        "<br />        PL/pgSQL function tf.get(anyelement,text) line
29at EXECUTE<br />        PL/pgSQL function results_eq(refcursor,refcursor,text) line 11 at FETCH<br />        PL/pgSQL
functionresults_eq(text,text,text) line 9 at assignment<br />STATEMENT:  SELECT results_eq(<br />          $$SELECT *
FROMtf.get( NULL::invoice, 'base' )$$<br />          , $$VALUES( 1, 1, current_date, current_date + 30 )$$<br />       
 , 'invoice factory output'<br />        );<br />And it gets cleared here without being decremented when cleaning up
thesecond exception:<br />LOG:  clear relation: invoice_0000000003, rd_refcnt: 1<br />CONTEXT:  PL/pgSQL function
results_eq(refcursor,refcursor,text)line 11 during exception cleanup<br />        PL/pgSQL function
results_eq(text,text,text)line 9 at assignment<br />STATEMENT:  SELECT results_eq(<br />          $$SELECT * FROM
tf.get(NULL::invoice, 'base' )$$<br />          , $$VALUES( 1, 1, current_date, current_date + 30 )$$<br />          ,
'invoicefactory output'<br />        );<br />-- <br />Michael</div> 

Re: Fwd: Core dump with nested CREATE TEMP TABLE

От
Michael Paquier
Дата:
On Wed, Sep 2, 2015 at 6:21 AM, Michael Paquier wrote:
> Yes, that's what I have been looking at actually by having some markers in relcache.c. The reference count of this
relationget incremented here: 

So, I have been playing more with this code, and as mentioned by
Andres this test case goes twice through the PG_CATCH block in
pl_exec.c, causing the crash as the temporary relation does not get
dereferenced. I may be missing something, but isn't it a matter of
moving the switch to the old memory context at the end of the block so
as we can get a correct cleanup of the estate in the first block?
See for example the attached that fixes the problem for me, and passes
make checkworld, though I expect Tom and Andres to jump in and say
that this is not correct within the next 12 hours :)
I haven't written yet a test case but I think that we could reproduce
that simply by having a relation referenced in the exception block of
a first function, calling a second function that itself raises an
exception, causing the referencing error.
Regards,
--
Michael

Вложения

Re: Fwd: Core dump with nested CREATE TEMP TABLE

От
Alvaro Herrera
Дата:
Michael Paquier wrote:

> I haven't written yet a test case but I think that we could reproduce
> that simply by having a relation referenced in the exception block of
> a first function, calling a second function that itself raises an
> exception, causing the referencing error.

Hm, so function 2 is called in the exception block of function 1?

Are you going to produce the test case for this?  Jim?

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Fwd: Core dump with nested CREATE TEMP TABLE

От
Jim Nasby
Дата:
On 9/2/15 2:17 PM, Alvaro Herrera wrote:
> Michael Paquier wrote:
>
>> I haven't written yet a test case but I think that we could reproduce
>> that simply by having a relation referenced in the exception block of
>> a first function, calling a second function that itself raises an
>> exception, causing the referencing error.
>
> Hm, so function 2 is called in the exception block of function 1?
>
> Are you going to produce the test case for this?  Jim?

I had attempted one but the issue is that it's more than just an 
exception block thing. If it was that simple then we'd still get the 
crash without involving pgTap. So I suspect you need to have a named 
cursor in the mix as well.

Let me make another attempt at something simpler.
-- 
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com



Re: Fwd: Core dump with nested CREATE TEMP TABLE

От
Jim Nasby
Дата:
On 9/2/15 2:56 PM, Jim Nasby wrote:
> On 9/2/15 2:17 PM, Alvaro Herrera wrote:
>> Michael Paquier wrote:
>>
>>> I haven't written yet a test case but I think that we could reproduce
>>> that simply by having a relation referenced in the exception block of
>>> a first function, calling a second function that itself raises an
>>> exception, causing the referencing error.
>>
>> Hm, so function 2 is called in the exception block of function 1?
>>
>> Are you going to produce the test case for this?  Jim?
>
> I had attempted one but the issue is that it's more than just an
> exception block thing. If it was that simple then we'd still get the
> crash without involving pgTap. So I suspect you need to have a named
> cursor in the mix as well.
>
> Let me make another attempt at something simpler.

After some time messing around with this I've been able to remove pgTap
from the equation using the attached crash.sql (relevant snippet below).

This only works if you pass a refcursor into the function. It doesn't
work if you do

OPEN have FOR EXECUTE $$SELECT * FROM tf.get( NULL::invoice, 'base' )$$;

inside the function instead. This code also produces different results;
it outputs the error message before crashing and the stack trace shows
the assert is now failing while trying to abort the top-level
transaction. So it looks like the scenario is:

BEGIN;
DECLARE (open?) cursor that calls function with exception handler that
calls function with exception handler that calls something that fails

The double set of exceptions seems to be critical; if instead of calling
tf.get(invoice) (which recursively does tf.get(customer)) I define the
cursor to call tf.get(customer) directly there's no assert.

I can poke at this more tomorrow, but it'd be very helpful if someone
could explain this failure mode, as I'm basically stumbling in the dark
on a test case right now.

CREATE OR REPLACE FUNCTION a(
     have refcursor
) RETURNS void LANGUAGE plpgsql AS $body$
DECLARE
     have_rec record;
BEGIN
     FETCH have INTO have_rec;
END
$body$;
DECLARE h CURSOR FOR SELECT * FROM tf.get( NULL::invoice, 'base' );
SELECT a(
   'h'::refcursor
);

> (lldb) bt
> * thread #1: tid = 0x722ed8, 0x00007fff92a5a866 libsystem_kernel.dylib`__pthread_kill + 10, queue =
'com.apple.main-thread',stop reason = signal SIGABRT 
>   * frame #0: 0x00007fff92a5a866 libsystem_kernel.dylib`__pthread_kill + 10
>     frame #1: 0x00007fff9001b35c libsystem_pthread.dylib`pthread_kill + 92
>     frame #2: 0x00007fff8cf89b1a libsystem_c.dylib`abort + 125
>     frame #3: 0x000000010cdb4039 postgres`ExceptionalCondition(conditionName=0x000000010cf59cfd,
errorType=0x000000010cec392d,fileName=0x000000010cf59045, lineNumber=1972) + 137 at assert.c:54 
>     frame #4: 0x000000010cd9b332 postgres`RelationClearRelation(relation=0x000000011658f110, rebuild='\0') + 162 at
relcache.c:1970
>     frame #5: 0x000000010cd9cc0f postgres`AtEOSubXact_cleanup(relation=0x000000011658f110, isCommit='\0', mySubid=15,
parentSubid=1)+ 79 at relcache.c:2665 
>     frame #6: 0x000000010cd9cb92 postgres`AtEOSubXact_RelationCache(isCommit='\0', mySubid=15, parentSubid=1) + 242
atrelcache.c:2633 
>     frame #7: 0x000000010c9b6e88 postgres`AbortSubTransaction + 440 at xact.c:4373
>     frame #8: 0x000000010c9b7208 postgres`AbortCurrentTransaction + 312 at xact.c:2947
>     frame #9: 0x000000010cc249f3 postgres`PostgresMain(argc=1, argv=0x00007fa3f4800378, dbname=0x00007fa3f4800200,
username=0x00007fa3f30031f8)+ 1875 at postgres.c:3902 
>     frame #10: 0x000000010cb9da48 postgres`PostmasterMain [inlined] BackendRun + 8024 at postmaster.c:4155
>     frame #11: 0x000000010cb9da22 postgres`PostmasterMain [inlined] BackendStartup at postmaster.c:3829
>     frame #12: 0x000000010cb9da22 postgres`PostmasterMain [inlined] ServerLoop at postmaster.c:1597
>     frame #13: 0x000000010cb9da22 postgres`PostmasterMain(argc=<unavailable>, argv=<unavailable>) + 7986 at
postmaster.c:1244
>     frame #14: 0x000000010cb218cd postgres`main(argc=<unavailable>, argv=<unavailable>) + 1325 at main.c:228
>     frame #15: 0x00007fff8e9a35fd libdyld.dylib`start + 1
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com

Вложения

Re: Fwd: Core dump with nested CREATE TEMP TABLE

От
Michael Paquier
Дата:


On Thu, Sep 3, 2015 at 1:46 AM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:
> The double set of exceptions seems to be critical; if instead of calling
> tf.get(invoice) (which recursively does tf.get(customer)) I define the
> cursor to call tf.get(customer) directly there's no assert.

Finally I have been able to crack down the problem, and it can be reproduced with the following test case for example:
BEGIN;
CREATE OR REPLACE FUNCTION create_self_tab() RETURNS text
LANGUAGE plpgsql AS $$
BEGIN
  CREATE TEMP TABLE new_table ON COMMIT DROP AS SELECT create_self_tab();
  RETURN 'foo';
END $$;
DECLARE h CURSOR FOR SELECT create_self_tab();
SAVEPOINT self_tab_point;
FETCH h; -- error
COMMIT;

When fetching the first tuple, the transaction status is cleaned up to the savepoint because the call actually fails in the second loop at the temporary relation exists, but it happens that the temporary table that has been created tried to be cleanup up but it is still referenced, causing the assertion failure. So that's not related to the use of the exception blocks. What directed me to the SAVEPOINT causing the issue is the use of ON_ERROR_ROLLBACK in the test case Jim proposed. I don't have a patch at hand yet, still now things are easier to test.
--
Michael

Re: Fwd: Core dump with nested CREATE TEMP TABLE

От
Tom Lane
Дата:
Michael Paquier <michael.paquier@gmail.com> writes:
> Finally I have been able to crack down the problem, and it can be
> reproduced with the following test case for example:

Hm.  It looks like the problem is that we're executing the function
within the Portal created for cursor "h", and that Portal is older
than the subtransaction created by the savepoint, so when we abort the
subtransaction we do not clean up the Portal.  That means that resources
held by that Portal haven't been released --- in particular it still has
a relcache pin on the "new_table" --- when subtransaction abort gets to
the point of wanting to drop relcache entries created during the
subtransaction.  So the Assert has caught an actual problem: we'd have
flushed the relcache entry while there was still an open reference to it.

I'm inclined to think that the only real fix for this type of problem
is that at subtransaction abort, we have to prevent further execution
of any Portals that have executed at all within the subxact (ie force
them into PORTAL_FAILED state and clean out their resources, see
MarkPortalFailed).  It's not good enough to kill the one(s) that are
actively executing at the instant of failure, because anything that's
run at all since the subxact started might be holding a reference to an
object we're about to delete.

I'm a little worried that that will break usage patterns that work today,
though.
        regards, tom lane



Re: Fwd: Core dump with nested CREATE TEMP TABLE

От
Tom Lane
Дата:
I wrote:
> I'm inclined to think that the only real fix for this type of problem
> is that at subtransaction abort, we have to prevent further execution
> of any Portals that have executed at all within the subxact (ie force
> them into PORTAL_FAILED state and clean out their resources, see
> MarkPortalFailed).  It's not good enough to kill the one(s) that are
> actively executing at the instant of failure, because anything that's
> run at all since the subxact started might be holding a reference to an
> object we're about to delete.

> I'm a little worried that that will break usage patterns that work today,
> though.

I experimented with the attached patch.  It seems to work to stop the
crash Michael exhibited (I've not tried to duplicate Jim's original
example, though).  However, it also causes a regression test failure,
because transactions.sql does this:

BEGIN;
    DECLARE c CURSOR FOR SELECT unique2 FROM tenk1 ORDER BY unique2;
    SAVEPOINT one;
        FETCH 10 FROM c;
    ROLLBACK TO SAVEPOINT one;
        FETCH 10 FROM c;

which is exactly the case we're trying to reject now.  So that fills
me with fear that this approach would break existing applications.
On the other hand, I do not really see a good alternative :-(.

I thought about trying to detect whether the Portal actually had any
references to "new in subtransaction" objects to decide whether to
kill it or not, but that seems awfully fragile.

Ideas?

            regards, tom lane

diff --git a/src/backend/commands/portalcmds.c b/src/backend/commands/portalcmds.c
index 2794537..327b2a5 100644
*** a/src/backend/commands/portalcmds.c
--- b/src/backend/commands/portalcmds.c
*************** PersistHoldablePortal(Portal portal)
*** 335,345 ****
      /*
       * Check for improper portal use, and mark portal active.
       */
!     if (portal->status != PORTAL_READY)
!         ereport(ERROR,
!                 (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
!                  errmsg("portal \"%s\" cannot be run", portal->name)));
!     portal->status = PORTAL_ACTIVE;

      /*
       * Set up global portal context pointers.
--- 335,341 ----
      /*
       * Check for improper portal use, and mark portal active.
       */
!     MarkPortalActive(portal);

      /*
       * Set up global portal context pointers.
diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index 9c14e8a..0df86a2 100644
*** a/src/backend/tcop/pquery.c
--- b/src/backend/tcop/pquery.c
*************** PortalRun(Portal portal, long count, boo
*** 734,744 ****
      /*
       * Check for improper portal use, and mark portal active.
       */
!     if (portal->status != PORTAL_READY)
!         ereport(ERROR,
!                 (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
!                  errmsg("portal \"%s\" cannot be run", portal->name)));
!     portal->status = PORTAL_ACTIVE;

      /*
       * Set up global portal context pointers.
--- 734,740 ----
      /*
       * Check for improper portal use, and mark portal active.
       */
!     MarkPortalActive(portal);

      /*
       * Set up global portal context pointers.
*************** PortalRunFetch(Portal portal,
*** 1398,1408 ****
      /*
       * Check for improper portal use, and mark portal active.
       */
!     if (portal->status != PORTAL_READY)
!         ereport(ERROR,
!                 (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
!                  errmsg("portal \"%s\" cannot be run", portal->name)));
!     portal->status = PORTAL_ACTIVE;

      /*
       * Set up global portal context pointers.
--- 1394,1400 ----
      /*
       * Check for improper portal use, and mark portal active.
       */
!     MarkPortalActive(portal);

      /*
       * Set up global portal context pointers.
diff --git a/src/backend/utils/mmgr/portalmem.c b/src/backend/utils/mmgr/portalmem.c
index 7530498..74f5089 100644
*** a/src/backend/utils/mmgr/portalmem.c
--- b/src/backend/utils/mmgr/portalmem.c
*************** CreatePortal(const char *name, bool allo
*** 232,237 ****
--- 232,238 ----
      portal->status = PORTAL_NEW;
      portal->cleanup = PortalCleanup;
      portal->createSubid = GetCurrentSubTransactionId();
+     portal->activeSubid = portal->createSubid;
      portal->strategy = PORTAL_MULTI_QUERY;
      portal->cursorOptions = CURSOR_OPT_NO_SCROLL;
      portal->atStart = true;
*************** UnpinPortal(Portal portal)
*** 403,408 ****
--- 404,428 ----
  }

  /*
+  * MarkPortalActive
+  *        Transition a portal from READY to ACTIVE state.
+  *
+  * NOTE: never set portal->status = PORTAL_ACTIVE directly; call this instead.
+  */
+ void
+ MarkPortalActive(Portal portal)
+ {
+     /* For safety, this is a runtime test not just an Assert */
+     if (portal->status != PORTAL_READY)
+         ereport(ERROR,
+                 (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+                  errmsg("portal \"%s\" cannot be run", portal->name)));
+     /* Perform the state transition */
+     portal->status = PORTAL_ACTIVE;
+     portal->activeSubid = GetCurrentSubTransactionId();
+ }
+
+ /*
   * MarkPortalDone
   *        Transition a portal from ACTIVE to DONE state.
   *
*************** PreCommit_Portals(bool isPrepare)
*** 690,695 ****
--- 710,716 ----
               * not belonging to this transaction.
               */
              portal->createSubid = InvalidSubTransactionId;
+             portal->activeSubid = InvalidSubTransactionId;

              /* Report we changed state */
              result = true;
*************** AtCleanup_Portals(void)
*** 836,843 ****
  /*
   * Pre-subcommit processing for portals.
   *
!  * Reassign the portals created in the current subtransaction to the parent
!  * subtransaction.
   */
  void
  AtSubCommit_Portals(SubTransactionId mySubid,
--- 857,864 ----
  /*
   * Pre-subcommit processing for portals.
   *
!  * Reassign portals created or used in the current subtransaction to the
!  * parent subtransaction.
   */
  void
  AtSubCommit_Portals(SubTransactionId mySubid,
*************** AtSubCommit_Portals(SubTransactionId myS
*** 859,872 ****
              if (portal->resowner)
                  ResourceOwnerNewParent(portal->resowner, parentXactOwner);
          }
      }
  }

  /*
   * Subtransaction abort handling for portals.
   *
!  * Deactivate portals created during the failed subtransaction.
!  * Note that per AtSubCommit_Portals, this will catch portals created
   * in descendants of the subtransaction too.
   *
   * We don't destroy any portals here; that's done in AtSubCleanup_Portals.
--- 880,895 ----
              if (portal->resowner)
                  ResourceOwnerNewParent(portal->resowner, parentXactOwner);
          }
+         if (portal->activeSubid == mySubid)
+             portal->activeSubid = parentSubid;
      }
  }

  /*
   * Subtransaction abort handling for portals.
   *
!  * Deactivate portals created or used during the failed subtransaction.
!  * Note that per AtSubCommit_Portals, this will catch portals created/used
   * in descendants of the subtransaction too.
   *
   * We don't destroy any portals here; that's done in AtSubCleanup_Portals.
*************** AtSubAbort_Portals(SubTransactionId mySu
*** 885,900 ****
      {
          Portal        portal = hentry->portal;

          if (portal->createSubid != mySubid)
              continue;

          /*
           * Force any live portals of my own subtransaction into FAILED state.
           * We have to do this because they might refer to objects created or
           * changed in the failed subtransaction, leading to crashes if
           * execution is resumed, or even if we just try to run ExecutorEnd.
-          * (Note we do NOT do this to upper-level portals, since they cannot
-          * have such references and hence may be able to continue.)
           */
          if (portal->status == PORTAL_READY ||
              portal->status == PORTAL_ACTIVE)
--- 908,956 ----
      {
          Portal        portal = hentry->portal;

+         /* Was it created in this subtransaction? */
          if (portal->createSubid != mySubid)
+         {
+             /* No, but maybe it was used in this subtransaction? */
+             if (portal->activeSubid == mySubid)
+             {
+                 /* Maintain activeSubid until the portal is removed */
+                 portal->activeSubid = parentSubid;
+
+                 /*
+                  * Upper-level portals that were used in this subxact must be
+                  * forced into FAILED state, for the same reasons discussed
+                  * below.
+                  */
+                 if (portal->status == PORTAL_READY ||
+                     portal->status == PORTAL_ACTIVE)
+                     MarkPortalFailed(portal);
+
+                 /*
+                  * Also, if we failed it during the current subxact (either
+                  * just above, or earlier), reattach its resowner to the
+                  * current subtransaction's resowner so that its resources get
+                  * cleaned up while we abort this subtransaction.  This is
+                  * needed to avoid getting Assert failures or worse when we
+                  * clean up objects created in the subtransaction.
+                  */
+                 if (portal->status == PORTAL_FAILED &&
+                     portal->resowner)
+                 {
+                     ResourceOwnerNewParent(portal->resowner,
+                                            CurrentResourceOwner);
+                     portal->resowner = NULL;
+                 }
+             }
+             /* If it wasn't created in this subxact, nothing more to do */
              continue;
+         }

          /*
           * Force any live portals of my own subtransaction into FAILED state.
           * We have to do this because they might refer to objects created or
           * changed in the failed subtransaction, leading to crashes if
           * execution is resumed, or even if we just try to run ExecutorEnd.
           */
          if (portal->status == PORTAL_READY ||
              portal->status == PORTAL_ACTIVE)
diff --git a/src/include/utils/portal.h b/src/include/utils/portal.h
index 1267b9e..7215e46 100644
*** a/src/include/utils/portal.h
--- b/src/include/utils/portal.h
*************** typedef struct PortalData
*** 119,130 ****
      MemoryContext heap;            /* subsidiary memory for portal */
      ResourceOwner resowner;        /* resources owned by portal */
      void        (*cleanup) (Portal portal);        /* cleanup hook */
-     SubTransactionId createSubid;        /* the ID of the creating subxact */

      /*
!      * if createSubid is InvalidSubTransactionId, the portal is held over from
!      * a previous transaction
       */

      /* The query or queries the portal will execute */
      const char *sourceText;        /* text of query (as of 8.4, never NULL) */
--- 119,136 ----
      MemoryContext heap;            /* subsidiary memory for portal */
      ResourceOwner resowner;        /* resources owned by portal */
      void        (*cleanup) (Portal portal);        /* cleanup hook */

      /*
!      * State data for remembering which subtransaction(s) the portal was
!      * created or used in.  If the portal is held over from a previous
!      * transaction, both subxids are InvalidSubTransactionId.  Otherwise,
!      * createSubid is the creating subxact and activeSubid is the last subxact
!      * in which we ran the portal.  We must deactivate the portal if either of
!      * these subxacts becomes aborted, since it might hold references to
!      * objects created in said subxact.
       */
+     SubTransactionId createSubid;        /* the creating subxact */
+     SubTransactionId activeSubid;        /* the last subxact with activity */

      /* The query or queries the portal will execute */
      const char *sourceText;        /* text of query (as of 8.4, never NULL) */
*************** extern Portal CreatePortal(const char *n
*** 207,212 ****
--- 213,219 ----
  extern Portal CreateNewPortal(void);
  extern void PinPortal(Portal portal);
  extern void UnpinPortal(Portal portal);
+ extern void MarkPortalActive(Portal portal);
  extern void MarkPortalDone(Portal portal);
  extern void MarkPortalFailed(Portal portal);
  extern void PortalDrop(Portal portal, bool isTopCommit);

Re: Fwd: Core dump with nested CREATE TEMP TABLE

От
Michael Paquier
Дата:
On Fri, Sep 4, 2015 at 6:51 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
I experimented with the attached patch.  It seems to work to stop the
crash Michael exhibited (I've not tried to duplicate Jim's original
example, though).  However, it also causes a regression test failure,
because transactions.sql does this:

Neat patch, it would have taken me longer to figure out that... I have tried with Jim's test case and the patch is working.
 
which is exactly the case we're trying to reject now.  So that fills
me with fear that this approach would break existing applications.
On the other hand, I do not really see a good alternative :-(.

This behavior is present since 2004 with fe548629, so that's a real concern to me, especially if there are drivers around relying on this behavior. There are for example some code patterns around Postgres ODBC that could be impacted, not sure which ones but I guess that some internal error handling is not going to like that.

I thought about trying to detect whether the Portal actually had any
references to "new in subtransaction" objects to decide whether to
kill it or not, but that seems awfully fragile.

Ideas?

Yes. This diff on top of your patch:
@@ -922,8 +922,7 @@ AtSubAbort_Portals(SubTransactionId mySubid,
                                 * must be forced into FAILED state, for the same reasons
                                 * discussed below.
                                 */
-                               if (portal->status == PORTAL_READY ||
-                                       portal->status == PORTAL_ACTIVE)
+                               if (portal->status == PORTAL_ACTIVE)
                                        MarkPortalFailed(portal);
This way only the active portals are marked as failed. The regression tests that are failing with your patch applied visibly do not activate the portal they use, just marking it as ready to be used. This seems to be the safest approach to me on stable branches, as well as on master, this way we are sure that resources on the failed subxacts are cleaned up correctly, and that existing applications are not impacted.

I am attaching a new patch with what I changed and a test case based on my previous one.
Regards,
--
Michael
Вложения

Re: Fwd: Core dump with nested CREATE TEMP TABLE

От
Tom Lane
Дата:
Michael Paquier <michael.paquier@gmail.com> writes:
> On Fri, Sep 4, 2015 at 6:51 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Ideas?

> Yes. This diff on top of your patch:
> @@ -922,8 +922,7 @@ AtSubAbort_Portals(SubTransactionId mySubid,
>                                  * must be forced into FAILED state, for
> the same reasons
>                                  * discussed below.
>                                  */
> -                               if (portal->status == PORTAL_READY ||
> -                                       portal->status == PORTAL_ACTIVE)
> +                               if (portal->status == PORTAL_ACTIVE)
>                                         MarkPortalFailed(portal);

> This way only the active portals are marked as failed.

Hmm.  I am not feeling especially comfortable about this: it's not clear
that there's anything preventing a suspended portal from containing a
dangerous reference.  However, a quick trial did not show that it was
possible to break it --- in the cases I tried, we detected that a cached
plan was no longer valid, tried to rebuild it, and noticed the missing
object at that point.  So maybe it's OK.

On reflection I think that the tracking of activeSubid in my patch is
probably overkill if we're attacking it this way.  We can just have
AtSubAbort_Portals fail any ACTIVE portal regardless of subxact level,
which is pretty analogous to what AtAbort_Portals has done for a long
time.

Let me work on this and see if I can get to a simpler patch.
        regards, tom lane



Re: Fwd: Core dump with nested CREATE TEMP TABLE

От
Michael Paquier
Дата:


On Fri, Sep 4, 2015 at 9:57 AM, Tom Lane wrote:
On reflection I think that the tracking of activeSubid in my patch is
probably overkill if we're attacking it this way.  We can just have
AtSubAbort_Portals fail any ACTIVE portal regardless of subxact level,
which is pretty analogous to what AtAbort_Portals has done for a long
time.

Tracking the activated subxact looked neat from my side, that's more consistent with what is done when the portal is marked as ready, particularly with the new routine introduced.
 
Let me work on this and see if I can get to a simpler patch.

Oh, OK. Thanks!
--
Michael

Re: Fwd: Core dump with nested CREATE TEMP TABLE

От
Tom Lane
Дата:
Michael Paquier <michael.paquier@gmail.com> writes:
> On Fri, Sep 4, 2015 at 9:57 AM, Tom Lane  <tgl@sss.pgh.pa.us>wrote:
>> On reflection I think that the tracking of activeSubid in my patch is
>> probably overkill if we're attacking it this way.  We can just have
>> AtSubAbort_Portals fail any ACTIVE portal regardless of subxact level,
>> which is pretty analogous to what AtAbort_Portals has done for a long
>> time.

> Tracking the activated subxact looked neat from my side, that's more
> consistent with what is done when the portal is marked as ready,
> particularly with the new routine introduced.

Actually, that idea did not work at all: it caused errors inside plpgsql
EXCEPT blocks to try to kill the portal running the outer function call.
Ooops.
        regards, tom lane



Re: Fwd: Core dump with nested CREATE TEMP TABLE

От
Tom Lane
Дата:
I wrote:
> Hmm.  I am not feeling especially comfortable about this: it's not clear
> that there's anything preventing a suspended portal from containing a
> dangerous reference.  However, a quick trial did not show that it was
> possible to break it --- in the cases I tried, we detected that a cached
> plan was no longer valid, tried to rebuild it, and noticed the missing
> object at that point.  So maybe it's OK.

After further thought I concluded that this probably is safe.  The
portal's original query was created and planned when it was opened,
so it cannot contain any references to objects of the failed
subtransaction.  We have a hazard from queries within functions,
but if the portal is suspended then no such functions are in progress.

Attached is an updated patch with comments to this effect and some
minor other code cleanup (mainly, not assuming that CurrentResourceOwner
is the right thing to use in AtSubAbort_Portals).

            regards, tom lane

diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index b53d95f..ae9e850 100644
*** a/src/backend/access/transam/xact.c
--- b/src/backend/access/transam/xact.c
*************** AbortSubTransaction(void)
*** 4585,4590 ****
--- 4585,4591 ----
          AfterTriggerEndSubXact(false);
          AtSubAbort_Portals(s->subTransactionId,
                             s->parent->subTransactionId,
+                            s->curTransactionOwner,
                             s->parent->curTransactionOwner);
          AtEOSubXact_LargeObject(false, s->subTransactionId,
                                  s->parent->subTransactionId);
diff --git a/src/backend/commands/portalcmds.c b/src/backend/commands/portalcmds.c
index 2794537..327b2a5 100644
*** a/src/backend/commands/portalcmds.c
--- b/src/backend/commands/portalcmds.c
*************** PersistHoldablePortal(Portal portal)
*** 335,345 ****
      /*
       * Check for improper portal use, and mark portal active.
       */
!     if (portal->status != PORTAL_READY)
!         ereport(ERROR,
!                 (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
!                  errmsg("portal \"%s\" cannot be run", portal->name)));
!     portal->status = PORTAL_ACTIVE;

      /*
       * Set up global portal context pointers.
--- 335,341 ----
      /*
       * Check for improper portal use, and mark portal active.
       */
!     MarkPortalActive(portal);

      /*
       * Set up global portal context pointers.
diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index 9c14e8a..0df86a2 100644
*** a/src/backend/tcop/pquery.c
--- b/src/backend/tcop/pquery.c
*************** PortalRun(Portal portal, long count, boo
*** 734,744 ****
      /*
       * Check for improper portal use, and mark portal active.
       */
!     if (portal->status != PORTAL_READY)
!         ereport(ERROR,
!                 (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
!                  errmsg("portal \"%s\" cannot be run", portal->name)));
!     portal->status = PORTAL_ACTIVE;

      /*
       * Set up global portal context pointers.
--- 734,740 ----
      /*
       * Check for improper portal use, and mark portal active.
       */
!     MarkPortalActive(portal);

      /*
       * Set up global portal context pointers.
*************** PortalRunFetch(Portal portal,
*** 1398,1408 ****
      /*
       * Check for improper portal use, and mark portal active.
       */
!     if (portal->status != PORTAL_READY)
!         ereport(ERROR,
!                 (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
!                  errmsg("portal \"%s\" cannot be run", portal->name)));
!     portal->status = PORTAL_ACTIVE;

      /*
       * Set up global portal context pointers.
--- 1394,1400 ----
      /*
       * Check for improper portal use, and mark portal active.
       */
!     MarkPortalActive(portal);

      /*
       * Set up global portal context pointers.
diff --git a/src/backend/utils/mmgr/portalmem.c b/src/backend/utils/mmgr/portalmem.c
index 7530498..44a87f7 100644
*** a/src/backend/utils/mmgr/portalmem.c
--- b/src/backend/utils/mmgr/portalmem.c
*************** CreatePortal(const char *name, bool allo
*** 232,237 ****
--- 232,238 ----
      portal->status = PORTAL_NEW;
      portal->cleanup = PortalCleanup;
      portal->createSubid = GetCurrentSubTransactionId();
+     portal->activeSubid = portal->createSubid;
      portal->strategy = PORTAL_MULTI_QUERY;
      portal->cursorOptions = CURSOR_OPT_NO_SCROLL;
      portal->atStart = true;
*************** UnpinPortal(Portal portal)
*** 403,408 ****
--- 404,428 ----
  }

  /*
+  * MarkPortalActive
+  *        Transition a portal from READY to ACTIVE state.
+  *
+  * NOTE: never set portal->status = PORTAL_ACTIVE directly; call this instead.
+  */
+ void
+ MarkPortalActive(Portal portal)
+ {
+     /* For safety, this is a runtime test not just an Assert */
+     if (portal->status != PORTAL_READY)
+         ereport(ERROR,
+                 (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+                  errmsg("portal \"%s\" cannot be run", portal->name)));
+     /* Perform the state transition */
+     portal->status = PORTAL_ACTIVE;
+     portal->activeSubid = GetCurrentSubTransactionId();
+ }
+
+ /*
   * MarkPortalDone
   *        Transition a portal from ACTIVE to DONE state.
   *
*************** PreCommit_Portals(bool isPrepare)
*** 690,695 ****
--- 710,716 ----
               * not belonging to this transaction.
               */
              portal->createSubid = InvalidSubTransactionId;
+             portal->activeSubid = InvalidSubTransactionId;

              /* Report we changed state */
              result = true;
*************** AtCleanup_Portals(void)
*** 836,843 ****
  /*
   * Pre-subcommit processing for portals.
   *
!  * Reassign the portals created in the current subtransaction to the parent
!  * subtransaction.
   */
  void
  AtSubCommit_Portals(SubTransactionId mySubid,
--- 857,864 ----
  /*
   * Pre-subcommit processing for portals.
   *
!  * Reassign portals created or used in the current subtransaction to the
!  * parent subtransaction.
   */
  void
  AtSubCommit_Portals(SubTransactionId mySubid,
*************** AtSubCommit_Portals(SubTransactionId myS
*** 859,872 ****
              if (portal->resowner)
                  ResourceOwnerNewParent(portal->resowner, parentXactOwner);
          }
      }
  }

  /*
   * Subtransaction abort handling for portals.
   *
!  * Deactivate portals created during the failed subtransaction.
!  * Note that per AtSubCommit_Portals, this will catch portals created
   * in descendants of the subtransaction too.
   *
   * We don't destroy any portals here; that's done in AtSubCleanup_Portals.
--- 880,895 ----
              if (portal->resowner)
                  ResourceOwnerNewParent(portal->resowner, parentXactOwner);
          }
+         if (portal->activeSubid == mySubid)
+             portal->activeSubid = parentSubid;
      }
  }

  /*
   * Subtransaction abort handling for portals.
   *
!  * Deactivate portals created or used during the failed subtransaction.
!  * Note that per AtSubCommit_Portals, this will catch portals created/used
   * in descendants of the subtransaction too.
   *
   * We don't destroy any portals here; that's done in AtSubCleanup_Portals.
*************** AtSubCommit_Portals(SubTransactionId myS
*** 874,879 ****
--- 897,903 ----
  void
  AtSubAbort_Portals(SubTransactionId mySubid,
                     SubTransactionId parentSubid,
+                    ResourceOwner myXactOwner,
                     ResourceOwner parentXactOwner)
  {
      HASH_SEQ_STATUS status;
*************** AtSubAbort_Portals(SubTransactionId mySu
*** 885,900 ****
      {
          Portal        portal = hentry->portal;

          if (portal->createSubid != mySubid)
              continue;

          /*
           * Force any live portals of my own subtransaction into FAILED state.
           * We have to do this because they might refer to objects created or
!          * changed in the failed subtransaction, leading to crashes if
!          * execution is resumed, or even if we just try to run ExecutorEnd.
!          * (Note we do NOT do this to upper-level portals, since they cannot
!          * have such references and hence may be able to continue.)
           */
          if (portal->status == PORTAL_READY ||
              portal->status == PORTAL_ACTIVE)
--- 909,966 ----
      {
          Portal        portal = hentry->portal;

+         /* Was it created in this subtransaction? */
          if (portal->createSubid != mySubid)
+         {
+             /* No, but maybe it was used in this subtransaction? */
+             if (portal->activeSubid == mySubid)
+             {
+                 /* Maintain activeSubid until the portal is removed */
+                 portal->activeSubid = parentSubid;
+
+                 /*
+                  * Upper-level portals that failed while running in this
+                  * subtransaction must be forced into FAILED state, for the
+                  * same reasons discussed below.
+                  *
+                  * We assume we can get away without forcing upper-level READY
+                  * portals to fail, even if they were run and then suspended.
+                  * In theory a suspended upper-level portal could have
+                  * acquired some references to objects that are about to be
+                  * destroyed, but there should be sufficient defenses against
+                  * such cases: the portal's original query cannot contain such
+                  * references, and any references within, say, cached plans of
+                  * PL/pgSQL functions are not from active queries and should
+                  * be protected by revalidation logic.
+                  */
+                 if (portal->status == PORTAL_ACTIVE)
+                     MarkPortalFailed(portal);
+
+                 /*
+                  * Also, if we failed it during the current subtransaction
+                  * (either just above, or earlier), reattach its resource
+                  * owner to the current subtransaction's resource owner, so
+                  * that any resources it still holds will be released while
+                  * cleaning up this subtransaction.  This prevents some corner
+                  * cases wherein we might get Asserts or worse while cleaning
+                  * up objects created during the current subtransaction
+                  * (because they're still referenced within this portal).
+                  */
+                 if (portal->status == PORTAL_FAILED && portal->resowner)
+                 {
+                     ResourceOwnerNewParent(portal->resowner, myXactOwner);
+                     portal->resowner = NULL;
+                 }
+             }
+             /* Done if it wasn't created in this subtransaction */
              continue;
+         }

          /*
           * Force any live portals of my own subtransaction into FAILED state.
           * We have to do this because they might refer to objects created or
!          * changed in the failed subtransaction, leading to crashes within
!          * ExecutorEnd when portalcmds.c tries to close down the portal.
           */
          if (portal->status == PORTAL_READY ||
              portal->status == PORTAL_ACTIVE)
diff --git a/src/include/utils/portal.h b/src/include/utils/portal.h
index 1267b9e..c369107 100644
*** a/src/include/utils/portal.h
--- b/src/include/utils/portal.h
*************** typedef struct PortalData
*** 119,130 ****
      MemoryContext heap;            /* subsidiary memory for portal */
      ResourceOwner resowner;        /* resources owned by portal */
      void        (*cleanup) (Portal portal);        /* cleanup hook */
-     SubTransactionId createSubid;        /* the ID of the creating subxact */

      /*
!      * if createSubid is InvalidSubTransactionId, the portal is held over from
!      * a previous transaction
       */

      /* The query or queries the portal will execute */
      const char *sourceText;        /* text of query (as of 8.4, never NULL) */
--- 119,134 ----
      MemoryContext heap;            /* subsidiary memory for portal */
      ResourceOwner resowner;        /* resources owned by portal */
      void        (*cleanup) (Portal portal);        /* cleanup hook */

      /*
!      * State data for remembering which subtransaction(s) the portal was
!      * created or used in.  If the portal is held over from a previous
!      * transaction, both subxids are InvalidSubTransactionId.  Otherwise,
!      * createSubid is the creating subxact and activeSubid is the last subxact
!      * in which we ran the portal.
       */
+     SubTransactionId createSubid;        /* the creating subxact */
+     SubTransactionId activeSubid;        /* the last subxact with activity */

      /* The query or queries the portal will execute */
      const char *sourceText;        /* text of query (as of 8.4, never NULL) */
*************** extern void AtSubCommit_Portals(SubTrans
*** 201,212 ****
--- 205,218 ----
                      ResourceOwner parentXactOwner);
  extern void AtSubAbort_Portals(SubTransactionId mySubid,
                     SubTransactionId parentSubid,
+                    ResourceOwner myXactOwner,
                     ResourceOwner parentXactOwner);
  extern void AtSubCleanup_Portals(SubTransactionId mySubid);
  extern Portal CreatePortal(const char *name, bool allowDup, bool dupSilent);
  extern Portal CreateNewPortal(void);
  extern void PinPortal(Portal portal);
  extern void UnpinPortal(Portal portal);
+ extern void MarkPortalActive(Portal portal);
  extern void MarkPortalDone(Portal portal);
  extern void MarkPortalFailed(Portal portal);
  extern void PortalDrop(Portal portal, bool isTopCommit);
diff --git a/src/test/regress/expected/transactions.out b/src/test/regress/expected/transactions.out
index 5d70863..d9b702d 100644
*** a/src/test/regress/expected/transactions.out
--- b/src/test/regress/expected/transactions.out
*************** fetch from foo;
*** 613,618 ****
--- 613,664 ----
  (1 row)

  abort;
+ -- Test for proper cleanup after a failure in a cursor portal
+ -- that was created in an outer subtransaction
+ CREATE FUNCTION invert(x float8) RETURNS float8 LANGUAGE plpgsql AS
+ $$ begin return 1/x; end $$;
+ CREATE FUNCTION create_temp_tab() RETURNS text
+ LANGUAGE plpgsql AS $$
+ BEGIN
+   CREATE TEMP TABLE new_table (f1 float8);
+   -- case of interest is that we fail while holding an open
+   -- relcache reference to new_table
+   INSERT INTO new_table SELECT invert(0.0);
+   RETURN 'foo';
+ END $$;
+ BEGIN;
+ DECLARE ok CURSOR FOR SELECT * FROM int8_tbl;
+ DECLARE ctt CURSOR FOR SELECT create_temp_tab();
+ FETCH ok;
+  q1  | q2
+ -----+-----
+  123 | 456
+ (1 row)
+
+ SAVEPOINT s1;
+ FETCH ok;  -- should work
+  q1  |        q2
+ -----+------------------
+  123 | 4567890123456789
+ (1 row)
+
+ FETCH ctt; -- error occurs here
+ ERROR:  division by zero
+ CONTEXT:  PL/pgSQL function invert(double precision) line 1 at RETURN
+ SQL statement "INSERT INTO new_table SELECT invert(0.0)"
+ PL/pgSQL function create_temp_tab() line 6 at SQL statement
+ ROLLBACK TO s1;
+ FETCH ok;  -- should work
+         q1        | q2
+ ------------------+-----
+  4567890123456789 | 123
+ (1 row)
+
+ FETCH ctt; -- must be rejected
+ ERROR:  portal "ctt" cannot be run
+ COMMIT;
+ DROP FUNCTION create_temp_tab();
+ DROP FUNCTION invert(x float8);
  -- Test for successful cleanup of an aborted transaction at session exit.
  -- THIS MUST BE THE LAST TEST IN THIS FILE.
  begin;
diff --git a/src/test/regress/sql/transactions.sql b/src/test/regress/sql/transactions.sql
index 9fac4a3..bf9cb05 100644
*** a/src/test/regress/sql/transactions.sql
--- b/src/test/regress/sql/transactions.sql
*************** fetch from foo;
*** 387,392 ****
--- 387,424 ----

  abort;

+
+ -- Test for proper cleanup after a failure in a cursor portal
+ -- that was created in an outer subtransaction
+ CREATE FUNCTION invert(x float8) RETURNS float8 LANGUAGE plpgsql AS
+ $$ begin return 1/x; end $$;
+
+ CREATE FUNCTION create_temp_tab() RETURNS text
+ LANGUAGE plpgsql AS $$
+ BEGIN
+   CREATE TEMP TABLE new_table (f1 float8);
+   -- case of interest is that we fail while holding an open
+   -- relcache reference to new_table
+   INSERT INTO new_table SELECT invert(0.0);
+   RETURN 'foo';
+ END $$;
+
+ BEGIN;
+ DECLARE ok CURSOR FOR SELECT * FROM int8_tbl;
+ DECLARE ctt CURSOR FOR SELECT create_temp_tab();
+ FETCH ok;
+ SAVEPOINT s1;
+ FETCH ok;  -- should work
+ FETCH ctt; -- error occurs here
+ ROLLBACK TO s1;
+ FETCH ok;  -- should work
+ FETCH ctt; -- must be rejected
+ COMMIT;
+
+ DROP FUNCTION create_temp_tab();
+ DROP FUNCTION invert(x float8);
+
+
  -- Test for successful cleanup of an aborted transaction at session exit.
  -- THIS MUST BE THE LAST TEST IN THIS FILE.


Re: Fwd: Core dump with nested CREATE TEMP TABLE

От
Michael Paquier
Дата:
<div dir="ltr"><br /><div class="gmail_extra"><br /><div class="gmail_quote">On Fri, Sep 4, 2015 at 12:04 PM, Tom Lane
<spandir="ltr"><<a href="mailto:tgl@sss.pgh.pa.us" target="_blank">tgl@sss.pgh.pa.us</a>></span> wrote:<br
/><blockquoteclass="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">I
wrote:<br/> > Hmm.  I am not feeling especially comfortable about this: it's not clear<br /> > that there's
anythingpreventing a suspended portal from containing a<br /> > dangerous reference.  However, a quick trial did not
showthat it was<br /> > possible to break it --- in the cases I tried, we detected that a cached<br /> > plan was
nolonger valid, tried to rebuild it, and noticed the missing<br /> > object at that point.  So maybe it's OK.<br
/><br/></span>After further thought I concluded that this probably is safe.  The<br /> portal's original query was
createdand planned when it was opened,<br /> so it cannot contain any references to objects of the failed<br />
subtransaction. We have a hazard from queries within functions,<br /> but if the portal is suspended then no such
functionsare in progress.<br /><br /> Attached is an updated patch with comments to this effect and some<br /> minor
othercode cleanup (mainly, not assuming that CurrentResourceOwner<br /> is the right thing to use in
AtSubAbort_Portals).<br/></blockquote></div><br /></div><div class="gmail_extra">Thanks! This looks good to me.<br />--
<br/><div class="gmail_signature">Michael<br /></div></div></div> 

Re: Fwd: Core dump with nested CREATE TEMP TABLE

От
Tom Lane
Дата:
Michael Paquier <michael.paquier@gmail.com> writes:
> On Fri, Sep 4, 2015 at 12:04 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Attached is an updated patch with comments to this effect and some
>> minor other code cleanup (mainly, not assuming that CurrentResourceOwner
>> is the right thing to use in AtSubAbort_Portals).

> Thanks! This looks good to me.

If no further comments, I'll see about pushing and back-patching this.

In the stable branches, the new field in Portal needs to go at the end,
to avoid an ABI break for extensions that look at Portals.  Otherwise I'm
inclined to push it as-is.  In particular, that would break any extensions
that call AtSubAbort_Portals() directly, but I'm having a really hard time
believing that there are any.  (If anyone can make a case that there are
some, the fallback would be to remove the new argument in the back
branches and make AtSubAbort_Portals depend on CurrentResourceOwner again;
but I'd rather not do it that way.)

Also, I am very strongly tempted to convert the original problem-causing
Assert in relcache.c into a regular runtime test-and-elog.  If we're wrong
about having sealed off this hole completely, I'd much rather see the
result be an elog than silent corruption of the relcache.
        regards, tom lane



Re: Fwd: Core dump with nested CREATE TEMP TABLE

От
Michael Paquier
Дата:
<div dir="ltr"><br /><div class="gmail_extra"><br /><div class="gmail_quote">On Fri, Sep 4, 2015 at 10:52 PM, Tom Lane
<spandir="ltr"><<a href="mailto:tgl@sss.pgh.pa.us" target="_blank">tgl@sss.pgh.pa.us</a>></span> wrote:<br
/><blockquoteclass="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> Also, I am very
stronglytempted to convert the original problem-causing<br /> Assert in relcache.c into a regular runtime
test-and-elog. If we're wrong<br /> about having sealed off this hole completely, I'd much rather see the<br /> result
bean elog than silent corruption of the relcache.<br /></blockquote></div><br /></div><div class="gmail_extra">+1. For
aHEAD-only change I suppose.<br />-- <br /><div class="gmail_signature">Michael<br /></div></div></div> 

Re: Fwd: Core dump with nested CREATE TEMP TABLE

От
Tom Lane
Дата:
Michael Paquier <michael.paquier@gmail.com> writes:
> On Fri, Sep 4, 2015 at 10:52 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Also, I am very strongly tempted to convert the original problem-causing
>> Assert in relcache.c into a regular runtime test-and-elog.  If we're wrong
>> about having sealed off this hole completely, I'd much rather see the
>> result be an elog than silent corruption of the relcache.

> +1. For a HEAD-only change I suppose.

No, the point is to have less dangerous behavior *in the field* if the
situation occurs.  I have every intention of back-patching this bit as
well.

Although after some experimentation with a build lacking the Portal-level
fixes, an elog(ERROR) in RelationClearRelation isn't much fun either,
because the problem case causes errors during error cleanup, soon leading
to "PANIC: ERRORDATA_STACK_SIZE exceeded".

After review of all the callers of RelationClearRelation, it seems like
most of them already have sufficient guards to prevent triggering of the
Assert.  The ones that lack such tests are AtEOXact_cleanup and
AtEOSubXact_cleanup.  So what I'm now thinking is that those should do
something along the lines of
    if (isCommit)        relation->rd_createSubid = parentSubid;    else if (RelationHasReferenceCountZero(relation))
{        RelationClearRelation(relation, false);        return;    }    else    {        elog(WARNING, "rel so-and-so
isstill referenced");        relation->rd_createSubid = parentSubid;    }
 

so as to mitigate the damage if there's a dangling reference.
        regards, tom lane



Re: Fwd: Core dump with nested CREATE TEMP TABLE

От
Tom Lane
Дата:
I wrote:
> After review of all the callers of RelationClearRelation, it seems like
> most of them already have sufficient guards to prevent triggering of the
> Assert.  The ones that lack such tests are AtEOXact_cleanup and
> AtEOSubXact_cleanup.  So what I'm now thinking is that those should do
> something along the lines of

Specifically, this, which can be shown to mitigate the results of the
problem cases in an otherwise-unpatched build.

            regards, tom lane

diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index 44e9509..420ef3d 100644
*** a/src/backend/utils/cache/relcache.c
--- b/src/backend/utils/cache/relcache.c
*************** RelationClearRelation(Relation relation,
*** 2048,2054 ****
  {
      /*
       * As per notes above, a rel to be rebuilt MUST have refcnt > 0; while of
!      * course it would be a bad idea to blow away one with nonzero refcnt.
       */
      Assert(rebuild ?
             !RelationHasReferenceCountZero(relation) :
--- 2048,2056 ----
  {
      /*
       * As per notes above, a rel to be rebuilt MUST have refcnt > 0; while of
!      * course it would be an equally bad idea to blow away one with nonzero
!      * refcnt, since that would leave someone somewhere with a dangling
!      * pointer.  All callers are expected to have verified that this holds.
       */
      Assert(rebuild ?
             !RelationHasReferenceCountZero(relation) :
*************** AtEOXact_cleanup(Relation relation, bool
*** 2654,2664 ****
      {
          if (isCommit)
              relation->rd_createSubid = InvalidSubTransactionId;
!         else
          {
              RelationClearRelation(relation, false);
              return;
          }
      }

      /*
--- 2656,2680 ----
      {
          if (isCommit)
              relation->rd_createSubid = InvalidSubTransactionId;
!         else if (RelationHasReferenceCountZero(relation))
          {
              RelationClearRelation(relation, false);
              return;
          }
+         else
+         {
+             /*
+              * Hmm, somewhere there's a (leaked?) reference to the relation.
+              * We daren't remove the entry for fear of dereferencing a
+              * dangling pointer later.  Bleat, and mark it as not belonging to
+              * the current transaction.  Hopefully it'll get cleaned up
+              * eventually.  This must be just a WARNING to avoid
+              * error-during-error-recovery loops.
+              */
+             relation->rd_createSubid = InvalidSubTransactionId;
+             elog(WARNING, "cannot remove relcache entry for \"%s\" because it has nonzero refcount",
+                  RelationGetRelationName(relation));
+         }
      }

      /*
*************** AtEOSubXact_cleanup(Relation relation, b
*** 2747,2757 ****
      {
          if (isCommit)
              relation->rd_createSubid = parentSubid;
!         else
          {
              RelationClearRelation(relation, false);
              return;
          }
      }

      /*
--- 2763,2786 ----
      {
          if (isCommit)
              relation->rd_createSubid = parentSubid;
!         else if (RelationHasReferenceCountZero(relation))
          {
              RelationClearRelation(relation, false);
              return;
          }
+         else
+         {
+             /*
+              * Hmm, somewhere there's a (leaked?) reference to the relation.
+              * We daren't remove the entry for fear of dereferencing a
+              * dangling pointer later.  Bleat, and transfer it to the parent
+              * subtransaction so we can try again later.  This must be just a
+              * WARNING to avoid error-during-error-recovery loops.
+              */
+             relation->rd_createSubid = parentSubid;
+             elog(WARNING, "cannot remove relcache entry for \"%s\" because it has nonzero refcount",
+                  RelationGetRelationName(relation));
+         }
      }

      /*

Re: Fwd: Core dump with nested CREATE TEMP TABLE

От
Tom Lane
Дата:
I wrote:
>> After review of all the callers of RelationClearRelation, it seems like
>> most of them already have sufficient guards to prevent triggering of the
>> Assert.  The ones that lack such tests are AtEOXact_cleanup and
>> AtEOSubXact_cleanup.  So what I'm now thinking is that those should do
>> something along the lines of

> Specifically, this, which can be shown to mitigate the results of the
> problem cases in an otherwise-unpatched build.

And pushed.  I noticed that while the relcache.c fix mitigates the error
pretty well in 9.3 and up, in the older branches you still end up with
a PANIC due to error stack overflow.  This may indicate that there's
some patch we'd have been better off back-porting.  However, with the
Portal changes in place the test cases work in all branches, so I'm
not excited enough to pursue the point further myself.
        regards, tom lane



Re: Fwd: Core dump with nested CREATE TEMP TABLE

От
Jim Nasby
Дата:
On 9/4/15 12:41 PM, Tom Lane wrote:
> I wrote:
>>> After review of all the callers of RelationClearRelation, it seems like
>>> most of them already have sufficient guards to prevent triggering of the
>>> Assert.  The ones that lack such tests are AtEOXact_cleanup and
>>> AtEOSubXact_cleanup.  So what I'm now thinking is that those should do
>>> something along the lines of
>
>> Specifically, this, which can be shown to mitigate the results of the
>> problem cases in an otherwise-unpatched build.
>
> And pushed.  I noticed that while the relcache.c fix mitigates the error
> pretty well in 9.3 and up, in the older branches you still end up with
> a PANIC due to error stack overflow.  This may indicate that there's
> some patch we'd have been better off back-porting.  However, with the
> Portal changes in place the test cases work in all branches, so I'm
> not excited enough to pursue the point further myself.

I was about to test this and was verifying that the crash still worked 
when I noticed this in the logs (9.4.1, not HEAD). Not sure if there's 
any realion here or not...

WARNING:  relation "pg_proc" page 121 is uninitialized --- fixing
WARNING:  relation "pg_proc" page 122 is uninitialized --- fixing
WARNING:  relation "pg_proc" page 123 is uninitialized --- fixing
WARNING:  relation "pg_proc" page 124 is uninitialized --- fixing
WARNING:  relation "pg_proc" page 125 is uninitialized --- fixing
WARNING:  relation "pg_proc" page 126 is uninitialized --- fixing
WARNING:  relation "pg_proc" page 127 is uninitialized --- fixing
WARNING:  relation "pg_proc" page 128 is uninitialized --- fixing
WARNING:  relation "pg_proc" page 129 is uninitialized --- fixing
WARNING:  relation "pg_proc" page 130 is uninitialized --- fixing
WARNING:  relation "pg_proc" page 131 is uninitialized --- fixing
WARNING:  relation "pg_proc" page 132 is uninitialized --- fixing
WARNING:  relation "pg_proc" page 133 is uninitialized --- fixing
WARNING:  relation "pg_proc" page 134 is uninitialized --- fixing
WARNING:  relation "pg_proc" page 135 is uninitialized --- fixing
WARNING:  relation "pg_proc" page 136 is uninitialized --- fixing
WARNING:  relation "pg_proc" page 137 is uninitialized --- fixing
WARNING:  relation "pg_proc" page 138 is uninitialized --- fixing
WARNING:  relation "pg_proc" page 139 is uninitialized --- fixing
WARNING:  relation "pg_proc" page 140 is uninitialized --- fixing
WARNING:  relation "pg_proc" page 141 is uninitialized --- fixing
WARNING:  relation "pg_proc" page 142 is uninitialized --- fixing
WARNING:  relation "pg_proc" page 143 is uninitialized --- fixing
WARNING:  relation "pg_proc" page 144 is uninitialized --- fixing
WARNING:  relation "pg_proc" page 145 is uninitialized --- fixing
WARNING:  relation "pg_proc" page 146 is uninitialized --- fixing
WARNING:  relation "pg_proc" page 147 is uninitialized --- fixing
WARNING:  relation "pg_proc" page 148 is uninitialized --- fixing
WARNING:  relation "pg_proc" page 149 is uninitialized --- fixing
WARNING:  relation "pg_proc" page 150 is uninitialized --- fixing
WARNING:  relation "pg_proc" page 151 is uninitialized --- fixing
WARNING:  relation "pg_proc" page 152 is uninitialized --- fixing
WARNING:  relation "pg_proc" page 153 is uninitialized --- fixing
WARNING:  relation "pg_proc" page 154 is uninitialized --- fixing
WARNING:  relation "pg_proc" page 155 is uninitialized --- fixing
WARNING:  relation "pg_proc" page 156 is uninitialized --- fixing
WARNING:  relation "pg_proc" page 157 is uninitialized --- fixing
WARNING:  relation "pg_proc" page 158 is uninitialized --- fixing
WARNING:  relation "pg_proc" page 159 is uninitialized --- fixing
WARNING:  relation "pg_proc" page 160 is uninitialized --- fixing
WARNING:  relation "pg_proc" page 161 is uninitialized --- fixing
WARNING:  relation "pg_proc" page 162 is uninitialized --- fixing
WARNING:  relation "pg_proc" page 246 is uninitialized --- fixing
WARNING:  relation "pg_depend" page 82 is uninitialized --- fixing
WARNING:  relation "pg_depend" page 83 is uninitialized --- fixing
WARNING:  relation "pg_depend" page 84 is uninitialized --- fixing
WARNING:  relation "pg_depend" page 85 is uninitialized --- fixing
WARNING:  relation "pg_depend" page 86 is uninitialized --- fixing
WARNING:  relation "pg_depend" page 87 is uninitialized --- fixing
WARNING:  relation "pg_depend" page 88 is uninitialized --- fixing
WARNING:  relation "pg_depend" page 89 is uninitialized --- fixing
WARNING:  relation "pg_depend" page 90 is uninitialized --- fixing
WARNING:  relation "pg_depend" page 91 is uninitialized --- fixing
WARNING:  relation "pg_depend" page 92 is uninitialized --- fixing
WARNING:  relation "pg_depend" page 93 is uninitialized --- fixing
WARNING:  relation "pg_depend" page 94 is uninitialized --- fixing
WARNING:  relation "pg_depend" page 95 is uninitialized --- fixing
WARNING:  relation "pg_depend" page 96 is uninitialized --- fixing
WARNING:  relation "pg_depend" page 112 is uninitialized --- fixing
WARNING:  relation "pg_depend" page 128 is uninitialized --- fixing
-- 
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com



Re: Fwd: Core dump with nested CREATE TEMP TABLE

От
Michael Paquier
Дата:


On Sat, Sep 5, 2015 at 3:41 PM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:
I was about to test this and was verifying that the crash still worked when I noticed this in the logs (9.4.1, not HEAD). Not sure if there's any realion here or not...

WARNING:  relation "pg_proc" page 121 is uninitialized --- fixing
WARNING:  relation "pg_proc" page 122 is uninitialized --- fixing

[reading vacuumlazy.c...] This seems unrelated and I would not worry about it. Those system catalogs have been extended with new pages by a couple of backends, but a crash happened before they could actually insert tuples on it and commit. Perhaps you were creating a bunch of objects when a crash happened, no?

Coming to the point, did you see a new crash with test_factory? Is that some remnant of a previous test?
--
Michael

Re: Fwd: Core dump with nested CREATE TEMP TABLE

От
Tom Lane
Дата:
Michael Paquier <michael.paquier@gmail.com> writes:
> On Sat, Sep 5, 2015 at 3:41 PM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:
>> I was about to test this and was verifying that the crash still worked
>> when I noticed this in the logs (9.4.1, not HEAD). Not sure if there's any
>> realion here or not...
>> 
>> WARNING:  relation "pg_proc" page 121 is uninitialized --- fixing
>> WARNING:  relation "pg_proc" page 122 is uninitialized --- fixing

> [reading vacuumlazy.c...] This seems unrelated and I would not worry about
> it. Those system catalogs have been extended with new pages by a couple of
> backends, but a crash happened before they could actually insert tuples on
> it and commit. Perhaps you were creating a bunch of objects when a crash
> happened, no?

If this is the system you were doing the previous crash testing on, I'd
say it's a direct artifact of those crashes.  You were repeatedly crashing
the system during transactions that created functions and temp tables,
which would surely make entries in pg_proc, pg_depend, etc.  The crash
came before commit and so neither the relation pages nor the WAL entries
would necessarily get to disk.  But extension of the relations to make
room would still happen.  Moreover, there's probably nothing to cause
those new pages to get added to the relations' FSM, so each new test
attempt would add another set of pages.  Eventually autovacuum would
notice, which is what you've got logged here, but it might take awhile.
        regards, tom lane



Re: Fwd: Core dump with nested CREATE TEMP TABLE

От
Noah Misch
Дата:
On Thu, Sep 03, 2015 at 11:04:11PM -0400, Tom Lane wrote:
> *************** AtSubAbort_Portals(SubTransactionId mySu

> --- 909,966 ----
>       {
>           Portal        portal = hentry->portal;
>   
> +         /* Was it created in this subtransaction? */
>           if (portal->createSubid != mySubid)
> +         {
> +             /* No, but maybe it was used in this subtransaction? */
> +             if (portal->activeSubid == mySubid)
> +             {
...
> +                 if (portal->status == PORTAL_ACTIVE)
> +                     MarkPortalFailed(portal);

Do you have a test case that reaches this particular MarkPortalFailed() call?
My attempts stumbled over the fact that, before we reach here, each of the
three MarkPortalActive() callers will have already called MarkPortalFailed()
in its PG_CATCH block.  ("make check" does not reach this call.)



Re: Fwd: Core dump with nested CREATE TEMP TABLE

От
Tom Lane
Дата:
Noah Misch <noah@leadboat.com> writes:
> On Thu, Sep 03, 2015 at 11:04:11PM -0400, Tom Lane wrote:
>> *************** AtSubAbort_Portals(SubTransactionId mySu

>> --- 909,966 ----
>> {
>> Portal        portal = hentry->portal;
>> 
>> +         /* Was it created in this subtransaction? */
>> if (portal->createSubid != mySubid)
>> +         {
>> +             /* No, but maybe it was used in this subtransaction? */
>> +             if (portal->activeSubid == mySubid)
>> +             {
> ...
>> +                 if (portal->status == PORTAL_ACTIVE)
>> +                     MarkPortalFailed(portal);

> Do you have a test case that reaches this particular MarkPortalFailed() call?
> My attempts stumbled over the fact that, before we reach here, each of the
> three MarkPortalActive() callers will have already called MarkPortalFailed()
> in its PG_CATCH block.  ("make check" does not reach this call.)

Offhand I think that's just belt-and-suspenders-too coding.  As you say,
we'd typically have failed active portals already before getting here.
But the responsibility of this routine is to *guarantee* that no broken
portals remain active, so I'd not want to remove this check.

Do you have a particular reason for asking?
        regards, tom lane



Re: Fwd: Core dump with nested CREATE TEMP TABLE

От
Noah Misch
Дата:
On Sat, Jan 02, 2016 at 01:46:07PM -0500, Tom Lane wrote:
> Noah Misch <noah@leadboat.com> writes:
> > On Thu, Sep 03, 2015 at 11:04:11PM -0400, Tom Lane wrote:
> >> *************** AtSubAbort_Portals(SubTransactionId mySu
> 
> >> --- 909,966 ----
> >> {
> >> Portal        portal = hentry->portal;
> >> 
> >> +         /* Was it created in this subtransaction? */
> >> if (portal->createSubid != mySubid)
> >> +         {
> >> +             /* No, but maybe it was used in this subtransaction? */
> >> +             if (portal->activeSubid == mySubid)
> >> +             {
> > ...
> >> +                 if (portal->status == PORTAL_ACTIVE)
> >> +                     MarkPortalFailed(portal);
> 
> > Do you have a test case that reaches this particular MarkPortalFailed() call?
> > My attempts stumbled over the fact that, before we reach here, each of the
> > three MarkPortalActive() callers will have already called MarkPortalFailed()
> > in its PG_CATCH block.  ("make check" does not reach this call.)
> 
> Offhand I think that's just belt-and-suspenders-too coding.  As you say,
> we'd typically have failed active portals already before getting here.
> But the responsibility of this routine is to *guarantee* that no broken
> portals remain active, so I'd not want to remove this check.
> 
> Do you have a particular reason for asking?

After reading the log message for and comments added in commit c5454f9, I
understood that the commit changed PostgreSQL to fail portals that did not
fail before.  That is to say, queries that formerly completed without error
would now elicit errors.  I was looking into what sorts of queries it affected
in this way.  If the new MarkPortalFailed() is indeed dead code, then the
commit affects no query that way.  The meat of the commit is then the
ResourceOwnerNewParent() call, which lets queries ERROR cleanly instead of
seeing assertion failures or undefined behavior.

I am inclined to add an Assert(portal->status != PORTAL_ACTIVE) to emphasize
that this is backup only.  MarkPortalActive() callers remain responsible for
updating the status to something else before relinquishing control.



Re: Fwd: Core dump with nested CREATE TEMP TABLE

От
Tom Lane
Дата:
Noah Misch <noah@leadboat.com> writes:
> I am inclined to add an Assert(portal->status != PORTAL_ACTIVE) to emphasize
> that this is backup only.  MarkPortalActive() callers remain responsible for
> updating the status to something else before relinquishing control.

No, I do not think that would be an improvement.  There is no contract
saying that this must be done earlier, IMO.
        regards, tom lane



Re: Fwd: Core dump with nested CREATE TEMP TABLE

От
Noah Misch
Дата:
On Sat, Jan 02, 2016 at 07:22:13PM -0500, Tom Lane wrote:
> Noah Misch <noah@leadboat.com> writes:
> > I am inclined to add an Assert(portal->status != PORTAL_ACTIVE) to emphasize
> > that this is backup only.  MarkPortalActive() callers remain responsible for
> > updating the status to something else before relinquishing control.
> 
> No, I do not think that would be an improvement.  There is no contract
> saying that this must be done earlier, IMO.

Indeed, nobody wrote a contract.  The assertion would record what has been the
sole standing practice for eleven years (since commit a393fbf9).  It would
prompt discussion if a proposed patch would depart from that practice, and
that is a good thing.  Also, every addition of dead code should label that
code to aid future readers.



Re: Fwd: Core dump with nested CREATE TEMP TABLE

От
Noah Misch
Дата:
On Sun, Jan 03, 2016 at 12:35:56AM -0500, Noah Misch wrote:
> On Sat, Jan 02, 2016 at 07:22:13PM -0500, Tom Lane wrote:
> > Noah Misch <noah@leadboat.com> writes:
> > > I am inclined to add an Assert(portal->status != PORTAL_ACTIVE) to emphasize
> > > that this is backup only.  MarkPortalActive() callers remain responsible for
> > > updating the status to something else before relinquishing control.
> >
> > No, I do not think that would be an improvement.  There is no contract
> > saying that this must be done earlier, IMO.
>
> Indeed, nobody wrote a contract.  The assertion would record what has been the
> sole standing practice for eleven years (since commit a393fbf9).  It would
> prompt discussion if a proposed patch would depart from that practice, and
> that is a good thing.  Also, every addition of dead code should label that
> code to aid future readers.

Here's the patch I have in mind.  AtAbort_Portals() has an older
MarkPortalFailed() call, and the story is somewhat different there per its new
comment.  That call is plausible to reach with no bug involved, but
MarkPortalFailed() would then be the wrong thing.

nm

Вложения

Re: Fwd: Core dump with nested CREATE TEMP TABLE

От
Michael Paquier
Дата:
On Thu, Jan 28, 2016 at 12:40 PM, Noah Misch <noah@leadboat.com> wrote:
> On Sun, Jan 03, 2016 at 12:35:56AM -0500, Noah Misch wrote:
>> On Sat, Jan 02, 2016 at 07:22:13PM -0500, Tom Lane wrote:
>> > Noah Misch <noah@leadboat.com> writes:
>> > > I am inclined to add an Assert(portal->status != PORTAL_ACTIVE) to emphasize
>> > > that this is backup only.  MarkPortalActive() callers remain responsible for
>> > > updating the status to something else before relinquishing control.
>> >
>> > No, I do not think that would be an improvement.  There is no contract
>> > saying that this must be done earlier, IMO.
>>
>> Indeed, nobody wrote a contract.  The assertion would record what has been the
>> sole standing practice for eleven years (since commit a393fbf9).  It would
>> prompt discussion if a proposed patch would depart from that practice, and
>> that is a good thing.  Also, every addition of dead code should label that
>> code to aid future readers.
>
> Here's the patch I have in mind.  AtAbort_Portals() has an older
> MarkPortalFailed() call, and the story is somewhat different there per its new
> comment.  That call is plausible to reach with no bug involved, but
> MarkPortalFailed() would then be the wrong thing.

+         * fresh transaction.  No part of core PostgreSQL functions that way,
+         * though it's a fair thing to want.  Such code would wish the portal
From the point of view of core code, this stands true, but, for my 2c,
honestly, I think that is just going to annoy more people working on
plugins and forks of Postgres. When working on Postgres-XC and
developing stuff for the core code, I recall having been annoyed a
couple of times by similar assert limitations, because sometimes they
did not actually make sense in the context of what I was doing, and
other times things got suddendly broken after bumping the forks code
base to a major upgrades because a routine used up to now broke its
contract. Perhaps I was doing it wrong at this point though, and
should have used something else.
-- 
Michael



Re: Fwd: Core dump with nested CREATE TEMP TABLE

От
Robert Haas
Дата:
On Wed, Jan 27, 2016 at 10:40 PM, Noah Misch <noah@leadboat.com> wrote:
> On Sun, Jan 03, 2016 at 12:35:56AM -0500, Noah Misch wrote:
>> On Sat, Jan 02, 2016 at 07:22:13PM -0500, Tom Lane wrote:
>> > Noah Misch <noah@leadboat.com> writes:
>> > > I am inclined to add an Assert(portal->status != PORTAL_ACTIVE) to emphasize
>> > > that this is backup only.  MarkPortalActive() callers remain responsible for
>> > > updating the status to something else before relinquishing control.
>> >
>> > No, I do not think that would be an improvement.  There is no contract
>> > saying that this must be done earlier, IMO.
>>
>> Indeed, nobody wrote a contract.  The assertion would record what has been the
>> sole standing practice for eleven years (since commit a393fbf9).  It would
>> prompt discussion if a proposed patch would depart from that practice, and
>> that is a good thing.  Also, every addition of dead code should label that
>> code to aid future readers.
>
> Here's the patch I have in mind.  AtAbort_Portals() has an older
> MarkPortalFailed() call, and the story is somewhat different there per its new
> comment.  That call is plausible to reach with no bug involved, but
> MarkPortalFailed() would then be the wrong thing.

+                Assert(portal->status != PORTAL_ACTIVE);                if (portal->status == PORTAL_ACTIVE)
        MarkPortalFailed(portal);
 

Now that just looks kooky to me.  We assert that the portal isn't
active, but then cater to the possibility that it might be anyway?
That seems totally contrary to our usual programming practices, and a
bad idea for that reason.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Fwd: Core dump with nested CREATE TEMP TABLE

От
Noah Misch
Дата:
On Thu, Jan 28, 2016 at 12:57:36PM +0900, Michael Paquier wrote:
> On Thu, Jan 28, 2016 at 12:40 PM, Noah Misch <noah@leadboat.com> wrote:
> +         * fresh transaction.  No part of core PostgreSQL functions that way,
> +         * though it's a fair thing to want.  Such code would wish the portal

> From the point of view of core code, this stands true, but, for my 2c,
> honestly, I think that is just going to annoy more people working on
> plugins and forks of Postgres. When working on Postgres-XC and
> developing stuff for the core code, I recall having been annoyed a
> couple of times by similar assert limitations

At first, I left out that assertion in case some extension code did the thing
I described, perhaps in a background worker.  I then realized that
MarkPortalFailed() is the wrong thing for such code, which would want
treatment similar to this bit of PreCommit_Portals():
    /*     * Do not touch active portals --- this can only happen in the case of     * a multi-transaction utility
command,such as VACUUM.     *     * Note however that any resource owner attached to such a portal is     * still going
togo away, so don't leave a dangling pointer.     */    if (portal->status == PORTAL_ACTIVE)    {
portal->resowner= NULL;        continue;    }
 

If you can think of a case where the code would work okay despite its active
portal being marked as failed, that would be a good reason to omit the one
assertion.  Otherwise, an assertion seems better than silently doing the
known-wrong thing.



Re: Fwd: Core dump with nested CREATE TEMP TABLE

От
Noah Misch
Дата:
On Wed, Jan 27, 2016 at 11:04:33PM -0500, Robert Haas wrote:
> +                Assert(portal->status != PORTAL_ACTIVE);
>                  if (portal->status == PORTAL_ACTIVE)
>                      MarkPortalFailed(portal);
> 
> Now that just looks kooky to me.  We assert that the portal isn't
> active, but then cater to the possibility that it might be anyway?

Right.

> That seems totally contrary to our usual programming practices, and a
> bad idea for that reason.

It is contrary to our usual programming practices, I agree.  I borrowed the
idea from untenured code (da3751c8, 2015-11-11) in load_relcache_init_file():
    if (nailed_rels != NUM_CRITICAL_SHARED_RELS ||        nailed_indexes != NUM_CRITICAL_SHARED_INDEXES)    {
elog(WARNING,"found %d nailed shared rels and %d nailed shared indexes in init file, but expected %d and %d
respectively",            nailed_rels, nailed_indexes,             NUM_CRITICAL_SHARED_RELS,
NUM_CRITICAL_SHARED_INDEXES);       /* Make sure we get developers' attention about this */        Assert(false);
 

I liked this pattern.  It's a good fit for cases that we design to be
impossible yet for which we have a workaround if they do happen.  That being
said, if you feel it's bad, I would be fine using elog(FATAL).  I envision
this as a master-only change in any case.  No PGXN module references
PORTAL_ACTIVE or MarkPortalActive(), so it's unlikely that extension code will
notice the change whether in Assert() form or in elog() form.  What is best?



Re: Fwd: Core dump with nested CREATE TEMP TABLE

От
Robert Haas
Дата:
On Wed, Jan 27, 2016 at 11:47 PM, Noah Misch <noah@leadboat.com> wrote:
> On Wed, Jan 27, 2016 at 11:04:33PM -0500, Robert Haas wrote:
>> +                Assert(portal->status != PORTAL_ACTIVE);
>>                  if (portal->status == PORTAL_ACTIVE)
>>                      MarkPortalFailed(portal);
>>
>> Now that just looks kooky to me.  We assert that the portal isn't
>> active, but then cater to the possibility that it might be anyway?
>
> Right.
>
>> That seems totally contrary to our usual programming practices, and a
>> bad idea for that reason.
>
> It is contrary to our usual programming practices, I agree.  I borrowed the
> idea from untenured code (da3751c8, 2015-11-11) in load_relcache_init_file():
>
>                 if (nailed_rels != NUM_CRITICAL_SHARED_RELS ||
>                         nailed_indexes != NUM_CRITICAL_SHARED_INDEXES)
>                 {
>                         elog(WARNING, "found %d nailed shared rels and %d nailed shared indexes in init file, but
expected%d and %d respectively",
 
>                                  nailed_rels, nailed_indexes,
>                                  NUM_CRITICAL_SHARED_RELS, NUM_CRITICAL_SHARED_INDEXES);
>                         /* Make sure we get developers' attention about this */
>                         Assert(false);
>
> I liked this pattern.  It's a good fit for cases that we design to be
> impossible yet for which we have a workaround if they do happen.  That being
> said, if you feel it's bad, I would be fine using elog(FATAL).  I envision
> this as a master-only change in any case.  No PGXN module references
> PORTAL_ACTIVE or MarkPortalActive(), so it's unlikely that extension code will
> notice the change whether in Assert() form or in elog() form.  What is best?

I'm honestly failing to understand why we should change anything at
all.  I don't believe that doing something more severe than marking
the portal failed actually improves anything.  I suppose if I had to
pick between what you proposed before and elog(FATAL) I'd pick the
latter, but I see no real reason to cut off future code (or
third-party code) at the knees like that.  I don't see it as either
necessary or desirable to forbid something just because there's no
clear present use case for it.  The code you quote emits a warning
about a reasonably forseeable situation that can never be right, but
there's no particular reason to think that MarkPortalFailed is the
wrong thing to do here if that situation came up.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Fwd: Core dump with nested CREATE TEMP TABLE

От
Noah Misch
Дата:
On Thu, Jan 28, 2016 at 11:12:15AM -0500, Robert Haas wrote:
> I'm honestly failing to understand why we should change anything at
> all.  I don't believe that doing something more severe than marking
> the portal failed actually improves anything.  I suppose if I had to
> pick between what you proposed before and elog(FATAL) I'd pick the
> latter, but I see no real reason to cut off future code (or
> third-party code) at the knees like that.  I don't see it as either
> necessary or desirable to forbid something just because there's no
> clear present use case for it.

As you say, forbidding things makes friction in the event that someone comes
along wanting to do the forbidden thing.  Forbidding things also simplifies
the system, making it easier to verify.  This decision should depend on the
API's audience.  We have prominently-public APIs like lsyscache.h,
stringinfo.h and funcapi.h.  They should cater to reasonably-foreseeable use
cases, not just what yesterday's users have actually used.  We then have
narrow-interest APIs like subselect.h and view.h.  For those, the value of a
simpler system exceeds the risk-adjusted cost of friction.  They should impose
strict constraints on their callers.

Portal status belongs to the second category.  PortalRun(), PortalRunFetch()
and PersistHoldablePortal() are the three core functions that place portals
into PORTAL_ACTIVE status.  No PGXN module does so; none so much as references
a PortalStatus constant or MarkPortal{Active,Done,Failed}() function.  If
someone adds a fourth core portal runner, the system will be simpler and
better if that function replicates the structure of the existing three.

> The code you quote emits a warning
> about a reasonably forseeable situation that can never be right, but
> there's no particular reason to think that MarkPortalFailed is the
> wrong thing to do here if that situation came up.

I have two reasons to expect these MarkPortalFailed() calls will be the wrong
thing for hypothetical code reaching them.  First, PortalRun() and peers reset
ActivePortal and PortalContext on error in addition to fixing portal status
with MarkPortalFailed().  If code forgets to update the status, there's a
substantial chance it forgot the other two.  My patch added a comment
explaining the second reason:

+        /*
+         * See similar code in AtSubAbort_Portals().  This would fire if code
+         * orchestrating multiple top-level transactions within a portal, such
+         * as VACUUM, caught errors and continued under the same portal with a
+         * fresh transaction.  No part of core PostgreSQL functions that way,
+         * though it's a fair thing to want.  Such code would wish the portal
+         * to remain ACTIVE, as in PreCommit_Portals(); we don't cater to
+         * that.  Instead, like AtSubAbort_Portals(), interpret this as a bug.
+         */



Re: Fwd: Core dump with nested CREATE TEMP TABLE

От
Robert Haas
Дата:
On Fri, Jan 29, 2016 at 11:19 PM, Noah Misch <noah@leadboat.com> wrote:
> As you say, forbidding things makes friction in the event that someone comes
> along wanting to do the forbidden thing.  Forbidding things also simplifies
> the system, making it easier to verify.  This decision should depend on the
> API's audience.  We have prominently-public APIs like lsyscache.h,
> stringinfo.h and funcapi.h.  They should cater to reasonably-foreseeable use
> cases, not just what yesterday's users have actually used.  We then have
> narrow-interest APIs like subselect.h and view.h.  For those, the value of a
> simpler system exceeds the risk-adjusted cost of friction.  They should impose
> strict constraints on their callers.
>
> Portal status belongs to the second category.  PortalRun(), PortalRunFetch()
> and PersistHoldablePortal() are the three core functions that place portals
> into PORTAL_ACTIVE status.  No PGXN module does so; none so much as references
> a PortalStatus constant or MarkPortal{Active,Done,Failed}() function.  If
> someone adds a fourth core portal runner, the system will be simpler and
> better if that function replicates the structure of the existing three.

I won't argue with that, but it strikes me that if I were reviewing a
patch to do such a thing, I'd surely compare the new caller against
the existing ones, so any failure to adhere to the same design pattern
would be caught that way.  I expect other reviewers and/or committers
would almost certainly do something similar.  If we presuppose
incompetence on the part of future reviewers and committers, no action
taken now can secure us.

>> The code you quote emits a warning
>> about a reasonably forseeable situation that can never be right, but
>> there's no particular reason to think that MarkPortalFailed is the
>> wrong thing to do here if that situation came up.
>
> I have two reasons to expect these MarkPortalFailed() calls will be the wrong
> thing for hypothetical code reaching them.  First, PortalRun() and peers reset
> ActivePortal and PortalContext on error in addition to fixing portal status
> with MarkPortalFailed().  If code forgets to update the status, there's a
> substantial chance it forgot the other two.  My patch added a comment
> explaining the second reason:
>
> +               /*
> +                * See similar code in AtSubAbort_Portals().  This would fire if code
> +                * orchestrating multiple top-level transactions within a portal, such
> +                * as VACUUM, caught errors and continued under the same portal with a
> +                * fresh transaction.  No part of core PostgreSQL functions that way,
> +                * though it's a fair thing to want.  Such code would wish the portal
> +                * to remain ACTIVE, as in PreCommit_Portals(); we don't cater to
> +                * that.  Instead, like AtSubAbort_Portals(), interpret this as a bug.
> +                */

You may be right, but then again Tom had a different opinion, even
after seeing your patch, and he's no dummy.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Fwd: Core dump with nested CREATE TEMP TABLE

От
Noah Misch
Дата:
On Sat, Jan 30, 2016 at 07:37:45AM -0500, Robert Haas wrote:
> On Fri, Jan 29, 2016 at 11:19 PM, Noah Misch <noah@leadboat.com> wrote:
> > As you say, forbidding things makes friction in the event that someone comes
> > along wanting to do the forbidden thing.  Forbidding things also simplifies
> > the system, making it easier to verify.  This decision should depend on the
> > API's audience.  We have prominently-public APIs like lsyscache.h,
> > stringinfo.h and funcapi.h.  They should cater to reasonably-foreseeable use
> > cases, not just what yesterday's users have actually used.  We then have
> > narrow-interest APIs like subselect.h and view.h.  For those, the value of a
> > simpler system exceeds the risk-adjusted cost of friction.  They should impose
> > strict constraints on their callers.
> >
> > Portal status belongs to the second category.  PortalRun(), PortalRunFetch()
> > and PersistHoldablePortal() are the three core functions that place portals
> > into PORTAL_ACTIVE status.  No PGXN module does so; none so much as references
> > a PortalStatus constant or MarkPortal{Active,Done,Failed}() function.  If
> > someone adds a fourth core portal runner, the system will be simpler and
> > better if that function replicates the structure of the existing three.
> 
> I won't argue with that, but it strikes me that if I were reviewing a
> patch to do such a thing, I'd surely compare the new caller against
> the existing ones, so any failure to adhere to the same design pattern
> would be caught that way.  I expect other reviewers and/or committers
> would almost certainly do something similar.  If we presuppose
> incompetence on the part of future reviewers and committers, no action
> taken now can secure us.

You could offer that paragraph as an objection to almost all Assert(), elog(),
and automated tests.  Why levy it against this patch?  The valuable ways
assertions and tests supplement review are well-established.

> You may be right, but then again Tom had a different opinion, even
> after seeing your patch, and he's no dummy.

Eh?  Tom last posted to this thread before I first posted a patch.



Re: Fwd: Core dump with nested CREATE TEMP TABLE

От
Robert Haas
Дата:
On Sat, Jan 30, 2016 at 2:13 PM, Noah Misch <noah@leadboat.com> wrote:
> You could offer that paragraph as an objection to almost all Assert(), elog(),
> and automated tests.  Why levy it against this patch?  The valuable ways
> assertions and tests supplement review are well-established.

Sure, that's true, but I don't view all situations in the same way, so
I don't write the same thing in answer to each one.  I think I've
pretty much said what I have to say about this; if nothing I wrote up
until now swayed you, it's unlikely that anything else I say after
this point will either.

>> You may be right, but then again Tom had a different opinion, even
>> after seeing your patch, and he's no dummy.
>
> Eh?  Tom last posted to this thread before I first posted a patch.

http://www.postgresql.org/message-id/29758.1451780533@sss.pgh.pa.us
seems to me to be a vote against the concept embodied by the patch.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Fwd: Core dump with nested CREATE TEMP TABLE

От
Noah Misch
Дата:
On Sat, Jan 30, 2016 at 04:28:47PM -0500, Robert Haas wrote:
> I think I've
> pretty much said what I have to say about this; if nothing I wrote up
> until now swayed you, it's unlikely that anything else I say after
> this point will either.

Say I drop the parts that change the binary.  Does the attached v2 manage to
improve PostgreSQL, or is it neutral-or-harmful like v1?

On Sat, Jan 30, 2016 at 04:28:47PM -0500, Robert Haas wrote:
> On Sat, Jan 30, 2016 at 2:13 PM, Noah Misch <noah@leadboat.com> wrote:
> > On Sat, Jan 30, 2016 at 07:37:45AM -0500, Robert Haas wrote:
> > > On Fri, Jan 29, 2016 at 11:19 PM, Noah Misch <noah@leadboat.com> wrote:
> > > > On Thu, Jan 28, 2016 at 11:12:15AM -0500, Robert Haas wrote:
> > > >> The code you quote emits a warning
> > > >> about a reasonably forseeable situation that can never be right, but
> > > >> there's no particular reason to think that MarkPortalFailed is the
> > > >> wrong thing to do here if that situation came up.
> > > >
> > > > I have two reasons to expect these MarkPortalFailed() calls will be the wrong
> > > > thing for hypothetical code reaching them.  First, PortalRun() and peers reset
> > > > ActivePortal and PortalContext on error in addition to fixing portal status
> > > > with MarkPortalFailed().  If code forgets to update the status, there's a
> > > > substantial chance it forgot the other two.  My patch added a comment
> > > > explaining the second reason:
> > > >
> > > > +               /*
> > > > +                * See similar code in AtSubAbort_Portals().  This would fire if code
> > > > +                * orchestrating multiple top-level transactions within a portal, such
> > > > +                * as VACUUM, caught errors and continued under the same portal with a
> > > > +                * fresh transaction.  No part of core PostgreSQL functions that way,
> > > > +                * though it's a fair thing to want.  Such code would wish the portal
> > > > +                * to remain ACTIVE, as in PreCommit_Portals(); we don't cater to
> > > > +                * that.  Instead, like AtSubAbort_Portals(), interpret this as a bug.
> > > > +                */
> > >
> > > You may be right, but then again Tom had a different opinion, even
> > > after seeing your patch, and he's no dummy.
> >
> > Eh?  Tom last posted to this thread before I first posted a patch.
>
> http://www.postgresql.org/message-id/29758.1451780533@sss.pgh.pa.us
> seems to me to be a vote against the concept embodied by the patch.

That much is true.  The order of postings is the opposite of what you stated,
and there's no mailing list evidence that anyone other than you has read my
explanation of why MarkPortalFailed() is wrong here.  Alternately, I demand
the schematics for Tom's time machine.

Вложения

Re: Fwd: Core dump with nested CREATE TEMP TABLE

От
Tom Lane
Дата:
Noah Misch <noah@leadboat.com> writes:
> On Sat, Jan 30, 2016 at 04:28:47PM -0500, Robert Haas wrote:
>> I think I've
>> pretty much said what I have to say about this; if nothing I wrote up
>> until now swayed you, it's unlikely that anything else I say after
>> this point will either.

> Say I drop the parts that change the binary.  Does the attached v2 manage to
> improve PostgreSQL, or is it neutral-or-harmful like v1?

There can surely be no objection to improving these comments.  However,
I'm not convinced that we should word the comments to insist that the
hypothetical cases are bugs.  As I said before, I do not think there is an
API contract that would promise that portals don't reach here in ACTIVE
state.  So IMO it's fair to note that no such case can arise currently,
but not to state that it's a bug if it does.  So for example I'd reword
your last comment addition along the lines of "Currently, every
MarkPortalActive() caller ensures it updates the portal status again
before relinquishing control, so that ACTIVE can't happen here.  If it
does happen, dispose the portal like existing MarkPortalActive() callers
would."
        regards, tom lane



Re: Fwd: Core dump with nested CREATE TEMP TABLE

От
Robert Haas
Дата:
On Sun, Jan 31, 2016 at 11:50 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Noah Misch <noah@leadboat.com> writes:
>> On Sat, Jan 30, 2016 at 04:28:47PM -0500, Robert Haas wrote:
>>> I think I've
>>> pretty much said what I have to say about this; if nothing I wrote up
>>> until now swayed you, it's unlikely that anything else I say after
>>> this point will either.
>
>> Say I drop the parts that change the binary.  Does the attached v2 manage to
>> improve PostgreSQL, or is it neutral-or-harmful like v1?
>
> There can surely be no objection to improving these comments.  However,
> I'm not convinced that we should word the comments to insist that the
> hypothetical cases are bugs.  As I said before, I do not think there is an
> API contract that would promise that portals don't reach here in ACTIVE
> state.  So IMO it's fair to note that no such case can arise currently,
> but not to state that it's a bug if it does.  So for example I'd reword
> your last comment addition along the lines of "Currently, every
> MarkPortalActive() caller ensures it updates the portal status again
> before relinquishing control, so that ACTIVE can't happen here.  If it
> does happen, dispose the portal like existing MarkPortalActive() callers
> would."

+1.  I think Noah's comment additions constitute useful and helpful
information, but I too am doubtful about the characterization of the
situations in question as bugs.  I don't see what the evidence for
that is, especially given that it's quite hard to predict how the code
might change in the future.  However, to reiterate, a more neutral
position of the same facts would get my vote.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company