Обсуждение: assertion in 9.4 with wal_level=logical

Поиск
Список
Период
Сортировка

assertion in 9.4 with wal_level=logical

От
Steve Singer
Дата:
With master/9.4 from today (52e757420fa98a76015c2c88432db94269f3e8f4)

I am getting an assertion when doing a truncate via SPI when I have 
wal_level=logical.

Stack trace is below.

I am just replicating a table with normal slony (2.2) I don't need to 
establish any replication slots to get this.




(gdb) where
#0  0x00007fc9b4f58295 in __GI_raise (sig=sig@entry=6)    at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1  0x00007fc9b4f5b438 in __GI_abort () at abort.c:90
#2  0x00000000007a10f7 in ExceptionalCondition (    conditionName=conditionName@entry=0x955d90 "!(CritSectionCount == 0

|| (CurrentMemoryContext) == ErrorContext || (MyAuxProcType == 
CheckpointerProcess))",    errorType=errorType@entry=0x7da7b0 "FailedAssertion",    fileName=fileName@entry=0x955a2e
"mcxt.c",
 
lineNumber=lineNumber@entry=670)    at assert.c:54
#3  0x00000000007c3090 in palloc (size=16) at mcxt.c:670
#4  0x00000000004dd83f in mXactCacheGetById (members=0x7fff679a3d18, 
multi=58)    at multixact.c:1411
#5  GetMultiXactIdMembers (multi=58, members=members@entry=0x7fff679a3d18,    allow_old=allow_old@entry=0 '\000') at
multixact.c:1080
#6  0x000000000049e43f in MultiXactIdGetUpdateXid (xmax=<optimized out>,    t_infomask=<optimized out>) at
heapam.c:6042
#7  0x00000000004a1ccc in HeapTupleGetUpdateXid (tuple=<optimized out>)    at heapam.c:6083
#8  0x00000000007cf7d9 in HeapTupleHeaderGetCmax 
(tup=tup@entry=0x7fc9ac838e38)    at combocid.c:122
#9  0x000000000049eb98 in log_heap_new_cid (    relation=relation@entry=0x7fc9b5a67dc0, tup=tup@entry=0x7fff679a3ea0)
at heapam.c:7047
 
#10 0x00000000004a48a5 in heap_update 
(relation=relation@entry=0x7fc9b5a67dc0,    otid=otid@entry=0x2678c6c, newtup=newtup@entry=0x2678c68, cid=26,
crosscheck=crosscheck@entry=0x0,wait=wait@entry=1 '\001',    hufd=hufd@entry=0x7fff679a4080,
lockmode=lockmode@entry=0x7fff679a407c)   at heapam.c:3734
 
#11 0x00000000004a5842 in simple_heap_update (    relation=relation@entry=0x7fc9b5a67dc0, otid=otid@entry=0x2678c6c,
tup=tup@entry=0x2678c68)at heapam.c:4010
 
#12 0x0000000000797cf7 in RelationSetNewRelfilenode (    relation=relation@entry=0x7fc9ab270b68, freezeXid=19459,
minmulti=minmulti@entry=58)at relcache.c:2956
 
#13 0x000000000059ddde in ExecuteTruncate (stmt=0x3a, stmt@entry=0x2678a58)    at tablecmds.c:1187
#14 0x00000000006c3870 in standard_ProcessUtility (parsetree=0x2678a58,    queryString=<optimized out>,
context=<optimizedout>, params=0x0,    dest=<optimized out>, completionTag=<optimized out>) at utility.c:515
 
#15 0x00000000005e79d9 in _SPI_execute_plan 
(plan=plan@entry=0x7fff679a4320,    paramLI=paramLI@entry=0x0, snapshot=snapshot@entry=0x0,
crosscheck_snapshot=crosscheck_snapshot@entry=0x0,   read_only=read_only@entry=0 '\000',
fire_triggers=fire_triggers@entry=1'\001', tcount=tcount@entry=0)    at spi.c:2171
 
#16 0x00000000005e83c1 in SPI_execute (
#16 0x00000000005e83c1 in SPI_execute (
---Type <return> to continue, or q <return> to quit---    src=src@entry=0x25bde90 "truncate only
\"disorder\".\"do_restock\"",   read_only=0 '\000', tcount=tcount@entry=0) at spi.c:386
 



Re: assertion in 9.4 with wal_level=logical

От
Andres Freund
Дата:
Hi,

On 2014-04-17 16:23:54 -0400, Steve Singer wrote:
> With master/9.4 from today (52e757420fa98a76015c2c88432db94269f3e8f4)
> 
> I am getting an assertion when doing a truncate via SPI when I have
> wal_level=logical.
> 
> Stack trace is below.
> 
> I am just replicating a table with normal slony (2.2) I don't need to
> establish any replication slots to get this.

Uh, that's somewhat nasty... You probably only get that because of
slony's habit of share locking catalogs. Could that be?

For now, to circumvent the problem you could just revert
4a170ee9e0ebd7021cb1190fabd5b0cbe2effb8e for now.

I'll look into fixing it properly.

Greetings,

Andres Freund



Re: assertion in 9.4 with wal_level=logical

От
Alvaro Herrera
Дата:
Steve Singer wrote:
> With master/9.4 from today (52e757420fa98a76015c2c88432db94269f3e8f4)
> 
> I am getting an assertion when doing a truncate via SPI when I have
> wal_level=logical.
> 
> Stack trace is below.
> 
> I am just replicating a table with normal slony (2.2) I don't need
> to establish any replication slots to get this.

For once, this looks more like a problem in logical decoding, which is
trying to assert about the tuple being updated; the assertion failing is
the one added a week ago about not palloc'ing in a critical section.

Andres told me on IM that there's another weird thing going on (which is
how the catalog tuple gets a multixact in the first place) which is that
Slony does a SELECT FOR SHARE in the catalog tuple.

One simple approach would be to just disable that particular assert when
in a critical section.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services



Re: assertion in 9.4 with wal_level=logical

От
Andres Freund
Дата:
On 2014-04-17 17:40:01 -0300, Alvaro Herrera wrote:
> For once, this looks more like a problem in logical decoding, which is
> trying to assert about the tuple being updated; the assertion failing is
> the one added a week ago about not palloc'ing in a critical section.

It's this (older) assertion in HeapTupleHeaderGetCmax():
Assert(TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetUpdateXid(tup)));

That can allocate memory if xmax is a multixact... Does anybody have a
better idea to solve this than adding a CritSectionCount == 0 && in
there?

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: assertion in 9.4 with wal_level=logical

От
Steve Singer
Дата:
On 04/17/2014 04:33 PM, Andres Freund wrote:
> Hi,
>
> On 2014-04-17 16:23:54 -0400, Steve Singer wrote:
>> With master/9.4 from today (52e757420fa98a76015c2c88432db94269f3e8f4)
>>
>> I am getting an assertion when doing a truncate via SPI when I have
>> wal_level=logical.
>>
>> Stack trace is below.
>>
>> I am just replicating a table with normal slony (2.2) I don't need to
>> establish any replication slots to get this.
> Uh, that's somewhat nasty... You probably only get that because of
> slony's habit of share locking catalogs. Could that be?

Yes slony does a select from pg_catalog and pg_namespace  in  the stored 
function just before doing the truncate.


> For now, to circumvent the problem you could just revert
> 4a170ee9e0ebd7021cb1190fabd5b0cbe2effb8e for now.
>
> I'll look into fixing it properly.
>
> Greetings,
>
> Andres Freund
>
>




Re: assertion in 9.4 with wal_level=logical

От
Robert Haas
Дата:
On Thu, Apr 17, 2014 at 10:47 PM, Andres Freund <andres@2ndquadrant.com> wrote:
> On 2014-04-17 17:40:01 -0300, Alvaro Herrera wrote:
>> For once, this looks more like a problem in logical decoding, which is
>> trying to assert about the tuple being updated; the assertion failing is
>> the one added a week ago about not palloc'ing in a critical section.
>
> It's this (older) assertion in HeapTupleHeaderGetCmax():
>
>         Assert(TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetUpdateXid(tup)));
>
> That can allocate memory if xmax is a multixact... Does anybody have a
> better idea to solve this than adding a CritSectionCount == 0 && in
> there?

Blech.  Isn't that just nerfing the assertion?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: assertion in 9.4 with wal_level=logical

От
Alvaro Herrera
Дата:
Robert Haas wrote:
> On Thu, Apr 17, 2014 at 10:47 PM, Andres Freund <andres@2ndquadrant.com> wrote:
> > On 2014-04-17 17:40:01 -0300, Alvaro Herrera wrote:
> >> For once, this looks more like a problem in logical decoding, which is
> >> trying to assert about the tuple being updated; the assertion failing is
> >> the one added a week ago about not palloc'ing in a critical section.
> >
> > It's this (older) assertion in HeapTupleHeaderGetCmax():
> >
> >         Assert(TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetUpdateXid(tup)));
> >
> > That can allocate memory if xmax is a multixact... Does anybody have a
> > better idea to solve this than adding a CritSectionCount == 0 && in
> > there?
> 
> Blech.  Isn't that just nerfing the assertion?

Well, that's exactly the point.  Most of the time,
HeapTupleHeaderGetCmax gets called in a non-critical section, and we
want to run the assertion in that case.  But it's not huge trouble if
the assertion is not run in the rare case where HeapTupleHeaderGetCmax
is being used to write a Xlog record.

It's a bit painful that HeapTupleHeaderGetUpdateXid allocates memory,
but to fix that we would have to remove all allocations from
GetMultiXactIdMembers which doesn't sound feasible.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services



Re: assertion in 9.4 with wal_level=logical

От
Andres Freund
Дата:
On 2014-04-18 16:44:55 +0200, Robert Haas wrote:
> On Thu, Apr 17, 2014 at 10:47 PM, Andres Freund <andres@2ndquadrant.com> wrote:
> > On 2014-04-17 17:40:01 -0300, Alvaro Herrera wrote:
> >> For once, this looks more like a problem in logical decoding, which is
> >> trying to assert about the tuple being updated; the assertion failing is
> >> the one added a week ago about not palloc'ing in a critical section.
> >
> > It's this (older) assertion in HeapTupleHeaderGetCmax():
> >
> >         Assert(TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetUpdateXid(tup)));
> >
> > That can allocate memory if xmax is a multixact... Does anybody have a
> > better idea to solve this than adding a CritSectionCount == 0 && in
> > there?
> 
> Blech.  Isn't that just nerfing the assertion?

Not precicisely sure what you mean, but the only memory allocation in
HeapTupleHeaderGetCmax() and log_heap_new_cid() is that Assert(). And
that's the only "forbidden" thing in that codepath.
Now, we could alternatively restructure the codepaths so they pass in
xmax from outside the critical section, but I had a quick look and the
risk/complications from that seems bigger than the assertion buys us
there.
I don't have a better idea unfortunately :(

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: assertion in 9.4 with wal_level=logical

От
Andres Freund
Дата:
On 2014-04-18 11:50:55 -0300, Alvaro Herrera wrote:
> It's a bit painful that HeapTupleHeaderGetUpdateXid allocates memory,
> but to fix that we would have to remove all allocations from
> GetMultiXactIdMembers which doesn't sound feasible.

I was thinking for a second we could just allocate something during
startup based on max_connections (+ other possible backends), but I
think that won't work because of subtransactions?

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: assertion in 9.4 with wal_level=logical

От
Alvaro Herrera
Дата:
Andres Freund wrote:
> On 2014-04-18 11:50:55 -0300, Alvaro Herrera wrote:
> > It's a bit painful that HeapTupleHeaderGetUpdateXid allocates memory,
> > but to fix that we would have to remove all allocations from
> > GetMultiXactIdMembers which doesn't sound feasible.
> 
> I was thinking for a second we could just allocate something during
> startup based on max_connections (+ other possible backends), but I
> think that won't work because of subtransactions?

Yeah, I don't think the number of members in a multixact is bound.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services



Re: assertion in 9.4 with wal_level=logical

От
Andres Freund
Дата:
On 2014-04-18 11:50:55 -0300, Alvaro Herrera wrote:
> Robert Haas wrote:
> > On Thu, Apr 17, 2014 at 10:47 PM, Andres Freund <andres@2ndquadrant.com> wrote:
> > > It's this (older) assertion in HeapTupleHeaderGetCmax():
> > >
> > >         Assert(TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetUpdateXid(tup)));
> > >
> > > That can allocate memory if xmax is a multixact... Does anybody have a
> > > better idea to solve this than adding a CritSectionCount == 0 && in
> > > there?
> > 
> > Blech.  Isn't that just nerfing the assertion?
> 
> Well, that's exactly the point.  Most of the time,
> HeapTupleHeaderGetCmax gets called in a non-critical section, and we
> want to run the assertion in that case.  But it's not huge trouble if
> the assertion is not run in the rare case where HeapTupleHeaderGetCmax
> is being used to write a Xlog record.
> 
> It's a bit painful that HeapTupleHeaderGetUpdateXid allocates memory,
> but to fix that we would have to remove all allocations from
> GetMultiXactIdMembers which doesn't sound feasible.

Since nobody seemed to have a better idea I've proceeded in doing
so... Not pretty.
I've verified that the assertion could be triggered before, but not
after by doing something like:
S1:
CREATE TABLE a();
BEGIN;
SELECT * FROM pg_class WHERE relname = 'a' FOR SHARE;

S2:
BEGIN;
SELECT * FROM pg_class WHERE relname = 'a' FOR SHARE;
COMMIT;

S1:
ALTER TABLE a RENAME to b;
ALTER TABLE b RENAME to a; -- this triggered the assertion

That seems a tad too obscure for its own isolationtester test.

Thanks for the report, Steve!

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services