Обсуждение: Wrong assert in TransactionGroupUpdateXidStatus

Поиск
Список
Период
Сортировка

Wrong assert in TransactionGroupUpdateXidStatus

От
Dilip Kumar
Дата:
While testing, my colleague Vignesh has hit an assert in
TransactionGroupUpdateXidStatus.  But that is not reproducible.  After
some analysis and code review, I have found the reason for the same.

As shown in the below code, there is an assert in
TransactionGroupUpdateXidStatus, which assumes that an overflowed
transaction can never get registered for the group update.  But,
actually, that is not true because while registering the transaction
for group update, we only check how many committed children this
transaction has because all aborted sub-transaction would have already
updated their status.  So if the transaction once overflowed but later
all its children are aborted (i.e remaining committed children are <=
THRESHOLD_SUBTRANS_CLOG_OPT) then it will be registered for the group
update.

/*
* Overflowed transactions should not use group XID status update
* mechanism.
*/
Assert(!pgxact->overflowed);

A solution could be either we remove this assert or change this assert
to Assert(pgxact->nxids <= THRESHOLD_SUBTRANS_CLOG_OPT);

Note:   I could not come up with the reproducible test case as we can
not ensure whether a backend will try to group updates or not because
that depends upon whether it gets the CLogControlLock or not.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Вложения

Re: Wrong assert in TransactionGroupUpdateXidStatus

От
Andres Freund
Дата:
Hi,

Amit, Robert, IIRC that's mostly your feature?

On 2019-12-10 13:55:40 +0530, Dilip Kumar wrote:
> While testing, my colleague Vignesh has hit an assert in
> TransactionGroupUpdateXidStatus.  But that is not reproducible.  After
> some analysis and code review, I have found the reason for the same.
> 
> As shown in the below code, there is an assert in
> TransactionGroupUpdateXidStatus, which assumes that an overflowed
> transaction can never get registered for the group update.  But,
> actually, that is not true because while registering the transaction
> for group update, we only check how many committed children this
> transaction has because all aborted sub-transaction would have already
> updated their status.  So if the transaction once overflowed but later
> all its children are aborted (i.e remaining committed children are <=
> THRESHOLD_SUBTRANS_CLOG_OPT) then it will be registered for the group
> update.

> /*
> * Overflowed transactions should not use group XID status update
> * mechanism.
> */
> Assert(!pgxact->overflowed);
> 
> A solution could be either we remove this assert or change this assert
> to Assert(pgxact->nxids <= THRESHOLD_SUBTRANS_CLOG_OPT);

Maybe I'm missing something, but isn't this a bug then? IIRC We can't
rely on MyProc->subxids once we overflowed, even if since then the
remaining number of children has become low enough. It seems to me that
the actual fix here is to correct the condition in
TransactionIdSetPageStatus() checking whether group updates are possible
- it seems it'd need to verify that the transaction isn't
overflowed.


Also, it's somewhat odd that TransactionIdSetPageStatus() first has

    /* Can't use group update when PGPROC overflows. */
    StaticAssertStmt(THRESHOLD_SUBTRANS_CLOG_OPT <= PGPROC_MAX_CACHED_SUBXIDS,
                     "group clog threshold less than PGPROC cached subxids");

and then, within an if():

        /*
         * We don't try to do group update optimization if a process has
         * overflowed the subxids array in its PGPROC, since in that case we
         * don't have a complete list of XIDs for it.
         */
        Assert(THRESHOLD_SUBTRANS_CLOG_OPT <= PGPROC_MAX_CACHED_SUBXIDS);

Even if these weren't redundant, it can't make sense to test such a
static condition only within an if? Is it possible this was actually
intended to test something different?

Greetings,

Andres Freund



Re: Wrong assert in TransactionGroupUpdateXidStatus

От
Amit Kapila
Дата:
On Wed, Dec 11, 2019 at 4:02 AM Andres Freund <andres@anarazel.de> wrote:
>
> Hi,
>
> Amit, Robert, IIRC that's mostly your feature?
>

I will look into this today.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: Wrong assert in TransactionGroupUpdateXidStatus

От
Amit Kapila
Дата:
On Wed, Dec 11, 2019 at 4:02 AM Andres Freund <andres@anarazel.de> wrote:
> On 2019-12-10 13:55:40 +0530, Dilip Kumar wrote:
>
> > /*
> > * Overflowed transactions should not use group XID status update
> > * mechanism.
> > */
> > Assert(!pgxact->overflowed);
> >
> > A solution could be either we remove this assert or change this assert
> > to Assert(pgxact->nxids <= THRESHOLD_SUBTRANS_CLOG_OPT);
>
> Maybe I'm missing something, but isn't this a bug then? IIRC We can't
> rely on MyProc->subxids once we overflowed, even if since then the
> remaining number of children has become low enough.
>

AFAICS, the MyProc->subxids is maintained properly if the number of
subtransactions is lesser than PGPROC_MAX_CACHED_SUBXIDS (64).  Can
you explain the case where that won't be true?  Also, even if what you
are saying is true, I think the memcmp in TransactionIdSetPageStatus
should avoid taking us a wrong decision.


>
> Also, it's somewhat odd that TransactionIdSetPageStatus() first has
>
>         /* Can't use group update when PGPROC overflows. */
>         StaticAssertStmt(THRESHOLD_SUBTRANS_CLOG_OPT <= PGPROC_MAX_CACHED_SUBXIDS,
>                                          "group clog threshold less than PGPROC cached subxids");
>
> and then, within an if():
>
>                 /*
>                  * We don't try to do group update optimization if a process has
>                  * overflowed the subxids array in its PGPROC, since in that case we
>                  * don't have a complete list of XIDs for it.
>                  */
>                 Assert(THRESHOLD_SUBTRANS_CLOG_OPT <= PGPROC_MAX_CACHED_SUBXIDS);
>
> Even if these weren't redundant, it can't make sense to test such a
> static condition only within an if?
>

I don't remember exactly the reason for this, but now I don't find the
Assert within if () meaningful.  I think we should remove the Assert
inside if() unless Robert or someone see any use of it.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: Wrong assert in TransactionGroupUpdateXidStatus

От
Amit Kapila
Дата:
On Wed, Dec 11, 2019 at 11:00 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Wed, Dec 11, 2019 at 4:02 AM Andres Freund <andres@anarazel.de> wrote:
> > On 2019-12-10 13:55:40 +0530, Dilip Kumar wrote:
> >
> > > /*
> > > * Overflowed transactions should not use group XID status update
> > > * mechanism.
> > > */
> > > Assert(!pgxact->overflowed);
> > >
> > > A solution could be either we remove this assert or change this assert
> > > to Assert(pgxact->nxids <= THRESHOLD_SUBTRANS_CLOG_OPT);
> >
> > Maybe I'm missing something, but isn't this a bug then? IIRC We can't
> > rely on MyProc->subxids once we overflowed, even if since then the
> > remaining number of children has become low enough.
> >
>
> AFAICS, the MyProc->subxids is maintained properly if the number of
> subtransactions is lesser than PGPROC_MAX_CACHED_SUBXIDS (64).  Can
> you explain the case where that won't be true?  Also, even if what you
> are saying is true, I think the memcmp in TransactionIdSetPageStatus
> should avoid taking us a wrong decision.
>

I am able to reproduce the issue by reducing the values of
PGPROC_MAX_CACHED_SUBXIDS and THRESHOLD_SUBTRANS_CLOG_OPT to 2.  Below
is what I did after reducing the values:
Session-1
--------------
postgres=# begin;
BEGIN
postgres=# insert into t1 values(1);
INSERT 0 1
postgres=# savepoint s1;
SAVEPOINT
postgres=# insert into t1 values(2);
INSERT 0 1
postgres=# savepoint s2;
SAVEPOINT
postgres=# insert into t1 values(3);
INSERT 0 1
postgres=# savepoint s3;
SAVEPOINT
postgres=# insert into t1 values(4);
INSERT 0 1
postgres=# rollback to s2;
ROLLBACK

Session-2
---------------
insert into t1 values(4);  -- attach debugger and stop in
TransactionIdSetPageStatus after acquiring CLogControlLock

Session-1
---------------
Commit;  -- This will wait to acquire CLogControlLock in a group
update path (TransactionGroupUpdateXidStatus).  Now, continue in the
session-2 debugger.  After that continue in session-1's debugger and
it will hit the Assert.

The attached patch fixes it by changing the Assert.  I have
additionally removed the redundant Assert in
TransactionIdSetPageStatus as pointed out by Andres.  I am planning to
commit and backpatch this early next week (Monday) unless someone
wants to review it further or has objections to it.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Вложения

Re: Wrong assert in TransactionGroupUpdateXidStatus

От
Alvaro Herrera
Дата:
On 2019-Dec-11, Amit Kapila wrote:

> On Wed, Dec 11, 2019 at 4:02 AM Andres Freund <andres@anarazel.de> wrote:
> > On 2019-12-10 13:55:40 +0530, Dilip Kumar wrote:

> >                 /*
> >                  * We don't try to do group update optimization if a process has
> >                  * overflowed the subxids array in its PGPROC, since in that case we
> >                  * don't have a complete list of XIDs for it.
> >                  */
> >                 Assert(THRESHOLD_SUBTRANS_CLOG_OPT <= PGPROC_MAX_CACHED_SUBXIDS);
> >
> > Even if these weren't redundant, it can't make sense to test such a
> > static condition only within an if?
> 
> I don't remember exactly the reason for this, but now I don't find the
> Assert within if () meaningful.  I think we should remove the Assert
> inside if() unless Robert or someone see any use of it.

The more I look at both these asserts, the less sense they make.  Why
does clog.c care about PGPROC at all?  Looking at the callers of that
routine, nowhere do they concern themselves with whether the overflowed
flag has been set or not.  It seems to me that the StaticAssert() should
be near the PGPROC_MAX_CACHED_SUBXIDS definition, not the SUBTRANS
definition (maybe as StaticAssertDecl, as in
201DD0641B056142AC8C6645EC1B5F62014B8E8030@SYD1217 )

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Wrong assert in TransactionGroupUpdateXidStatus

От
Amit Kapila
Дата:
On Thu, Dec 12, 2019 at 6:10 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
>
> On 2019-Dec-11, Amit Kapila wrote:
>
> > On Wed, Dec 11, 2019 at 4:02 AM Andres Freund <andres@anarazel.de> wrote:
> > > On 2019-12-10 13:55:40 +0530, Dilip Kumar wrote:
>
> > >                 /*
> > >                  * We don't try to do group update optimization if a process has
> > >                  * overflowed the subxids array in its PGPROC, since in that case we
> > >                  * don't have a complete list of XIDs for it.
> > >                  */
> > >                 Assert(THRESHOLD_SUBTRANS_CLOG_OPT <= PGPROC_MAX_CACHED_SUBXIDS);
> > >
> > > Even if these weren't redundant, it can't make sense to test such a
> > > static condition only within an if?
> >
> > I don't remember exactly the reason for this, but now I don't find the
> > Assert within if () meaningful.  I think we should remove the Assert
> > inside if() unless Robert or someone see any use of it.
>
> The more I look at both these asserts, the less sense they make.  Why
> does clog.c care about PGPROC at all?
>

It is mainly for group updates.  Basically, we want to piggyback the
procs that are trying to update clog at the same time on the proc
which got the CLogControlLock. This avoids taking/releasing that lock
multiple times.  See TransactionGroupUpdateXidStatus.

>  Looking at the callers of that
> routine, nowhere do they concern themselves with whether the overflowed
> flag has been set or not.  It seems to me that the StaticAssert() should
> be near the PGPROC_MAX_CACHED_SUBXIDS definition, not the SUBTRANS
> definition (maybe as StaticAssertDecl, as in
> 201DD0641B056142AC8C6645EC1B5F62014B8E8030@SYD1217 )
>

Sounds reasonable.  We can do that once the patch mentioned by you got
committed.  For now, we are planning to just remove the Assert inside
if() condition. Do you see any problem with that?


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: Wrong assert in TransactionGroupUpdateXidStatus

От
Alvaro Herrera
Дата:
On 2019-Dec-12, Amit Kapila wrote:

> On Thu, Dec 12, 2019 at 6:10 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

> > The more I look at both these asserts, the less sense they make.  Why
> > does clog.c care about PGPROC at all?
> 
> It is mainly for group updates.  Basically, we want to piggyback the
> procs that are trying to update clog at the same time on the proc
> which got the CLogControlLock. This avoids taking/releasing that lock
> multiple times.  See TransactionGroupUpdateXidStatus.

Yeah, I (think I) understand that.  My point is that conceptually, the
fact that a PGPROC has overflowed does not really affect clog.c in any
way.

> > Looking at the callers of that routine, nowhere do they concern
> > themselves with whether the overflowed
> > flag has been set or not.  It seems to me that the StaticAssert() should
> > be near the PGPROC_MAX_CACHED_SUBXIDS definition, not the SUBTRANS
> > definition (maybe as StaticAssertDecl, as in
> > 201DD0641B056142AC8C6645EC1B5F62014B8E8030@SYD1217 )
> 
> Sounds reasonable.  We can do that once the patch mentioned by you got
> committed.  For now, we are planning to just remove the Assert inside
> if() condition. Do you see any problem with that?

Nope.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Wrong assert in TransactionGroupUpdateXidStatus

От
Robert Haas
Дата:
On Tue, Dec 10, 2019 at 4:32 PM Andres Freund <andres@anarazel.de> wrote:
> and then, within an if():
>
>                 /*
>                  * We don't try to do group update optimization if a process has
>                  * overflowed the subxids array in its PGPROC, since in that case we
>                  * don't have a complete list of XIDs for it.
>                  */
>                 Assert(THRESHOLD_SUBTRANS_CLOG_OPT <= PGPROC_MAX_CACHED_SUBXIDS);
>
> Even if these weren't redundant, it can't make sense to test such a
> static condition only within an if? Is it possible this was actually
> intended to test something different?

Based on the comment, I imagine it might've been intended to read
Assert(nsubxids <= PGPROC_MAX_CACHED_SUBXIDS).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Wrong assert in TransactionGroupUpdateXidStatus

От
Amit Kapila
Дата:
On Thu, Dec 12, 2019 at 8:44 PM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Tue, Dec 10, 2019 at 4:32 PM Andres Freund <andres@anarazel.de> wrote:
> > and then, within an if():
> >
> >                 /*
> >                  * We don't try to do group update optimization if a process has
> >                  * overflowed the subxids array in its PGPROC, since in that case we
> >                  * don't have a complete list of XIDs for it.
> >                  */
> >                 Assert(THRESHOLD_SUBTRANS_CLOG_OPT <= PGPROC_MAX_CACHED_SUBXIDS);
> >
> > Even if these weren't redundant, it can't make sense to test such a
> > static condition only within an if? Is it possible this was actually
> > intended to test something different?
>
> Based on the comment, I imagine it might've been intended to read
> Assert(nsubxids <= PGPROC_MAX_CACHED_SUBXIDS).
>

Do you think we need such an Assert after having StaticAssert for
(THRESHOLD_SUBTRANS_CLOG_OPT <= PGPROC_MAX_CACHED_SUBXIDS)  and then
an if statement containing (nsubxids <= THRESHOLD_SUBTRANS_CLOG_OPT)
just before this Assert?  Sure, we can keep this for extra safety, but
I don't see the need for it.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: Wrong assert in TransactionGroupUpdateXidStatus

От
Robert Haas
Дата:
On Thu, Dec 12, 2019 at 9:23 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> Do you think we need such an Assert after having StaticAssert for
> (THRESHOLD_SUBTRANS_CLOG_OPT <= PGPROC_MAX_CACHED_SUBXIDS)  and then
> an if statement containing (nsubxids <= THRESHOLD_SUBTRANS_CLOG_OPT)
> just before this Assert?  Sure, we can keep this for extra safety, but
> I don't see the need for it.

I don't have strong feelings about it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Wrong assert in TransactionGroupUpdateXidStatus

От
Amit Kapila
Дата:
On Sun, Dec 15, 2019 at 8:51 AM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Thu, Dec 12, 2019 at 9:23 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > Do you think we need such an Assert after having StaticAssert for
> > (THRESHOLD_SUBTRANS_CLOG_OPT <= PGPROC_MAX_CACHED_SUBXIDS)  and then
> > an if statement containing (nsubxids <= THRESHOLD_SUBTRANS_CLOG_OPT)
> > just before this Assert?  Sure, we can keep this for extra safety, but
> > I don't see the need for it.
>
> I don't have strong feelings about it.
>

Okay, in that case, I am planning to push this patch [1] tomorrow
morning unless I see any other comments.  I am also planning to
backpatch this through 10 where it got introduced, even though this is
not a serious bug, but I think it is better to keep the code
consistent in back branches.

[1] - https://www.postgresql.org/message-id/CAA4eK1JZ5EipQ8Ta6eLMX_ni3CNtZDUrvHg0th1C8n%3D%2Bk%2B0ojg%40mail.gmail.com

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: Wrong assert in TransactionGroupUpdateXidStatus

От
Amit Kapila
Дата:
On Mon, Dec 16, 2019 at 8:53 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Sun, Dec 15, 2019 at 8:51 AM Robert Haas <robertmhaas@gmail.com> wrote:
> >
> > On Thu, Dec 12, 2019 at 9:23 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > Do you think we need such an Assert after having StaticAssert for
> > > (THRESHOLD_SUBTRANS_CLOG_OPT <= PGPROC_MAX_CACHED_SUBXIDS)  and then
> > > an if statement containing (nsubxids <= THRESHOLD_SUBTRANS_CLOG_OPT)
> > > just before this Assert?  Sure, we can keep this for extra safety, but
> > > I don't see the need for it.
> >
> > I don't have strong feelings about it.
> >
>
> Okay, in that case, I am planning to push this patch [1] tomorrow
> morning unless I see any other comments.  I am also planning to
> backpatch this through 10 where it got introduced,
>

This was introduced in 11, so pushed and backpatched through 11.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com