Обсуждение: BUG #8273: Assertion failure in 9.3 beta2 with serializable and savepoints

Поиск
Список
Период
Сортировка

BUG #8273: Assertion failure in 9.3 beta2 with serializable and savepoints

От
levertond@googlemail.com
Дата:
The following bug has been logged on the website:

Bug reference:      8273
Logged by:          David Leverton
Email address:      levertond@googlemail.com
PostgreSQL version: Unsupported/Unknown
Operating system:   RHEL 5 x86_64
Description:

The following test case causes a backend assertion failure in 9.3 beta2:


START TRANSACTION ISOLATION LEVEL SERIALIZABLE;
CREATE TABLE testing(
  x INTEGER PRIMARY KEY
);
INSERT INTO testing VALUES(1);
SELECT * FROM testing WHERE x = 1 FOR UPDATE;
SAVEPOINT test;
UPDATE testing SET x = 2 WHERE x = 1;
ROLLBACK TO test;
UPDATE testing SET x = 3 WHERE x = 1;
ROLLBACK;


TRAP: FailedAssertion("!(((xid) != ((TransactionId) 0)))", File:
"predicate.c", Line: 3936)


Postgres was installed using the RPMs from http://yum.pgrpms.org/, and is
using a near-default configuration (changes to port number, data directory
and pg_hba.conf, and some roles created, but nothing likely to influence
this bug).


The full backtrace is as follows:


#0  0x0000003d66a30265 in raise () from /lib64/libc.so.6
#1  0x0000003d66a31d10 in abort () from /lib64/libc.so.6
#2  0x000000000074af8d in ExceptionalCondition (
    conditionName=<value optimized out>, errorType=<value optimized out>,
    fileName=<value optimized out>, lineNumber=<value optimized out>)
    at assert.c:54
#3  0x00000000006784bf in CheckForSerializableConflictOut (visible=1
'\001',
    relation=<value optimized out>, tuple=<value optimized out>,
    buffer=<value optimized out>, snapshot=<value optimized out>)
    at predicate.c:3936
#4  0x000000000049208e in heap_hot_search_buffer (tid=0x2abfdac,
    relation=0x2ad4b49eba78, buffer=260, snapshot=0x29cb3b0,
    heapTuple=0x2abfda8, all_dead=0x7fff510be26f "\001X\375\253\002",
    first_call=1 '\001') at heapam.c:1730
#5  0x000000000049ad1a in index_fetch_heap (scan=0x2abfd58) at
indexam.c:529
#6  0x000000000049afb3 in index_getnext (scan=0x2abfd58,
    direction=ForwardScanDirection) at indexam.c:612
#7  0x00000000005adafb in IndexNext (node=0x2abe9d0) at nodeIndexscan.c:78
#8  0x00000000005a1cf8 in ExecScanFetch (node=0x2abe9d0,
    accessMtd=0x5adab0 <IndexNext>, recheckMtd=0x5ada60 <IndexRecheck>)
    at execScan.c:82
#9  ExecScan (node=0x2abe9d0, accessMtd=0x5adab0 <IndexNext>,
    recheckMtd=0x5ada60 <IndexRecheck>) at execScan.c:167
#10 0x000000000059a94e in ExecProcNode (node=0x2abe9d0) at
execProcnode.c:404
#11 0x00000000005b1253 in ExecModifyTable (node=0x2abe6c0)
    at nodeModifyTable.c:918
#12 0x000000000059a90c in ExecProcNode (node=0x2abe6c0) at
execProcnode.c:377
#13 0x0000000000599b4d in ExecutePlan (queryDesc=0x2a98408,
direction=12114,
    count=0) at execMain.c:1470
#14 standard_ExecutorRun (queryDesc=0x2a98408, direction=12114, count=0)
    at execMain.c:306
#15 0x0000000000683e5f in ProcessQuery (plan=0x2ab32c0,
    sourceText=0x2a67128 "UPDATE testing SET x = 3 WHERE x = 1;",
    params=<value optimized out>, dest=0x2ab33b8,
    completionTag=0x7fff510be700 "") at pquery.c:185
#16 0x00000000006840d7 in PortalRunMulti (portal=0x29d12a8,
    isTopLevel=<value optimized out>, dest=0x2ab33b8, altdest=0x2ab33b8,
    completionTag=0x7fff510be700 "") at pquery.c:1279
#17 0x0000000000684cf2 in PortalRun (portal=0x29d12a8,
    count=9223372036854775807, isTopLevel=1 '\001', dest=0x2ab33b8,
    altdest=0x2ab33b8, completionTag=0x7fff510be700 "") at pquery.c:816
#18 0x000000000068105d in exec_simple_query (
    query_string=0x2a67128 "UPDATE testing SET x = 3 WHERE x = 1;")
    at postgres.c:1048
#19 0x000000000068255e in PostgresMain (argc=<value optimized out>,
    argv=<value optimized out>, dbname=0x29d6088 "postgres",
    username=<value optimized out>) at postgres.c:3985
#20 0x00000000006355b6 in ServerLoop () at postmaster.c:3987
#21 0x0000000000638a77 in PostmasterMain (argc=5, argv=0x29b0b80)
    at postmaster.c:1246
#22 0x00000000005ce293 in main (argc=5, argv=<value optimized out>)
    at main.c:196

Re: BUG #8273: Assertion failure in 9.3 beta2 with serializable and savepoints

От
Jeff Janes
Дата:
On Mon, Jul 1, 2013 at 3:43 AM, <levertond@googlemail.com> wrote:

> The following bug has been logged on the website:
>
> Bug reference:      8273
> Logged by:          David Leverton
> Email address:      levertond@googlemail.com
> PostgreSQL version: Unsupported/Unknown
> Operating system:   RHEL 5 x86_64
> Description:
>
> The following test case causes a backend assertion failure in 9.3 beta2:
>
>
> START TRANSACTION ISOLATION LEVEL SERIALIZABLE;
> CREATE TABLE testing(
>   x INTEGER PRIMARY KEY
> );
> INSERT INTO testing VALUES(1);
> SELECT * FROM testing WHERE x = 1 FOR UPDATE;
> SAVEPOINT test;
> UPDATE testing SET x = 2 WHERE x = 1;
> ROLLBACK TO test;
> UPDATE testing SET x = 3 WHERE x = 1;
> ROLLBACK;
>
>
> TRAP: FailedAssertion("!(((xid) != ((TransactionId) 0)))", File:
> "predicate.c", Line: 3936)
>
>
> Postgres was installed using the RPMs from http://yum.pgrpms.org/, and is
> using a near-default configuration (changes to port number, data directory
> and pg_hba.conf, and some roles created, but nothing likely to influence
> this bug).
>


I can confirm this compiling from source on CentOS 6.4.

The bug was introduced in commit: 0ac5ad5... Improve concurrency of foreign
key locking.

I don't know what more to look into on this, so I'm cc Alvaro, the patch
author.

Cheers,

Jeff

Re: BUG #8273: Assertion failure in 9.3 beta2 with serializable and savepoints

От
Alvaro Herrera
Дата:
Jeff Janes escribió:

> The bug was introduced in commit: 0ac5ad5... Improve concurrency of foreign
> key locking.
>
> I don't know what more to look into on this, so I'm cc Alvaro, the patch
> author.

Thanks, will look.

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: BUG #8273: Assertion failure in 9.3 beta2 with serializable and savepoints

От
Alvaro Herrera
Дата:
levertond@googlemail.com wrote:

> START TRANSACTION ISOLATION LEVEL SERIALIZABLE;
> CREATE TABLE testing(
>   x INTEGER PRIMARY KEY
> );
> INSERT INTO testing VALUES(1);
> SELECT * FROM testing WHERE x = 1 FOR UPDATE;
> SAVEPOINT test;
> UPDATE testing SET x = 2 WHERE x = 1;
> ROLLBACK TO test;
> UPDATE testing SET x = 3 WHERE x = 1;
> ROLLBACK;

Thanks for the test case.  This one-liner patch fixes it:

diff --git a/src/backend/utils/time/tqual.c b/src/backend/utils/time/tqual.c
index 55563ea..fa9c9d7 100644
--- a/src/backend/utils/time/tqual.c
+++ b/src/backend/utils/time/tqual.c
@@ -1287,7 +1287,7 @@ HeapTupleSatisfiesVacuum(HeapTupleHeader tuple, TransactionId OldestXmin,
         {
             if (tuple->t_infomask & HEAP_XMAX_INVALID)    /* xid invalid */
                 return HEAPTUPLE_INSERT_IN_PROGRESS;
-            if (HEAP_XMAX_IS_LOCKED_ONLY(tuple->t_infomask))
+            if (HeapTupleHeaderIsOnlyLocked(tuple))
                 return HEAPTUPLE_INSERT_IN_PROGRESS;
             /* inserted and then deleted by same xact */
             return HEAPTUPLE_DELETE_IN_PROGRESS;


That is, I was taking a shortcut here by checking only the infomask bits
about locking, and not resolving the MultiXact to see if the updater
(sub)transaction was aborted; but this was wrong, because a tuple which
was created here and updated by an aborted subxact needs to be seen as
in-progress insertion, not an in-progress delete.  This causes trouble
to the predicate.c code because it tries to obtain the update XID for
the tuple, but since the update has aborted, that returned zero, causing
the assert failure.

Sadly, this has performance implications, because what previously was
just an in-place check of bit flags has now become a function call.
Perhaps it'd be smart to optimize it a bit so that we first check the
flags, and only call the function if that fails.

Before pushing this patch, I'm going to examine the other users of
HEAP_XMAX_IS_LOCKED_ONLY to ensure they don't also need the same change.
I think some of them were prepared for the possibility that there were
aborted updaters; but this was a late addition so perhaps I missed
others that aren't.

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: BUG #8273: Assertion failure in 9.3 beta2 with serializable and savepoints

От
Andres Freund
Дата:
On 2013-07-19 13:46:44 -0400, Alvaro Herrera wrote:
> levertond@googlemail.com wrote:
>
> > START TRANSACTION ISOLATION LEVEL SERIALIZABLE;
> > CREATE TABLE testing(
> >   x INTEGER PRIMARY KEY
> > );
> > INSERT INTO testing VALUES(1);
> > SELECT * FROM testing WHERE x = 1 FOR UPDATE;
> > SAVEPOINT test;
> > UPDATE testing SET x = 2 WHERE x = 1;
> > ROLLBACK TO test;
> > UPDATE testing SET x = 3 WHERE x = 1;
> > ROLLBACK;
>
> Thanks for the test case.  This one-liner patch fixes it:
>
> diff --git a/src/backend/utils/time/tqual.c b/src/backend/utils/time/tqual.c
> index 55563ea..fa9c9d7 100644
> --- a/src/backend/utils/time/tqual.c
> +++ b/src/backend/utils/time/tqual.c
> @@ -1287,7 +1287,7 @@ HeapTupleSatisfiesVacuum(HeapTupleHeader tuple, TransactionId OldestXmin,
>          {
>              if (tuple->t_infomask & HEAP_XMAX_INVALID)    /* xid invalid */
>                  return HEAPTUPLE_INSERT_IN_PROGRESS;
> -            if (HEAP_XMAX_IS_LOCKED_ONLY(tuple->t_infomask))
> +            if (HeapTupleHeaderIsOnlyLocked(tuple))
>                  return HEAPTUPLE_INSERT_IN_PROGRESS;
>              /* inserted and then deleted by same xact */
>              return HEAPTUPLE_DELETE_IN_PROGRESS;
>
>
> That is, I was taking a shortcut here by checking only the infomask bits
> about locking, and not resolving the MultiXact to see if the updater
> (sub)transaction was aborted; but this was wrong, because a tuple which
> was created here and updated by an aborted subxact needs to be seen as
> in-progress insertion, not an in-progress delete.  This causes trouble
> to the predicate.c code because it tries to obtain the update XID for
> the tuple, but since the update has aborted, that returned zero, causing
> the assert failure.

> Sadly, this has performance implications, because what previously was
> just an in-place check of bit flags has now become a function call.

Well, the impact imo primarily comes from actually resolving the
multixact, not from the function call itself... But I don't think we
need to overly worried. That path is only entered if xmin is
in-progress, so that shouldn't have too big implications.

If you're worried you could do a SetHintBits(HEAP_XMAX_INVALID) like
it's done if xmin isn't in progress like it's done when xmin has
committed.

> Perhaps it'd be smart to optimize it a bit so that we first check the
> flags, and only call the function if that fails.

Sounds like a good idea to me. The duplicated amount of work by that
should by fairly minimal.

Greetings,

Andres Freund

--
 Andres Freund                       http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: BUG #8273: Assertion failure in 9.3 beta2 with serializable and savepoints

От
Alvaro Herrera
Дата:
Andres Freund wrote:
> On 2013-07-19 13:46:44 -0400, Alvaro Herrera wrote:

> > Sadly, this has performance implications, because what previously was
> > just an in-place check of bit flags has now become a function call.
>
> Well, the impact imo primarily comes from actually resolving the
> multixact, not from the function call itself... But I don't think we
> need to overly worried. That path is only entered if xmin is
> in-progress, so that shouldn't have too big implications.

What I am worried about is those code paths that previously executed all
this only by going through straight code in the visibility routine,
without calling any other functions, for the cases where there's no
multixact involved at all.  If we introduce function calls here, that
would cause a performance regression for everybody.  Of course, people
who benefit from the new multixacts can be happy that they don't block
waiting for a remote transaction, but this doesn't help people who don't
benefit from multixacts.

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: BUG #8273: Assertion failure in 9.3 beta2 with serializable and savepoints

От
Alvaro Herrera
Дата:
Andres Freund wrote:
> On 2013-07-19 13:46:44 -0400, Alvaro Herrera wrote:

> > Sadly, this has performance implications, because what previously was
> > just an in-place check of bit flags has now become a function call.
>
> Well, the impact imo primarily comes from actually resolving the
> multixact, not from the function call itself... But I don't think we
> need to overly worried. That path is only entered if xmin is
> in-progress, so that shouldn't have too big implications.

True.  I have pushed it, with this tweak:

> > Perhaps it'd be smart to optimize it a bit so that we first check the
> > flags, and only call the function if that fails.
>
> Sounds like a good idea to me. The duplicated amount of work by that
> should by fairly minimal.

Thanks.

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services