Обсуждение: BUG #13844: Logical decoding bug with subxact + row locking

Поиск
Список
Период
Сортировка

BUG #13844: Logical decoding bug with subxact + row locking

От
jarred@webriots.com
Дата:
The following bug has been logged on the website:

Bug reference:      13844
Logged by:          Jarred Ward
Email address:      jarred@webriots.com
PostgreSQL version: 9.4.5
Operating system:   Linux
Description:

This is a pretty serious logical decoding bug that bit us pretty hard
in production.

Run the following code:


----------------------------------------------------------------------------
    CREATE TABLE foo(a SERIAL PRIMARY KEY, b TEXT);

    SELECT pg_create_logical_replication_slot('regression_slot',
'test_decoding');

    INSERT INTO foo (b) VALUES ('john');

    BEGIN;
    SELECT * FROM foo WHERE b = 'john' FOR UPDATE;
    SAVEPOINT sp1;
    INSERT INTO foo (b) VALUES ('jane');
    ROLLBACK TO SAVEPOINT sp1;
    INSERT INTO foo (b) VALUES ('doe');
    END;

    SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,
NULL);

----------------------------------------------------------------------------

The last select above should return the decoded WAL, but instead it
returns:


----------------------------------------------------------------------------
    ERROR:  subxact logged without previous toplevel record

----------------------------------------------------------------------------

And the process is disconnected with no way to recover the replication
slot without dropping and recreating.

If SELECT statement in the transaction is changed the following:


----------------------------------------------------------------------------
    SELECT * FROM foo WHERE b = 'john' FOR UPDATE;

----------------------------------------------------------------------------

to remove the row level locking the correct decoded change set is
returned.

Re: BUG #13844: Logical decoding bug with subxact + row locking

От
Bruce Momjian
Дата:
On Tue, Jan  5, 2016 at 03:32:49AM +0000, jarred@webriots.com wrote:
> The following bug has been logged on the website:
>
> Bug reference:      13844
> Logged by:          Jarred Ward
> Email address:      jarred@webriots.com
> PostgreSQL version: 9.4.5
> Operating system:   Linux
> Description:
>
> This is a pretty serious logical decoding bug that bit us pretty hard
> in production.
>
> Run the following code:
>
>
> ----------------------------------------------------------------------------
>     CREATE TABLE foo(a SERIAL PRIMARY KEY, b TEXT);
>
>     SELECT pg_create_logical_replication_slot('regression_slot',
> 'test_decoding');
>
>     INSERT INTO foo (b) VALUES ('john');
>
>     BEGIN;
>     SELECT * FROM foo WHERE b = 'john' FOR UPDATE;
>     SAVEPOINT sp1;
>     INSERT INTO foo (b) VALUES ('jane');
>     ROLLBACK TO SAVEPOINT sp1;
>     INSERT INTO foo (b) VALUES ('doe');
>     END;
>
>     SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,
> NULL);
>
> ----------------------------------------------------------------------------
>
> The last select above should return the decoded WAL, but instead it
> returns:
>
>
> ----------------------------------------------------------------------------
>     ERROR:  subxact logged without previous toplevel record
>
> ----------------------------------------------------------------------------
>
> And the process is disconnected with no way to recover the replication
> slot without dropping and recreating.
>
> If SELECT statement in the transaction is changed the following:
>
>
> ----------------------------------------------------------------------------
>     SELECT * FROM foo WHERE b = 'john' FOR UPDATE;
>
> ----------------------------------------------------------------------------
>
> to remove the row level locking the correct decoded change set is
> returned.

I can confirm this behavior.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Roman grave inscription                             +

Re: BUG #13844: Logical decoding bug with subxact + row locking

От
Andres Freund
Дата:
Hi,

On 2016-01-05 03:32:49 +0000, jarred@webriots.com wrote:
> This is a pretty serious logical decoding bug that bit us pretty hard
> in production.
>
> Run the following code:
>
>
> ----------------------------------------------------------------------------
>     CREATE TABLE foo(a SERIAL PRIMARY KEY, b TEXT);
>
>     SELECT pg_create_logical_replication_slot('regression_slot',
> 'test_decoding');
>
>     INSERT INTO foo (b) VALUES ('john');
>
>     BEGIN;
>     SELECT * FROM foo WHERE b = 'john' FOR UPDATE;
>     SAVEPOINT sp1;
>     INSERT INTO foo (b) VALUES ('jane');
>     ROLLBACK TO SAVEPOINT sp1;
>     INSERT INTO foo (b) VALUES ('doe');
>     END;
>
>     SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,
> NULL);
>
> ----------------------------------------------------------------------------
>
> The last select above should return the decoded WAL, but instead it
> returns:
>
>
> ----------------------------------------------------------------------------
>     ERROR:  subxact logged without previous toplevel record
>
> ----------------------------------------------------------------------------

Ugh, that's an annoying bug. Working on producing a fix.

The problem is that logical decoding expects to know about toplevel xids
before subtransaction/savepoint xids. Which is enforced in the WAL
logging code, so that's ok. But the problem is that right now the WAL
produced looks like (abbreviated):
rmgr: Heap        tx:        585, lsn: 0/01A304A8, desc: LOCK off 1: xid 585 LOCK_ONLY EXCL_LOCK KEYS_UPDATED , blkref
#0:rel 1663/12385/24607 blk 0 
rmgr: Heap        tx:        586, lsn: 0/01A30658, desc: INSERT off 2, blkref #0: rel 1663/12385/24607 blk 0 FPW
rmgr: Transaction tx:        586, lsn: 0/01A30798, desc: ABORT 2016-02-07 12:21:18.624045 CET
rmgr: Standby     tx:          0, lsn: 0/01A307C0, desc: RUNNING_XACTS nextXid 587 latestCompletedXid 586
oldestRunningXid585; 1 xacts: 585 
rmgr: Heap        tx:        587, lsn: 0/01A307F8, desc: INSERT off 3, blkref #0: rel 1663/12385/24607 blk 0
rmgr: Btree       tx:        587, lsn: 0/01A30838, desc: INSERT_LEAF off 3, blkref #0: rel 1663/12385/24614 blk 1
rmgr: Transaction tx:        585, lsn: 0/01A30878, desc: COMMIT 2016-02-07 12:21:28.062978 CET; subxacts: 587

and decode.c doesn't care about LOCK records so far. Which means
reorderbuffer.c doesn't know about the relevant xid. Hm.

Greetings,

Andres Freund

Re: BUG #13844: Logical decoding bug with subxact + row locking

От
Alvaro Herrera
Дата:
Andres Freund wrote:

> Ugh, that's an annoying bug. Working on producing a fix.
>
> The problem is that logical decoding expects to know about toplevel xids
> before subtransaction/savepoint xids. Which is enforced in the WAL
> logging code, so that's ok. But the problem is that right now the WAL
> produced looks like (abbreviated):
> rmgr: Heap        tx:        585, lsn: 0/01A304A8, desc: LOCK off 1: xid 585 LOCK_ONLY EXCL_LOCK KEYS_UPDATED ,
blkref#0: rel 1663/12385/24607 blk 0 
> rmgr: Heap        tx:        586, lsn: 0/01A30658, desc: INSERT off 2, blkref #0: rel 1663/12385/24607 blk 0 FPW
> rmgr: Transaction tx:        586, lsn: 0/01A30798, desc: ABORT 2016-02-07 12:21:18.624045 CET
> rmgr: Standby     tx:          0, lsn: 0/01A307C0, desc: RUNNING_XACTS nextXid 587 latestCompletedXid 586
oldestRunningXid585; 1 xacts: 585 
> rmgr: Heap        tx:        587, lsn: 0/01A307F8, desc: INSERT off 3, blkref #0: rel 1663/12385/24607 blk 0
> rmgr: Btree       tx:        587, lsn: 0/01A30838, desc: INSERT_LEAF off 3, blkref #0: rel 1663/12385/24614 blk 1
> rmgr: Transaction tx:        585, lsn: 0/01A30878, desc: COMMIT 2016-02-07 12:21:28.062978 CET; subxacts: 587
>
> and decode.c doesn't care about LOCK records so far. Which means
> reorderbuffer.c doesn't know about the relevant xid. Hm.

Hi,

Any progress with this?

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: BUG #13844: Logical decoding bug with subxact + row locking

От
Andres Freund
Дата:
On 2016-03-04 12:36:32 -0300, Alvaro Herrera wrote:
> Andres Freund wrote:
>
> > Ugh, that's an annoying bug. Working on producing a fix.
> >
> > The problem is that logical decoding expects to know about toplevel xids
> > before subtransaction/savepoint xids. Which is enforced in the WAL
> > logging code, so that's ok. But the problem is that right now the WAL
> > produced looks like (abbreviated):
> > rmgr: Heap        tx:        585, lsn: 0/01A304A8, desc: LOCK off 1: xid 585 LOCK_ONLY EXCL_LOCK KEYS_UPDATED ,
blkref#0: rel 1663/12385/24607 blk 0 
> > rmgr: Heap        tx:        586, lsn: 0/01A30658, desc: INSERT off 2, blkref #0: rel 1663/12385/24607 blk 0 FPW
> > rmgr: Transaction tx:        586, lsn: 0/01A30798, desc: ABORT 2016-02-07 12:21:18.624045 CET
> > rmgr: Standby     tx:          0, lsn: 0/01A307C0, desc: RUNNING_XACTS nextXid 587 latestCompletedXid 586
oldestRunningXid585; 1 xacts: 585 
> > rmgr: Heap        tx:        587, lsn: 0/01A307F8, desc: INSERT off 3, blkref #0: rel 1663/12385/24607 blk 0
> > rmgr: Btree       tx:        587, lsn: 0/01A30838, desc: INSERT_LEAF off 3, blkref #0: rel 1663/12385/24614 blk 1
> > rmgr: Transaction tx:        585, lsn: 0/01A30878, desc: COMMIT 2016-02-07 12:21:28.062978 CET; subxacts: 587
> >
> > and decode.c doesn't care about LOCK records so far. Which means
> > reorderbuffer.c doesn't know about the relevant xid. Hm.
>
> Any progress with this?

Yes. The attached WIP patch fixes this, and some other issues. I'm
working on splitting it up, and adding some more testcases.

Andres

Вложения

Re: BUG #13844: Logical decoding bug with subxact + row locking

От
Alvaro Herrera
Дата:
Andres Freund wrote:
> On 2016-03-04 12:36:32 -0300, Alvaro Herrera wrote:

> > Any progress with this?
>
> Yes. The attached WIP patch fixes this, and some other issues. I'm
> working on splitting it up, and adding some more testcases.

Great, thanks.  Looks interesting.  I started to write a patch today
before posting and in doing so I started to realize it was a bit bigger
than what I was doing.  I'll give it a look later.

Do you think you can get this before the next set of releases?  If you
think I can help you get this done, please let me know.

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: BUG #13844: Logical decoding bug with subxact + row locking

От
Andres Freund
Дата:
On 2016-03-04 15:28:40 -0300, Alvaro Herrera wrote:
> Do you think you can get this before the next set of releases?  If you
> think I can help you get this done, please let me know.

Yes, I hope to get it done in the next few hours.

Andres

Re: BUG #13844: Logical decoding bug with subxact + row locking

От
Andres Freund
Дата:
On 2016-03-04 10:30:02 -0800, Andres Freund wrote:
> On 2016-03-04 15:28:40 -0300, Alvaro Herrera wrote:
> > Do you think you can get this before the next set of releases?  If you
> > think I can help you get this done, please let me know.
>
> Yes, I hope to get it done in the next few hours.

Here's updated and separated versions of patches addressing all the open
logical decoding issues that I know of. I plan to take another look over
them tomorrow and then push.  0001 is the patch addressing this specific
issue.

Regards,

Andres

Вложения

Re: BUG #13844: Logical decoding bug with subxact + row locking

От
Andres Freund
Дата:
Hi Jared,

On 2016-01-05 03:32:49 +0000, jarred@webriots.com wrote:
> The following bug has been logged on the website:
>
> Bug reference:      13844
> Logged by:          Jarred Ward
> Email address:      jarred@webriots.com
> PostgreSQL version: 9.4.5
> Operating system:   Linux
> Description:
>
> This is a pretty serious logical decoding bug that bit us pretty hard
> in production.
>
> Run the following code:
>
>
> ----------------------------------------------------------------------------
>     CREATE TABLE foo(a SERIAL PRIMARY KEY, b TEXT);
>
>     SELECT pg_create_logical_replication_slot('regression_slot',
> 'test_decoding');
>
>     INSERT INTO foo (b) VALUES ('john');
>
>     BEGIN;
>     SELECT * FROM foo WHERE b = 'john' FOR UPDATE;
>     SAVEPOINT sp1;
>     INSERT INTO foo (b) VALUES ('jane');
>     ROLLBACK TO SAVEPOINT sp1;
>     INSERT INTO foo (b) VALUES ('doe');
>     END;
>
>     SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,
> NULL);
>
> ----------------------------------------------------------------------------
>
> The last select above should return the decoded WAL, but instead it
> returns:
>
>
> ----------------------------------------------------------------------------
>     ERROR:  subxact logged without previous toplevel record
>
> ----------------------------------------------------------------------------
>
> And the process is disconnected with no way to recover the replication
> slot without dropping and recreating.
>
> If SELECT statement in the transaction is changed the following:
>
>
> ----------------------------------------------------------------------------
>     SELECT * FROM foo WHERE b = 'john' FOR UPDATE;
>
> ----------------------------------------------------------------------------
>
> to remove the row level locking the correct decoded change set is
> returned.

I pushed a fix for this; it'll be included in the next set of
maintenance releases which are coming up soon-ish. Sorry that it took so
long.

Thanks for the report!

Regards,

Andres