Обсуждение: Logical replication woes

Поиск
Список
Период
Сортировка

Logical replication woes

От
Heikki Linnakangas
Дата:
Spotted while testing pg_recvlogical:

1. Set up pg_recvlogical to receive:

./pg_recvlogical -S fooslot -d postgres --create
./pg_recvlogical -S fooslot -d postgres --start -f -

2. In another terminal, with psql:

create table foo (id int4);
begin; insert into foo values (4); alter table foo alter column id type text;
prepare transaction 'foo';
commit prepared 'foo';
insert into foo values (1);

3.  With current HEAD, after commit 
bb38fb0d43c8d7ff54072bfd8bd63154e536b384, this produces an assertion 
failure:

TRAP: FailedAssertion("!(((xid) != ((TransactionId) 0)))", File: 
"reorderbuffer.c", Line: 508)

I believe that's we no longer assign another XID to the transaction that 
does the COMMIT PREPARED. Previously, an extra XID, in addition to the 
XID of the prepared transaction, was assigned for use in locking the 
global transaction entry in shared memory, but that's no longer required.

However, even with that patch reverted, it doesn't work correctly:

ERROR:  could not map filenode "base/12142/16390" to relation OID
LOG:  starting logical decoding for slot fooslot
DETAIL:  streaming transactions committing after 0/16D1670, reading WAL 
from 0/16BC470

- Heikki



Re: Logical replication woes

От
Heikki Linnakangas
Дата:
On 05/15/2014 07:57 PM, Heikki Linnakangas wrote:
> Spotted while testing pg_recvlogical:
>
> 1. Set up pg_recvlogical to receive:
>
> ./pg_recvlogical -S fooslot -d postgres --create
> ./pg_recvlogical -S fooslot -d postgres --start -f -
>
> 2. In another terminal, with psql:
>
> create table foo (id int4);
> begin;
>    insert into foo values (4);
>    alter table foo alter column id type text;
> prepare transaction 'foo';
> commit prepared 'foo';
> insert into foo values (1);
>
> 3.  With current HEAD, after commit
> bb38fb0d43c8d7ff54072bfd8bd63154e536b384, this produces an assertion
> failure:
>
> TRAP: FailedAssertion("!(((xid) != ((TransactionId) 0)))", File:
> "reorderbuffer.c", Line: 508)
>
> I believe that's we no longer assign another XID to the transaction that
> does the COMMIT PREPARED. Previously, an extra XID, in addition to the
> XID of the prepared transaction, was assigned for use in locking the
> global transaction entry in shared memory, but that's no longer required.
>
> However, even with that patch reverted, it doesn't work correctly:
>
> ERROR:  could not map filenode "base/12142/16390" to relation OID
> LOG:  starting logical decoding for slot fooslot
> DETAIL:  streaming transactions committing after 0/16D1670, reading WAL
> from 0/16BC470

Ok, so the immediate cause was quick to find: when decoding a
commit-prepared WAL record, we have to use the XID from the record
content (patch attached). The XID in the record header is the XID of the
transaction doing the COMMIT PREPARED, which is always 0 after patch
bb38fb0d43c8d7ff54072bfd8bd63154e536b384 which causes the assertion. But
it was always wrong. After fixing, it no longer asserts or gives the
above "could not map filenode" error.

However, it still doesn't seem right. When I do the above as a regular
transaction, ie:

begin; insert into foo values (6);  alter table foo alter column id type
text; commit;

pg_recvlogical prints this:

BEGIN 708
table public.foo: INSERT: id[text]:'6'
COMMIT 708

But if I do it as a prepared transaction:

begin; insert into foo values (7);  alter table foo alter column id type
text; prepare transaction 'foo'; commit prepared 'foo';

pg_recvlogical prints nothing.

- Heikki

Вложения

Re: Logical replication woes

От
Andres Freund
Дата:
Hi,

On 2014-05-15 20:07:23 +0300, Heikki Linnakangas wrote:
> On 05/15/2014 07:57 PM, Heikki Linnakangas wrote:
> >Spotted while testing pg_recvlogical:
> >
> >1. Set up pg_recvlogical to receive:
> >
> >./pg_recvlogical -S fooslot -d postgres --create
> >./pg_recvlogical -S fooslot -d postgres --start -f -
> >
> >2. In another terminal, with psql:
> >
> >create table foo (id int4);
> >begin;
> >   insert into foo values (4);
> >   alter table foo alter column id type text;
> >prepare transaction 'foo';
> >commit prepared 'foo';
> >insert into foo values (1);
> >
> >3.  With current HEAD, after commit
> >bb38fb0d43c8d7ff54072bfd8bd63154e536b384, this produces an assertion
> >failure:
> >
> >TRAP: FailedAssertion("!(((xid) != ((TransactionId) 0)))", File:
> >"reorderbuffer.c", Line: 508)
> >
> >I believe that's we no longer assign another XID to the transaction that
> >does the COMMIT PREPARED. Previously, an extra XID, in addition to the
> >XID of the prepared transaction, was assigned for use in locking the
> >global transaction entry in shared memory, but that's no longer required.
> >
> >However, even with that patch reverted, it doesn't work correctly:
> >
> >ERROR:  could not map filenode "base/12142/16390" to relation OID
> >LOG:  starting logical decoding for slot fooslot
> >DETAIL:  streaming transactions committing after 0/16D1670, reading WAL
> >from 0/16BC470
> 
> Ok, so the immediate cause was quick to find: when decoding a
> commit-prepared WAL record, we have to use the XID from the record content
> (patch attached). The XID in the record header is the XID of the transaction
> doing the COMMIT PREPARED, which is always 0 after patch
> bb38fb0d43c8d7ff54072bfd8bd63154e536b384 which causes the assertion. But it
> was always wrong. After fixing, it no longer asserts or gives the above
> "could not map filenode" error.
> 
> However, it still doesn't seem right. When I do the above as a regular
> transaction, ie:
> 
> begin; insert into foo values (6);  alter table foo alter column id type
> text; commit;
> 
> pg_recvlogical prints this:
> 
> BEGIN 708
> table public.foo: INSERT: id[text]:'6'
> COMMIT 708
> 
> But if I do it as a prepared transaction:
> 
> begin; insert into foo values (7);  alter table foo alter column id type
> text; prepare transaction 'foo'; commit prepared 'foo';

Looking into it. I at some point dropped the prepared xact tests and
that was obviously a mistake. Will re-add them and fix.

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: Logical replication woes

От
Andres Freund
Дата:
On 2014-05-15 20:07:23 +0300, Heikki Linnakangas wrote:
> Ok, so the immediate cause was quick to find: when decoding a
> commit-prepared WAL record, we have to use the XID from the record content
> (patch attached). The XID in the record header is the XID of the transaction
> doing the COMMIT PREPARED, which is always 0 after patch
> bb38fb0d43c8d7ff54072bfd8bd63154e536b384 which causes the assertion. But it
> was always wrong. After fixing, it no longer asserts or gives the above
> "could not map filenode" error.
> 
> However, it still doesn't seem right. When I do the above as a regular
> transaction, ie:
> 
> begin; insert into foo values (6);  alter table foo alter column id type
> text; commit;
> 
> pg_recvlogical prints this:
> 
> BEGIN 708
> table public.foo: INSERT: id[text]:'6'
> COMMIT 708
> 
> But if I do it as a prepared transaction:
> 
> begin; insert into foo values (7);  alter table foo alter column id type
> text; prepare transaction 'foo'; commit prepared 'foo';

How very wierd. The reason for this is that
RecordTransactionCommitPrepared() forgets to fill a couple of fields in
xl_xact_commit. Any reason dbId/tsId aren't filled? They aren't strictly
needed because afaics they're only looked at for relcache invalidations
which prepared xacts don't support, but still?
That also explains why decoding for prepared xacts (besides the "typo"
you found) didn't work anymore - the filtering at commit was added
pretty late...

Attached patch fixes things, but I want to add some regression tests
before commit.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: Logical replication woes

От
Andres Freund
Дата:
On 2014-05-15 19:46:57 +0200, Andres Freund wrote:
> Attached patch fixes things, but I want to add some regression tests
> before commit.

And now actually attached. Will send a patch with regression tests later
tonight or tomorrow. Need to eat first...

Greetings,

Andres Freund

--
 Andres Freund                       http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Вложения

Re: Logical replication woes

От
Heikki Linnakangas
Дата:
On 05/15/2014 08:46 PM, Andres Freund wrote:
> On 2014-05-15 20:07:23 +0300, Heikki Linnakangas wrote:
>> Ok, so the immediate cause was quick to find: when decoding a
>> commit-prepared WAL record, we have to use the XID from the record content
>> (patch attached). The XID in the record header is the XID of the transaction
>> doing the COMMIT PREPARED, which is always 0 after patch
>> bb38fb0d43c8d7ff54072bfd8bd63154e536b384 which causes the assertion. But it
>> was always wrong. After fixing, it no longer asserts or gives the above
>> "could not map filenode" error.
>>
>> However, it still doesn't seem right. When I do the above as a regular
>> transaction, ie:
>>
>> begin; insert into foo values (6);  alter table foo alter column id type
>> text; commit;
>>
>> pg_recvlogical prints this:
>>
>> BEGIN 708
>> table public.foo: INSERT: id[text]:'6'
>> COMMIT 708
>>
>> But if I do it as a prepared transaction:
>>
>> begin; insert into foo values (7);  alter table foo alter column id type
>> text; prepare transaction 'foo'; commit prepared 'foo';
>
> How very wierd. The reason for this is that
> RecordTransactionCommitPrepared() forgets to fill a couple of fields in
> xl_xact_commit. Any reason dbId/tsId aren't filled? They aren't strictly
> needed because afaics they're only looked at for relcache invalidations
> which prepared xacts don't support, but still?

Seems like an oversight in commit dd428c79, which added the fields to 
xl_xact_commit. They are needed. A prepared xact can indeed cause 
relcache invalidations, and removal of the init file. For example:
 begin; cluster pg_opclass using pg_opclass_oid_index ; prepare 
transaction 'foo'; commit prepared 'foo';

> Attached patch fixes things, but I want to add some regression tests
> before commit.

Looks good to me.

- Heikki



Re: Logical replication woes

От
Andres Freund
Дата:
On 2014-05-15 22:30:53 +0300, Heikki Linnakangas wrote:
> On 05/15/2014 08:46 PM, Andres Freund wrote:
> >On 2014-05-15 20:07:23 +0300, Heikki Linnakangas wrote:
> >How very wierd. The reason for this is that
> >RecordTransactionCommitPrepared() forgets to fill a couple of fields in
> >xl_xact_commit. Any reason dbId/tsId aren't filled? They aren't strictly
> >needed because afaics they're only looked at for relcache invalidations
> >which prepared xacts don't support, but still?
> 
> Seems like an oversight in commit dd428c79, which added the fields to
> xl_xact_commit. They are needed. A prepared xact can indeed cause relcache
> invalidations, and removal of the init file. For example:

Hm, so that part has to be backpatched to 9.0. Ick, I wonder if that's
been hit in production. Seems like it could cause pretty random looking
errors. It's easy enough to cause errors like:
ERROR:  could not open file "base/12753/12613": No such file or directory
I guess not many people will do relevant stuff in prepared xacts tho.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: Logical replication woes

От
Andres Freund
Дата:
On 2014-05-15 22:30:53 +0300, Heikki Linnakangas wrote:
> >Attached patch fixes things, but I want to add some regression tests
> >before commit.
>
> Looks good to me.

Attached are two patches. One for the unitialized dbId/tsId issue; one
for the decoding bug. The former should be backpatched.
Should you wonder about the slight reordering of the assignments in
RecordTransactionCommitPrepared() - I've made it more similar to RecordTransactionCommit()
to make it easier to see they are equivalent.

Thanks for the testing!

Greetings,

Andres Freund

--
 Andres Freund                       http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Вложения

Re: Logical replication woes

От
Heikki Linnakangas
Дата:
On 05/15/2014 11:58 PM, Andres Freund wrote:
> On 2014-05-15 22:30:53 +0300, Heikki Linnakangas wrote:
>>> Attached patch fixes things, but I want to add some regression tests
>>> before commit.
>>
>> Looks good to me.
>
> Attached are two patches. One for the unitialized dbId/tsId issue; one
> for the decoding bug. The former should be backpatched.

Thanks, committed.

> Should you wonder about the slight reordering of the assignments in
> RecordTransactionCommitPrepared() - I've made it more similar to RecordTransactionCommit()
> to make it easier to see they are equivalent.

Makes sense.

- Heikki