RE: [HACKERS] logical decoding of two-phase transactions

Поиск
Список
Период
Сортировка
От tanghy.fnst@fujitsu.com
Тема RE: [HACKERS] logical decoding of two-phase transactions
Дата
Msg-id OS0PR01MB611332752AFF6ADF580AB849FB259@OS0PR01MB6113.jpnprd01.prod.outlook.com
обсуждение исходный текст
Ответ на Re: [HACKERS] logical decoding of two-phase transactions  (Peter Smith <smithpb2250@gmail.com>)
Ответы Re: [HACKERS] logical decoding of two-phase transactions  (Ajin Cherian <itsajin@gmail.com>)
Список pgsql-hackers
> > 13.
> > @@ -507,7 +558,16 @@ CreateSubscription(CreateSubscriptionStmt *stmt,
> > bool isTopLevel)
> >   {
> >   Assert(slotname);
> >
> > - walrcv_create_slot(wrconn, slotname, false,
> > + /*
> > + * Even if two_phase is set, don't create the slot with
> > + * two-phase enabled. Will enable it once all the tables are
> > + * synced and ready. This avoids race-conditions like prepared
> > + * transactions being skipped due to changes not being applied
> > + * due to checks in should_apply_changes_for_rel() when
> > + * tablesync for the corresponding tables are in progress. See
> > + * comments atop worker.c.
> > + */
> > + walrcv_create_slot(wrconn, slotname, false, false,
> >
> > Can't we enable two_phase if copy_data is false? Because in that case,
> > all relations will be in a READY state. If we do that then we should
> > also set two_phase state as 'enabled' during createsubscription. I
> > think we need to be careful to check that connect option is given and
> > copy_data is false before setting such a state. Now, I guess we may
> > not be able to optimize this to not set 'enabled' state when the
> > subscription has no rels.
> >
> 
> Fixed in v77-0001

I noticed this modification in v77-0001 and executed "CREATE SUBSCRIPTION ... WITH (two_phase = on, copy_data =
false)",but it crashed.
 
-------------
postgres=# CREATE SUBSCRIPTION sub CONNECTION 'dbname=postgres' PUBLICATION pub WITH(two_phase = on, copy_data =
false);
WARNING:  relcache reference leak: relation "pg_subscription" not closed
WARNING:  snapshot 0x34278d0 still active
NOTICE:  created replication slot "sub" on publisher
server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
!?>
-------------

There are two warnings and a segmentation fault in subscriber log:
-------------
2021-05-24 15:08:32.435 CST [2848572] WARNING:  relcache reference leak: relation "pg_subscription" not closed
2021-05-24 15:08:32.435 CST [2848572] WARNING:  snapshot 0x32ce8b0 still active
2021-05-24 15:08:33.012 CST [2848555] LOG:  server process (PID 2848572) was terminated by signal 11: Segmentation
fault
2021-05-24 15:08:33.012 CST [2848555] DETAIL:  Failed process was running: CREATE SUBSCRIPTION sub CONNECTION
'dbname=postgres'PUBLICATION pub WITH(two_phase = on, copy_data = false);
 
-------------

The backtrace about segmentation fault is attached. It happened in table_close function, we got it because
"CurrentResourceOwner"was NULL.
 

I think it was related with the first warning, which reported "relcache reference leak". The backtrace information is
attached,too. When updating two-phase state in CreateSubscription function, it released resource owner and set
"CurrentResourceOwner"as NULL in CommitTransaction function.
 

The second warning about "snapshot still active" was also happened in CommitTransaction function. It called
AtEOXact_Snapshotfunction, checked leftover snapshots and reported the warning.
 
I debugged and found the snapshot was added in function PortalRunUtility by calling PushActiveSnapshot function, the
addressof "ActiveSnapshot" at this time was same as the address in warning.
 

In summary, when creating subscription with two_phase = on and copy_data = false, it calls UpdateTwoPhaseState function
inCreateSubscription function to set two_phase state as 'enabled', and it checked and released relcache and snapshot
tooearly so the NG happened. I think some change should be made to avoid it. Thought?
 

FYI
 I also tested the new released V78* at [1], the above NG still exists.
[1] https://www.postgresql.org/message-id/CAFPTHDab56twVmC%2B0a%3DRNcRw4KuyFdqzW0JAcvJdS63n_fRnOQ%40mail.gmail.com

Regards
Tang


Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Guillaume Lelarge
Дата:
Сообщение: Re: Issue on catalogs.sgml
Следующее
От: Dilip Kumar
Дата:
Сообщение: Re: Assertion failure while streaming toasted data