Re: logical decoding and replication of sequences, take 2

Поиск
Список
Период
Сортировка
От Amit Kapila
Тема Re: logical decoding and replication of sequences, take 2
Дата
Msg-id CAA4eK1KqFtjUnLkY7-qGDnGqBYSdpnCw3Xo4HjRw=CvqkemBfg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: logical decoding and replication of sequences, take 2  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
Ответы Re: logical decoding and replication of sequences, take 2  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
Список pgsql-hackers
On Mon, Jul 24, 2023 at 4:22 PM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:
>
> On 7/24/23 12:40, Amit Kapila wrote:
> > On Wed, Jul 5, 2023 at 8:21 PM Ashutosh Bapat
> > <ashutosh.bapat.oss@gmail.com> wrote:
> >>
> >> 0005, 0006 and 0007 are all related to the initial sequence sync. [3]
> >> resulted in 0007 and I think we need it. That leaves 0005 and 0006 to
> >> be reviewed in this response.
> >>
> >> I followed the discussion starting [1] till [2]. The second one
> >> mentions the interlock mechanism which has been implemented in 0005
> >> and 0006. While I don't have an objection to allowing LOCKing a
> >> sequence using the LOCK command, I am not sure whether it will
> >> actually work or is even needed.
> >>
> >> The problem described in [1] seems to be the same as the problem
> >> described in [2]. In both cases we see the sequence moving backwards
> >> during CATCHUP. At the end of catchup the sequence is in the right
> >> state in both the cases.
> >>
> >
> > I think we could see backward sequence value even after the catchup
> > phase (after the sync worker is exited and or the state of rel is
> > marked as 'ready' in pg_subscription_rel). The point is that there is
> > no guarantee that we will process all the pending WAL before
> > considering the sequence state is 'SYNCDONE' and or 'READY'. For
> > example, after copy_sequence, I see values like:
> >
> > postgres=# select * from s;
> >  last_value | log_cnt | is_called
> > ------------+---------+-----------
> >         165 |       0 | t
> > (1 row)
> > postgres=# select nextval('s');
> >  nextval
> > ---------
> >      166
> > (1 row)
> > postgres=# select nextval('s');
> >  nextval
> > ---------
> >      167
> > (1 row)
> > postgres=# select currval('s');
> >  currval
> > ---------
> >      167
> > (1 row)
> >
> > Then during the catchup phase:
> > postgres=# select * from s;
> >  last_value | log_cnt | is_called
> > ------------+---------+-----------
> >          33 |       0 | t
> > (1 row)
> > postgres=# select * from s;
> >  last_value | log_cnt | is_called
> > ------------+---------+-----------
> >          66 |       0 | t
> > (1 row)
> >
> > postgres=# select * from pg_subscription_rel;
> >  srsubid | srrelid | srsubstate | srsublsn
> > ---------+---------+------------+-----------
> >    16394 |   16390 | r          | 0/16374E8
> >    16394 |   16393 | s          | 0/1637700
> > (2 rows)
> >
> > postgres=# select * from pg_subscription_rel;
> >  srsubid | srrelid | srsubstate | srsublsn
> > ---------+---------+------------+-----------
> >    16394 |   16390 | r          | 0/16374E8
> >    16394 |   16393 | r          | 0/1637700
> > (2 rows)
> >
> > Here Sequence relid id 16393. You can see sequence state is marked as ready.
> >
> > postgres=# select * from s;
> >  last_value | log_cnt | is_called
> > ------------+---------+-----------
> >          66 |       0 | t
> > (1 row)
> >
> > Even after that, see below the value of the sequence is still not
> > caught up. Later, when the apply worker processes all the WAL, the
> > sequence state will be caught up.
> >
> > postgres=# select * from s;
> >  last_value | log_cnt | is_called
> > ------------+---------+-----------
> >         165 |       0 | t
> > (1 row)
> >
> > So, there will be a window where the sequence won't be caught up for a
> > certain period of time and any usage of it (even after the sync is
> > finished) during that time could result in inconsistent behaviour.
> >
>
> I'm rather confused about which node these queries are executed on.
> Presumably some of it is on publisher, some on subscriber?
>

These are all on the subscriber.

> Can you create a reproducer (TAP test demonstrating this?) I guess it
> might require adding some sleeps to hit the right timing ...
>

I have used the debugger to reproduce this as it needs quite some
coordination. I just wanted to see if the sequence can go backward and
didn't catch up completely before the sequence state is marked
'ready'. On the publisher side, I created a publication with a table
and a sequence. Then did the following steps:
SELECT nextval('s') FROM generate_series(1,50);
insert into t1 values(1);
SELECT nextval('s') FROM generate_series(51,150);

Then on the subscriber side with some debugging aid, I could find the
values in the sequence shown in the previous email. Sorry, I haven't
recorded each and every step but, if you think it helps, I can again
try to reproduce it and share the steps.

--
With Regards,
Amit Kapila.



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Aleksander Alekseev
Дата:
Сообщение: Re: POC: Extension for adding distributed tracing - pg_tracing
Следующее
От: Amit Kapila
Дата:
Сообщение: Re: logicalrep_message_type throws an error