Re: BUG #16643: PG13 - Logical replication - initial startup never finishes and gets stuck in startup loop

Поиск
Список
Период
Сортировка
От Dilip Kumar
Тема Re: BUG #16643: PG13 - Logical replication - initial startup never finishes and gets stuck in startup loop
Дата
Msg-id CAFiTN-uUgKpfdbwSGnn3db3mMQAeviOhQvGWE_pC9icZF7VDKg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: BUG #16643: PG13 - Logical replication - initial startup never finishes and gets stuck in startup loop  (Amit Kapila <amit.kapila16@gmail.com>)
Ответы Re: BUG #16643: PG13 - Logical replication - initial startup never finishes and gets stuck in startup loop  (Amit Kapila <amit.kapila16@gmail.com>)
Список pgsql-bugs
On Wed, Nov 4, 2020 at 10:58 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Thu, Oct 15, 2020 at 8:20 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> >
> > On 2020-Oct-14, Petr Jelinek wrote:
> >
> > > It would be nice if the new sentences at the beginning of tablesync.c
> > > started with uppercase, but that's about as nitpicky as I can be :)
> >
> > OK, fixed that :-)  And pushed (to master only).  There's one more
> > change I added at the last minute, which is to remove the 'missing_ok'
> > parameter of GetSubscriptionRelState.  There are some other cosmetic
> > changes, but nothing of substance.
> >
> > > > If I understand correcly, the early exit in tablesync.c is not saving *a
> > > > lot* of time (we don't actually skip replaying any WAL), even if it's
> > > > saving execution of a bunch of code.  So I stand by my position that
> > > > removing the code is better because it's clearer about what is actually
> > > > happening.
> > >
> > > I don't really have any problems with the simplification you propose. The
> > > saved time is probably in order of hundreds of ms which for table sync is
> > > insignificant.
> >
> > Great, thanks.
> >
> > If you think this is done ... I have taken a few notes in the process:
> >
> > * STREAM COMMIT bug?
> >   In apply_handle_stream_commit, we do CommitTransactionCommand, but
> >   apparently in a tablesync worker we shouldn't do it.
> >
>
> In the tablesync stage, we don't allow streaming. See pgoutput_startup
> where we disable streaming for the init phase. As far as I understand,
> for tablesync we create the initial slot during which streaming will
> be disabled then we will copy the table (here logical decoding won't
> be used) and then allow the apply worker to get any other data which
> is inserted in the meantime.

I think this assumption is not completely correct,  because if the
tablesync worker is behind the apply worker then it will start the
streaming by itself until it reaches from CATCHUP to SYNC DONE state.
So during that time if streaming is on then the tablesync worker will
also send the streaming on.  I think for disabling the streaming in
the tablesync worker we can do something like this.

diff --git a/src/backend/replication/logical/worker.c
b/src/backend/replication/logical/worker.c
index 3a5b733ee3..21ac29f703 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -3084,7 +3084,7 @@ ApplyWorkerMain(Datum main_arg)
                LOGICALREP_PROTO_STREAM_VERSION_NUM :
LOGICALREP_PROTO_VERSION_NUM;
        options.proto.logical.publication_names = MySubscription->publications;
        options.proto.logical.binary = MySubscription->binary;
-       options.proto.logical.streaming = MySubscription->stream;
+       options.proto.logical.streaming = am_tablesync_worker() ?
false : MySubscription->stream;

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



В списке pgsql-bugs по дате отправления:

Предыдущее
От: Dilip Kumar
Дата:
Сообщение: Re: BUG #16643: PG13 - Logical replication - initial startup never finishes and gets stuck in startup loop
Следующее
От: Amit Kapila
Дата:
Сообщение: Re: BUG #16643: PG13 - Logical replication - initial startup never finishes and gets stuck in startup loop