Re: BUG #16643: PG13 - Logical replication - initial startup never finishes and gets stuck in startup loop

Поиск
Список
Период
Сортировка
От Dilip Kumar
Тема Re: BUG #16643: PG13 - Logical replication - initial startup never finishes and gets stuck in startup loop
Дата
Msg-id CAFiTN-viZixPtZx7X+PLuvZ0rf9djm18OhR74+ZQVx69oJWHew@mail.gmail.com
обсуждение исходный текст
Ответ на Re: BUG #16643: PG13 - Logical replication - initial startup never finishes and gets stuck in startup loop  (Dilip Kumar <dilipbalaut@gmail.com>)
Ответы Re: BUG #16643: PG13 - Logical replication - initial startup never finishes and gets stuck in startup loop  (Amit Kapila <amit.kapila16@gmail.com>)
Список pgsql-bugs
On Fri, Nov 20, 2020 at 10:21 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Wed, Nov 18, 2020 at 2:12 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Wed, Nov 18, 2020 at 11:19 AM Peter Smith <smithpb2250@gmail.com> wrote:
> > >
> > > On Wed, Nov 18, 2020 at 3:17 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > > > To cut a long story short, a tablesync worker CAN in fact end up
> > > > > processing (e.g. apply_dispatch) streaming messages.
> > > > > So the tablesync worker CAN get into the apply_handle_stream_commit.
> > > > > And this scenario, albeit rare, will crash.
> > > > >
> > > >
> > > > Thank you for reproducing this issue. Dilip, Peter, is anyone of you
> > > > interested in writing a fix for this?
> > >
> > > Hi Amit.
> > >
> > > FYI - Sorry, I am away/offline for the next 5 days.
> > >
> > > However, if this bug still remains unfixed after next Tuesday then I
> > > can look at it then.
> > >
> >
> > Fair enough. Let's see if Dilip or I can get a chance to look into
> > this before that.
> >
> > > ---
> > >
> > > IIUC there are 2 options:
> > > 1) Disallow streaming for the tablesync worker.
> > > 2) Make streaming work for the tablesync worker.
> > >
> > > I prefer option (a) not only because of the KISS principle, but also
> > > because this is how the tablesync worker was previously thought to
> > > behave anyway. I expect this fix may be like the code that Dilip
> > > already posted [1]
> > > [1] https://www.postgresql.org/message-id/CAFiTN-uUgKpfdbwSGnn3db3mMQAeviOhQvGWE_pC9icZF7VDKg%40mail.gmail.com
> > >
> > > OTOH, option (b) fix may or may not be possible (I don't know), but I
> > > have doubts that it is worthwhile to consider making a special fix for
> > > a scenario which so far has never been reproduced outside of the
> > > debugger.
> > >
> >
> > I would prefer option (b) unless the fix is not possible due to design
> > constraints. I don't think it is a good idea to make tablesync workers
> > behave differently unless we have a reason for doing so.
> >
>
> Okay, I will analyze this and try to submit my finding today.

I have done my analysis, basically, the table sync worker is applying
all the changes (for multiple transactions from upstream) under the
single transaction (on downstream).  Now for normal changes, we can
just avoid committing in apply_handle_commit and everything is fine
for streaming changes we also have the transaction at the stream level
which we need to manage the buffiles for storing the streaming
changes.  So if we want to do that for the streaming transaction then
we need to avoid commit transactions on apply_handle_stream_commit as
apply_handle_stream_stop for the table sync worker.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



В списке pgsql-bugs по дате отправления:

Предыдущее
От: Dilip Kumar
Дата:
Сообщение: Re: BUG #16643: PG13 - Logical replication - initial startup never finishes and gets stuck in startup loop
Следующее
От: PG Bug reporting form
Дата:
Сообщение: BUG #16733: insert into on conflict(pk) do nothing error violates not-null constraint