Re: BUG #16643: PG13 - Logical replication - initial startup never finishes and gets stuck in startup loop

Поиск
Список
Период
Сортировка
От Amit Kapila
Тема Re: BUG #16643: PG13 - Logical replication - initial startup never finishes and gets stuck in startup loop
Дата
Msg-id CAA4eK1JmDGpr+Ouvj8x7u9RWVSLWyfsDiz2TRbrMNmdAqDpJ0g@mail.gmail.com
обсуждение исходный текст
Ответ на Re: BUG #16643: PG13 - Logical replication - initial startup never finishes and gets stuck in startup loop  (Peter Smith <smithpb2250@gmail.com>)
Ответы Re: BUG #16643: PG13 - Logical replication - initial startup never finishes and gets stuck in startup loop  (Peter Smith <smithpb2250@gmail.com>)
Список pgsql-bugs
On Wed, Nov 18, 2020 at 8:18 AM Peter Smith <smithpb2250@gmail.com> wrote:
>
> On Wed, Nov 18, 2020 at 1:29 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> >
> > On 2020-Nov-04, Amit Kapila wrote:
> >
> > > On Thu, Oct 15, 2020 at 8:20 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> >
> > > > * STREAM COMMIT bug?
> > > >   In apply_handle_stream_commit, we do CommitTransactionCommand, but
> > > >   apparently in a tablesync worker we shouldn't do it.
> > >
> > > In the tablesync stage, we don't allow streaming. See pgoutput_startup
> > > where we disable streaming for the init phase. As far as I understand,
> > > for tablesync we create the initial slot during which streaming will
> > > be disabled then we will copy the table (here logical decoding won't
> > > be used) and then allow the apply worker to get any other data which
> > > is inserted in the meantime. Now, I might be missing something here
> > > but if you can explain it a bit more or share some test to show how we
> > > can reach here via tablesync worker then we can discuss the possible
> > > solution.
> >
> > Hmm, okay, that sounds like there would be no bug then.  Maybe what we
> > need is just an assert in apply_handle_stream_commit that
> > !am_tablesync_worker(), as in the attached patch.  Passes tests.
>
> Hi.
>
> Using the same debugging technique described in a previous mail [1], I
> have tested again but this time using a SUBSCRIPTION capable of
> streaming.
>
> While paused in the debugger (to force an unusual timing situation) I
> can publish INSERTs en masse and cause streaming replication to occur.
>
> To cut a long story short, a tablesync worker CAN in fact end up
> processing (e.g. apply_dispatch) streaming messages.
> So the tablesync worker CAN get into the apply_handle_stream_commit.
> And this scenario, albeit rare, will crash.
>

Thank you for reproducing this issue. Dilip, Peter, is anyone of you
interested in writing a fix for this?

-- 
With Regards,
Amit Kapila.



В списке pgsql-bugs по дате отправления:

Предыдущее
От: Peter Smith
Дата:
Сообщение: Re: BUG #16643: PG13 - Logical replication - initial startup never finishes and gets stuck in startup loop
Следующее
От: Peter Smith
Дата:
Сообщение: Re: BUG #16643: PG13 - Logical replication - initial startup never finishes and gets stuck in startup loop