RE: Logical replication timeout problem

Поиск
Список
Период
Сортировка
От wangw.fnst@fujitsu.com
Тема RE: Logical replication timeout problem
Дата
Msg-id OS3PR01MB62751097A6FA5F3187A0EEFA9ED29@OS3PR01MB6275.jpnprd01.prod.outlook.com
обсуждение исходный текст
Ответ на Re: Logical replication timeout problem  (Amit Kapila <amit.kapila16@gmail.com>)
Ответы RE: Logical replication timeout problem  ("shiy.fnst@fujitsu.com" <shiy.fnst@fujitsu.com>)
Список pgsql-hackers
On Fri, Jan 27, 2023 at 19:55 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Fri, Jan 27, 2023 at 5:18 PM houzj.fnst@fujitsu.com
> <houzj.fnst@fujitsu.com> wrote:
> >
> > On Wednesday, January 25, 2023 7:26 PM Amit Kapila
> <amit.kapila16@gmail.com>
> > >
> > > On Tue, Jan 24, 2023 at 8:15 AM wangw.fnst@fujitsu.com
> > > <wangw.fnst@fujitsu.com> wrote:
> > > >
> > > > Attach the new patch.
> > > >
> > >
> > > I think the patch missed to handle the case of non-transactional messages
> which
> > > was previously getting handled. I have tried to address that in the attached.
> Is
> > > there a reason that shouldn't be handled?
> >
> > Thanks for updating the patch!
> >
> > I thought about the non-transactional message. I think it seems fine if we
> > don’t handle it for timeout because such message is decoded via:
> >
> > WalSndLoop
> > -XLogSendLogical
> > --LogicalDecodingProcessRecord
> > ---logicalmsg_decode
> > ----ReorderBufferQueueMessage
> > -----rb->message() -- //maybe send the message or do nothing here.
> >
> > After invoking rb->message(), we will directly return to the main
> > loop(WalSndLoop) where we will get a chance to call
> > WalSndKeepaliveIfNecessary() to avoid the timeout.
> >
> 
> Valid point. But this means the previous handling of non-transactional
> messages was also redundant.

Thanks for the analysis, I think it makes sense. So I removed the handling of
non-transactional messages.

> > This is a bit different from transactional changes, because for transactional
> changes, we
> > will buffer them and then send every buffered change one by one(via
> > ReorderBufferProcessTXN) without going back to the WalSndLoop, so we
> don't get
> > a chance to send keepalive message if necessary, which is more likely to cause
> the
> > timeout problem.
> >
> > I will also test the non-transactional message for timeout in case I missed
> something.
> >
> 
> Okay, thanks. Please see if we can test a mix of transactional and
> non-transactional messages as well.

I tested a mix transaction of transactional and non-transactional messages on
the current HEAD and reproduced the timeout problem. I think this result is OK.
Because when decoding a transaction, non-transactional changes are processed
directly and the function WalSndKeepaliveIfNecessary is called, while
transactional changes are cached and processed after decoding. After decoding,
only transactional changes will be processed (in the function
ReorderBufferProcessTXN), so the timeout problem will still be reproduced.

After applying the v8 patch, the test mentioned above didn't reproduce the
timeout problem (Attach this test script 'test_with_nontransactional.sh').

Attach the new patch.

Regards,
Wang Wei

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: run pgindent on a regular basis / scripted manner
Следующее
От: Dean Rasheed
Дата:
Сообщение: Bug #17759: GENERATED columns not computed during MERGE