Re: Logical replication timeout problem

Поиск
Список
Период
Сортировка
От Fabrice Chapuis
Тема Re: Logical replication timeout problem
Дата
Msg-id CAA5-nLDskH8jrshBLXv_dhAsR2aE9ofUhcbskg=SeDzercP=JA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Logical replication timeout problem  (Fabrice Chapuis <fabrice636861@gmail.com>)
Ответы Re: Logical replication timeout problem  (Amit Kapila <amit.kapila16@gmail.com>)
Список pgsql-hackers
I keep your patch 0001 and I add these two calls in function WalSndUpdateProgress without modifying WalSndKeepaliveIfNecessary, it works too.
What do your think of this patch?

static void
WalSndUpdateProgress(LogicalDecodingContext *ctx, XLogRecPtr lsn, TransactionId xid)
{
        static TimestampTz sendTime = 0;
        TimestampTz now = GetCurrentTimestamp();

        ProcessRepliesIfAny();
        WalSndKeepaliveIfNecessary();



        /*
         * Track lag no more than once per WALSND_LOGICAL_LAG_TRACK_INTERVAL_MS to
         * avoid flooding the lag tracker when we commit frequently.
         */
...
Regards

Fabrice

On Fri, Jan 21, 2022 at 2:17 PM Fabrice Chapuis <fabrice636861@gmail.com> wrote:
Thanks for your patch, it also works well when executing our use case, the timeout no longer appears in the logs. Is it necessary now to refine this patch and make as few changes as possible in order for it to be released?

On Fri, Jan 21, 2022 at 10:51 AM wangw.fnst@fujitsu.com <wangw.fnst@fujitsu.com> wrote:
On Thu, Jan 20, 2022 at 9:18 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> It might be not reaching the actual send_keep_alive logic in
> WalSndKeepaliveIfNecessary because of below code:
> {
> ...
> /*
> * Don't send keepalive messages if timeouts are globally disabled or
> * we're doing something not partaking in timeouts.
> */
> if (wal_sender_timeout <= 0 || last_reply_timestamp <= 0) return; ..
> }
>
> I think you can add elog before the above return and before updating progress
> in the below code:
> case REORDER_BUFFER_CHANGE_INSERT:
>   if (!relentry->pubactions.pubinsert)
> + {
> + OutputPluginUpdateProgress(ctx);
>   return;
>
> This will help us to rule out one possibility.

Thanks for your advices!

According to your advices, I applied 0001,0002 and 0003 to run the test script.
When subscriber timeout, I filter publisher-side log:
$ grep "before invoking update progress" pub.log | wc -l
60373557
$ grep "return because wal_sender_timeout or last_reply_timestamp" pub.log | wc -l
0
$ grep "return because waiting_for_ping_response" pub.log | wc -l
0

Based on this result, I think function WalSndKeepaliveIfNecessary was invoked,
but function WalSndKeepalive was not invoked because (last_processing >=
ping_time) is false.
So I tried to see changes about last_processing and last_reply_timestamp
(because ping_time is based on last_reply_timestamp).

I found last_processing and last_reply_timestamp is set in function
ProcessRepliesIfAny.
last_processing is set to the time when function ProcessRepliesIfAny is
invoked.
Only when publisher receive a response from subscriber, last_reply_timestamp is
set to last_processing and the flag waiting_for_ping_response is reset to
false.

When we are during the loop to skip all the changes of transaction, IIUC, we do
not invoke function ProcessRepliesIfAny. So I think last_processing and
last_reply_timestamp will not be changed in this loop.
Therefore I think about our use case, we should modify the condition of
invoking WalSndKeepalive.(please refer to
0004-Simple-modification-of-timing.patch, and note that this is only a patch
for testing).
At the same time I modify the input of WalSndKeepalive from true to false. This
is because when input is true, waiting_for_ping_response is set to true in
WalSndKeepalive. As mentioned above, ProcessRepliesIfAny is not invoked in the
loop, so I think waiting_for_ping_response will not be reset to false and
keepalive messages will not be sent.

I tested after applying patches(0001 and 0004), I found the timeout was not
printed in subscriber-side log. And the added messages "begin load changes" and
"commit the log" were printed in publisher-side log:
$ grep -ir "begin load changes" pub.log
2022-01-21 11:17:06.934 CST [2577699] LOG:  begin load changes
$ grep -ir "commit the log" pub.log
2022-01-21 11:21:15.564 CST [2577699] LOG:  commit the log

Attach the patches and test script mentioned above, in case someone wants to
try.

Regards,
Wang wei

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Robert Haas
Дата:
Сообщение: Re: New developer papercut - Makefile references INSTALL
Следующее
От: Tom Lane
Дата:
Сообщение: Re: New developer papercut - Makefile references INSTALL