Re: High CPU consumption in cascade replication with large number of walsenders

Поиск

Список

Период

Сортировка

От	Alexander Korotkov
Тема	Re: High CPU consumption in cascade replication with large number of walsenders
Дата	27 октября 00:03:23
Msg-id	CAPpHfdsmXcAy3-LYN746oo_es6aoG0_sg95KYei3HcoEemzzgQ@mail.gmail.com обсуждение исходный текст
Ответ на	Re: High CPU consumption in cascade replication with large number of walsenders (Alexey Makhmutov <a.makhmutov@postgrespro.ru>)
Список	pgsql-hackers

Дерево обсуждения

Hi, Alexey!

Thank you for your comment and patch revision. I have some further question to you.

On Tue, Sep 16, 2025 at 6:20 PM Alexey Makhmutov <a.makhmutov@postgrespro.ru> wrote:
> Thank you very much for looking at the patch and providing valuable
> feedback!
>
> > This approach makes sense to me. Do you think it might have corner
> cases? I suggest the test scenario might include some delay between
> "UPDATE" queries. Then we can see how changing of this delay interacts
> with cascade_replication_batch_delay.
>
> The effect of 'cascade_replication_batch_delay' setting could be more
> easily observed by manually changing a single row in the primary
> database ('A' instance in the test) and then observing the delay before
> such change became visible on the 'C' instance. Something like following:
> On C instance:
> select c0 where test_repli_test_t1 where id=0 \watch 1
> On A instance, first set the initial value:
> update test_repli_test_t1 set c0=0 where id=0;
> ... and then update the row and wait for it to became visible on C instance:
> update test_repli_test_t1 set c0=c0+1 where id=0;
>
> In my tests with enabled batching and without enabling delay limit (i.e.
> by setting the 'cascade_replication_batch_delay' to 0), the change
> became visible in about 5-6 seconds (as walsender on B instance seems to
> wake up by itself anyway). With 'cascade_replication_batch_delay' set to
> 500 (ms) the value became visible almost immediately.
>
> > This comment tells about logical walsenders, but they same will be
> applied to physical walsenders, right?
>
> Yes, this item probably needs some clarification. In this code path we
> are dealing with logical walsenders, as physical walsenders are notified
> in XLogWalRcvFlush. However, when TLI changes, this code will notify
> both physical and logical walsenders. So, I've changed the comment now
> to describe this behavior more clearly.
>
> Another question is whether we really need to notify physical walsenders
> at this point. This was the logic of the original code, so I kept it
> when adding batching support. However, it seems that physical sender
> should not be very interested in knowing that logical decoding has
> discovered change in timeline ID, as it should be either already
> notified by walreceiver or discover it by itself in the stored WAL data
> if recovery was invoked at startup. So, maybe the better approach here
> is just to keep notifications for logical walsenders only.

Could you, please, also comment change from check for AllowCascadeReplication() to StandbyWithCascadeReplication()? Do you think this is beneficial and saves us from sending the notifications when they are useless?

Also, could you comment this condition.

if (cascadeReplicationMaxBatchSize <= 1 && appliedRecords == 0)

Does this mean that if batching was disabled in config then enforced by SIGHUP, we will still wait for the current batch to be completed? Would it be better to stop batching immediately?

Also, this patch lacks documentation. I would especially like to see combinations of GUCs described (cascade_replication_batch_size is enabled, but cascade_replication_batch_delay disabled, and vise versa).

------
Regards,
Alexander Korotkov
Supabase

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: High CPU consumption in cascade replication with large number of walsenders