Re: Logical replication prefetch

Поиск
Список
Период
Сортировка
От Amit Kapila
Тема Re: Logical replication prefetch
Дата
Msg-id CAA4eK1+GKn8xwF4mt3WGCvhxOzzdtcHMPjJKcv8HCvDFRQ7mNA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Logical replication prefetch  (Konstantin Knizhnik <knizhnik@garret.ru>)
Ответы Re: Logical replication prefetch
Список pgsql-hackers
On Fri, Jul 11, 2025 at 7:49 PM Konstantin Knizhnik <knizhnik@garret.ru> wrote:
>
> On 08/07/2025 2:51 pm, Amit Kapila wrote:
> > On Tue, Jul 8, 2025 at 12:06 PM Konstantin Knizhnik <knizhnik@garret.ru> wrote:
> >> There is well known Postgres problem that logical replication subscriber
> >> can not caught-up with publisher just because LR changes are applied by
> >> single worker and at publisher changes are made by
> >> multiple concurrent backends. The problem is not logical replication
> >> specific: physical replication stream is also handled by single
> >> walreceiver. But for physical replication Postgres now implements
> >> prefetch: looking at WAL record blocks it is quite easy to predict which
> >> pages will be required for redo and prefetch them. With logical
> >> replication situation is much more complicated.
> >>
> >> My first idea was to implement parallel apply of transactions. But to do
> >> it we need to track dependencies between transactions. Right now
> >> Postgres can apply transactions in parallel, but only if they are
> >> streamed  (which is done only for large transactions) and serialize them
> >> by commits. It is possible to enforce parallel apply of short
> >> transactions using `debug_logical_replication_streaming` but then
> >> performance is ~2x times slower than in case of sequential apply by
> >> single worker.
> >>
> > What is the reason of such a large slow down? Is it because the amount
> > of network transfer has increased without giving any significant
> > advantage because of the serialization of commits?
>
>
> It is not directly related with subj, but I do not understand this code:
>
> ```
>      /*
>       * Stop the worker if there are enough workers in the pool.
>       *
>       * XXX Additionally, we also stop the worker if the leader apply worker
>       * serialize part of the transaction data due to a send timeout.
> This is
>       * because the message could be partially written to the queue and
> there
>       * is no way to clean the queue other than resending the message
> until it
>       * succeeds. Instead of trying to send the data which anyway would have
>       * been serialized and then letting the parallel apply worker deal with
>       * the spurious message, we stop the worker.
>       */
>      if (winfo->serialize_changes ||
>          list_length(ParallelApplyWorkerPool) >
>          (max_parallel_apply_workers_per_subscription / 2))
>      {
>          logicalrep_pa_worker_stop(winfo);
>          pa_free_worker_info(winfo);
>
>          return;
>      }
> ```
>
> It stops worker if number fo workers in pool is more than half of
> `max_parallel_apply_workers_per_subscription`.
> What I see is that `pa_launch_parallel_worker` spawns new workers and
> after completion of transaction it is immediately terminated.
> Actually this leads to awful slowdown of apply process.
>

I didn't understand your scenario. pa_launch_parallel_worker() should
spawn a new worker only if all the workers in the pool are busy, and
then it will free the worker if the pool already has enough workers.
So, do you mean to say that the workers in the pool are always busy in
your workload which lead spawn/exit of new workers? Can you please
explain your scenario in some more detail?

--
With Regards,
Amit Kapila.



В списке pgsql-hackers по дате отправления: