Re: Parallel Apply
| От | Amit Kapila |
|---|---|
| Тема | Re: Parallel Apply |
| Дата | |
| Msg-id | CAA4eK1KbSOcU2FER=F_nd0ghSeHdGeT=4U4n=dJTRPyCM7ezBA@mail.gmail.com обсуждение исходный текст |
| Ответ на | Re: Parallel Apply (Dilip Kumar <dilipbalaut@gmail.com>) |
| Список | pgsql-hackers |
On Mon, Nov 24, 2025 at 9:56 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Tue, Sep 16, 2025 at 3:03 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Sat, Sep 6, 2025 at 10:33 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > I suspect this might not be the most performant default strategy and > > > could frequently cause a performance dip. In general, we utilize > > > parallel apply workers, considering that the time taken to apply > > > changes is much costlier than reading and sending messages to workers. > > > > > > The current strategy involves the leader picking one transaction for > > > itself after distributing transactions to all apply workers, assuming > > > the apply task will take some time to complete. When the leader takes > > > on an apply task, it becomes a bottleneck for complete parallelism. > > > This is because it needs to finish applying previous messages before > > > accepting any new ones. Consequently, even as workers slowly become > > > free, they won't receive new tasks because the leader is busy applying > > > its own transaction. > > > > > > This type of strategy might be suitable in scenarios where users > > > cannot supply more workers due to resource limitations. However, on > > > high-end machines, it is more efficient to let the leader act solely > > > as a message transmitter and allow the apply workers to handle all > > > apply tasks. This could be a configurable parameter, determining > > > whether the leader also participates in applying changes. I believe > > > this should not be the default strategy; in fact, the default should > > > be for the leader to act purely as a transmitter. > > > > > > > I see your point but consider a scenario where we have two pa workers. > > pa-1 is waiting for some backend on unique_key insertion and pa-2 is > > waiting for pa-1 to complete its transaction as pa-2 has to perform > > some change which is dependent on pa-1's transaction. So, leader can > > either simply wait for a third transaction to be distributed or just > > apply it and process another change. If we follow the earlier then it > > is quite possible that the sender fills the network queue to send data > > and simply timed out. > > Sorry I took a while to come back to this. I understand your point and > agree that it's a valid concern. However, I question whether limiting > this to a single choice is the optimal solution. The core issue > involves two distinct roles: work distribution and applying changes. > Work distribution is exclusively handled by the leader, while any > worker can apply the changes. This is essentially a single-producer, > multiple-consumer problem. > > While it might seem efficient for the producer (leader) to assist > consumers (workers) when there's a limited number of consumers, I > believe this isn't the best design. In such scenarios, it's generally > better to allow the producer to focus solely on its primary task, > unless there's a severe shortage of processing power. > > If computing resources are constrained, allowing producers to join > consumers in applying changes is acceptable. However, if sufficient > processing power is available, the producer should ideally be left to > its own duties. The question then becomes: how do we make this > decision? > > My suggestion is to make this a configurable parameter. Users could > then decide whether the leader participates in applying changes. > We could do this but another possibility is that the leader does distribute some threshold of pending transactions (say 5 or 10) to each of the workers and if none of the workers is still available then it can perform the task by itself. I think this will avoid the system performing poorly when the existing workers are waiting on each other and or backend to finish the current transaction. Having said that, I think this can be done as a separate optimization patch as well. -- With Regards, Amit Kapila.
В списке pgsql-hackers по дате отправления: