Re: Logical replication prefetch
От | Amit Kapila |
---|---|
Тема | Re: Logical replication prefetch |
Дата | |
Msg-id | CAA4eK1JuKQX397YNVWDgig6B_QVeb8eOn4UMruKewx8=2XUv4w@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Logical replication prefetch (Amit Kapila <amit.kapila16@gmail.com>) |
Список | pgsql-hackers |
On Mon, Jul 14, 2025 at 3:13 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Sun, Jul 13, 2025 at 6:06 PM Konstantin Knizhnik <knizhnik@garret.ru> wrote: > > > > On 13/07/2025 1:28 pm, Amit Kapila wrote: > > > On Tue, Jul 8, 2025 at 12:06 PM Konstantin Knizhnik <knizhnik@garret.ru> wrote: > > >> There is well known Postgres problem that logical replication subscriber > > >> can not caught-up with publisher just because LR changes are applied by > > >> single worker and at publisher changes are made by > > >> multiple concurrent backends. > > >> > > > BTW, do you know how users deal with this lag? For example, one can > > > imagine creating multiple pub-sub pairs for different sets of tables > > > so that the workload on the subscriber could also be shared by > > > multiple apply workers. I can also think of splitting the workload > > > among multiple pub-sub pairs by using row filters > > > > > > Yes, I saw that users starts several subscriptions/publications to > > receive and apply changes in parallel. > > But it can not be considered as universal solution: > > 1. Not always there are multiple tables (or partitions of one one table) > > so that it it possible to split them between multiple publications. > > 2. It violates transactional behavior (consistency): if transactions > > update several tables included in different publications then applying > > this changes independently, we can observe at replica behaviour when one > > table is update - and another - not. The same is true for row filters. > > 3. Each walsender will have to scan WAL, so having N subscriptions we > > have to read and decode WAL N times. > > > > I agree that it is not a solution which can be applied in all cases > and neither I want to say that we shouldn't pursue the idea of > prefetch or parallel apply to improve the speed of apply. It was just > to know/discuss how users try to workaround lag for cases where the > lag is large. > If you are interested, I would like to know your opinion on a somewhat related topic, which has triggered my interest in your patch. We are working on an update_delete conflict detection patch. The exact problem was explained in the initial email [1]. The basic idea to resolve the problem is that on the subscriber, we maintain a slot that will help in retaining dead tuples for a certain period of time till the concurrent transactions have been applied to the subscriber. You can read the commit message of the first patch in email [2]. Now, the problem we are facing is that because of replication LAG in a scenario similar to what we are discussing here, such that when there are many clients on the publisher and a single apply worker on the subscriber, the slot takes more time to get advanced. This will lead to retention of dead tuples, which further slows down apply worker especially for update workloads. Apart from apply, the other transactions running on the system (say pgbench kind of workload on the subscriber) also became slower because of the retention of dead tuples. Now, for the workloads where the LAG is not there, like when one splits the workload with options mentioned above (split workload among pub-sub in some way) or the workload doesn't consist of a large number of clients operating on the publisher and subscriber at the same time, etc. we don't observe any major slowdown on the subscriber. We would like to solicit your opinion as you seem to have some experience with LR users, whether one can use this feature in cases where required by enabling it at the subscription level. They will have the facility to disable it if they face any performance regression or additional bloat. Now, after having that feature, we can work on additional features such as prefetch or parallel apply that will reduce the chances of LAG, making the feature more broadly used. Does that sound reasonable to you? Feel free to ignore giving your opinion if you are not interested in that work. [1] - https://www.postgresql.org/message-id/OS0PR01MB5716BE80DAEB0EE2A6A5D1F5949D2%40OS0PR01MB5716.jpnprd01.prod.outlook.com [2] - https://www.postgresql.org/message-id/OS0PR01MB5716ECC539008C85E7AB65C5944FA%40OS0PR01MB5716.jpnprd01.prod.outlook.com -- With Regards, Amit Kapila.
В списке pgsql-hackers по дате отправления: