Re: Synchronizing slots from primary to standby

Поиск
Список
Период
Сортировка
От shveta malik
Тема Re: Synchronizing slots from primary to standby
Дата
Msg-id CAJpy0uBLdcExDJcgKtNExztrsybU41Oj0KVf9GHLXWEzNWaUtA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Synchronizing slots from primary to standby  ("Drouvot, Bertrand" <bertranddrouvot.pg@gmail.com>)
Ответы Re: Synchronizing slots from primary to standby  ("Drouvot, Bertrand" <bertranddrouvot.pg@gmail.com>)
Список pgsql-hackers
On Fri, Aug 4, 2023 at 2:44 PM Drouvot, Bertrand
<bertranddrouvot.pg@gmail.com> wrote:
>
> Hi,
>
> On 7/28/23 4:39 PM, Bharath Rupireddy wrote:
> > On Mon, Jul 24, 2023 at 9:00 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >>
> >>> 2. All candidate standbys will start one slot sync worker per logical
> >>> slot which might not be scalable.
> >>
> >> Yeah, that doesn't sound like a good idea but IIRC, the proposed patch
> >> is using one worker per database (for all slots corresponding to a
> >> database).
> >
> > Right. It's based on one worker for each database.
> >
> >>> Is having one (or a few more - not
> >>> necessarily one for each logical slot) worker for all logical slots
> >>> enough?
> >>
> >> I guess for a large number of slots the is a possibility of a large
> >> gap in syncing the slots which probably means we need to retain
> >> corresponding WAL for a much longer time on the primary. If we can
> >> prove that the gap won't be large enough to matter then this would be
> >> probably worth considering otherwise, I think we should find a way to
> >> scale the number of workers to avoid the large gap.
> >
> > I think the gap is largely determined by the time taken to advance
> > each slot and the amount of WAL that each logical slot moves ahead on
> > primary.
>
> Sorry to be late, but I gave a second thought and I wonder if we really need this design.
> (i.e start a logical replication background worker on the standby to sync the slots).
>
> Wouldn't that be simpler to "just" update the sync slots "metadata"
> as the https://github.com/EnterpriseDB/pg_failover_slots module (mentioned by Peter
> up-thread) is doing?
> (making use of LogicalConfirmReceivedLocation(), LogicalIncreaseXminForSlot()
> and LogicalIncreaseRestartDecodingForSlot(), If I read synchronize_one_slot() correctly).
>

Agreed. It would be simpler to just update the metadata. I think you
have not got chance to review the latest posted patch ('v10-0003')
yet, it does the same.

But I do not quite get it as in how can we do it w/o starting a
background worker? Even the failover-slots extension starts one
background worker. The question here is how many background workers we
need to have. Will one be sufficient or do we need one per db (as done
earlier by the original patches in this thread) or are we good with
dividing work among some limited number of workers?

I feel syncing all slots in one worker may increase the lag between
subsequent syncs for a particular slot and if the number of slots are
huge, the chances of losing the slot-data is more in case of failure.
Starting one worker per db also might not be that efficient as it will
increase load on the system (both in terms of background worker and
network traffic) especially for a case where the number of dbs are
more. Thus starting max 'n' number of workers where 'n' is decided by
GUC and dividing the work/DBs among these looks a better option to me.
Please see the discussion in and around the email at [1]

[1]: https://www.postgresql.org/message-id/CAJpy0uCT%2BnpL4eUvCWiV_MBEri9ixcUgJVDdsBCJSqLd0oD1fQ%40mail.gmail.com

> > I've measured the time it takes for
> > pg_logical_replication_slot_advance with different amounts WAL on my
> > system. It took 2595ms/5091ms/31238ms to advance the slot by
> > 3.7GB/7.3GB/13GB respectively. To put things into perspective here,
> > imagine there are 3 logical slots to sync for a single slot sync
> > worker and each of them are in need of advancing the slot by
> > 3.7GB/7.3GB/13GB of WAL. The slot sync worker gets to slot 1 again
> > after 2595ms+5091ms+31238ms (~40sec), gets to slot 2 again after
> > advance time of slot 1 with amount of WAL that the slot has moved
> > ahead on primary during 40sec, gets to slot 3 again after advance time
> > of slot 1 and slot 2 with amount of WAL that the slot has moved ahead
> > on primary and so on. If WAL generation on the primary is pretty fast,
> > and if the logical slot moves pretty fast on the primary, the time it
> > takes for a single sync worker to sync a slot can increase.
>
> That would be way "faster" and we would probably not need to
> worry that much about the number of "sync" workers (if it/they "just" has/have
> to sync slot's "metadata") as proposed above.
>

Agreed, we need not to worry about delay due to
pg_logical_replication_slot_advance if we are only going to update a
few simple things using the function calls as mentioned above.

thanks
Shveta



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Amit Kapila
Дата:
Сообщение: Re: [PoC] pg_upgrade: allow to upgrade publisher node
Следующее
От: Tomas Vondra
Дата:
Сообщение: Re: Use of additional index columns in rows filtering