Re: pg_upgrade: optimize replication slot caught-up check

Поиск
Список
Период
Сортировка
От Masahiko Sawada
Тема Re: pg_upgrade: optimize replication slot caught-up check
Дата
Msg-id CAD21AoA01ThbeFB6VFioObCUpyWE4ss2b9YXTwrHZo+6nwG1Nw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: pg_upgrade: optimize replication slot caught-up check  (Masahiko Sawada <sawada.mshk@gmail.com>)
Ответы Re: pg_upgrade: optimize replication slot caught-up check
Re: pg_upgrade: optimize replication slot caught-up check
Список pgsql-hackers
On Tue, Jan 6, 2026 at 12:09 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Mon, Jan 5, 2026 at 6:55 PM Chao Li <li.evan.chao@gmail.com> wrote:
> >
> >
> >
> > > On Jan 6, 2026, at 02:02, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > >
> > > Hi all,
> > >
> > > Commit 29d0a77fa6 improved pg_upgrade to allow migrating logical
> > > slots. Currently, to check if the slots are ready to be migrated, we
> > > call binary_upgrade_logical_slot_has_caught_up() for every single
> > > slot. This checks if there are any unconsumed WAL records. However, we
> > > noticed a performance issue. If there are many slots (e.g., 100 or
> > > more) or if there is a WAL backlog, checking all slots one by one
> > > takes a long time.
> > >
> > > Here are some test results from my environment:
> > > With an empty cluster: 1.55s
> > > With 200 slots and 30MB backlog: 15.51s
> > >
> > > Commit 6d3d2e8e5 introduced parallel checks per database, but a single
> > > job might still have to check too many slots, causing delays.
> > >
> > > Since binary_upgrade_logical_slot_has_caught_up() essentially checks
> > > if any decodable record exists in the database, IIUC it is not
> > > necessary to check every slot. We can optimize this by checking only
> > > the slot with the minimum confirmed_flush_lsn. If that slot is caught
> > > up, we can assume others are too. The attached patch implements this
> > > optimization. With the patch, the test with 200 slots finished in
> > > 2.512s. The execution time is now stable regardless of the number of
> > > slots.
> > >
> > > One thing to note is that DecodeTXNNeedSkip() also considers
> > > replication origin filters. Theoretically, a plugin could filter out
> > > specific origins, which might lead to different results. However, this
> > > is a very rare case. Even if it happens, it would just result in a
> > > false positive (the upgrade fails safely), so the impact is minimal.
> > > Therefore, the patch simplifies the check to be per-database instead
> > > of per-slot.
> > >
> > > Feedback is very welcome.
> > >
> > > Regards,
> > >
> > > --
> > > Masahiko Sawada
> > > Amazon Web Services: https://aws.amazon.com
> > > <v1-0001-pg_upgrade-Optimize-replication-slot-caught-up-ch.patch>
> >
> > Hi Masahiko-san,
> >
> > Basically I think the optimization idea makes sense to me. Reducing the check from O(slots × WAL) to O(databases ×
WAL)sounds a win. Just a few small comments: 
>
> Thank you for reviewing the patch!
>
> >
> > 1
> > ```
> > +static void process_old_cluter_logical_slot_catchup_infos(DbInfo *dbinfo, PGresult *res, void *arg);
> > ```
> >
> > Typo in the function name: cluter->cluster
> >
> > 2
> > ···
> > -               logical_slot_infos_query = get_old_cluster_logical_slot_infos_query();
> > +               const char *slot_info_query = "SELECT slot_name, plugin, two_phase, failover, “
> > ···
> >
> > logical_slot_infos_query is no longer used, should be removed.
>
> Fixed.
>
> >
> > 3
> > ```
> > +                               "  ORDER BY confirmed_flush_lsn ASC "
> > ```
> >
> > I am thinking, if confirmed_flush_lsn can be NULL? If that’s true, we may want to add “NULLS LAST”.
>
> Given that the query is not executed when live-check, I think
> confirmed_flush cannot be NULL. temporary should also not be true but
> it's for consistency with slot_info_query.
>
> >
> > 4
> > ```
> > +       Assert(num_slots == dbinfo->slot_arr.nslots);
> > +
> > +       for (int i = 0; i < num_slots; i++)
> > +       {
> > +               char       *slotname;
> > +               bool            caught_up;
> > +
> > +               slotname = PQgetvalue(res, i, PQfnumber(res, "slot_name"));
> > +               caught_up = (strcmp(PQgetvalue(res, i, PQfnumber(res, "caught_up")), "t") == 0);
> > +
> > +               for (int slotnum = 0; slotnum < dbinfo->slot_arr.nslots; slotnum++)
> > +               {
> > ```
> >
> > As num_slots == dbinfo->slot_arr.nslots, this is still a O(num_slots^2) check. Given this patch’s goal is to
eliminatethe burden from large number of slots, maybe we can build a hash table to make the check faster. 
> >
>
> Or if we sort both queries in the same order we can simply update the
> slots sequentially.
>
> One problem I can see in this approach is that we could wrongly report
> the invalid slots in invalid_logical_slots.txt. Even if the slot with
> the oldest confirmed_flush_lsn has unconsumed decodable records, other
> slots might have already been caught up. I'll think of how to deal
> with this problem.

I came up with a simple idea; instead of
binary_upgrade_logical_slot_has_caught_up() returning true/false, we
can have it return the last decodable WAL record's LSN, and consider
logical slots as caught-up if its confirmed_flush_lsn already passed
that LSN. This way, we can keep scanning at most one logical slot per
database and reporting the same contents as today. The function nwo
needs to scan all WAL backlogs but it's still much better than the
current method.

I've implemented this idea in the v2 patch with some updates:

- incorporated the review comments.
- added logic to support PG18 or older.
- added regression tests.

Feedback is very welcome.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Вложения

В списке pgsql-hackers по дате отправления: