Re: pg_upgrade: optimize replication slot caught-up check

Поиск
Список
Период
Сортировка
От Chao Li
Тема Re: pg_upgrade: optimize replication slot caught-up check
Дата
Msg-id F9ED040D-4F1C-404D-80EC-52A5C65290C6@gmail.com
обсуждение исходный текст
Ответ на Re: pg_upgrade: optimize replication slot caught-up check  (Masahiko Sawada <sawada.mshk@gmail.com>)
Список pgsql-hackers

> On Jan 8, 2026, at 04:18, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Tue, Jan 6, 2026 at 12:09 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>>
>> On Mon, Jan 5, 2026 at 6:55 PM Chao Li <li.evan.chao@gmail.com> wrote:
>>>
>>>
>>>
>>>> On Jan 6, 2026, at 02:02, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>>>>
>>>> Hi all,
>>>>
>>>> Commit 29d0a77fa6 improved pg_upgrade to allow migrating logical
>>>> slots. Currently, to check if the slots are ready to be migrated, we
>>>> call binary_upgrade_logical_slot_has_caught_up() for every single
>>>> slot. This checks if there are any unconsumed WAL records. However, we
>>>> noticed a performance issue. If there are many slots (e.g., 100 or
>>>> more) or if there is a WAL backlog, checking all slots one by one
>>>> takes a long time.
>>>>
>>>> Here are some test results from my environment:
>>>> With an empty cluster: 1.55s
>>>> With 200 slots and 30MB backlog: 15.51s
>>>>
>>>> Commit 6d3d2e8e5 introduced parallel checks per database, but a single
>>>> job might still have to check too many slots, causing delays.
>>>>
>>>> Since binary_upgrade_logical_slot_has_caught_up() essentially checks
>>>> if any decodable record exists in the database, IIUC it is not
>>>> necessary to check every slot. We can optimize this by checking only
>>>> the slot with the minimum confirmed_flush_lsn. If that slot is caught
>>>> up, we can assume others are too. The attached patch implements this
>>>> optimization. With the patch, the test with 200 slots finished in
>>>> 2.512s. The execution time is now stable regardless of the number of
>>>> slots.
>>>>
>>>> One thing to note is that DecodeTXNNeedSkip() also considers
>>>> replication origin filters. Theoretically, a plugin could filter out
>>>> specific origins, which might lead to different results. However, this
>>>> is a very rare case. Even if it happens, it would just result in a
>>>> false positive (the upgrade fails safely), so the impact is minimal.
>>>> Therefore, the patch simplifies the check to be per-database instead
>>>> of per-slot.
>>>>
>>>> Feedback is very welcome.
>>>>
>>>> Regards,
>>>>
>>>> --
>>>> Masahiko Sawada
>>>> Amazon Web Services: https://aws.amazon.com
>>>> <v1-0001-pg_upgrade-Optimize-replication-slot-caught-up-ch.patch>
>>>
>>> Hi Masahiko-san,
>>>
>>> Basically I think the optimization idea makes sense to me. Reducing the check from O(slots × WAL) to O(databases ×
WAL)sounds a win. Just a few small comments: 
>>
>> Thank you for reviewing the patch!
>>
>>>
>>> 1
>>> ```
>>> +static void process_old_cluter_logical_slot_catchup_infos(DbInfo *dbinfo, PGresult *res, void *arg);
>>> ```
>>>
>>> Typo in the function name: cluter->cluster
>>>
>>> 2
>>> ···
>>> -               logical_slot_infos_query = get_old_cluster_logical_slot_infos_query();
>>> +               const char *slot_info_query = "SELECT slot_name, plugin, two_phase, failover, “
>>> ···
>>>
>>> logical_slot_infos_query is no longer used, should be removed.
>>
>> Fixed.
>>
>>>
>>> 3
>>> ```
>>> +                               "  ORDER BY confirmed_flush_lsn ASC "
>>> ```
>>>
>>> I am thinking, if confirmed_flush_lsn can be NULL? If that’s true, we may want to add “NULLS LAST”.
>>
>> Given that the query is not executed when live-check, I think
>> confirmed_flush cannot be NULL. temporary should also not be true but
>> it's for consistency with slot_info_query.
>>
>>>
>>> 4
>>> ```
>>> +       Assert(num_slots == dbinfo->slot_arr.nslots);
>>> +
>>> +       for (int i = 0; i < num_slots; i++)
>>> +       {
>>> +               char       *slotname;
>>> +               bool            caught_up;
>>> +
>>> +               slotname = PQgetvalue(res, i, PQfnumber(res, "slot_name"));
>>> +               caught_up = (strcmp(PQgetvalue(res, i, PQfnumber(res, "caught_up")), "t") == 0);
>>> +
>>> +               for (int slotnum = 0; slotnum < dbinfo->slot_arr.nslots; slotnum++)
>>> +               {
>>> ```
>>>
>>> As num_slots == dbinfo->slot_arr.nslots, this is still a O(num_slots^2) check. Given this patch’s goal is to
eliminatethe burden from large number of slots, maybe we can build a hash table to make the check faster. 
>>>
>>
>> Or if we sort both queries in the same order we can simply update the
>> slots sequentially.
>>
>> One problem I can see in this approach is that we could wrongly report
>> the invalid slots in invalid_logical_slots.txt. Even if the slot with
>> the oldest confirmed_flush_lsn has unconsumed decodable records, other
>> slots might have already been caught up. I'll think of how to deal
>> with this problem.
>
> I came up with a simple idea; instead of
> binary_upgrade_logical_slot_has_caught_up() returning true/false, we
> can have it return the last decodable WAL record's LSN, and consider
> logical slots as caught-up if its confirmed_flush_lsn already passed
> that LSN. This way, we can keep scanning at most one logical slot per
> database and reporting the same contents as today. The function nwo
> needs to scan all WAL backlogs but it's still much better than the
> current method.
>

Sounds better, but I need more time to look into the detail. I will spend time on this tomorrow.

> I've implemented this idea in the v2 patch with some updates:
>
> - incorporated the review comments.
> - added logic to support PG18 or older.
> - added regression tests.
>
> Feedback is very welcome.
>
> Regards,
>
> --
> Masahiko Sawada
> Amazon Web Services: https://aws.amazon.com
> <v2-0001-pg_upgrade-Optimize-replication-slot-caught-up-ch.patch>

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/







В списке pgsql-hackers по дате отправления: