Обсуждение: Fix pg_upgrade to detect invalid logical replication slots on PG19
Hi Hackers,
invalidation_reason IS NULL, then cross-joins it with the main slot
query. When ALL logical slots in a database are invalid, the CTE
returns zero rows, and the cross join produces an empty result set.
This causes pg_upgrade to silently skip those slots entirely --
neither detecting them as invalid (which should block the upgrade)
nor attempting to migrate them.
The pre-PG19 query path does not have this problem because it queries
pg_replication_slots directly without a cross join. This may not impact
upgrade to PG19 but will change the behavior for PG20 upgrade.
Fix by changing the cross join to a LEFT JOIN,
so that invalid slots still appear in the result set with NULL
caught_up values.
Fix by changing the cross join to a LEFT JOIN,
so that invalid slots still appear in the result set with NULL
caught_up values.
Regards,
Lakshmi
Вложения
On Mon, Apr 20, 2026 at 2:28 PM Lakshmi N <lakshmin.jhs@gmail.com> wrote:
>
> Hi Hackers,
>
> The PG19-optimized slot catchup query uses a CTE that filters on
> invalidation_reason IS NULL, then cross-joins it with the main slot
> query. When ALL logical slots in a database are invalid, the CTE
> returns zero rows, and the cross join produces an empty result set.
> This causes pg_upgrade to silently skip those slots entirely --
> neither detecting them as invalid (which should block the upgrade)
> nor attempting to migrate them.
>
> The pre-PG19 query path does not have this problem because it queries
> pg_replication_slots directly without a cross join. This may not impact
> upgrade to PG19 but will change the behavior for PG20 upgrade.
>
> Fix by changing the cross join to a LEFT JOIN,
> so that invalid slots still appear in the result set with NULL
> caught_up values.
>
I agree with the problem here.
Another way to solve this would be using a scalar subquery(see [1]),
but that would reduce readability. Thus, I prefer a LEFT OUTER JOIN on
TRUE here. There should also be no performance impact, since the
right-hand side query will always return at most one row due to the
LIMIT 1 clause. So IMO, the proposed solution is good. Copying
Sawada-san, as he was the author of the original patch.
[1]:
SELECT slot_name, plugin, two_phase, failover,
CASE
WHEN invalidation_reason IS NOT NULL THEN FALSE
ELSE (
(SELECT last_pending_wal FROM check_caught_up) IS NULL
OR confirmed_flush_lsn > (SELECT last_pending_wal FROM check_caught_up)
)
END as caught_up,
invalidation_reason IS NOT NULL as invalid
FROM pg_catalog.pg_replication_slots
WHERE slot_type = 'logical'
AND database = current_database()
AND temporary IS FALSE;
thanks
Shveta
On Tue, Apr 21, 2026 at 9:27 PM shveta malik <shveta.malik@gmail.com> wrote: > > On Mon, Apr 20, 2026 at 2:28 PM Lakshmi N <lakshmin.jhs@gmail.com> wrote: > > > > Hi Hackers, > > > > The PG19-optimized slot catchup query uses a CTE that filters on > > invalidation_reason IS NULL, then cross-joins it with the main slot > > query. When ALL logical slots in a database are invalid, the CTE > > returns zero rows, and the cross join produces an empty result set. > > This causes pg_upgrade to silently skip those slots entirely -- > > neither detecting them as invalid (which should block the upgrade) > > nor attempting to migrate them. > > > > The pre-PG19 query path does not have this problem because it queries > > pg_replication_slots directly without a cross join. This may not impact > > upgrade to PG19 but will change the behavior for PG20 upgrade. Thank you for the report. > > > > Fix by changing the cross join to a LEFT JOIN, > > so that invalid slots still appear in the result set with NULL > > caught_up values. > > > > I agree with the problem here. I've confirmed the bug and the patch fixes it. The patch looks good to me, so I'm going to push it tomorrow barring any objections. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
> On Apr 20, 2026, at 16:57, Lakshmi N <lakshmin.jhs@gmail.com> wrote: > > Hi Hackers, > > The PG19-optimized slot catchup query uses a CTE that filters on > invalidation_reason IS NULL, then cross-joins it with the main slot > query. When ALL logical slots in a database are invalid, the CTE > returns zero rows, and the cross join produces an empty result set. > This causes pg_upgrade to silently skip those slots entirely -- > neither detecting them as invalid (which should block the upgrade) > nor attempting to migrate them. > > The pre-PG19 query path does not have this problem because it queries > pg_replication_slots directly without a cross join. This may not impact > upgrade to PG19 but will change the behavior for PG20 upgrade. > > Fix by changing the cross join to a LEFT JOIN, > so that invalid slots still appear in the result set with NULL > caught_up values. > > Regards, > Lakshmi > <0001-Fix-pg_upgrade-to-detect-invalid-logical-replication.patch> This changes the inner join to a left join, which preserves the rows from pg_replication_slots. As check_caught_up is intentionallya singleton CTE because ORDER BY … LIMIT 1, so there is no row multiplication risk. So, the patch looks good to me. Best regards, -- Chao Li (Evan) HighGo Software Co., Ltd. https://www.highgo.com/
On Wed, Apr 22, 2026 at 1:02 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > On Tue, Apr 21, 2026 at 9:27 PM shveta malik <shveta.malik@gmail.com> wrote: > > > > On Mon, Apr 20, 2026 at 2:28 PM Lakshmi N <lakshmin.jhs@gmail.com> wrote: > > > > > > Hi Hackers, > > > > > > The PG19-optimized slot catchup query uses a CTE that filters on > > > invalidation_reason IS NULL, then cross-joins it with the main slot > > > query. When ALL logical slots in a database are invalid, the CTE > > > returns zero rows, and the cross join produces an empty result set. > > > This causes pg_upgrade to silently skip those slots entirely -- > > > neither detecting them as invalid (which should block the upgrade) > > > nor attempting to migrate them. > > > > > > The pre-PG19 query path does not have this problem because it queries > > > pg_replication_slots directly without a cross join. This may not impact > > > upgrade to PG19 but will change the behavior for PG20 upgrade. > > Thank you for the report. > > > > > > > Fix by changing the cross join to a LEFT JOIN, > > > so that invalid slots still appear in the result set with NULL > > > caught_up values. > > > > > > > I agree with the problem here. > > I've confirmed the bug and the patch fixes it. The patch looks good to > me, so I'm going to push it tomorrow barring any objections. Pushed. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com