Обсуждение: Fix pg_upgrade to detect invalid logical replication slots on PG19

Поиск
Список
Период
Сортировка

Fix pg_upgrade to detect invalid logical replication slots on PG19

От
Lakshmi N
Дата:
Hi Hackers,

The PG19-optimized slot catchup query uses a CTE that filters on
invalidation_reason IS NULL, then cross-joins it with the main slot
query.  When ALL logical slots in a database are invalid, the CTE
returns zero rows, and the cross join produces an empty result set.
This causes pg_upgrade to silently skip those slots entirely --
neither detecting them as invalid (which should block the upgrade)
nor attempting to migrate them.

The pre-PG19 query path does not have this problem because it queries
pg_replication_slots directly without a cross join. This may not impact
upgrade to PG19 but will change the behavior for PG20 upgrade.

Fix by changing the cross join to a LEFT JOIN,
so that invalid slots still appear in the result set with NULL
caught_up values. 

Regards,
Lakshmi
Вложения

Re: Fix pg_upgrade to detect invalid logical replication slots on PG19

От
shveta malik
Дата:
On Mon, Apr 20, 2026 at 2:28 PM Lakshmi N <lakshmin.jhs@gmail.com> wrote:
>
> Hi Hackers,
>
> The PG19-optimized slot catchup query uses a CTE that filters on
> invalidation_reason IS NULL, then cross-joins it with the main slot
> query.  When ALL logical slots in a database are invalid, the CTE
> returns zero rows, and the cross join produces an empty result set.
> This causes pg_upgrade to silently skip those slots entirely --
> neither detecting them as invalid (which should block the upgrade)
> nor attempting to migrate them.
>
> The pre-PG19 query path does not have this problem because it queries
> pg_replication_slots directly without a cross join. This may not impact
> upgrade to PG19 but will change the behavior for PG20 upgrade.
>
> Fix by changing the cross join to a LEFT JOIN,
> so that invalid slots still appear in the result set with NULL
> caught_up values.
>

I agree with the problem here.

Another way to solve this would be using a scalar subquery(see [1]),
but that would reduce readability. Thus, I prefer a LEFT OUTER JOIN on
TRUE here. There should also be no performance impact, since the
right-hand side query will always return at most one row due to the
LIMIT 1 clause. So IMO, the proposed solution is good. Copying
Sawada-san, as he was the author of the original patch.


[1]:
SELECT slot_name, plugin, two_phase, failover,
CASE
  WHEN invalidation_reason IS NOT NULL THEN FALSE
  ELSE (
    (SELECT last_pending_wal FROM check_caught_up) IS NULL
    OR confirmed_flush_lsn > (SELECT last_pending_wal FROM check_caught_up)
  )
END as caught_up,
invalidation_reason IS NOT NULL as invalid
FROM pg_catalog.pg_replication_slots
WHERE slot_type = 'logical'
  AND database = current_database()
  AND temporary IS FALSE;

thanks
Shveta



Re: Fix pg_upgrade to detect invalid logical replication slots on PG19

От
Masahiko Sawada
Дата:
On Tue, Apr 21, 2026 at 9:27 PM shveta malik <shveta.malik@gmail.com> wrote:
>
> On Mon, Apr 20, 2026 at 2:28 PM Lakshmi N <lakshmin.jhs@gmail.com> wrote:
> >
> > Hi Hackers,
> >
> > The PG19-optimized slot catchup query uses a CTE that filters on
> > invalidation_reason IS NULL, then cross-joins it with the main slot
> > query.  When ALL logical slots in a database are invalid, the CTE
> > returns zero rows, and the cross join produces an empty result set.
> > This causes pg_upgrade to silently skip those slots entirely --
> > neither detecting them as invalid (which should block the upgrade)
> > nor attempting to migrate them.
> >
> > The pre-PG19 query path does not have this problem because it queries
> > pg_replication_slots directly without a cross join. This may not impact
> > upgrade to PG19 but will change the behavior for PG20 upgrade.

Thank you for the report.

> >
> > Fix by changing the cross join to a LEFT JOIN,
> > so that invalid slots still appear in the result set with NULL
> > caught_up values.
> >
>
> I agree with the problem here.

I've confirmed the bug and the patch fixes it. The patch looks good to
me, so I'm going to push it tomorrow barring any objections.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



Re: Fix pg_upgrade to detect invalid logical replication slots on PG19

От
Chao Li
Дата:

> On Apr 20, 2026, at 16:57, Lakshmi N <lakshmin.jhs@gmail.com> wrote:
>
> Hi Hackers,
>
> The PG19-optimized slot catchup query uses a CTE that filters on
> invalidation_reason IS NULL, then cross-joins it with the main slot
> query.  When ALL logical slots in a database are invalid, the CTE
> returns zero rows, and the cross join produces an empty result set.
> This causes pg_upgrade to silently skip those slots entirely --
> neither detecting them as invalid (which should block the upgrade)
> nor attempting to migrate them.
>
> The pre-PG19 query path does not have this problem because it queries
> pg_replication_slots directly without a cross join. This may not impact
> upgrade to PG19 but will change the behavior for PG20 upgrade.
>
> Fix by changing the cross join to a LEFT JOIN,
> so that invalid slots still appear in the result set with NULL
> caught_up values.
>
> Regards,
> Lakshmi
> <0001-Fix-pg_upgrade-to-detect-invalid-logical-replication.patch>

This changes the inner join to a left join, which preserves the rows from pg_replication_slots. As check_caught_up is
intentionallya singleton CTE because ORDER BY … LIMIT 1, so there is no row multiplication risk. 

So, the patch looks good to me.

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/







Re: Fix pg_upgrade to detect invalid logical replication slots on PG19

От
Masahiko Sawada
Дата:
On Wed, Apr 22, 2026 at 1:02 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Tue, Apr 21, 2026 at 9:27 PM shveta malik <shveta.malik@gmail.com> wrote:
> >
> > On Mon, Apr 20, 2026 at 2:28 PM Lakshmi N <lakshmin.jhs@gmail.com> wrote:
> > >
> > > Hi Hackers,
> > >
> > > The PG19-optimized slot catchup query uses a CTE that filters on
> > > invalidation_reason IS NULL, then cross-joins it with the main slot
> > > query.  When ALL logical slots in a database are invalid, the CTE
> > > returns zero rows, and the cross join produces an empty result set.
> > > This causes pg_upgrade to silently skip those slots entirely --
> > > neither detecting them as invalid (which should block the upgrade)
> > > nor attempting to migrate them.
> > >
> > > The pre-PG19 query path does not have this problem because it queries
> > > pg_replication_slots directly without a cross join. This may not impact
> > > upgrade to PG19 but will change the behavior for PG20 upgrade.
>
> Thank you for the report.
>
> > >
> > > Fix by changing the cross join to a LEFT JOIN,
> > > so that invalid slots still appear in the result set with NULL
> > > caught_up values.
> > >
> >
> > I agree with the problem here.
>
> I've confirmed the bug and the patch fixes it. The patch looks good to
> me, so I'm going to push it tomorrow barring any objections.

Pushed.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com