Обсуждение: pg_upgradecluster and synchronous replication

Поиск
Список
Период
Сортировка

pg_upgradecluster and synchronous replication

От
Laurenz Albe
Дата:
I ran into a problem today during a training session when I demonstrated

  pg_upgradecluster -v 18 --method=upgrade --link 17 main

The problem was that I had configured synchronous streaming replication first, so after
pg_upgrade, the synchronous standby was lost, and when pg_upgradecluster ran

  vacuumdb ... --analyze-in-stages --missing-stats-only

processing hung, because the synchronous standby was not available and the transaction
could not commit.

Yes, that was my mistake, and yes, the problem was easily fixed by interrupting
pg_upgradecluster, disabling synchronous_standby_names and running the ANALYZE
and v17 cluster removal manually.

Yet I wonder if pg_upgradecluster could improve things by disabling synchronous_standby_names
when the cluster ist started for the "finish" stage.  I have attached a POC patch how this
could be done.  I didn't test it, and my Perl skills are marginal, but you get the idea.

Perhaps this is too much black magic, not sure.  But I wanted to share my experience.

Yours,
Laurenz Albe

Вложения

Re: pg_upgradecluster and synchronous replication

От
Christoph Berg
Дата:
Re: Laurenz Albe
>   vacuumdb ... --analyze-in-stages --missing-stats-only
> 
> processing hung, because the synchronous standby was not available and the transaction
> could not commit.

Interesting find, thanks for sharing.

> Yet I wonder if pg_upgradecluster could improve things by disabling synchronous_standby_names
> when the cluster ist started for the "finish" stage.  I have attached a POC patch how this
> could be done.  I didn't test it, and my Perl skills are marginal, but you get the idea.

That's also the command that does the final cluster start for
production, I wouldn't want such a change to persist.

Maybe we should instead change the analyze hook script to do that
internally? Setting PGOPTIONS should be enough:

    PGOPTIONS="-csynchronous_commit=local"

> Perhaps this is too much black magic, not sure.  But I wanted to share my experience.

There is already some code in pg_upgradecluster that works around
black magic problems:

    # ensure we can upgrade DBs with default read-only transactions
    $ENV{PGOPTIONS} .= " -cdefault_transaction_read_only=off";

This would just add one more wart of that kind.

Christoph



Re: pg_upgradecluster and synchronous replication

От
Laurenz Albe
Дата:
On Wed, 2026-01-14 at 20:39 +0100, Christoph Berg wrote:
> >    vacuumdb ... --analyze-in-stages --missing-stats-only
> >
> > processing hung, because the synchronous standby was not available and the transaction
> > could not commit.
> >
> > I wonder if pg_upgradecluster could improve things by disabling synchronous_standby_names
> > when the cluster ist started for the "finish" stage.  I have attached a POC patch how this
> > could be done.  I didn't test it, and my Perl skills are marginal, but you get the idea.
>
> Maybe we should instead change the analyze hook script to do that
> internally? Setting PGOPTIONS should be enough:
>
>     PGOPTIONS="-csynchronous_commit=local"
>
> > Perhaps this is too much black magic, not sure.  But I wanted to share my experience.
>
> There is already some code in pg_upgradecluster that works around
> black magic problems:
>
>     # ensure we can upgrade DBs with default read-only transactions
>     $ENV{PGOPTIONS} .= " -cdefault_transaction_read_only=off";
>
> This would just add one more wart of that kind.

That looks like a better way to do it, I agree.

> That's also the command that does the final cluster start for
> production, I wouldn't want such a change to persist.

That makes me wonder.  I don't think that it is a great idea to start the cluster for
production with "synchronous_commit" set to a non-standard value.  That would mask the
problem initially, but whenever the cluster is restarted, the parameter is back to its
original value, and suddenly things will stop working.

I can imagine two ways to do better:

1. Set "synchronous_commit=local" during the ANALYZE, then restart the cluster without
   overriding any parameters.  Then the upgrade itself would work, and the problem with
   the missing synchronous standby would manifest right when the user starts using the
   upgraded cluster.  Better than having things start failing after the next restart,
   perhaps months later, when it is harder to trace the problem to its original cause!

2. Add a pre-upgrade check that just aborts with a useful error message if
   "synchronous_standby_names" is set.

Yours,
Laurenz Albe



Re: pg_upgradecluster and synchronous replication

От
Laurenz Albe
Дата:
On Wed, 2026-01-14 at 20:54 +0100, I wrote:
> On Wed, 2026-01-14 at 20:39 +0100, Christoph Berg wrote:
> >
> > Maybe we should instead change the analyze hook script to do that
> > internally? Setting PGOPTIONS should be enough:
> >
> >     PGOPTIONS="-csynchronous_commit=local"
> >
> >
> > There is already some code in pg_upgradecluster that works around
> > black magic problems:
> >
> >     # ensure we can upgrade DBs with default read-only transactions
> >     $ENV{PGOPTIONS} .= " -cdefault_transaction_read_only=off";
> >
> > This would just add one more wart of that kind.
>
> That looks like a better way to do it, I agree.
>
>
> at makes me wonder.  I don't think that it is a great idea to start the cluster for
> production with "synchronous_commit" set to a non-standard value.  That would mask the
> problem initially, but whenever the cluster is restarted, the parameter is back to its
> original value, and suddenly things will stop working.

Forget that: I missed that with PGOPTIONS, the setting is only changed for
the database session that performs the ANALYZE, not for the entire server
runtime.  Yes, I think that is the proper solution!

Yours,
Laurenz Albe



Re: pg_upgradecluster and synchronous replication

От
Laurenz Albe
Дата:
On Thu, 2026-01-15 at 05:40 +0100, I wrote:
> > On Wed, 2026-01-14 at 20:39 +0100, Christoph Berg wrote:
> > >
> > > Maybe we should instead change the analyze hook script to do that
> > > internally? Setting PGOPTIONS should be enough:
> > >
> > >     PGOPTIONS="-csynchronous_commit=local"
>
> Yes, I think that is the proper solution!

How about the attached patch?

Yours,
Laurenz Albe

Вложения

Re: pg_upgradecluster and synchronous replication

От
Christoph Berg
Дата:
Re: Laurenz Albe
> How about the attached patch?

Thanks, pushed with a minor update - with set -e, we need to use
${PGOPTIONS:-} instead of $PGOPTIONS.

Christoph



Re: pg_upgradecluster and synchronous replication

От
Laurenz Albe
Дата:
On Mon, 2026-01-19 at 15:38 +0100, Christoph Berg wrote:
> Re: Laurenz Albe
> > How about the attached patch?
>
> Thanks, pushed with a minor update - with set -e, we need to use
> ${PGOPTIONS:-} instead of $PGOPTIONS.

Thank you!

Yours,
Laurenz Albe