Re: pg_upgrade instructions involving "rsync --size-only" might lead to standby corruption?

Поиск
Список
Период
Сортировка
От Michael Banck
Тема Re: pg_upgrade instructions involving "rsync --size-only" might lead to standby corruption?
Дата
Msg-id 64ac71ff.170a0220.ff95d.0d51@mx.google.com
обсуждение исходный текст
Ответ на Re: pg_upgrade instructions involving "rsync --size-only" might lead to standby corruption?  (Nikolay Samokhvalov <nik@postgres.ai>)
Ответы Re: pg_upgrade instructions involving "rsync --size-only" might lead to standby corruption?  (Nikolay Samokhvalov <nik@postgres.ai>)
Список pgsql-hackers
Hi,

On Mon, Jul 10, 2023 at 01:36:39PM -0700, Nikolay Samokhvalov wrote:
> On Fri, Jul 7, 2023 at 6:31 AM Stephen Frost <sfrost@snowman.net> wrote:
> > * Nikolay Samokhvalov (nik@postgres.ai) wrote:
> > > But this can happen with anyone who follows the procedure from the docs
> > as
> > > is and doesn't do any additional steps, because in step 9 "Prepare for
> > > standby server upgrades":
> > >
> > > 1) there is no requirement to follow specific order to shut down the
> > nodes
> > >    - "Streaming replication and log-shipping standby servers can remain
> > > running until a later step" should probably be changed to a
> > > requirement-like "keep them running"
> >
> > Agreed that it would be good to clarify that the primary should be shut
> > down first, to make sure everything written by the primary has been
> > replicated to all of the replicas.
> 
> Thanks!
> 
> Here is a patch to fix the existing procedure description.

Thanks for that!

> I agree with Andrey – without it, we don't have any good way to upgrade
> large clusters in short time. Default rsync mode (without "--size-only")
> takes a lot of time too, if the load is heavy.
> 
> With these adjustments, can "rsync --size-only" remain in the docs as the
> *fast* and safe method to upgrade standbys, or there are still some
> concerns related to corruption risks?

I hope somebody can answer that definitively, but I read Stephen's mail
to indicate that this procedure should be safe in principle (if you know
what you are doing).

> From: Nikolay Samokhvalov <nik@postgres.ai>
> Date: Mon, 10 Jul 2023 20:07:18 +0000
> Subject: [PATCH] Improve major upgrade docs

Maybe mention standby here, like "Improve major upgrade documentation
for standby servers".

> +++ b/doc/src/sgml/ref/pgupgrade.sgml
> @@ -380,22 +380,28 @@ NET STOP postgresql-&majorversion;
>      </para>
>  
>      <para>
> -     Streaming replication and log-shipping standby servers can
> +     Streaming replication and log-shipping standby servers must
>       remain running until a later step.
>      </para>
>     </step>
>  
> -   <step>
> +   <step id="pgupgrade-step-prepare-standbys">
>
>      <para>
> -     If you are upgrading standby servers using methods outlined in section <xref
> -     linkend="pgupgrade-step-replicas"/>, verify that the old standby
> -     servers are caught up by running <application>pg_controldata</application>
> -     against the old primary and standby clusters.  Verify that the
> -     <quote>Latest checkpoint location</quote> values match in all clusters.
> -     (There will be a mismatch if old standby servers were shut down
> -     before the old primary or if the old standby servers are still running.)
> +     If you are upgrading standby servers using methods outlined in 
> +     <xref linkend="pgupgrade-step-replicas"/>, 

You dropped the "section" before the xref, I think that should be kept
around.

> +                                                ensure that they were running when 
> +     you shut down the primaries in the previous step, so all the latest changes 

You talk of primaries in plural here, that is a bit weird for PostgreSQL
documentation.

> +     and the shutdown checkpoint record were received. You can verify this by running 
> +     <application>pg_controldata</application> against the old primary and standby 
> +     clusters. The <quote>Latest checkpoint location</quote> values must match in all 
> +     nodes. A mismatch might occur if old standby servers were shut down before 
> +     the old primary. To fix a mismatch, start all old servers and return to the 
> +     previous step; proceeding with mismatched 
> +     <quote>Latest checkpoint location</quote> may lead to standby corruption.
> +    </para>
> +
> +    <para>
>       Also, make sure <varname>wal_level</varname> is not set to
>       <literal>minimal</literal> in the <filename>postgresql.conf</filename> file on the
>       new primary cluster.
> @@ -497,7 +503,6 @@ pg_upgrade.exe
>       linkend="warm-standby"/>) standby servers, you can follow these steps to
>       quickly upgrade them.  You will not be running <application>pg_upgrade</application> on
>       the standby servers, but rather <application>rsync</application> on the primary.
> -     Do not start any servers yet.
>      </para>
>  
>      <para>
> @@ -508,6 +513,15 @@ pg_upgrade.exe
>       is running.
>      </para>
>  
> +    <para>
> +     Before running rsync, to avoid standby corruption, it is absolutely
> +     critical to ensure that both primaries are shut down and standbys 
> +     have received the last changes (see <xref linkend="pgupgrade-step-prepare-standbys"/>). 

I think this should be something like "ensure both that the primary is
shut down and that the standbys have received all the changes".

> +     Standbys can be running at this point or fully stopped.

"or be fully stopped." I think.

> +                                                             If they 
> +     are still running, you can stop, upgrade, and start them one by one; this
> +     can be useful to keep the cluster open for read-only transactions.
> +    </para>

Maybe this is clear from the context, but "upgrade" in the above should
maybe more explicitly refer to the rsync method or else people might
think one can run pg_upgrade on them after all?


Michael



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Nathan Bossart
Дата:
Сообщение: Re: Preventing non-superusers from altering session authorization
Следующее
От: "Tristan Partin"
Дата:
Сообщение: Re: Refactoring backend fork+exec code