Re: PG Upgrade with hardlinks, when to start/stop master and replicas

Поиск
Список
Период
Сортировка
От Stephen Frost
Тема Re: PG Upgrade with hardlinks, when to start/stop master and replicas
Дата
Msg-id 20190219111152.GJ6197@tamriel.snowman.net
обсуждение исходный текст
Ответ на PG Upgrade with hardlinks, when to start/stop master and replicas  (Martín Fernández <fmartin91@gmail.com>)
Ответы Re: PG Upgrade with hardlinks, when to start/stop master and replicas  (Hellmuth Vargas <hivs77@gmail.com>)
Список pgsql-general
Greetings,

* Martín Fernández (fmartin91@gmail.com) wrote:
> After reading the pg_upgrade documentation multiple times, it seems that after running pg_upgrade on
the primary instance,we can't start it until we run rsync from the primary to the standby. I'm understanding this from
thefollowing section in the pg_upgrade manual page. 
>
> ```
> You will not be running pg_upgrade on the standby servers, but rather rsync on the
>            primary. Do not start any servers yet.
> ```
>
> I'm understanding the `any` as primary and standbys.

Yes, that's correct, you shouldn't start up anything yet.

> On the other hand, we've been doing tests that start the primary instance as soon as pg_upgrade is done. This tests
haveworked perfectly fine so far. We make the rsync call with the primary instance running and the standby can start
lateron after rsync is done and we copy the new configuration files. 

This is like taking an online backup of the primary without actually
doing pg_start_backup / pg_stop_backup and following the protocol for
that, meaning that the replica will start up without a backup_label and
will think it's at whatever point in the WAL stream that the pg_control
file says its at as of whenever the rsync copies that file.

That is NOT SAFE and it's a sure way to end up with corruption.

The rsync while everything is down should be pretty fast, unless you
have unlogged tables that are big (in which case, you should truncate
them before shutting down the primary) or temporary tables left around
(which you should clean up) or just generally other things that a
replica doesn't normally have.

If you can't have any downtime during this process then, imv, the answer
is to build out a new replica that will essentially be a 'throw-away',
move all the read load over to it and then go through the documented
pg_upgrade process with the primary and the other replicas, then flip
the traffic back to the primary + original replicas and then you can
either throw away the replica that was kept online or rebuild it using
the traditional methods of pg_basebackup (or for a larger system, you
could use pgbackrest which can run in parallel and is much, much faster
than pg_basebackup).

> If what we are doing is wrong, we need to run `rsync` before starting the primary instance, that would mean that the
primaryand the standby are not usable if pg10 doesn't start correctly in the primary right ?  

This is another reason why it's good to have an independent replica, as
it can be a fail-safe if things go completely south (you can just
promote it and have it be the primary and then rebuild replicas using
the regular backup+restore method and figure out what went wrong with
the pg10 migration).

Thanks!

Stephen

Вложения

В списке pgsql-general по дате отправления:

Предыдущее
От: Thomas Munro
Дата:
Сообщение: Re: WSL (windows subsystem on linux) users will need to turn fsyncoff as of 11.2
Следующее
От: Vincent Predoehl
Дата:
Сообщение: Plpythonu extension