Re: upgrading from pg 9.3 to 10

Поиск
Список
Период
Сортировка
От bricklen
Тема Re: upgrading from pg 9.3 to 10
Дата
Msg-id CAGrpgQ-fe5XAP0i-AkZEEnrNYv2m4fGEu7PVAAzKbarXnhgRhw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: upgrading from pg 9.3 to 10  (Stephen Frost <sfrost@snowman.net>)
Список pgsql-general
Hi Stephen, thanks for the feedback, it is much appreciated!

On Mon, Aug 20, 2018 at 12:15 PM Stephen Frost <sfrost@snowman.net> wrote:
* bricklen (bricklen@gmail.com) wrote:
> Using pg_upgrade, it is definitely possible to upgrade 9.3 to 10 in one
> jump. We did this in production earlier this year for 1500 Postgres
> clusters.
> At https://bricklen.github.io/2018-03-27-Postgres-10-upgrade/ I documented
> (in excrutiating detail, perhaps) how we upgraded those clusters from 9.3
> to 10 with sub-15 minute downtime per cluster. The only real issues we ran
> into were some corrupted indexes that appeared to be related to 10.1 and
> 10.2. Upgrading to 10.3/10.4 and reindexing fixed those particular
> corrupted indexes.

Interesting write-up.  A few points:

#1: I'd strongly recommend including something in the write-up about
checking for unlogged tables.  Those can add quite a bit of time to the
rsync if they aren't truncated before the primary cluster is shut down.

A note about UNLOGGED has been added to the page, thanks.

 
#2: The issue with timelines leads me to suspect that you had a
restore_command defined and that when PG started up, it found the
timeline history files from the old cluster.  If you don't have a
restore_command set for any of these then I'm very curious what
happened.  The other possibility (though I wouldn't have expected a
timeline complaint from PG...) is that the replica wasn't fully up to
date for whatever reason.

Entirely plausible, unfortunately I don't recall what all the settings were in the replicas, nor what the actual errors were. One puzzling aspect was that every one of those 1500 clusters had the same primary and replica settings, and every one of them had been failed back and forth dozens of times before we upgraded to PG10.
The replica not being fully up to date makes more sense than rsync copying over history files only part of the time, since we were less focused on the remote replica than ensuring the primary was successfully upgraded.

 
#3: There's a number of checks discussed in the upgrade documentation
around using the rsync-based method, but it doesn't seem like you did
those.  A mention of them might be good. 

If you mean #8 from the docs, where it says "If you are upgrading standby servers using methods outlined in section Step 10, verify that the old standby servers are caught up by running pg_controldata against the old primary and standby clusters. Verify that the “Latest checkpoint location” values match in all clusters. (There will be a mismatch if old standby servers were shut down before the old primary.) ", we found that the pg_controldata output from the hot standby never matched the output from the primary. When I was investigating at the time, I read a note somewhere that if the replica was not using Streaming Replication the pg_controldata output was unlikely to match. It's possible I misunderstood (or I've forgotten what I read), but the output did not match in the dozen-ish times we tested that particular aspect. It's entirely possible it was due to me bungling something, but I was careful to checkpoint + switch wal files + shut down the primary cleanly before checking that all the WALs from the primary were shipped to the hot standby and were applied successfully, before gracefully shutting down the hot standby.  In all that testing, the systems were quiescent, as we'd already blocked access to the primary.
If you meant other checks, I'd be happy to add a note about them if you can point out which ones they were. It's possible we overlooked something in our planning and execution of the pg_upgrade steps.

 
Note that these are
particularly important because the rsync will *not* copy over changes to
the data files except in the relatively rare case of the relfilenode's
size changing (given that most of them are 1G, that's not too likely).
The note you have about the rsync taking more time due to "if the
remote replica was fully caught up when it was shut down" isn't
accurate- there is no WAL replay that happens on the replica using this
method to 'catch up' and if WAL replay was required to have this process
be correct then it simply wouldn't ever work.

No doubt the problem exists between my ears and I'm making invalid assumptions about the problems we ran into.
What I meant say -and please correct me if this still inaccurate- was that once the primary was stopped cleanly, all WALs generated needed to be shipped to the remote replica, where those WALs needed to be applied to bring the replica to a point where it is near-identical the primary. At that point, the rsync would only ship the changes created by the pg_upgrade process itself. The point about being "caught up" was the if the replica hadn't applied all the WALs from the primary, it would have more data files different to the primary, thus resulting in more data to rsync. I'm happy to be corrected if that's wrong.

 
#4: pg_upgrade absolutely resets the timeline to '1', and you shouldn't
ever copy over history files from the old cluster to the new cluster.
The new replicas will *also* be on timeline '1'.

There was no explicit copying over of a history file, but there was nothing preventing that from happening either, so it's probable the rsync did indeed copy it.

 
#5: There's no such thing as a 'timeline decrement'.  The new cluster
(either on the primary or the replica) should only ever see itself as
being on timeline '1' when starting up after the pg_upgrade and before a
promotion happens.

Poor choice of words. pg_upgrade resets the timeline to 1, but the replica in several cases (after the rsync completed) failed to start because it was looking for a higher timeline. That leads me to believe the history file you mentioned above should have been excluded from the replica.

 
#6: In note 33, it's unclear what this is referring to.  There's no WAL
which would have been generated by the pg_upgrade (that process is not
WAL'd).  Perhaps some activity needed to be done on the primary before a
new restorepoint would happen on the replica, due to how pg_basebackup
needs a restorepoint to begin from when working on a replica.

By step #33, both the active primary and the main replica have been started and are accepting client traffic.  This step is for creating the hot standby at the end of the line ("end of the line" as shown in the image at the top of the page), so the WAL isn't from pg_upgrade, but from normal user changes.

 
Thanks!

Stephen

Thanks a lot for reviewing my write-up, and for your explanations. Any corrections are welcome, and all errors in that document are mine that I'd be pleased to fix.

В списке pgsql-general по дате отправления:

Предыдущее
От: Tim Cross
Дата:
Сообщение: Re: Can Pg somehow recognize/honor linux groups to control user access ?
Следующее
От: Grégoire Berclaz
Дата:
Сообщение: postgresql-10 armhf repository