Re: Better Upgrades

Поиск
Список
Период
Сортировка
От Craig Ringer
Тема Re: Better Upgrades
Дата
Msg-id CAMsr+YHYMfq7f36nsObuzrnOUKGSyGtieh7rb36-+vP9AHw=7g@mail.gmail.com
обсуждение исходный текст
Ответ на Better Upgrades  (David Fetter <david@fetter.org>)
Список pgsql-hackers
On 6 February 2018 at 08:09, David Fetter <david@fetter.org> wrote:
Folks,

While chatting with Bruce about how to make something better than
pg_upgrade, we (and by "we," I mean mostly Bruce) came up with the
following.

What needs improvement:

- pg_upgrade forces a down time event, no matter how cleverly it's done.
- pg_upgrade is very much a blocker for on-disk format changes.

The proposal:

- Add a new script--possibly Perl or Bash, which would:
    - Initdb a new cluster with the new version of PostgreSQL and a
      different port.
    - Start logical replication from the old version to the new
      version.
    - Poll until a pre-determined default amount of replication lag was observed, then:
      * Issue an ALTER SYSTEM on the new server to change its port to the old server's
      * Issue a pg_ctl stop -w to the old server
      * Issue a pg_ctl restart on the new server
      * Happiness!

Assumptions underlying it:

- Disk and similar resources are cheap enough for most users that
  doubling up during the upgrade is feasible.
- The default upgrade path should require exactly one step.
- Errors do not, by and large, have the capacity to violate an SLA.

The proposal has blockers:

- We don't actually have logical decoding for DDL, although I'm given
  to understand that Álvaro Herrera has done some yeoman follow-up
  work on Dimitri Fontaine's PoC patches.

Yep, some DDL support would be key. Lots of apps expect to do DDL online (automatic migrations, etc), and after all it's been one of Pg's selling points for a long time. Not having it means people have to do much more prep - and *will* ignore that prep and break things if you expose it as an easy to use tool.

You will find dealing with DDL that does full table rewrites to be a challenge. Especially ALTER TABLEs that add a DEFAULT using a non-IMMUTABLE expression and ALTER TYPE ... USING with a non-IMMUTABLE expression. But even for the immutable cases some work is needed to let us cleanly replicate the DDL.

Some DDL may initially need to be disallowed because it plays poorly with logical decoding, e.g. CREATE INDEX CONCURRENTLY. It's fixable, it just requires special case handling where the apply side understands that specific statement and knows how to recover if we have an error partway through applying it.

We (2ndQuadrant) routinely do online migrations like this using pglogical, which is a much more capable tool than in-core logical replication currently is. But it still requires careful preparation and application qualification, and preferably a trial run. Improvements in schema change / DDL handling would be needed to address that.

Logical decoding doesn't support a number of postgres features, which still rules its use out for some customers. No sequence support (though pglogical works around that with periodic sequence sync). No pg_largeobject support. Probably more I'm forgetting.

Long txns and txns that change huge numbers of rows, especially if they also dirty the catalogs (even temp tables!) are a challenge. Performance isn't great for them, but more importantly we don't start decoding them until they commit so the replication latency for them can be very large. Care is required to time a cutover so you don't land up stopping writes to the old db then waiting ages before the new db is caught up.

 
- We don't have logical decoding for DCL (GRANT/REVOKE)

In general there are issues with any command affecting the cluster as a whole: capturing them, ensuring replay order between them if run in different dbs, etc.

The simplest option would be to disallow them during migration (make them ERROR). Or possibly restrict them to a single DB where we permit them and add them to a DDL replication queue.

If you let them run in any db and capture them wherever they ran you run into order-of-apply issues because the logical change streams aren't synchronised between the DBs.
 

We also came up with and, we believe, addressed an important issue,
namely how to ensure continuity.  When we issue a `pg_ctl stop -w`,
that's short for "Cancel current commands and stop cleanly."  At this
point, the new server will not have WAL to replay, so a pg_ctl restart
will load the new configuration and come up pretty much immediately,
and the next try will find a brand new server without a down time
event.

You'll need to be able to make sure the old server stays up after the last user commit  for long enough to flush all pending WAL to the new server. There may be committed changes that the new server hasn't applied yet.

If the old server crashes during the grace period you'll have to restart it and retry, since you don't know for sure if the new server got all the changes.

You'll want some kind of read-only mode where you can ensure that exactly one server is writeable by user queries at a time. And to prevent conflicts there'll be some period where both are read-only during the final catchup phase; you have to make the old server read-only, wait until the new server applies pending changes, then make the new server read/write. Some care about when that's done will minimise the write-downtime. And users who're willing to accept conflicts can always force read/write earlier on the new server. But only if they're not using SEQUENCEs.

The cut-over period is needed to make sure all sequences are fully synced over too.

And if 2PC is in use, the cutover must be done during a period where there are no prepared txns on the old master, since we don't currently decode and send txns until COMMIT PREPARED time. So we'd lose prepared-but-not-committed txns if we cut over while they existed.

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tomas Vondra
Дата:
Сообщение: Re: WIP: BRIN multi-range indexes
Следующее
От: Tatsuo Ishii
Дата:
Сообщение: Re: [HACKERS] [PATCH] Lockable views