Re: [HACKERS] WIP: Failover Slots

Поиск
Список
Период
Сортировка
От Craig Ringer
Тема Re: [HACKERS] WIP: Failover Slots
Дата
Msg-id CAMsr+YH0y2V9615s7Aeedzs3DvgrWBt28LP2K3FQi6uKKfJjMw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [HACKERS] WIP: Failover Slots  (Robert Haas <robertmhaas@gmail.com>)
Ответы Re: [HACKERS] WIP: Failover Slots  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
On 3 August 2017 at 04:35, Robert Haas <robertmhaas@gmail.com> wrote:
On Tue, Jul 25, 2017 at 8:44 PM, Craig Ringer <craig@2ndquadrant.com> wrote:
> No. The whole approach seems to have been bounced from core. I don't agree
> and continue to think this functionality is desirable but I don't get to
> make that call.

I actually think failover slots are quite desirable, especially now
that we've got logical replication in core.  In a review of this
thread I don't see anyone saying otherwise.  The debate has really
been about the right way of implementing that.  Suppose we did
something like this:

- When a standby connects to a master, it can optionally supply a list
of slot names that it cares about.

Wouldn't that immediately exclude use for PITR and snapshot recovery? I have people right now who want the ability to promote a PITR-recovered snapshot into place of a logical replication master and have downstream peers replay from it. It's more complex than that, as there's a resync process required to recover changes the failed node had sent to other peers but isn't available in the WAL archive, but that's the gist.

If you have a 5TB database do you want to run an extra replica or two because PostgreSQL can't preserve slots without a running, live replica? Your SAN snapshots + WAL archiving have been fine for everything else so far.

Requiring live replication connections could also be an issue for service interruptions, surely?  Unless you persist needed knowledge in the physical replication slot used by the standby to master connection, so the master can tell the difference between "downstream went away for while but will come back" and "downstream is gone forever, toss out its resources."

That's exactly what the catalog_xmin hot_standby_feedback patches in Pg10 do, but they can only tell the master about the oldest resources needed by any existing slot on the replica. Not which slots. And they have the same issues with needing a live, running replica.

Also, what about cascading? Lots of "pull" model designs I've looked at tend to fall down in cascaded environments. For that matter so do failover slots, but only for the narrower restriction of not being able to actually decode from a failover-enabled slot on a standby, they still work fine in terms of cascading down to leaf nodes.

- The master responds by periodically notifying the standby of changes
to the slot contents using some new replication sub-protocol message.
- The standby applies those updates to its local copies of the slots.

That's pretty much what I expect to have to do for clients to work on unpatched Pg10, probably using a separate bgworker and normal libpq connections to the upstream since we don't have hooks to extend the walsender/walreceiver.

It can work now that the catalog_xmin hot_standby_feedback patches are in, but it'd require some low-level slot state setting that I know Andres is not a fan of. So I expect to carry on relying on an out-of-tree failover slots patch for Pg 10.

 
So, you could create a slot on a standby with an "uplink this" flag of
some kind, and it would then try to keep it up to date using the
method described above.  It's not quite clear to me how to handle the
case where the corresponding slot doesn't exist on the master, or
initially does but then it's later dropped, or it initially doesn't
but it's later created.

Thoughts?

Right. So the standby must be running and in active communication. It needs some way to know the master has confirmed slot creation and it can rely on the slot's resources really being reserved by the master. That turns out to be quite hard, per the decoding on standby patches. There needs to be some way to tell the master a standby has gone away forever and to drop its dependent slots, so you're not stuck wondering "is slot xxyz from standby abc that we lost in that crash?". Standbys need to cope with having created a slot, only to find out there's a name collision with master.

For all those reasons, I just extended hot_standby_feedback to report catalog_xmin separately to upstreams instead, so the existing physical slot serves all these needs. And it's part of the picture, but there's no way to get slot position change info from the master back down onto the replicas so the replicas can advance any of their own slots and, via feedback, free up master resources. That's where the bgworker hack to query pg_replication_slots comes in. Seems complex, full of restrictions, and fragile to me compared to just expecting the master to do it.

The only objection I personally understood and accepted re failover slots was that it'd be impossible to create a failover slot on a standby and have that standby "sub-tree" support failover to leaf nodes. Which is true, but instead we have noting and no viable looking roadmap toward anything users can benefit from. So I don't think that's the worst restriction in the world.

I do not understand why logical replication slots are exempt from our usual policy that anything that works on the master should be expected to work on failover to a standby. Is there anything persistent across crash for which that's not the case, except grandfathered-in hash indexes? We're hardly going to say "hey, it's ok to forget about prepared xacts when you fail over to a standby" yet this problem with failover and slots in logical decoding and replication is the same sort of showstopper issue for users who use the functionality.

In the medium term I've given up making progress with getting something simple and usable into user hands on this. A tweaked version of failover slots is being carried as an out-of-tree on-disk-format-compatible patch instead, and it's meeting customer needs very well. I've done my dash here and moved on to other things where I can make more progress.

I'd like to continue working on logical decoding on standby support for pg11 too, but even if we can get that in place it'll only work for reachable, online standbys. Every application that uses logical decoding will have to maintain a directory of standbys (which it has no way to ask the master for) and advance their slots via extra walsender connections. They'll do a bunch of unnecessary work decoding WAL they don't need to just to throw the data away. It won't help for PITR and snapshot use cases at all. So for now I'm not able to allocate much priority to that.

I'd love to get failover slots in, I still think it's the simplest and best way to do what users need. It doesn't stop us progressing with decoding on standby or paint us into any corners.

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Amit Kapila
Дата:
Сообщение: Re: [HACKERS] why not parallel seq scan for slow functions
Следующее
От: Ashutosh Bapat
Дата:
Сообщение: Re: [HACKERS] Partition-wise join for join between (declaratively)partitioned tables