Re: [HACKERS] WIP: Failover Slots

Поиск
Список
Период
Сортировка
От Craig Ringer
Тема Re: [HACKERS] WIP: Failover Slots
Дата
Msg-id CAMsr+YGbc7vnf+BAEYxbEGG6=4hp=yJ5YXTgZAFm2fqyGa1hew@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [HACKERS] WIP: Failover Slots  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
On 11 August 2017 at 01:02, Robert Haas <robertmhaas@gmail.com> wrote:
 
Well,
anybody's welcome to write code without discussion and drop it to the
list, but if people don't like it, that's the risk you took by not
discussing it first.

Agreed, patches materializing doesn't mean they should be committed, and there wasn't prior design discussion on this.

It can be hard to elicit it without a patch, but clearly not always, we're doing a good job of it here.
 
> When a replica connects to an upstream it asks via a new walsender msg "send
> me the state of all your failover slots". Any local mirror slots are
> updated. If they are not listed by the upstream they are known deleted, and
> the mirror slots are deleted on the downstream.

What about slots not listed by the upstream that are currently in use?

Yes, it'll also need to send a list of its local owned and up-mirrored failover slots to the upstream so the upstream can create them or update their state.

> There's one big hole left here. When we create a slot on a cascading leaf or
> inner node, it takes time for hot_standby_feedback to propagate the needed
> catalog_xmin "up" the chain. Until the master has set the needed
> catalog_xmin on the physical slot for the closest branch, the inner node's
> slot's catalog_xmin can only be tentative pending confirmation. That's what
> a whole bunch of gruesomeness in the decoding on standby patch was about.
>
> One possible solution to this is to also mirror slots "up", as you alluded
> to: when you create an "owned" slot on a replica, it tells the master at
> connect time / slot creation time "I have this slot X, please copy it up the
> tree". The slot gets copied "up" to the master via cascading layers with a
> different failover slot type indicating it's an up-mirror. Decoding clients
> aren't allowed to replay from an up-mirror slot and it cannot be promoted
> like a down-mirror slot can, it's only there for resource retention. A node
> knows its owned slot is safe to actually use, and is fully created, when it
> sees the walsender report it in the list of failover slots from the master
> during a slot state update.

I'm not sure that this actually prevents the problem you describe.  It
also seems really complicated.  Maybe you can explain further; perhaps
there is a simpler solution (or perhaps this isn't as complicated as I
currently think it is).


It probably sounds more complex than it is. A slot is created tentatively and marked not ready to actually use yet when created on a standby. It flows "up" to the master where it's created as permanent/ready. The permanent/ready state flows back down to the creator. 

When we see a temp slot become permanent we copy the restart_lsn/catalog_xmin/confirmed_flush_lsn from the upstream slot in case the master had to advance them from our tentative values when it created the slot. After that, slot state updates only flow "out" from the owner: up the tree for up-mirror slots, down the tree for down-mirror slots.

Diagram may help. I focused only on the logical slot created on standby case, since I think we're happy with the rest already and I don't want to complicate it.

GMail will probably HTMLize this, sorry:


                          Phys rep          Phys rep
                          using phys        using
                          slot "B"          phys slot "C"
                +-------+         +--------+         +-------+
 T              |  A    <^--------+ B      <---------+ C     |
 I              |       |         |        |         |       |
 M              +-------+         +--------+         +-------+
 E                 |                  |                  |
 |                 |                  |                  |CREATEs
 |                 |                  |                  |logical slot X
 v                 |                  |                  |("owned")
                   |                  |                  |as temp slot
                   |                  +<-----------------+
                   |                  |Creates upmirror  |
                   |                  |slot "X" linked   |
                   |                  |to phys slot "C"  |
                   |                  |marked temp       |
                   | <----------------+                  |
                   |Creates upmirror  |                  | <--------------------------+   +-----------------+
                   |slot "X" linked   |                  |   Attempt to decode from "X"   |                 |
                   |to phys slot "B"  |                  |                                | CLIENT          |
                   |marked permanent  |                  |  +------------------------->   |                 |
                   +----------------> |                  |   ERROR: slot X still being    +-----------------+
                   |                  |Sees upmirror     |   created on master, not ready
                   |                  |slot "X" in       |
                   |                  |list from "A",    |
                   |                  |marks it          |
                   |                  |permanent and     |
                   |                  |copies state      |
                   |                  +----------------> |
                   |                  |                  |Sees upmirror slot
                   |                  |                  |"X" on "B" got marked
                   |                  |                  |permanent (because it
                   |                  |                  |appears in B's slot
                   |                  |                  |listings),
                   |                  |                  |marks permanent on C.
                   |                  |                  |Copies state.
                   |                  |                  |
                   |                  |                  |Slot "X" now persistent
                   |                  |                  |and (when decoding on standby
                   |                  |                  |supported) can be used for decoding
                   |                  |                  |on standby.
                   +                  +                  +



To actually use the slot once decoding on standby is supported: a decoding client on "C" can consume xacts and cause slot "X" to advance catalog_xmin, confirmed_flush_lsn, etc. walreceiver on "C" will tell walsender on "B" about the new slot state, and it'll get synced up-tree, then B will tell A.

Since slot is already marked permanent, state won't get copied back down-tree, that only happens once when slot is first fully created on master.

Some node "D" can exist as a phys rep of "C". If C fails and is replace with D, admin can promote the down-mirror slot on "D" to an owned slot.


Make sense?
 

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Craig Ringer
Дата:
Сообщение: Re: [HACKERS] Thoughts on unit testing?
Следующее
От: Amit Langote
Дата:
Сообщение: Re: [HACKERS] dubious error message from partition.c