Re: [HACKERS] WIP: Failover Slots

Поиск
Список
Период
Сортировка
От Craig Ringer
Тема Re: [HACKERS] WIP: Failover Slots
Дата
Msg-id CAMsr+YGX_p9M9mj8X1ExWAWVjis6bQHjqPmUVhZA2_KkYYJ0EQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [HACKERS] WIP: Failover Slots  (Craig Ringer <craig@2ndquadrant.com>)
Список pgsql-hackers


On 14 August 2017 at 11:56, Craig Ringer <craig@2ndquadrant.com> wrote:

I don't want to block failover slots on decoding on standby just because decoding on standby would be nice to have.

However, during discussion with Tomas Munro a point has come up that does block failover slots as currently envisioned - silent timeline divergence. It's a solid reason why the current design and implementation is insufficient to solve the problem. This issue exists both with the original failover slots and with the model Robert and I were discussing.

Say a decoding client has replayed from master up to commit of xid 42 at 1/1000 and confirmed flush, then a failover slots standby of the master is promoted. The standby has only received WAL from the failed master up to 1/500 with most recent xid 20. Now the standby does some other new xacts, pushing xid up to 30 at 1/1000 then continuing to insert until xid 50 at lsn 1/2000.

Then the logical client reconnects. The logical client will connect to the failover slot fine, and start replay. But it'll ask for replay to start at 1/1000. The standby will happily fast-forward the slot (as it should), and start replay after 1/1000.

But now we have silent divergence in timelines. The logical replica has received and committed xacts 20...42 at lsn 1/500 through 1/1000, but these are not present on the promoted master. And the replica has skipped over the new-master's xids 20...30 with lsns 1/500 through 1/1000, so they're present on the new master but not the replica.

IMO, this shows that not including the timeline in replication origins was a bit of a mistake, since we'd trivially detect this if they were included - but it's a bit late now.  And anyway, detection would just mean logical rep would break, which doesn't help much.

The simplest fix, but rather limited, is to require that failover candidates be in synchronous_standby_names, and delay ReorderBufferCommit sending the actual commit message until all peers in s_s_n confirm flush of the commit lsn. But that's not much good if you want sync rep for your logical connections too, and is generally a hack.

A more general solution requires that masters be told which peers are failover candidates, so they can ensure ordering between logical decoding and physical failover candidates. Which effectively adds another kind of sync rep, where we do "wait for physical failover candidates to flush, and only then allow logical decoding". This actually seems pretty practical with the design Robert and I discussed, but it's definitely an expansion in scope.

Alternately, we could require the decoding clients to keep an eye on the flush/replay positions of all failover candidates and delay commit+confirm of decoded xacts until the upstream's failover candidates have received and flushed up to that lsn. Theat starts to look at lot like a decoding on standby based model for logical failover, where the downstream maintains slots on each failover candidate upstream.

So yeah. More work needed here. Even if we suddenly decided the original failover slots model was OK, it's not sufficient to fully solve the problem.

(It's something I'd thought for BDR failover, but never applied to falover slots: the problem of detecting or preventing divergence when the logical client is ahead of physical receive at the time the physical standby is promoted.)

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Michael Paquier
Дата:
Сообщение: Re: [HACKERS] Setting pd_lower in GIN metapage
Следующее
От: Noah Misch
Дата:
Сообщение: [HACKERS] Re: DROP SUBSCRIPTION hangs if sub is disabled in the sametransaction