Re: [HACKERS] Replication origins and timelines

Поиск
Список
Период
Сортировка
От Craig Ringer
Тема Re: [HACKERS] Replication origins and timelines
Дата
Msg-id CAMsr+YE1SZL4oSB9V5U=tS=U6AxXNH8N8xFx-0hNq-BcOCSXeg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [HACKERS] Replication origins and timelines  (Andres Freund <andres@anarazel.de>)
Список pgsql-hackers
On 1 June 2017 at 09:23, Andres Freund <andres@anarazel.de> wrote:
> Hi,
>
> On 2017-06-01 09:12:04 +0800, Craig Ringer wrote:
>> TL;DR: replication origins track LSN without timeline. This is
>> ambiguous when physical failover is present since XXXXXXXX/XXXXXXXX
>> can now represent more than one state due to timeline forks with
>> promotions. Replication origins should track timelines so we can tell
>> the difference, I propose to patch them accordingly for pg11.
>
> I'm not quite convinced that this should be tracked at the origin level.
> If you fail over physically, shouldn't we also reconfigure logical
> replication?
>
> Even if we decide this is necessary, I *strongly* suggest trying to get
> the existing standby decoding etc wrapped up before starting something
> nontrival afresh.

Yeah, I'm not thinking of leaping straight to a patch before we've got
the rep on standby stuff nailed down. I just wanted to raise early
discussion to make sure it's not entirely the wrong path and/or
totally hopeless for core.

Logical decoding output plugins would need to keep track of the
timeline and send an extra message informing the downstream of a
timeline change whenever they see a new timeline. Or include it in all
messages (see: extra overhead). Since we don't stop a decoding session
when we hit a timeline boundary and force re-connection. (Nor can we,
since at some point our restart_lsn will be on the old timeline but
the first commits will be on the new timeline). I'll need to think
about if/how the decoding plugin can reliably do that.

>> Take master A, its physical replica B, and logical decoding client X
>> streaming changes from A. B is lagging. A is at lsn 1/1000, B is only
>> at 1/500. C has replicated from A up to 1/1000, when A fails. We
>> promote B to replace A. Now C connects to B, and requests to resume at
>> LSN 1/1000.
>
> Wouldn't it be better to solve this by querying the new master's
> timeline history, and checking whether the current replay point is
> pre/post fork?

That could work.

The decoding client would need to track the last-commit timeline in
its own metadata if we're not letting it put it in the replication
origin. Manageable, if awkward.

Clients would need to know how to fetch and parse timeline history
files, which is an irritating thing for every decoding client that
wants to support failover to have to support. But I guess it's
manageable, if not friendly. And non-Pg-based downstreams would have
to do it anyway.

-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Stephen Frost
Дата:
Сообщение: Re: [HACKERS] Replication origins and timelines
Следующее
От: Stephen Frost
Дата:
Сообщение: Re: [HACKERS] Replication origins and timelines