Re: Synch Rep for CommitFest 2009-07

Поиск
Список
Период
Сортировка
От Dimitri Fontaine
Тема Re: Synch Rep for CommitFest 2009-07
Дата
Msg-id 87fxcxnjwt.fsf@hi-media-techno.com
обсуждение исходный текст
Ответ на Re: Synch Rep for CommitFest 2009-07  (Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>)
Список pgsql-hackers
Hi,

Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
> I think a better way to address that need is to provide a built-in
> mechanism for the standby to request a base backup and have it sent over
> the wire. That makes the initial setup very easy.

Great idea :) 

So I'll reproduce the sketch I did in this other mail, adding the 'base'
state where the prerequisite base backup is handled, that will help
clarify the next points:
0. base: slave asks the master for a base-backup, at the end of this it   reaches the base-lsn
1. init: slave asks the master the current LSN and start streaming WAL
2. setup: slave asks the master for missing WALs from its base-lsn to   this LSN it just got, and apply them all to
reachinitial LSN (this   happens in parallel to 1.)
 
3. catchup: slave has replayed missing WALs and now is replaying the   stream he received in parallel, and which
appliesfrom init LSN   (just reached)
 
4. sync: slave is applying the stream as it gets it, either as part of   the master transaction or not depending on the
GUCsettings
 

> The situation arises also when the standby falls badly behind. A simple
> solution to that is to add a switch in the master to specify "always
> keep X MB of WAL in pg_xlog". The standby will then still find it in
> pg_xlog, making it harder for a standby to fall so much behind that it
> can't find the WAL it needs in the primary anymore. Tom suggested that
> we can just give up and re-sync with a new base backup, but that really
> requires built-in base backup capability, and is only practical for
> small databases.

I think that when the standby is back in business after a connection
glitch (or any other transient error), its current internal state is
still 'sync' and walreceiver asks for next LSN (RedoPTR?). Now, 2 cases
are possible:
a. primary still has it handy, so the standby is still in sync but   lagging behind (and primary knows how much)
b. primary is not able to provide the requested WAL entry, so the slave   is back to 'setup' state, with base-lsn the
pointreached just   before loosing sync (the one walreceiver just asked for).
 

Now, a standby in 'setup' state isn't ready (yet), and for example
synchronous replication won't be possible in this state: we can't ask
the primary to refuse to COMMIT any transaction (holding it, eg) while a
standby hasn't reached 'sync' state.

The way your talking about the issue make me think there's a mix between
how to handle a lagging standby and an out-of-sync standby. For clarity,
I think we should have very distinct states and responses. And yes, as
Tom and you keep saying, a synced standby by definition should not need
any access to its primary archives. So if it does, it's no more in sync.

> I think we should definitely have both those features, but it's not
> urgent. The replication works without them, although requires that you
> set up traditional archiving as well.

Agreed, it's not essential for the feature as far as hackers are
concerned.

Regards,
-- 
dim


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Peter Eisentraut
Дата:
Сообщение: Re: Mostly Harmless: c++reserved - patch 1 of 4
Следующее
От: Jaime Casanova
Дата:
Сообщение: Review: support for multiplexing SIGUSR1