Re: Sync Rep: First Thoughts on Code

Поиск
Список
Период
Сортировка
От Aidan Van Dyk
Тема Re: Sync Rep: First Thoughts on Code
Дата
Msg-id 20081211142712.GW26596@yugib.highrise.ca
обсуждение исходный текст
Ответ на Re: Sync Rep: First Thoughts on Code  (Simon Riggs <simon@2ndQuadrant.com>)
Ответы Re: Sync Rep: First Thoughts on Code  (Simon Riggs <simon@2ndQuadrant.com>)
Список pgsql-hackers
* Simon Riggs <simon@2ndQuadrant.com> [081211 05:45]:
> 
> On Wed, 2008-12-10 at 15:06 -0500, Aidan Van Dyk wrote:
> 
> > Call me think, but I'm confused... In sync rep, there *can't be* any
> > catchign up do do... i.e. if the "slave" isn't accepting the WAL the
> > master "stops" doing *anything*...
> 
> In normal/steady state, yes, you are correct. But there is more...
> 
> The simplest way to configure standby would be to freeze the primary
> while we setup the standby and then go straight into normal/steady
> state. That could mean hours of downtime for large databases, which is
> unacceptable in a feature aimed at increasing availability. So we need
> to allow the primary to continue working while the standby is setup.
> That then creates a log gap between the LSN of the primary and the LSN
> of the standby, which must be resolved.
> 
> So the catchup occurs during the transient initial phase when standby is
> catching up with primary before they continue together in normal/steady
> state. 

But "catchup" *has* to be *done* before PostgreSQL can enter "sync rep".

So, if I start PostgreSQL in sync rep mode, without any capable clients
to rep with....  But I'ld rather be buggered there then find out tonight
at 3am that it was in sync rep mode but wasn't really doing sync rep,
becus I'ld messed up something somewhere (firewall, config, password,
anything) and ther ewas not "caught up" client at the time, and I've
just lost a days' worth of my $$$$$ transactions...

> Most of the architectural discussion over last few months has been about
> the need for the initial state and how to handle it. Most of the code
> complexity also.

Well, for me, I'm quite happy with a "restart/stop&start" being a
necessary "downtime" to move to synchronous replication.  This way, I
could see a "setup" routing that looks like:
1) Current "production" DB does normal backups/PITR/WAL archiving
2) I setup new "slave", which involves  - restore from backup + wal recover (pg_standby type)  - Could take days+++  -
Ohwell....
 
3) Stop production
4) so, now slave is caught up...
5) Start "production" now in sync rep mode as master
6) start slave in sync-rep mode as slave...

So downtime would be limited to the time from the old postmaster
shutdown to the time the slave has replayed the last WAL and connected
to the restarted postmaster as a sync rep slave...

Or am I way too naive to think that a small downtime to "switch" from
non-sync-rep to sync-rep is acceptable...

a.
-- 
Aidan Van Dyk                                             Create like a god,
aidan@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: visibility maps
Следующее
От: Dmitry Turin
Дата:
Сообщение: Re: COCOMO & Indians