Re: time-delayed standbys
От | Fujii Masao |
---|---|
Тема | Re: time-delayed standbys |
Дата | |
Msg-id | BANLkTikSyWhLtK2eUE=e0Sg0dcBRnjongw@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: time-delayed standbys (Robert Haas <robertmhaas@gmail.com>) |
Ответы |
Re: time-delayed standbys
(Robert Haas <robertmhaas@gmail.com>)
|
Список | pgsql-hackers |
On Fri, Jun 17, 2011 at 3:29 AM, Robert Haas <robertmhaas@gmail.com> wrote: > Even if that were not an issue, I'm still more or less of the opinion > that trying to solve the time synchronization problem is a rathole > anyway. To really solve this problem well, you're going to need the > standby to send a message containing a timestamp, get a reply back > from the master that contains that timestamp and a master timestamp, > and then compute based on those two timestamps plus the reply > timestamp the maximum and minimum possible lag between the two > machines. Then you're going to need to guess, based on several cycles > of this activity, what the actual lag is, and adjust it over time (but > not too quckly, unless of course a large manual step has occurred) as > the clocks potentially drift apart from each other. This is basically > what ntpd does, except that it can be virtually guaranteed that our > implementation will suck by comparison. Time synchronization is > neither easy nor our core competency, and I think trying to include it > in this feature is going to result in a net loss of reliability. Agreed. You've already added the note about time synchronization into the document. That's enough, I think. >>> errmsg("parameter \"%s\" requires a temporal value", "recovery_time_delay"), >> >> We should s/"a temporal"/"an Integer"? > > It seems strange to ask for an integer when what we want is an amount > of time in seconds or minutes... OK. >> http://forge.mysql.com/worklog/task.php?id=344 >> According to the above page, one purpose of time-delayed replication is to >> protect against user mistakes on master. But, when an user notices his wrong >> operation on master, what should he do next? The WAL records of his wrong >> operation might have already arrived at the standby, so neither "promote" nor >> "restart" doesn't cancel that wrong operation. Instead, probably he should >> shutdown the standby, investigate the timestamp of XID of the operation >> he'd like to cancel, set recovery_target_time and restart the standby. >> Something like this procedures should be documented? Or, we should >> implement new "promote" mode which finishes a recovery as soon as >> "promote" is requested (i.e., not replay all the available WAL records)? > > I like the idea of a new promote mode; Are you going to implement that mode in this CF? or next one? > and documenting the other > approach you mention doesn't sound bad either. Either one sounds like > a job for a separate patch, though. > > The other option is to pause recovery and run pg_dump... Yes, please. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
В списке pgsql-hackers по дате отправления: