Re: Hot standby, recovery infra

Поиск
Список
Период
Сортировка
От Heikki Linnakangas
Тема Re: Hot standby, recovery infra
Дата
Msg-id 498ACB0D.9010307@enterprisedb.com
обсуждение исходный текст
Ответ на Re: Hot standby, recovery infra  (Simon Riggs <simon@2ndQuadrant.com>)
Ответы Re: Hot standby, recovery infra
Список pgsql-hackers
Simon Riggs wrote:
> On Thu, 2009-02-05 at 11:46 +0200, Heikki Linnakangas wrote:
>> Simon Riggs wrote:
> 
>>> So we might end up flushing more often *and* we will be doing it
>>> potentially in the code path of other users.
>> For example, imagine a database that fits completely in shared buffers. 
>> If we update at every XLogFileRead, we have to fsync every 16MB of WAL. 
>> If we update in XLogFlush the way I described, you only need to update 
>> when we flush a page from the buffer cache, which will only happen at 
>> restartpoints. That's far less updates.
> 
> Oh, did you change the bgwriter so it doesn't do normal page cleaning? 

No. Ok, that wasn't completely accurate. The page cleaning by bgwriter 
will perform XLogFlushes, but that should be pretty insignificant. When 
there's little page replacement going on, bgwriter will do a small 
trickle of page cleaning, which won't matter much. If there's more page 
replacement going on, bgwriter is cleaning up pages that will soon be 
replaced, so it's just offsetting work from other backends (or the 
startup process in this case).

>> Expanding that example to a database that doesn't fit in cache, you're 
>> still replacing pages from the buffer cache that have been untouched for 
>> longest. Such pages will have an old LSN, too, so we shouldn't need to 
>> update very often.
> 
> They will tend to be written in ascending LSN order which will mean we
> continually update the control file. Anything out of order does skip a
> write. The better the cache is at finding LRU blocks out the more writes
> we will make.

When minRecoveryPoint is updated, it's not update to just the LSN that's 
being flushed. It's updated to the recptr of the most recently read WAL 
record. That's an important point to avoid that behavior. Just like 
XLogFlush normally always flushes all of the outstanding WAL, not just 
up to the requested LSN.

>> I'd like to have the extra protection that this approach gives. If we 
>> let safeStartPoint to be ahead of the actual WAL we've replayed, we have 
>> to just assume we're fine if we reach end of WAL before reaching that 
>> point. That assumption falls down if e.g recovery is stopped, and you go 
>> and remove the last few WAL segments from the archive before restarting 
>> it, or signal pg_standby to trigger failover too early. Tracking the 
>> real safe starting point and enforcing it always protects you from that.
> 
> Doing it this way will require you to remove existing specific error
> messages about ending before end time of backup, to be replaced by more
> general ones that say "consistency not reached" which is harder to
> figure out what to do about it.

Yeah. If that's an important distinction, we could still save the 
original backup stop location somewhere, just so that we can give the 
old error message when we've not passed that location. But perhaps a 
message like "WAL ends before reaching a consistent state" with a hint 
"Make sure you archive all the WAL created during backup" or something 
would do suffice.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Fujii Masao
Дата:
Сообщение: Re: Synch Replication
Следующее
От: Simon Riggs
Дата:
Сообщение: Re: Hot standby, recovery infra