Re: Proposed WAL changes
От | Tom Lane |
---|---|
Тема | Re: Proposed WAL changes |
Дата | |
Msg-id | 26035.984071013@sss.pgh.pa.us обсуждение исходный текст |
Ответ на | RE: Proposed WAL changes ("Mikheev, Vadim" <vmikheev@SECTORBASE.COM>) |
Список | pgsql-hackers |
"Mikheev, Vadim" <vmikheev@SECTORBASE.COM> writes: >> And how well will that approach work if the last checkpoint record >> got written near the start of a log segment file, and then the >> checkpointer discarded all your prior log segments because "you don't >> need those anymore"? If the checkpoint record gets corrupted, >> you have no readable log at all. > The question - why should we have it? It is assumed that data files > are flushed before checkpoint appears in log. If this assumtion > is wrong due to *bogus* fsync/disk/whatever why should we increase > disk space requirements which will affect *good* systems too? > What will we buy with extra logs? Just some data we can't > guarantee consistency anyway? > It seems that you want guarantee more than me, Tom -:) No, but I want a system that's not brittle. You seem to be content to design a system that is reliable as long as the WAL log is OK but loses the entire database unrecoverably as soon as one bit goes bad in the log. I'd like a slightly softer failure mode. WAL logs *will* go bad (even without system crashes; what of unrecoverable disk read errors?) and we ought to be able to deal with that with some degree of grace. Yes, we lost our guarantee of consistency. That doesn't mean we should not do the best we can with what we've got left. > BTW, in some my tests size of on-line logs was ~ 200Mb with default > checkpoint interval. So, it's worth to care about on-line logs size. Okay, but to me that suggests we need a smarter log management strategy, not a management strategy that throws away data we might wish we still had (for manual analysis if nothing else). Perhaps the checkpoint creation rule should be "every M seconds *or* every N megabytes of log, whichever comes first". It'd be fairly easy to signal the postmaster to start up a new checkpoint process when XLogWrite rolls over to a new log segment, if the last checkpoint was further back than some number of segments. Comments? > Please convince me that NEXTXID is necessary. > Why add anything that is not useful? I'm not convinced that it's not necessary. In particular, consider the case where we are trying to recover from a crash using an on-line checkpoint as our last readable WAL entry. In the pre-NEXTXID code, this checkpoint would contain the current XID counter and an advanced-beyond-current OID counter. I think both of those numbers should be advanced beyond current, so that there's some safety margin against reusing XIDs/OIDs that were allocated by now-lost XLOG entries. The OID code is doing this right, but the XID code wasn't. Again, it's a question of brittleness. Yes, as long as everything operates as designed and the WAL log never drops a bit, we don't need it. But I want a safety margin for when things aren't perfect. regards, tom lane
В списке pgsql-hackers по дате отправления:
Следующее
От: "Mikheev, Vadim"Дата:
Сообщение: RE: WAL does not recover gracefully from out-of-disk-sp ace