Idea: recycle WAL segments, don't delete/recreate 'em
От | Tom Lane |
---|---|
Тема | Idea: recycle WAL segments, don't delete/recreate 'em |
Дата | |
Msg-id | 24901.995381770@sss.pgh.pa.us обсуждение исходный текст |
Ответы |
Re: Idea: recycle WAL segments, don't delete/recreate 'em
(Bruce Momjian <pgman@candle.pha.pa.us>)
|
Список | pgsql-hackers |
I have noticed that a large fraction of the I/O done by 7.1 is associated with initializing new segments of the WAL log for use. (We have to physically fill each segment with zeroes to ensure that the system has actually allocated a whole 16MB to it; otherwise we fall victim to the "hole-saving" allocation technique of most Unix filesystems.) I just had an idea about how to avoid this cost: why not recycle old log segments? At the point where the code currently deletes a no-longer-needed segment, just rename it to become the next created-in-advance segment. With this approach, shortly after installation the system would converge to a steady state with a constant number of WAL segments (basically CHECKPOINT_SEGMENTS + WAL_FILES + 1, maybe one or two more if load is really high). So, in addition to eliminating initialization writes, we would also reduce the metadata traffic (inode and indirect blocks) to a very low level. That has to be good both for performance and for improving the odds that the WAL files will survive a system crash. The sole disadvantage I can see to this approach is that a recycled segment would not contain zeroes, but valid WAL records. We'd need to take care that in a recovery situation, we not mistake old records beyond the last one we actually wrote for new records we should redo. While checking the xl_prev back-pointers in each record should be sufficient to detect this, I'd feel more comfortable if we extended the XLogPageHeader record to contain the file/segment number that it belongs to. This'd cost an extra 8 bytes per 8K XLOG page, which seems worth it to me. Another issue is whether the recycling logic should be "always recycle" (hence number of extant WAL segments will never decrease), or should it be more like "recycle if there are fewer than WAL_FILES advance segments, else delete". If we were supporting WAL-based UNDO then I think it'd have to be the latter, so that we could reduce the WAL usage from a peak created by a long-running transaction. But with the present logic that the WAL log is truncated after each checkpoint, I think it'd be better just to never delete. Otherwise, the behavior is likely to be that the system varies between N and N+1 extant segments due to roundoff effects (ie, depending on just where you are in the current segment when a checkpoint happens). That's exactly what we do not want. A possible answer is "recycle if there are fewer than WAL_FILES + SLOP advance files, else delete", where SLOP is (say) about three or four segments. That would avoid unwanted oscillations in the number of extant files, while still allowing decrease from a peak for UNDO. Comments, better ideas? regards, tom lane
В списке pgsql-hackers по дате отправления:
Предыдущее
От: Tom LaneДата:
Сообщение: Re: ALTER TABLE ADD COLUMN column SERIAL -- unexpected results