Checkpoints

*** a/doc/src/sgml/config.sgml --- b/doc/src/sgml/config.sgml *************** *** 1036,1042 **** include 'filename' usually require a corresponding increase in checkpoint_segments, in order to spread out the process of writing large quantities of new or changed data over a ! longer period of time. --- 1036,1042 ---- usually require a corresponding increase in checkpoint_segments, in order to spread out the process of writing large quantities of new or changed data over a ! longer period of time. FIXME: What should we suggest here now? *************** *** 1958,1974 **** include 'filename' Checkpoints ! ! checkpoint_segments (integer) ! checkpoint_segments configuration parameter ! Maximum number of log file segments between automatic WAL ! checkpoints (each segment is normally 16 megabytes). The default ! is three segments. Increasing this parameter can increase the ! amount of time needed for crash recovery. This parameter can only be set in the postgresql.conf file or on the server command line. --- 1958,1977 ---- Checkpoints ! ! checkpoint_wal_size (integer) ! checkpoint_wal_size configuration parameter ! Maximum size to let the WAL grow to between automatic WAL ! checkpoints. This is a soft limit; WAL size can exceed ! checkpoint_wal_size under special circumstances, like ! under heavy load, a failing archive_command, or a high ! wal_keep_segments setting. The default is 256 MB. ! Increasing this parameter can increase the amount of time needed for ! crash recovery. This parameter can only be set in the postgresql.conf file or on the server command line. *************** *** 2028,2033 **** include 'filename' --- 2031,2054 ---- + + min_recycle_wal_size (integer) + + min_recycle_wal_size configuration parameter + + + + As long as WAL disk usage stays below this setting, old WAL files are + always recycled for future use at a checkpoint, rather than removed. + This can be used to ensure that enough WAL space is reserved to + handle spikes in WAL usage, for example when running large batch + jobs. The default is 80 MB. + This parameter can only be set in the postgresql.conf + file or on the server command line. + + + + *** a/doc/src/sgml/perform.sgml --- b/doc/src/sgml/perform.sgml *************** *** 1302,1320 **** SELECT * FROM x, y, a, b, c WHERE something AND somethingelse; ! ! Increase <varname>checkpoint_segments</varname> ! Temporarily increasing the configuration variable can also make large data loads faster. This is because loading a large amount of data into PostgreSQL will cause checkpoints to occur more often than the normal checkpoint frequency (specified by the checkpoint_timeout configuration variable). Whenever a checkpoint occurs, all dirty pages must be flushed to disk. By increasing ! checkpoint_segments temporarily during bulk data loads, the number of checkpoints that are required can be reduced. --- 1302,1320 ---- ! ! Increase <varname>checkpoint_wal_size</varname> ! Increasing the configuration variable can also make large data loads faster. This is because loading a large amount of data into PostgreSQL will cause checkpoints to occur more often than the normal checkpoint frequency (specified by the checkpoint_timeout configuration variable). Whenever a checkpoint occurs, all dirty pages must be flushed to disk. By increasing ! checkpoint-wal-size temporarily during bulk data loads, the number of checkpoints that are required can be reduced. *************** *** 1419,1425 **** SELECT * FROM x, y, a, b, c WHERE something AND somethingelse; Set appropriate (i.e., larger than normal) values for maintenance_work_mem and ! checkpoint_segments. --- 1419,1425 ---- Set appropriate (i.e., larger than normal) values for maintenance_work_mem and ! checkpoint_wal_size. *************** *** 1486,1492 **** SELECT * FROM x, y, a, b, c WHERE something AND somethingelse; So when loading a data-only dump, it is up to you to drop and recreate indexes and foreign keys if you wish to use those techniques. ! It's still useful to increase checkpoint_segments while loading the data, but don't bother increasing maintenance_work_mem; rather, you'd do that while manually recreating indexes and foreign keys afterwards. --- 1486,1492 ---- So when loading a data-only dump, it is up to you to drop and recreate indexes and foreign keys if you wish to use those techniques. ! It's still useful to increase checkpoint_wal_size while loading the data, but don't bother increasing maintenance_work_mem; rather, you'd do that while manually recreating indexes and foreign keys afterwards. *************** *** 1542,1548 **** SELECT * FROM x, y, a, b, c WHERE something AND somethingelse; ! Increase and ; this reduces the frequency of checkpoints, but increases the storage requirements of /pg_xlog. --- 1542,1548 ---- ! Increase and ; this reduces the frequency of checkpoints, but increases the storage requirements of /pg_xlog. *** a/doc/src/sgml/wal.sgml --- b/doc/src/sgml/wal.sgml *************** *** 471,479 **** The server's checkpointer process automatically performs a checkpoint every so often. A checkpoint is begun every log segments, or every seconds, whichever comes first. ! The default settings are 3 segments and 300 seconds (5 minutes), respectively. If no WAL has been written since the previous checkpoint, new checkpoints will be skipped even if checkpoint_timeout has passed. (If WAL archiving is being used and you want to put a lower limit on how --- 471,480 ---- The server's checkpointer process automatically performs a checkpoint every so often. A checkpoint is begun every seconds, or if ! is about to be exceeded, whichever ! comes first. ! The default settings are 5 minutes and 256 MB, respectively. If no WAL has been written since the previous checkpoint, new checkpoints will be skipped even if checkpoint_timeout has passed. (If WAL archiving is being used and you want to put a lower limit on how *************** *** 485,492 **** ! Reducing checkpoint_segments and/or ! checkpoint_timeout causes checkpoints to occur more often. This allows faster after-crash recovery, since less work will need to be redone. However, one must balance this against the increased cost of flushing dirty data pages more often. If --- 486,493 ---- ! Reducing checkpoint_timeout and/or ! checkpoint_wal_size causes checkpoints to occur more often. This allows faster after-crash recovery, since less work will need to be redone. However, one must balance this against the increased cost of flushing dirty data pages more often. If *************** *** 509,519 **** parameter. If checkpoints happen closer together than checkpoint_warning seconds, a message will be output to the server log recommending increasing ! checkpoint_segments. Occasional appearance of such a message is not cause for alarm, but if it appears often then the checkpoint control parameters should be increased. Bulk operations such as large COPY transfers might cause a number of such warnings ! to appear if you have not set checkpoint_segments high enough. --- 510,520 ---- parameter. If checkpoints happen closer together than checkpoint_warning seconds, a message will be output to the server log recommending increasing ! checkpoint_wal_size. Occasional appearance of such a message is not cause for alarm, but if it appears often then the checkpoint control parameters should be increased. Bulk operations such as large COPY transfers might cause a number of such warnings ! to appear if you have not set checkpoint_wal_size high enough. *************** *** 524,533 **** , which is given as a fraction of the checkpoint interval. The I/O rate is adjusted so that the checkpoint finishes when the ! given fraction of checkpoint_segments WAL segments ! have been consumed since checkpoint start, or the given fraction of ! checkpoint_timeout seconds have elapsed, ! whichever is sooner. With the default value of 0.5, PostgreSQL can be expected to complete each checkpoint in about half the time before the next checkpoint starts. On a system that's very close to maximum I/O throughput during normal operation, --- 525,534 ---- , which is given as a fraction of the checkpoint interval. The I/O rate is adjusted so that the checkpoint finishes when the ! given fraction of ! checkpoint_timeout seconds have elapsed, or before ! checkpoint_wal_size is exceeded, whichever is sooner. ! With the default value of 0.5, PostgreSQL can be expected to complete each checkpoint in about half the time before the next checkpoint starts. On a system that's very close to maximum I/O throughput during normal operation, *************** *** 544,561 **** ! There will always be at least one WAL segment file, and will normally ! not be more than (2 + checkpoint_completion_target) * checkpoint_segments + 1 ! or checkpoint_segments + + 1 ! files. Each segment file is normally 16 MB (though this size can be ! altered when building the server). You can use this to estimate space ! requirements for WAL. ! Ordinarily, when old log segment files are no longer needed, they ! are recycled (that is, renamed to become future segments in the numbered ! sequence). If, due to a short-term peak of log output rate, there ! are more than 3 * checkpoint_segments + 1 ! segment files, the unneeded segment files will be deleted instead ! of recycled until the system gets back under this limit. --- 545,577 ---- ! The number of WAL segment files in pg_xlog directory depends on ! checkpoint_wal_size, wal_recycle_min_size and the ! amount of WAL generated in previous checkpoint cycles. When old log ! segment files are no longer needed, they are removed or recycled (that is, ! renamed to become future segments in the numbered sequence). If, due to a ! short-term peak of log output rate, checkpoint_wal_size is ! exceeded, the unneeded segment files will be removed until the system ! gets back under this limit. Below that limit, the system recycles enough ! WAL files to cover the estimated need until the next checkpoint, and ! removes the rest. The estimate is based on a moving average of the number ! of WAL files used in previous checkpoint cycles. The moving average ! is increased immediately if the actual usage exceeds the estimate, so it ! accommodates peak usage rather average usage to some extent. ! wal_recycle_min_size puts a minimum on the amount of WAL files ! recycled for future usage; that much WAL is always recycled for future use, ! even if the system is idle and the WAL usage estimate suggests that little ! WAL is needed. ! ! ! ! Independently of checkpoint_wal_size, ! + 1 most recent WAL files are ! kept at all times. Also, if WAL archiving is used, old segments can not be ! removed or recycled until they are archived. If WAL archiving cannot keep up ! with the pace that WAL is generated, or if archive_command ! fails repeatedly, old WAL files will accumulate in pg_xlog ! until the situation is resolved. *************** *** 570,578 **** master because restartpoints can only be performed at checkpoint records. A restartpoint is triggered when a checkpoint record is reached if at least checkpoint_timeout seconds have passed since the last ! restartpoint. In standby mode, a restartpoint is also triggered if at ! least checkpoint_segments log segments have been replayed ! since the last restartpoint. --- 586,593 ---- master because restartpoints can only be performed at checkpoint records. A restartpoint is triggered when a checkpoint record is reached if at least checkpoint_timeout seconds have passed since the last ! restartpoint, or if WAL size is about to exceed ! checkpoint_wal_size. *** a/src/backend/access/transam/xlog.c --- b/src/backend/access/transam/xlog.c *************** *** 71,77 **** extern uint32 bootstrap_data_checksum_version; /* User-settable parameters */ ! int CheckPointSegments = 3; int wal_keep_segments = 0; int XLOGbuffers = -1; int XLogArchiveTimeout = 0; --- 71,78 ---- /* User-settable parameters */ ! int checkpoint_wal_size = 262144; /* 256 MB */ ! int min_recycle_wal_size = 81920; /* 80 MB */ int wal_keep_segments = 0; int XLOGbuffers = -1; int XLogArchiveTimeout = 0; *************** *** 86,108 **** int CommitDelay = 0; /* precommit delay in microseconds */ int CommitSiblings = 5; /* # concurrent xacts needed to sleep */ int num_xloginsert_slots = 8; #ifdef WAL_DEBUG bool XLOG_DEBUG = false; #endif ! /* ! * XLOGfileslop is the maximum number of preallocated future XLOG segments. ! * When we are done with an old XLOG segment file, we will recycle it as a ! * future XLOG segment as long as there aren't already XLOGfileslop future ! * segments; else we'll delete it. This could be made a separate GUC ! * variable, but at present I think it's sufficient to hardwire it as ! * 2*CheckPointSegments+1. Under normal conditions, a checkpoint will free ! * no more than 2*CheckPointSegments log segments, and we want to recycle all ! * of them; the +1 allows boundary cases to happen without wasting a ! * delete/create-segment cycle. ! */ ! #define XLOGfileslop (2*CheckPointSegments + 1) ! /* * GUC support --- 87,105 ---- int CommitSiblings = 5; /* # concurrent xacts needed to sleep */ int num_xloginsert_slots = 8; + /* + * Max distance from last checkpoint, before triggering a new xlog-based + * checkpoint. + */ + int CheckPointSegments; + #ifdef WAL_DEBUG bool XLOG_DEBUG = false; #endif ! /* Estimated distance between checkpoints, in bytes */ ! static double CheckPointDistanceEstimate = 0; ! static double PrevCheckPointDistance = 0; /* * GUC support *************** *** 740,746 **** static void AdvanceXLInsertBuffer(XLogRecPtr upto, bool opportunistic); static bool XLogCheckpointNeeded(XLogSegNo new_segno); static void XLogWrite(XLogwrtRqst WriteRqst, bool flexible); static bool InstallXLogFileSegment(XLogSegNo *segno, char *tmppath, ! bool find_free, int *max_advance, bool use_lock); static int XLogFileRead(XLogSegNo segno, int emode, TimeLineID tli, int source, bool notexistOk); --- 737,743 ---- static bool XLogCheckpointNeeded(XLogSegNo new_segno); static void XLogWrite(XLogwrtRqst WriteRqst, bool flexible); static bool InstallXLogFileSegment(XLogSegNo *segno, char *tmppath, ! bool find_free, XLogSegNo max_segno, bool use_lock); static int XLogFileRead(XLogSegNo segno, int emode, TimeLineID tli, int source, bool notexistOk); *************** *** 753,759 **** static bool WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess, static int emode_for_corrupt_record(int emode, XLogRecPtr RecPtr); static void XLogFileClose(void); static void PreallocXlogFiles(XLogRecPtr endptr); ! static void RemoveOldXlogFiles(XLogSegNo segno, XLogRecPtr endptr); static void UpdateLastRemovedPtr(char *filename); static void ValidateXLOGDirectoryStructure(void); static void CleanupBackupHistory(void); --- 750,756 ---- static int emode_for_corrupt_record(int emode, XLogRecPtr RecPtr); static void XLogFileClose(void); static void PreallocXlogFiles(XLogRecPtr endptr); ! static void RemoveOldXlogFiles(XLogSegNo segno, XLogRecPtr PriorRedoPtr, XLogRecPtr endptr); static void UpdateLastRemovedPtr(char *filename); static void ValidateXLOGDirectoryStructure(void); static void CleanupBackupHistory(void); *************** *** 2548,2553 **** AdvanceXLInsertBuffer(XLogRecPtr upto, bool opportunistic) --- 2545,2653 ---- } /* + * Calculate CheckPointSegments based on checkpoint_wal_size and + * checkpoint_completion_target. + */ + static void + CalculateCheckpointSegments(void) + { + double target; + + /*------- + * Calculate the distance at which to trigger a checkpoint, to avoid + * exceeding checkpoint_wal_size. This is based on two assumptions: + * + * a) we keep WAL for two checkpoint cycles, back to the "prev" checkpoint. + * b) during checkpoint, we consume checkpoint_completion_target * + * number of segments consumed between checkpoints. + *------- + */ + target = (double ) checkpoint_wal_size / (double) (XLOG_SEG_SIZE / 1024); + target = target / (2.0 + CheckPointCompletionTarget); + + /* round down */ + CheckPointSegments = (int) target; + + if (CheckPointSegments < 1) + CheckPointSegments = 1; + } + + void + assign_checkpoint_wal_size(int newval, void *extra) + { + checkpoint_wal_size = newval; + CalculateCheckpointSegments(); + } + + void + assign_checkpoint_completion_target(double newval, void *extra) + { + CheckPointCompletionTarget = newval; + CalculateCheckpointSegments(); + } + + /* + * At a checkpoint, how many WAL segments to recycle as preallocated future + * XLOG segments? Returns the highest segment that should be preallocated. + */ + static XLogSegNo + XLOGfileslop(XLogRecPtr PriorRedoPtr) + { + double nsegments; + XLogSegNo minSegNo; + XLogSegNo maxSegNo; + double distance; + XLogSegNo recycleSegNo; + + /* + * Calculate the segment numbers that min_recycle_wal_size and + * checkpoint_wal_size correspond to. Always recycle enough segments + * to meet the minimum, and remove enough segments to stay below the + * maximum. + */ + nsegments = (double) min_recycle_wal_size / (double) (XLOG_SEG_SIZE / 1024); + minSegNo = PriorRedoPtr / XLOG_SEG_SIZE + (int) nsegments; + nsegments = (double) checkpoint_wal_size / (double) (XLOG_SEG_SIZE / 1024); + maxSegNo = PriorRedoPtr / XLOG_SEG_SIZE + (int) nsegments; + + /* + * Between those limits, recycle enough segments to get us through to the + * estimated end of next checkpoint. + * + * To estimate where the next checkpoint will finish, assume that the + * system runs steadily consuming CheckPointDistanceEstimate + * bytes between every checkpoint. + * + * The reason this calculation is done from the prior checkpoint, not the + * one that just finished, is that this behaves better if some checkpoint + * cycles are abnormally short, like if you perform a manual checkpoint + * right after a timed one. The manual checkpoint will make almost a full + * cycle's worth of WAL segments available for recycling, because the + * segments from the prior's prior, fully-sized checkpoint cycle are no + * longer needed. However, the next checkpoint will make only few segments + * available for recycling, the ones generated between the timed + * checkpoint and the manual one right after that. If at the manual + * checkpoint we only retained enough segments to get us to the next timed + * one, and removed the rest, then at the next checkpoint we would not + * have enough segments around for recycling, to get us to the checkpoint + * after that. Basing the calculations on the distance from the prior redo + * pointer largely fixes that problem. + */ + distance = (2.0 + CheckPointCompletionTarget) * CheckPointDistanceEstimate; + /* add 10% for good measure. */ + distance *= 1.10; + + recycleSegNo = (XLogSegNo) ceil(((double) PriorRedoPtr + distance) / XLOG_SEG_SIZE); + + if (recycleSegNo < minSegNo) + recycleSegNo = minSegNo; + if (recycleSegNo > maxSegNo) + recycleSegNo = maxSegNo; + + return recycleSegNo; + } + + /* * Check whether we've consumed enough xlog space that a checkpoint is needed. * * new_segno indicates a log file that has just been filled up (or read *************** *** 3345,3351 **** XLogFileInit(XLogSegNo logsegno, bool *use_existent, bool use_lock) char path[MAXPGPATH]; char tmppath[MAXPGPATH]; XLogSegNo installed_segno; ! int max_advance; int fd; bool zero_fill = true; --- 3445,3451 ---- char path[MAXPGPATH]; char tmppath[MAXPGPATH]; XLogSegNo installed_segno; ! XLogSegNo max_segno; int fd; bool zero_fill = true; *************** *** 3472,3480 **** XLogFileInit(XLogSegNo logsegno, bool *use_existent, bool use_lock) * pre-create a future log segment. */ installed_segno = logsegno; ! max_advance = XLOGfileslop; if (!InstallXLogFileSegment(&installed_segno, tmppath, ! *use_existent, &max_advance, use_lock)) { /* --- 3572,3590 ---- * pre-create a future log segment. */ installed_segno = logsegno; ! ! /* ! * XXX: What should we use as max_segno? We used to use XLOGfileslop when ! * that was a constant, but that was always a bit dubious: normally, at a ! * checkpoint, XLOGfileslop was the offset from the checkpoint record, ! * but here, it was the offset from the insert location. We can't do the ! * normal XLOGfileslop calculation here because we don't have access to ! * the prior checkpoint's redo location. So somewhat arbitrarily, just ! * use CheckPointSegments. ! */ ! max_segno = logsegno + CheckPointSegments; if (!InstallXLogFileSegment(&installed_segno, tmppath, ! *use_existent, max_segno, use_lock)) { /* *************** *** 3597,3603 **** XLogFileCopy(XLogSegNo destsegno, TimeLineID srcTLI, XLogSegNo srcsegno) /* * Now move the segment into place with its final name. */ ! if (!InstallXLogFileSegment(&destsegno, tmppath, false, NULL, false)) elog(ERROR, "InstallXLogFileSegment should not have failed"); } --- 3707,3713 ---- /* * Now move the segment into place with its final name. */ ! if (!InstallXLogFileSegment(&destsegno, tmppath, false, 0, false)) elog(ERROR, "InstallXLogFileSegment should not have failed"); } *************** *** 3617,3638 **** XLogFileCopy(XLogSegNo destsegno, TimeLineID srcTLI, XLogSegNo srcsegno) * number at or after the passed numbers. If FALSE, install the new segment * exactly where specified, deleting any existing segment file there. * ! * *max_advance: maximum number of segno slots to advance past the starting ! * point. Fail if no free slot is found in this range. On return, reduced ! * by the number of slots skipped over. (Irrelevant, and may be NULL, ! * when find_free is FALSE.) * * use_lock: if TRUE, acquire ControlFileLock while moving file into * place. This should be TRUE except during bootstrap log creation. The * caller must *not* hold the lock at call. * * Returns TRUE if the file was installed successfully. FALSE indicates that ! * max_advance limit was exceeded, or an error occurred while renaming the * file into place. */ static bool InstallXLogFileSegment(XLogSegNo *segno, char *tmppath, ! bool find_free, int *max_advance, bool use_lock) { char path[MAXPGPATH]; --- 3727,3747 ---- * number at or after the passed numbers. If FALSE, install the new segment * exactly where specified, deleting any existing segment file there. * ! * max_segno: maximum segment number to install the new file as. Fail if no ! * free slot is found between *segno and max_segno. (Ignored when find_free ! * is FALSE.) * * use_lock: if TRUE, acquire ControlFileLock while moving file into * place. This should be TRUE except during bootstrap log creation. The * caller must *not* hold the lock at call. * * Returns TRUE if the file was installed successfully. FALSE indicates that ! * max_segno limit was exceeded, or an error occurred while renaming the * file into place. */ static bool InstallXLogFileSegment(XLogSegNo *segno, char *tmppath, ! bool find_free, XLogSegNo max_segno, bool use_lock) { char path[MAXPGPATH]; *************** *** 3656,3662 **** InstallXLogFileSegment(XLogSegNo *segno, char *tmppath, /* Find a free slot to put it in */ while (stat(path, &stat_buf) == 0) { ! if (*max_advance <= 0) { /* Failed to find a free slot within specified range */ if (use_lock) --- 3765,3771 ---- /* Find a free slot to put it in */ while (stat(path, &stat_buf) == 0) { ! if ((*segno) >= max_segno) { /* Failed to find a free slot within specified range */ if (use_lock) *************** *** 3664,3670 **** InstallXLogFileSegment(XLogSegNo *segno, char *tmppath, return false; } (*segno)++; - (*max_advance)--; XLogFilePath(path, ThisTimeLineID, *segno); } } --- 3773,3778 ---- *************** *** 3997,4010 **** UpdateLastRemovedPtr(char *filename) /* * Recycle or remove all log files older or equal to passed segno * ! * endptr is current (or recent) end of xlog; this is used to determine * whether we want to recycle rather than delete no-longer-wanted log files. */ static void ! RemoveOldXlogFiles(XLogSegNo segno, XLogRecPtr endptr) { XLogSegNo endlogSegNo; ! int max_advance; DIR *xldir; struct dirent *xlde; char lastoff[MAXFNAMELEN]; --- 4105,4119 ---- /* * Recycle or remove all log files older or equal to passed segno * ! * endptr is current (or recent) end of xlog, and PriorRedoRecPtr is the ! * redo pointer of the previous checkpoint. These are used to determine * whether we want to recycle rather than delete no-longer-wanted log files. */ static void ! RemoveOldXlogFiles(XLogSegNo segno, XLogRecPtr PriorRedoPtr, XLogRecPtr endptr) { XLogSegNo endlogSegNo; ! XLogSegNo recycleSegNo; DIR *xldir; struct dirent *xlde; char lastoff[MAXFNAMELEN]; *************** *** 4016,4026 **** RemoveOldXlogFiles(XLogSegNo segno, XLogRecPtr endptr) struct stat statbuf; /* ! * Initialize info about where to try to recycle to. We allow recycling ! * segments up to XLOGfileslop segments beyond the current XLOG location. */ XLByteToPrevSeg(endptr, endlogSegNo); ! max_advance = XLOGfileslop; xldir = AllocateDir(XLOGDIR); if (xldir == NULL) --- 4125,4134 ---- struct stat statbuf; /* ! * Initialize info about where to try to recycle to. */ XLByteToPrevSeg(endptr, endlogSegNo); ! recycleSegNo = XLOGfileslop(PriorRedoPtr); xldir = AllocateDir(XLOGDIR); if (xldir == NULL) *************** *** 4069,4088 **** RemoveOldXlogFiles(XLogSegNo segno, XLogRecPtr endptr) * for example can create symbolic links pointing to a * separate archive directory. */ ! if (lstat(path, &statbuf) == 0 && S_ISREG(statbuf.st_mode) && InstallXLogFileSegment(&endlogSegNo, path, ! true, &max_advance, true)) { ereport(DEBUG2, (errmsg("recycled transaction log file \"%s\"", xlde->d_name))); CheckpointStats.ckpt_segs_recycled++; /* Needn't recheck that slot on future iterations */ ! if (max_advance > 0) ! { ! endlogSegNo++; ! max_advance--; ! } } else { --- 4177,4193 ---- * for example can create symbolic links pointing to a * separate archive directory. */ ! if (endlogSegNo <= recycleSegNo && ! lstat(path, &statbuf) == 0 && S_ISREG(statbuf.st_mode) && InstallXLogFileSegment(&endlogSegNo, path, ! true, recycleSegNo, true)) { ereport(DEBUG2, (errmsg("recycled transaction log file \"%s\"", xlde->d_name))); CheckpointStats.ckpt_segs_recycled++; /* Needn't recheck that slot on future iterations */ ! endlogSegNo++; } else { *************** *** 7863,7869 **** LogCheckpointEnd(bool restartpoint) elog(LOG, "restartpoint complete: wrote %d buffers (%.1f%%); " "%d transaction log file(s) added, %d removed, %d recycled; " "write=%ld.%03d s, sync=%ld.%03d s, total=%ld.%03d s; " ! "sync files=%d, longest=%ld.%03d s, average=%ld.%03d s", CheckpointStats.ckpt_bufs_written, (double) CheckpointStats.ckpt_bufs_written * 100 / NBuffers, CheckpointStats.ckpt_segs_added, --- 7968,7975 ---- elog(LOG, "restartpoint complete: wrote %d buffers (%.1f%%); " "%d transaction log file(s) added, %d removed, %d recycled; " "write=%ld.%03d s, sync=%ld.%03d s, total=%ld.%03d s; " ! "sync files=%d, longest=%ld.%03d s, average=%ld.%03d s; " ! "distance=%d KB, estimate=%d KB", CheckpointStats.ckpt_bufs_written, (double) CheckpointStats.ckpt_bufs_written * 100 / NBuffers, CheckpointStats.ckpt_segs_added, *************** *** 7874,7885 **** LogCheckpointEnd(bool restartpoint) total_secs, total_usecs / 1000, CheckpointStats.ckpt_sync_rels, longest_secs, longest_usecs / 1000, ! average_secs, average_usecs / 1000); else elog(LOG, "checkpoint complete: wrote %d buffers (%.1f%%); " "%d transaction log file(s) added, %d removed, %d recycled; " "write=%ld.%03d s, sync=%ld.%03d s, total=%ld.%03d s; " ! "sync files=%d, longest=%ld.%03d s, average=%ld.%03d s", CheckpointStats.ckpt_bufs_written, (double) CheckpointStats.ckpt_bufs_written * 100 / NBuffers, CheckpointStats.ckpt_segs_added, --- 7980,7994 ---- total_secs, total_usecs / 1000, CheckpointStats.ckpt_sync_rels, longest_secs, longest_usecs / 1000, ! average_secs, average_usecs / 1000, ! (int) (PrevCheckPointDistance / 1024.0), ! (int) (CheckPointDistanceEstimate / 1024.0)); else elog(LOG, "checkpoint complete: wrote %d buffers (%.1f%%); " "%d transaction log file(s) added, %d removed, %d recycled; " "write=%ld.%03d s, sync=%ld.%03d s, total=%ld.%03d s; " ! "sync files=%d, longest=%ld.%03d s, average=%ld.%03d s; " ! "distance=%d KB, estimate=%d KB", CheckpointStats.ckpt_bufs_written, (double) CheckpointStats.ckpt_bufs_written * 100 / NBuffers, CheckpointStats.ckpt_segs_added, *************** *** 7890,7896 **** LogCheckpointEnd(bool restartpoint) total_secs, total_usecs / 1000, CheckpointStats.ckpt_sync_rels, longest_secs, longest_usecs / 1000, ! average_secs, average_usecs / 1000); } /* --- 7999,8046 ---- total_secs, total_usecs / 1000, CheckpointStats.ckpt_sync_rels, longest_secs, longest_usecs / 1000, ! average_secs, average_usecs / 1000, ! (int) (PrevCheckPointDistance / 1024.0), ! (int) (CheckPointDistanceEstimate / 1024.0)); ! } ! ! /* ! * Update the estimate of distance between checkpoints. ! * ! * The estimate is used to calculate the number of WAL segments to keep ! * preallocated, see XLOGFileSlop(). ! */ ! static void ! UpdateCheckPointDistanceEstimate(uint64 nbytes) ! { ! /* ! * To estimate the number of segments consumed between checkpoints, keep ! * a moving average of the actual number of segments consumed in previous ! * checkpoint cycles. However, if the load is bursty, with quiet periods ! * and busy periods, we want to cater for the peak load. So instead of a ! * plain moving average, let the average decline slowly if the previous ! * cycle used less WAL than estimated, but bump it up immediately if it ! * used more. ! * ! * When checkpoints are triggered by checkpoint_wal_size, this should ! * converge to CheckpointSegments * XLOG_SEG_SIZE, ! * ! * Note: This doesn't pay any attention to what caused the checkpoint. ! * Checkpoints triggered manually with CHECKPOINT command, or by e.g ! * starting a base backup, are counted the same as those created ! * automatically. The slow-decline will largely mask them out, if they are ! * not frequent. If they are frequent, it seems reasonable to count them ! * in as any others; if you issue a manual checkpoint every 5 minutes and ! * never let a timed checkpoint happen, it makes sense to base the ! * preallocation on that 5 minute interval rather than whatever ! * checkpoint_timeout is set to. ! */ ! PrevCheckPointDistance = nbytes; ! if (CheckPointDistanceEstimate < nbytes) ! CheckPointDistanceEstimate = nbytes; ! else ! CheckPointDistanceEstimate = ! (0.90 * CheckPointDistanceEstimate + 0.10 * (double) nbytes); } /* *************** *** 7932,7938 **** CreateCheckPoint(int flags) XLogCtlInsert *Insert = &XLogCtl->Insert; XLogRecData rdata; uint32 freespace; ! XLogSegNo _logSegNo; XLogRecPtr curInsert; VirtualTransactionId *vxids; int nvxids; --- 8082,8088 ---- XLogCtlInsert *Insert = &XLogCtl->Insert; XLogRecData rdata; uint32 freespace; ! XLogRecPtr PriorRedoPtr; XLogRecPtr curInsert; VirtualTransactionId *vxids; int nvxids; *************** *** 8237,8246 **** CreateCheckPoint(int flags) (errmsg("concurrent transaction log activity while database system is shutting down"))); /* ! * Select point at which we can truncate the log, which we base on the ! * prior checkpoint's earliest info. */ ! XLByteToSeg(ControlFile->checkPointCopy.redo, _logSegNo); /* * Update the control file. --- 8387,8396 ---- (errmsg("concurrent transaction log activity while database system is shutting down"))); /* ! * Remember the prior checkpoint's redo pointer, used later to determine ! * the point where the log can be truncated. */ ! PriorRedoPtr = ControlFile->checkPointCopy.redo; /* * Update the control file. *************** *** 8294,8304 **** CreateCheckPoint(int flags) * Delete old log files (those no longer needed even for previous * checkpoint or the standbys in XLOG streaming). */ ! if (_logSegNo) { KeepLogSeg(recptr, &_logSegNo); _logSegNo--; ! RemoveOldXlogFiles(_logSegNo, recptr); } /* --- 8444,8460 ---- * Delete old log files (those no longer needed even for previous * checkpoint or the standbys in XLOG streaming). */ ! if (PriorRedoPtr != InvalidXLogRecPtr) { + XLogSegNo _logSegNo; + + /* Update the average distance between checkpoints. */ + UpdateCheckPointDistanceEstimate(RedoRecPtr - PriorRedoPtr); + + XLByteToSeg(PriorRedoPtr, _logSegNo); KeepLogSeg(recptr, &_logSegNo); _logSegNo--; ! RemoveOldXlogFiles(_logSegNo, PriorRedoPtr, recptr); } /* *************** *** 8486,8492 **** CreateRestartPoint(int flags) { XLogRecPtr lastCheckPointRecPtr; CheckPoint lastCheckPoint; ! XLogSegNo _logSegNo; TimestampTz xtime; /* use volatile pointer to prevent code rearrangement */ --- 8642,8648 ---- { XLogRecPtr lastCheckPointRecPtr; CheckPoint lastCheckPoint; ! XLogRecPtr PriorRedoPtr; TimestampTz xtime; /* use volatile pointer to prevent code rearrangement */ *************** *** 8554,8560 **** CreateRestartPoint(int flags) /* * Update the shared RedoRecPtr so that the startup process can calculate * the number of segments replayed since last restartpoint, and request a ! * restartpoint if it exceeds checkpoint_segments. * * Like in CreateCheckPoint(), hold off insertions to update it, although * during recovery this is just pro forma, because no WAL insertions are --- 8710,8716 ---- /* * Update the shared RedoRecPtr so that the startup process can calculate * the number of segments replayed since last restartpoint, and request a ! * restartpoint if it exceeds CheckPointSegments. * * Like in CreateCheckPoint(), hold off insertions to update it, although * during recovery this is just pro forma, because no WAL insertions are *************** *** 8585,8594 **** CreateRestartPoint(int flags) CheckPointGuts(lastCheckPoint.redo, flags); /* ! * Select point at which we can truncate the xlog, which we base on the ! * prior checkpoint's earliest info. */ ! XLByteToSeg(ControlFile->checkPointCopy.redo, _logSegNo); /* * Update pg_control, using current time. Check that it still shows --- 8741,8750 ---- CheckPointGuts(lastCheckPoint.redo, flags); /* ! * Remember the prior checkpoint's redo pointer, used later to determine ! * the point at which we can truncate the log. */ ! PriorRedoPtr = ControlFile->checkPointCopy.redo; /* * Update pg_control, using current time. Check that it still shows *************** *** 8615,8626 **** CreateRestartPoint(int flags) * checkpoint/restartpoint) to prevent the disk holding the xlog from * growing full. */ ! if (_logSegNo) { XLogRecPtr receivePtr; XLogRecPtr replayPtr; TimeLineID replayTLI; XLogRecPtr endptr; /* * Get the current end of xlog replayed or received, whichever is --- 8771,8785 ---- * checkpoint/restartpoint) to prevent the disk holding the xlog from * growing full. */ ! if (PriorRedoPtr != InvalidXLogRecPtr) { XLogRecPtr receivePtr; XLogRecPtr replayPtr; TimeLineID replayTLI; XLogRecPtr endptr; + XLogSegNo _logSegNo; + + XLByteToSeg(PriorRedoPtr, _logSegNo); /* * Get the current end of xlog replayed or received, whichever is *************** *** 8649,8655 **** CreateRestartPoint(int flags) if (RecoveryInProgress()) ThisTimeLineID = replayTLI; ! RemoveOldXlogFiles(_logSegNo, endptr); /* * Make more log segments if needed. (Do this after recycling old log --- 8808,8814 ---- if (RecoveryInProgress()) ThisTimeLineID = replayTLI; ! RemoveOldXlogFiles(_logSegNo, PriorRedoPtr, endptr); /* * Make more log segments if needed. (Do this after recycling old log *** a/src/backend/postmaster/checkpointer.c --- b/src/backend/postmaster/checkpointer.c *************** *** 482,488 **** CheckpointerMain(void) "checkpoints are occurring too frequently (%d seconds apart)", elapsed_secs, elapsed_secs), ! errhint("Consider increasing the configuration parameter \"checkpoint_segments\"."))); /* * Initialize checkpointer-private variables used during --- 482,488 ---- "checkpoints are occurring too frequently (%d seconds apart)", elapsed_secs, elapsed_secs), ! errhint("Consider increasing the configuration parameter \"checkpoint_wal_size\"."))); /* * Initialize checkpointer-private variables used during *************** *** 760,770 **** IsCheckpointOnSchedule(double progress) return false; /* ! * Check progress against WAL segments written and checkpoint_segments. * * We compare the current WAL insert location against the location * computed before calling CreateCheckPoint. The code in XLogInsert that ! * actually triggers a checkpoint when checkpoint_segments is exceeded * compares against RedoRecptr, so this is not completely accurate. * However, it's good enough for our purposes, we're only calculating an * estimate anyway. --- 760,770 ---- return false; /* ! * Check progress against WAL segments written and CheckPointSegments. * * We compare the current WAL insert location against the location * computed before calling CreateCheckPoint. The code in XLogInsert that ! * actually triggers a checkpoint when CheckPointSegments is exceeded * compares against RedoRecptr, so this is not completely accurate. * However, it's good enough for our purposes, we're only calculating an * estimate anyway. *** a/src/backend/utils/misc/guc.c --- b/src/backend/utils/misc/guc.c *************** *** 1981,1996 **** static struct config_int ConfigureNamesInt[] = }, { ! {"checkpoint_segments", PGC_SIGHUP, WAL_CHECKPOINTS, ! gettext_noop("Sets the maximum distance in log segments between automatic WAL checkpoints."), ! NULL }, ! &CheckPointSegments, ! 3, 1, INT_MAX, NULL, NULL, NULL }, { {"checkpoint_timeout", PGC_SIGHUP, WAL_CHECKPOINTS, gettext_noop("Sets the maximum time between automatic WAL checkpoints."), NULL, --- 1981,2008 ---- }, { ! {"min_recycle_wal_size", PGC_SIGHUP, WAL_CHECKPOINTS, ! gettext_noop("Sets the minimum size to shrink the WAL to."), ! NULL, ! GUC_UNIT_KB }, ! &min_recycle_wal_size, ! 81920, 32768, INT_MAX, NULL, NULL, NULL }, { + {"checkpoint_wal_size", PGC_SIGHUP, WAL_CHECKPOINTS, + gettext_noop("Sets the maximum WAL size that triggers a checkpoint."), + NULL, + GUC_UNIT_KB + }, + &checkpoint_wal_size, + 262144, 32768, INT_MAX, + NULL, assign_checkpoint_wal_size, NULL + }, + + { {"checkpoint_timeout", PGC_SIGHUP, WAL_CHECKPOINTS, gettext_noop("Sets the maximum time between automatic WAL checkpoints."), NULL, *************** *** 2573,2579 **** static struct config_real ConfigureNamesReal[] = }, &CheckPointCompletionTarget, 0.5, 0.0, 1.0, ! NULL, NULL, NULL }, /* End-of-list marker */ --- 2585,2591 ---- }, &CheckPointCompletionTarget, 0.5, 0.0, 1.0, ! NULL, assign_checkpoint_completion_target, NULL }, /* End-of-list marker */ *** a/src/include/access/xlog.h --- b/src/include/access/xlog.h *************** *** 181,187 **** extern XLogRecPtr XactLastRecEnd; extern bool reachedConsistency; /* these variables are GUC parameters related to XLOG */ ! extern int CheckPointSegments; extern int wal_keep_segments; extern int XLOGbuffers; extern int XLogArchiveTimeout; --- 181,188 ---- extern bool reachedConsistency; /* these variables are GUC parameters related to XLOG */ ! extern int min_recycle_wal_size; ! extern int checkpoint_wal_size; extern int wal_keep_segments; extern int XLOGbuffers; extern int XLogArchiveTimeout; *************** *** 192,197 **** extern bool fullPageWrites; --- 193,200 ---- extern bool log_checkpoints; extern int num_xloginsert_slots; + extern int CheckPointSegments; + /* WAL levels */ typedef enum WalLevel { *************** *** 319,324 **** extern bool CheckPromoteSignal(void); --- 322,330 ---- extern void WakeupRecovery(void); extern void SetWalWriterSleeping(bool sleeping); + extern void assign_checkpoint_wal_size(int newval, void *extra); + extern void assign_checkpoint_completion_target(double newval, void *extra); + /* * Starting/stopping a base backup */