Making a Base Backup from Standby Database

*** a/doc/src/sgml/backup.sgml --- b/doc/src/sgml/backup.sgml *************** *** 939,944 **** SELECT pg_stop_backup(); --- 939,1004 ---- + + Making a Base Backup from Standby Database + + + It's possible to make a base backup during recovery. Which allows a user + to take a base backup from the standby to offload the expense of + periodic backups from the master. Its procedure is similar to that + during normal running. All these steps must be performed on the standby. + + + + Ensure that hot standby is enabled (see + for more information). + + + + + Connect to the database as a superuser and execute pg_start_backup. + This performs a restartpoint if there is at least one checkpoint record + replayed since last restartpoint. + + + + + Perform a file system backup. + + + + + Copy the pg_control file from the cluster directory to the global + sub-directory of the backup. For example: + + cp $PGDATA/global/pg_control /mnt/server/backupdir/global + + + + + + Again connect to the database as a superuser, and execute + pg_stop_backup. This terminates the backup mode, but does not + perform a switch to the next WAL segment, create a backup history file and + wait for all required WAL segments to be archived, + unlike that during normal processing. + + + + + + + You cannot use the pg_basebackup tool to take the backup + from the standby. + + + It's not possible to make a base backup from the server in recovery mode + when reading WAL written during a period when full_page_writes + was disabled. If you want to take a base backup from the standby, + full_page_writes must be set to true on the master. + + + Recovering Using a Continuous Archive Backup *** a/doc/src/sgml/config.sgml --- b/doc/src/sgml/config.sgml *************** *** 1682,1687 **** SET ENABLE_SEQSCAN TO OFF; --- 1682,1695 ---- + WAL written while full_page_writes is disabled does not + contain enough information to make a base backup during recovery + (see ), + so full_page_writes must be enabled on the master + to take a backup from the standby. + + + This parameter can only be set in the postgresql.conf file or on the server command line. The default is on. *** a/doc/src/sgml/func.sgml --- b/doc/src/sgml/func.sgml *************** *** 14034,14040 **** SELECT set_config('log_statement_stats', 'off', false); The functions shown in assist in making on-line backups. ! These functions cannot be executed during recovery. --- 14034,14041 ---- The functions shown in assist in making on-line backups. ! These functions except pg_start_backup and pg_stop_backup ! cannot be executed during recovery.

*************** *** 14114,14120 **** SELECT set_config('log_statement_stats', 'off', false); database cluster's data directory, performs a checkpoint, and then returns the backup's starting transaction log location as text. The user can ignore this result value, but it is ! provided in case it is useful. postgres=# select pg_start_backup('label_goes_here'); pg_start_backup --- 14115,14123 ---- database cluster's data directory, performs a checkpoint, and then returns the backup's starting transaction log location as text. The user can ignore this result value, but it is ! provided in case it is useful. If pg_start_backup is ! executed during recovery, it performs a restartpoint rather than ! writing a new checkpoint. postgres=# select pg_start_backup('label_goes_here'); pg_start_backup *************** *** 14142,14147 **** postgres=# select pg_start_backup('label_goes_here'); --- 14145,14157 ---- + If pg_stop_backup is executed during recovery, it just + removes the label file, but doesn't create a backup history file and wait for + the ending transaction log file to be archived. The return value is equal to + or bigger than the exact backup's ending transaction log location. + + + pg_switch_xlog moves to the next transaction log file, allowing the current file to be archived (assuming you are using continuous archiving). The return value is the ending transaction log location + 1 within the just-completed transaction log file. *** a/src/backend/access/transam/xlog.c --- b/src/backend/access/transam/xlog.c *************** *** 158,163 **** HotStandbyState standbyState = STANDBY_DISABLED; --- 158,174 ---- static XLogRecPtr LastRec; /* + * During recovery, lastFullPageWrites keeps track of full_page_writes that + * the replayed WAL records indicate. It's initialized with full_page_writes + * that the recovery starting checkpoint record indicates, and then updated + * each time XLOG_FPW_CHANGE record is replayed. At the end of startup, + * if it's equal to full_page_writes in postgresql.conf, which means that + * full_page_writes has not been changed since last shutdown or crash, so + * in this case we skip writing an XLOG_FPW_CHANGE record. + */ + static bool lastFullPageWrites; + + /* * Local copy of SharedRecoveryInProgress variable. True actually means "not * known, need to check the shared state". */ *************** *** 356,361 **** typedef struct XLogCtlInsert --- 367,382 ---- bool forcePageWrites; /* forcing full-page writes for PITR? */ /* + * fullPageWrites is shared-memory copy of walwriter's or startup + * process' full_page_writes. All backends use this flag to determine + * whether to write full-page to WAL, instead of using process-local + * one. This is required because, when full_page_writes is changed + * by SIGHUP, we must WAL-log it before it actually affects + * WAL-logging by backends. + */ + bool fullPageWrites; + + /* * exclusiveBackup is true if a backup started with pg_start_backup() is * in progress, and nonExclusiveBackups is a counter indicating the number * of streaming base backups currently in progress. forcePageWrites is set *************** *** 453,458 **** typedef struct XLogCtlData --- 474,485 ---- /* Are we requested to pause recovery? */ bool recoveryPause; + /* + * lastFpwDisableRecPtr points to the start of the last replayed + * XLOG_FPW_CHANGE record that instructs full_page_writes is disabled. + */ + XLogRecPtr lastFpwDisableRecPtr; + slock_t info_lck; /* locks shared variables shown above */ } XLogCtlData; *************** *** 665,671 **** static void xlog_outrec(StringInfo buf, XLogRecord *record); #endif static void pg_start_backup_callback(int code, Datum arg); static bool read_backup_label(XLogRecPtr *checkPointLoc, ! bool *backupEndRequired); static void rm_redo_error_callback(void *arg); static int get_sync_bit(int method); --- 692,698 ---- #endif static void pg_start_backup_callback(int code, Datum arg); static bool read_backup_label(XLogRecPtr *checkPointLoc, ! bool *backupEndRequired, bool *backupDuringRecovery); static void rm_redo_error_callback(void *arg); static int get_sync_bit(int method); *************** *** 710,715 **** XLogInsert(RmgrId rmid, uint8 info, XLogRecData *rdata) --- 737,743 ---- bool updrqst; bool doPageWrites; bool isLogSwitch = (rmid == RM_XLOG_ID && info == XLOG_SWITCH); + bool fpwChange = (rmid == RM_XLOG_ID && info == XLOG_FPW_CHANGE); /* cross-check on whether we should be here or not */ if (!XLogInsertAllowed()) *************** *** 761,770 **** begin:; /* * Decide if we need to do full-page writes in this XLOG record: true if * full_page_writes is on or we have a PITR request for it. Since we ! * don't yet have the insert lock, forcePageWrites could change under us, ! * but we'll recheck it once we have the lock. */ ! doPageWrites = fullPageWrites || Insert->forcePageWrites; INIT_CRC32(rdata_crc); len = 0; --- 789,798 ---- /* * Decide if we need to do full-page writes in this XLOG record: true if * full_page_writes is on or we have a PITR request for it. Since we ! * don't yet have the insert lock, fullPageWrites and forcePageWrites ! * could change under us, but we'll recheck them once we have the lock. */ ! doPageWrites = Insert->fullPageWrites || Insert->forcePageWrites; INIT_CRC32(rdata_crc); len = 0; *************** *** 905,916 **** begin:; } /* ! * Also check to see if forcePageWrites was just turned on; if we weren't ! * already doing full-page writes then go back and recompute. (If it was ! * just turned off, we could recompute the record without full pages, but ! * we choose not to bother.) */ ! if (Insert->forcePageWrites && !doPageWrites) { /* Oops, must redo it with full-page data */ LWLockRelease(WALInsertLock); --- 933,944 ---- } /* ! * Also check to see if fullPageWrites or forcePageWrites was just turned on; ! * if we weren't already doing full-page writes then go back and recompute. ! * (If it was just turned off, we could recompute the record without full pages, ! * but we choose not to bother.) */ ! if ((Insert->fullPageWrites || Insert->forcePageWrites) && !doPageWrites) { /* Oops, must redo it with full-page data */ LWLockRelease(WALInsertLock); *************** *** 1224,1229 **** begin:; --- 1252,1266 ---- WriteRqst = XLogCtl->xlblocks[curridx]; } + /* + * If the record is an XLOG_FPW_CHANGE, we update full_page_writes + * in shared memory before releasing WALInsertLock. This ensures that + * an XLOG_FPW_CHANGE record precedes any WAL record affected + * by this parameter change. + */ + if (fpwChange) + Insert->fullPageWrites = fullPageWrites; + LWLockRelease(WALInsertLock); if (updrqst) *************** *** 5155,5160 **** BootStrapXLOG(void) --- 5192,5198 ---- checkPoint.redo.xlogid = 0; checkPoint.redo.xrecoff = XLogSegSize + SizeOfXLogLongPHD; checkPoint.ThisTimeLineID = ThisTimeLineID; + checkPoint.fullPageWrites = fullPageWrites; checkPoint.nextXidEpoch = 0; checkPoint.nextXid = FirstNormalTransactionId; checkPoint.nextOid = FirstBootstrapObjectId; *************** *** 6025,6030 **** StartupXLOG(void) --- 6063,6070 ---- uint32 freespace; TransactionId oldestActiveXID; bool backupEndRequired = false; + bool backupDuringRecovery = false; + DBState save_state; /* * Read control file and check XLOG status looks valid. *************** *** 6158,6164 **** StartupXLOG(void) if (StandbyMode) OwnLatch(&XLogCtl->recoveryWakeupLatch); ! if (read_backup_label(&checkPointLoc, &backupEndRequired)) { /* * When a backup_label file is present, we want to roll forward from --- 6198,6205 ---- if (StandbyMode) OwnLatch(&XLogCtl->recoveryWakeupLatch); ! if (read_backup_label(&checkPointLoc, &backupEndRequired, ! &backupDuringRecovery)) { /* * When a backup_label file is present, we want to roll forward from *************** *** 6274,6279 **** StartupXLOG(void) --- 6315,6322 ---- */ ThisTimeLineID = checkPoint.ThisTimeLineID; + lastFullPageWrites = checkPoint.fullPageWrites; + RedoRecPtr = XLogCtl->Insert.RedoRecPtr = checkPoint.redo; if (XLByteLT(RecPtr, checkPoint.redo)) *************** *** 6314,6319 **** StartupXLOG(void) --- 6357,6363 ---- * pg_control with any minimum recovery stop point obtained from a * backup history file. */ + save_state = ControlFile->state; if (InArchiveRecovery) ControlFile->state = DB_IN_ARCHIVE_RECOVERY; else *************** *** 6334,6345 **** StartupXLOG(void) } /* ! * set backupStartPoint if we're starting recovery from a base backup */ if (haveBackupLabel) { ControlFile->backupStartPoint = checkPoint.redo; ControlFile->backupEndRequired = backupEndRequired; } ControlFile->time = (pg_time_t) time(NULL); /* No need to hold ControlFileLock yet, we aren't up far enough */ --- 6378,6411 ---- } /* ! * Set backupStartPoint if we're starting recovery from a base backup. ! * ! * Set backupEndPoint if we're starting recovery from a base backup ! * which was taken from the server in recovery mode. We confirm ! * that minRecoveryPoint can be used as the backup end location by ! * checking whether the database system status in pg_control indicates ! * DB_IN_ARCHIVE_RECOVERY. If minRecoveryPoint is not available, ! * there is no way to know the backup end location, so we cannot ! * advance recovery any more. In this case, we have to cancel recovery ! * before changing the database system status in pg_control to ! * DB_IN_ARCHIVE_RECOVERY because otherwise subsequent ! * restarted recovery would go through this check wrongly. */ if (haveBackupLabel) { ControlFile->backupStartPoint = checkPoint.redo; ControlFile->backupEndRequired = backupEndRequired; + + if (backupDuringRecovery) + { + if (save_state != DB_IN_ARCHIVE_RECOVERY) + ereport(FATAL, + (errmsg("database system status mismatches between " + "pg_control and backup_label"), + errhint("This means that the backup is corrupted and you will " + "have to use another backup for recovery."))); + ControlFile->backupEndPoint = ControlFile->minRecoveryPoint; + } } ControlFile->time = (pg_time_t) time(NULL); /* No need to hold ControlFileLock yet, we aren't up far enough */ *************** *** 6625,6630 **** StartupXLOG(void) --- 6691,6718 ---- /* Pop the error context stack */ error_context_stack = errcontext.previous; + if (!XLogRecPtrIsInvalid(ControlFile->backupStartPoint) && + XLByteLE(ControlFile->backupEndPoint, EndRecPtr)) + { + /* + * We have reached the end of base backup, the point where + * the minimum recovery point in pg_control which was + * backed up just before pg_stop_backup() indicates. + * The data on disk is now consistent. Reset backupStartPoint + * and backupEndPoint. + */ + elog(DEBUG1, "end of backup reached"); + + LWLockAcquire(ControlFileLock, LW_EXCLUSIVE); + + MemSet(&ControlFile->backupStartPoint, 0, sizeof(XLogRecPtr)); + MemSet(&ControlFile->backupEndPoint, 0, sizeof(XLogRecPtr)); + ControlFile->backupEndRequired = false; + UpdateControlFile(); + + LWLockRelease(ControlFileLock); + } + /* * Update shared recoveryLastRecPtr after this record has been * replayed. *************** *** 6824,6829 **** StartupXLOG(void) --- 6912,6933 ---- /* Pre-scan prepared transactions to find out the range of XIDs present */ oldestActiveXID = PrescanPreparedTransactions(NULL, NULL); + /* + * Update full_page_writes in shared memory and write an + * XLOG_FPW_CHANGE record before resource manager writes cleanup + * WAL records or checkpoint record is written. + * + * Note that full_page_writes in shared memory is initialized with + * lastFullPageWrites so that UpdateFullPageWrites() can check whether + * it's equal to full_page_writes specified in postgresql.conf (i.e., whether + * full_page_writes has been changed since last shutdown or crash) and + * then skip writing an XLOG_FPW_CHANGE record if not. + */ + Insert->fullPageWrites = lastFullPageWrites; + LocalSetXLogInsertAllowed(); + UpdateFullPageWrites(); + LocalXLogInsertAllowed = -1; + if (InRecovery) { int rmid; *************** *** 7681,7686 **** CreateCheckPoint(int flags) --- 7785,7791 ---- LocalSetXLogInsertAllowed(); checkPoint.ThisTimeLineID = ThisTimeLineID; + checkPoint.fullPageWrites = Insert->fullPageWrites; /* * Compute new REDO record ptr = location of next XLOG record. *************** *** 8382,8387 **** XLogReportParameters(void) --- 8487,8534 ---- } /* + * Update full_page_writes in shared memory, and write an + * XLOG_FPW_CHANGE record if necessary. + */ + void + UpdateFullPageWrites(void) + { + XLogCtlInsert *Insert = &XLogCtl->Insert; + + /* + * Do nothing if full_page_writes has not been changed. + * + * It's safe to check the shared full_page_writes without the lock, + * because we can guarantee that there is no concurrently running + * process which can update it. + */ + if (fullPageWrites == Insert->fullPageWrites) + return; + + /* + * Write an XLOG_FPW_CHANGE record. This allows us to keep + * track of full_page_writes during archive recovery, if required. + */ + if (XLogStandbyInfoActive()) + { + XLogRecData rdata; + + rdata.data = (char *) (&fullPageWrites); + rdata.len = sizeof(bool); + rdata.buffer = InvalidBuffer; + rdata.next = NULL; + + XLogInsert(RM_XLOG_ID, XLOG_FPW_CHANGE, &rdata); + } + else + { + LWLockAcquire(WALInsertLock, LW_EXCLUSIVE); + Insert->fullPageWrites = fullPageWrites; + LWLockRelease(WALInsertLock); + } + } + + /* * XLOG resource manager's routines * * Definitions of info values are in include/catalog/pg_control.h, though *************** *** 8425,8431 **** xlog_redo(XLogRecPtr lsn, XLogRecord *record) * never arrive. */ if (InArchiveRecovery && ! !XLogRecPtrIsInvalid(ControlFile->backupStartPoint)) ereport(ERROR, (errmsg("online backup was canceled, recovery cannot continue"))); --- 8572,8579 ---- * never arrive. */ if (InArchiveRecovery && ! !XLogRecPtrIsInvalid(ControlFile->backupStartPoint) && ! XLogRecPtrIsInvalid(ControlFile->backupEndPoint)) ereport(ERROR, (errmsg("online backup was canceled, recovery cannot continue"))); *************** *** 8594,8599 **** xlog_redo(XLogRecPtr lsn, XLogRecord *record) --- 8742,8771 ---- /* Check to see if any changes to max_connections give problems */ CheckRequiredParameterValues(); } + else if (info == XLOG_FPW_CHANGE) + { + /* use volatile pointer to prevent code rearrangement */ + volatile XLogCtlData *xlogctl = XLogCtl; + bool fpw; + + memcpy(&fpw, XLogRecGetData(record), sizeof(bool)); + + /* + * Update the LSN of the last replayed XLOG_FPW_CHANGE record + * so that pg_start_backup() and pg_stop_backup() can check + * whether full_page_writes has been disabled during online backup. + */ + if (!fpw) + { + SpinLockAcquire(&xlogctl->info_lck); + if (XLByteLT(xlogctl->lastFpwDisableRecPtr, ReadRecPtr)) + xlogctl->lastFpwDisableRecPtr = ReadRecPtr; + SpinLockRelease(&xlogctl->info_lck); + } + + /* Keep track of full_page_writes */ + lastFullPageWrites = fpw; + } } void *************** *** 8607,8616 **** xlog_desc(StringInfo buf, uint8 xl_info, char *rec) CheckPoint *checkpoint = (CheckPoint *) rec; appendStringInfo(buf, "checkpoint: redo %X/%X; " ! "tli %u; xid %u/%u; oid %u; multi %u; offset %u; " "oldest xid %u in DB %u; oldest running xid %u; %s", checkpoint->redo.xlogid, checkpoint->redo.xrecoff, checkpoint->ThisTimeLineID, checkpoint->nextXidEpoch, checkpoint->nextXid, checkpoint->nextOid, checkpoint->nextMulti, --- 8779,8789 ---- CheckPoint *checkpoint = (CheckPoint *) rec; appendStringInfo(buf, "checkpoint: redo %X/%X; " ! "tli %u; fpw %s; xid %u/%u; oid %u; multi %u; offset %u; " "oldest xid %u in DB %u; oldest running xid %u; %s", checkpoint->redo.xlogid, checkpoint->redo.xrecoff, checkpoint->ThisTimeLineID, + checkpoint->fullPageWrites ? "true" : "false", checkpoint->nextXidEpoch, checkpoint->nextXid, checkpoint->nextOid, checkpoint->nextMulti, *************** *** 8675,8680 **** xlog_desc(StringInfo buf, uint8 xl_info, char *rec) --- 8848,8860 ---- xlrec.max_locks_per_xact, wal_level_str); } + else if (info == XLOG_FPW_CHANGE) + { + bool fpw; + + memcpy(&fpw, rec, sizeof(bool)); + appendStringInfo(buf, "full_page_writes: %s", fpw ? "true" : "false"); + } else appendStringInfo(buf, "UNKNOWN"); } *************** *** 8888,8893 **** XLogRecPtr --- 9068,9074 ---- do_pg_start_backup(const char *backupidstr, bool fast, char **labelfile) { bool exclusive = (labelfile == NULL); + bool recovery_in_progress = false; XLogRecPtr checkpointloc; XLogRecPtr startpoint; pg_time_t stamp_time; *************** *** 8899,8916 **** do_pg_start_backup(const char *backupidstr, bool fast, char **labelfile) FILE *fp; StringInfoData labelfbuf; if (!superuser() && !is_authenticated_user_replication_role()) ereport(ERROR, (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE), errmsg("must be superuser or replication role to run a backup"))); ! if (RecoveryInProgress()) ! ereport(ERROR, ! (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), ! errmsg("recovery is in progress"), ! errhint("WAL control functions cannot be executed during recovery."))); ! ! if (!XLogIsNeeded()) ereport(ERROR, (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), errmsg("WAL level not sufficient for making an online backup"), --- 9080,9099 ---- FILE *fp; StringInfoData labelfbuf; + recovery_in_progress = RecoveryInProgress(); + if (!superuser() && !is_authenticated_user_replication_role()) ereport(ERROR, (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE), errmsg("must be superuser or replication role to run a backup"))); ! /* ! * During recovery, we don't need to check WAL level. Because the fact that ! * we are now executing pg_start_backup() during recovery means that ! * wal_level is set to hot_standby on the master, i.e., WAL level is sufficient ! * for making an online backup. ! */ ! if (!recovery_in_progress && !XLogIsNeeded()) ereport(ERROR, (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), errmsg("WAL level not sufficient for making an online backup"), *************** *** 8932,8939 **** do_pg_start_backup(const char *backupidstr, bool fast, char **labelfile) * we won't have a history file covering the old timeline if pg_xlog * directory was not included in the base backup and the WAL archive was * cleared too before starting the backup. */ ! RequestXLogSwitch(); /* * Mark backup active in shared memory. We must do full-page WAL writes --- 9115,9127 ---- * we won't have a history file covering the old timeline if pg_xlog * directory was not included in the base backup and the WAL archive was * cleared too before starting the backup. + * + * During recovery, we skip forcing XLOG file switch, which means that + * the backup taken during recovery is not available for the special recovery + * case described above. */ ! if (!recovery_in_progress) ! RequestXLogSwitch(); /* * Mark backup active in shared memory. We must do full-page WAL writes *************** *** 8949,8954 **** do_pg_start_backup(const char *backupidstr, bool fast, char **labelfile) --- 9137,9145 ---- * since we expect that any pages not modified during the backup interval * must have been correctly captured by the backup.) * + * Note that forcePageWrites has no effect during an online backup from + * the server in recovery mode. + * * We must hold WALInsertLock to change the value of forcePageWrites, to * ensure adequate interlocking against XLogInsert(). */ *************** *** 8977,8988 **** do_pg_start_backup(const char *backupidstr, bool fast, char **labelfile) do { /* ! * Force a CHECKPOINT. Aside from being necessary to prevent torn * page problems, this guarantees that two successive backup runs * will have different checkpoint positions and hence different * history file names, even if nothing happened in between. * * We use CHECKPOINT_IMMEDIATE only if requested by user (via * passing fast = true). Otherwise this can take awhile. */ --- 9168,9189 ---- do { + bool checkpointfpw; + /* ! * Force a CHECKPOINT. Aside from being necessary to prevent torn * page problems, this guarantees that two successive backup runs * will have different checkpoint positions and hence different * history file names, even if nothing happened in between. * + * During recovery, establish a restartpoint if possible. We use the last + * restartpoint as the backup starting checkpoint. This means that two + * successive backup runs can have same checkpoint positions. + * + * Since the fact that we are executing pg_start_backup() during + * recovery means that bgwriter is running, we can use + * RequestCheckpoint() to establish a restartpoint. + * * We use CHECKPOINT_IMMEDIATE only if requested by user (via * passing fast = true). Otherwise this can take awhile. */ *************** *** 8998,9005 **** do_pg_start_backup(const char *backupidstr, bool fast, char **labelfile) --- 9199,9238 ---- LWLockAcquire(ControlFileLock, LW_SHARED); checkpointloc = ControlFile->checkPoint; startpoint = ControlFile->checkPointCopy.redo; + checkpointfpw = ControlFile->checkPointCopy.fullPageWrites; LWLockRelease(ControlFileLock); + if (recovery_in_progress) + { + /* use volatile pointer to prevent code rearrangement */ + volatile XLogCtlData *xlogctl = XLogCtl; + XLogRecPtr recptr; + + /* + * Check to see if all WAL replayed during online backup (i.e., + * since last restartpoint used as backup starting checkpoint) + * contain full-page writes. + */ + SpinLockAcquire(&xlogctl->info_lck); + recptr = xlogctl->lastFpwDisableRecPtr; + SpinLockRelease(&xlogctl->info_lck); + + if (!checkpointfpw || XLByteLE(startpoint, recptr)) + ereport(ERROR, + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), + errmsg("WAL generated with full_page_writes=off was replayed " + "since last restartpoint"))); + + /* + * During recovery, since we don't use the end-of-backup WAL + * record and don't write the backup history file, the starting WAL + * location doesn't need to be unique. This means that two base + * backups started at the same time might use the same checkpoint + * as starting locations. + */ + gotUniqueStartpoint = true; + } + /* * If two base backups are started at the same time (in WAL sender * processes), we need to make sure that they use different *************** *** 9039,9044 **** do_pg_start_backup(const char *backupidstr, bool fast, char **labelfile) --- 9272,9279 ---- checkpointloc.xlogid, checkpointloc.xrecoff); appendStringInfo(&labelfbuf, "BACKUP METHOD: %s\n", exclusive ? "pg_start_backup" : "streamed"); + appendStringInfo(&labelfbuf, "SYSTEM STATUS: %s\n", + recovery_in_progress ? "recovery" : "in production"); appendStringInfo(&labelfbuf, "START TIME: %s\n", strfbuf); appendStringInfo(&labelfbuf, "LABEL: %s\n", backupidstr); *************** *** 9133,9138 **** pg_start_backup_callback(int code, Datum arg) --- 9368,9375 ---- * history file at the beginning of archive recovery, but we now use the WAL * record for that and the file is for informational and debug purposes only. * + * During recovery, we only remove the backup label file. + * * Note: different from CancelBackup which just cancels online backup mode. */ Datum *************** *** 9159,9164 **** XLogRecPtr --- 9396,9402 ---- do_pg_stop_backup(char *labelfile, bool waitforarchive) { bool exclusive = (labelfile == NULL); + bool recovery_in_progress = false; XLogRecPtr startpoint; XLogRecPtr stoppoint; XLogRecData rdata; *************** *** 9169,9174 **** do_pg_stop_backup(char *labelfile, bool waitforarchive) --- 9407,9413 ---- char stopxlogfilename[MAXFNAMELEN]; char lastxlogfilename[MAXFNAMELEN]; char histfilename[MAXFNAMELEN]; + char systemstatus[20]; uint32 _logId; uint32 _logSeg; FILE *lfp; *************** *** 9178,9196 **** do_pg_stop_backup(char *labelfile, bool waitforarchive) int waits = 0; bool reported_waiting = false; char *remaining; if (!superuser() && !is_authenticated_user_replication_role()) ereport(ERROR, (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE), (errmsg("must be superuser or replication role to run a backup")))); ! if (RecoveryInProgress()) ! ereport(ERROR, ! (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), ! errmsg("recovery is in progress"), ! errhint("WAL control functions cannot be executed during recovery."))); ! ! if (!XLogIsNeeded()) ereport(ERROR, (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), errmsg("WAL level not sufficient for making an online backup"), --- 9417,9438 ---- int waits = 0; bool reported_waiting = false; char *remaining; + char *ptr; + + recovery_in_progress = RecoveryInProgress(); if (!superuser() && !is_authenticated_user_replication_role()) ereport(ERROR, (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE), (errmsg("must be superuser or replication role to run a backup")))); ! /* ! * During recovery, we don't need to check WAL level. Because the fact that ! * we are now executing pg_stop_backup() means that wal_level is set to ! * hot_standby on the master, i.e., WAL level is sufficient for making an online ! * backup. ! */ ! if (!recovery_in_progress && !XLogIsNeeded()) ereport(ERROR, (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), errmsg("WAL level not sufficient for making an online backup"), *************** *** 9281,9286 **** do_pg_stop_backup(char *labelfile, bool waitforarchive) --- 9523,9599 ---- remaining = strchr(labelfile, '\n') + 1; /* %n is not portable enough */ /* + * Parse the SYSTEM STATUS line, and check that database system + * status matches between pg_start_backup() and pg_stop_backup(). + */ + ptr = strstr(remaining, "SYSTEM STATUS:"); + if (sscanf(ptr, "SYSTEM STATUS: %19s\n", systemstatus) != 1) + ereport(ERROR, + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), + errmsg("invalid data in file \"%s\"", BACKUP_LABEL_FILE))); + if (strcmp(systemstatus, "recovery") == 0 && !recovery_in_progress) + ereport(ERROR, + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), + errmsg("pg_stop_backup() was executed during normal processing " + "though pg_start_backup() was executed during recovery"), + errhint("The database backup will not be usable."))); + + /* + * During recovery, we don't write an end-of-backup record. We can + * assume that pg_control was backed up just before pg_stop_backup() + * and its minimum recovery point can be available as the backup end + * location. Without an end-of-backup record, we can check correctly + * whether we've reached the end of backup when starting recovery + * from this backup. + * + * We don't force a switch to new WAL file and wait for all the required + * files to be archived. This is okay if we use the backup to start + * the standby. But, if it's for an archive recovery, to ensure all the + * required files are available, a user should wait for them to be archived, + * or include them into the backup after pg_stop_backup(). + * + * We return the current minimum recovery point as the backup end + * location. Note that it's would be bigger than the exact backup end + * location if the minimum recovery point is updated since the backup + * of pg_control. The return value of pg_stop_backup() is often used + * for a user to calculate the required files. Returning approximate + * location is harmless for that use because it's guaranteed not to be + * smaller than the exact backup end location. + * + * XXX currently a backup history file is for informational and debug + * purposes only. It's not essential for an online backup. Furthermore, + * even if it's created, it will not be archived during recovery because + * an archiver is not invoked. So it doesn't seem worthwhile to write + * a backup history file during recovery. + */ + if (recovery_in_progress) + { + /* use volatile pointer to prevent code rearrangement */ + volatile XLogCtlData *xlogctl = XLogCtl; + XLogRecPtr recptr; + + /* + * Check to see if all WAL replayed during online backup contain + * full-page writes. + */ + SpinLockAcquire(&xlogctl->info_lck); + recptr = xlogctl->lastFpwDisableRecPtr; + SpinLockRelease(&xlogctl->info_lck); + + if (XLByteLE(startpoint, recptr)) + ereport(ERROR, + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), + errmsg("WAL generated with full_page_writes=off was replayed " + "during online backup"))); + + LWLockAcquire(ControlFileLock, LW_SHARED); + stoppoint = ControlFile->minRecoveryPoint; + LWLockRelease(ControlFileLock); + + return stoppoint; + } + + /* * Write the backup-end xlog record */ rdata.data = (char *) (&startpoint); *************** *** 9797,9814 **** pg_xlogfile_name(PG_FUNCTION_ARGS) * Returns TRUE if a backup_label was found (and fills the checkpoint * location and its REDO location into *checkPointLoc and RedoStartLSN, * respectively); returns FALSE if not. If this backup_label came from a ! * streamed backup, *backupEndRequired is set to TRUE. */ static bool ! read_backup_label(XLogRecPtr *checkPointLoc, bool *backupEndRequired) { char startxlogfilename[MAXFNAMELEN]; TimeLineID tli; FILE *lfp; char ch; char backuptype[20]; *backupEndRequired = false; /* * See if label file is present --- 10110,10131 ---- * Returns TRUE if a backup_label was found (and fills the checkpoint * location and its REDO location into *checkPointLoc and RedoStartLSN, * respectively); returns FALSE if not. If this backup_label came from a ! * streamed backup, *backupEndRequired is set to TRUE. If this backup_label ! * was created during recovery, *backupDuringRecovery is set to TRUE. */ static bool ! read_backup_label(XLogRecPtr *checkPointLoc, bool *backupEndRequired, ! bool *backupDuringRecovery) { char startxlogfilename[MAXFNAMELEN]; TimeLineID tli; FILE *lfp; char ch; char backuptype[20]; + char systemstatus[20]; *backupEndRequired = false; + *backupDuringRecovery = false; /* * See if label file is present *************** *** 9842,9857 **** read_backup_label(XLogRecPtr *checkPointLoc, bool *backupEndRequired) (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), errmsg("invalid data in file \"%s\"", BACKUP_LABEL_FILE))); /* ! * BACKUP METHOD line is new in 9.1. We can't restore from an older backup ! * anyway, but since the information on it is not strictly required, don't ! * error out if it's missing for some reason. */ ! if (fscanf(lfp, "BACKUP METHOD: %19s", backuptype) == 1) { if (strcmp(backuptype, "streamed") == 0) *backupEndRequired = true; } if (ferror(lfp) || FreeFile(lfp)) ereport(FATAL, (errcode_for_file_access(), --- 10159,10180 ---- (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), errmsg("invalid data in file \"%s\"", BACKUP_LABEL_FILE))); /* ! * BACKUP METHOD and SYSTEM STATUS lines are new in 9.2. We can't ! * restore from an older backup anyway, but since the information on it ! * is not strictly required, don't error out if it's missing for some reason. */ ! if (fscanf(lfp, "BACKUP METHOD: %19s\n", backuptype) == 1) { if (strcmp(backuptype, "streamed") == 0) *backupEndRequired = true; } + if (fscanf(lfp, "SYSTEM STATUS: %19s\n", systemstatus) == 1) + { + if (strcmp(systemstatus, "recovery") == 0) + *backupDuringRecovery = true; + } + if (ferror(lfp) || FreeFile(lfp)) ereport(FATAL, (errcode_for_file_access(), *** a/src/backend/postmaster/postmaster.c --- b/src/backend/postmaster/postmaster.c *************** *** 289,294 **** typedef enum --- 289,296 ---- static PMState pmState = PM_INIT; static bool ReachedNormalRunning = false; /* T if we've reached PM_RUN */ + static bool OnlineBackupAllowed = false; /* T if we've reached PM_RUN or + * PM_HOT_STANDBY */ bool ClientAuthInProgress = false; /* T during new-client * authentication */ *************** *** 2119,2136 **** pmdie(SIGNAL_ARGS) /* and the walwriter too */ if (WalWriterPID != 0) signal_child(WalWriterPID, SIGTERM); ! ! /* ! * If we're in recovery, we can't kill the startup process ! * right away, because at present doing so does not release ! * its locks. We might want to change this in a future ! * release. For the time being, the PM_WAIT_READONLY state ! * indicates that we're waiting for the regular (read only) ! * backends to die off; once they do, we'll kill the startup ! * and walreceiver processes. ! */ ! pmState = (pmState == PM_RUN) ? ! PM_WAIT_BACKUP : PM_WAIT_READONLY; } /* --- 2121,2127 ---- /* and the walwriter too */ if (WalWriterPID != 0) signal_child(WalWriterPID, SIGTERM); ! pmState = PM_WAIT_BACKUP; } /* *************** *** 2313,2318 **** reaper(SIGNAL_ARGS) --- 2304,2310 ---- */ FatalError = false; ReachedNormalRunning = true; + OnlineBackupAllowed = true; pmState = PM_RUN; /* *************** *** 2854,2862 **** PostmasterStateMachine(void) { /* * PM_WAIT_BACKUP state ends when online backup mode is not active. */ if (!BackupInProgress()) ! pmState = PM_WAIT_BACKENDS; } if (pmState == PM_WAIT_READONLY) --- 2846,2862 ---- { /* * PM_WAIT_BACKUP state ends when online backup mode is not active. + * + * If we're in recovery, we can't kill the startup process right away, + * because at present doing so does not release its locks. We might + * want to change this in a future release. For the time being, + * the PM_WAIT_READONLY state indicates that we're waiting for + * the regular (read only) backends to die off; once they do, + * we'll kill the startup and walreceiver processes. */ if (!BackupInProgress()) ! pmState = ReachedNormalRunning ? ! PM_WAIT_BACKENDS : PM_WAIT_READONLY; } if (pmState == PM_WAIT_READONLY) *************** *** 3025,3037 **** PostmasterStateMachine(void) /* * Terminate backup mode to avoid recovery after a clean fast * shutdown. Since a backup can only be taken during normal ! * running (and not, for example, while running under Hot Standby) ! * it only makes sense to do this if we reached normal running. If ! * we're still in recovery, the backup file is one we're ! * recovering *from*, and we must keep it around so that recovery ! * restarts from the right place. */ ! if (ReachedNormalRunning) CancelBackup(); /* Normal exit from the postmaster is here */ --- 3025,3037 ---- /* * Terminate backup mode to avoid recovery after a clean fast * shutdown. Since a backup can only be taken during normal ! * running and hot standby, it only makes sense to do this ! * if we reached normal running or hot standby. If we have not ! * reached a consistent recovery state yet, the backup file is ! * one we're recovering *from*, and we must keep it around ! * so that recovery restarts from the right place. */ ! if (OnlineBackupAllowed) CancelBackup(); /* Normal exit from the postmaster is here */ *************** *** 4188,4193 **** sigusr1_handler(SIGNAL_ARGS) --- 4188,4194 ---- ereport(LOG, (errmsg("database system is ready to accept read only connections"))); + OnlineBackupAllowed = true; pmState = PM_HOT_STANDBY; } *** a/src/backend/postmaster/walwriter.c --- b/src/backend/postmaster/walwriter.c *************** *** 216,221 **** WalWriterMain(void) --- 216,228 ---- PG_SETMASK(&UnBlockSig); /* + * There is a race condition: full_page_writes might have been changed + * since the startup process had updated it in shared memory. To handle + * this case, we always update shared full_page_writes here. + */ + UpdateFullPageWrites(); + + /* * Loop forever */ for (;;) *************** *** 236,241 **** WalWriterMain(void) --- 243,254 ---- { got_SIGHUP = false; ProcessConfigFile(PGC_SIGHUP); + + /* + * If full_page_writes has been changed by SIGHUP, we update it + * in shared memory and write an XLOG_FPW_CHANGE record. + */ + UpdateFullPageWrites(); } if (shutdown_requested) { *** a/src/backend/utils/misc/guc.c --- b/src/backend/utils/misc/guc.c *************** *** 130,136 **** extern int CommitSiblings; extern char *default_tablespace; extern char *temp_tablespaces; extern bool synchronize_seqscans; - extern bool fullPageWrites; extern int ssl_renegotiation_limit; extern char *SSLCipherSuites; --- 130,135 ---- *** a/src/bin/pg_controldata/pg_controldata.c --- b/src/bin/pg_controldata/pg_controldata.c *************** *** 209,214 **** main(int argc, char *argv[]) --- 209,216 ---- ControlFile.checkPointCopy.redo.xrecoff); printf(_("Latest checkpoint's TimeLineID: %u\n"), ControlFile.checkPointCopy.ThisTimeLineID); + printf(_("Latest checkpoint's full_page_writes: %s\n"), + ControlFile.checkPointCopy.fullPageWrites ? _("yes") : _("no")); printf(_("Latest checkpoint's NextXID: %u/%u\n"), ControlFile.checkPointCopy.nextXidEpoch, ControlFile.checkPointCopy.nextXid); *************** *** 232,237 **** main(int argc, char *argv[]) --- 234,242 ---- printf(_("Backup start location: %X/%X\n"), ControlFile.backupStartPoint.xlogid, ControlFile.backupStartPoint.xrecoff); + printf(_("Backup end location: %X/%X\n"), + ControlFile.backupEndPoint.xlogid, + ControlFile.backupEndPoint.xrecoff); printf(_("End-of-backup record required: %s\n"), ControlFile.backupEndRequired ? _("yes") : _("no")); printf(_("Current wal_level setting: %s\n"), *** a/src/bin/pg_ctl/pg_ctl.c --- b/src/bin/pg_ctl/pg_ctl.c *************** *** 885,899 **** do_stop(void) /* * If backup_label exists, an online backup is running. Warn the user * that smart shutdown will wait for it to finish. However, if ! * recovery.conf is also present, we're recovering from an online ! * backup instead of performing one. */ if (shutdown_mode == SMART_MODE && ! stat(backup_file, &statbuf) == 0 && ! stat(recovery_file, &statbuf) != 0) { ! print_msg(_("WARNING: online backup mode is active\n" ! "Shutdown will not complete until pg_stop_backup() is called.\n\n")); } print_msg(_("waiting for server to shut down...")); --- 885,902 ---- /* * If backup_label exists, an online backup is running. Warn the user * that smart shutdown will wait for it to finish. However, if ! * recovery.conf is also present and new connection has not been ! * allowed yet, an online backup mode must not be active. */ if (shutdown_mode == SMART_MODE && ! stat(backup_file, &statbuf) == 0) { ! if (stat(recovery_file, &statbuf) != 0) ! print_msg(_("WARNING: online backup mode is active\n" ! "Shutdown will not complete until pg_stop_backup() is called.\n\n")); ! else ! print_msg(_("WARNING: online backup mode is active if you can connect as a superuser to server\n" ! "If so, shutdown will not complete until pg_stop_backup() is called.\n\n")); } print_msg(_("waiting for server to shut down...")); *************** *** 973,987 **** do_restart(void) /* * If backup_label exists, an online backup is running. Warn the user * that smart shutdown will wait for it to finish. However, if ! * recovery.conf is also present, we're recovering from an online ! * backup instead of performing one. */ if (shutdown_mode == SMART_MODE && ! stat(backup_file, &statbuf) == 0 && ! stat(recovery_file, &statbuf) != 0) { ! print_msg(_("WARNING: online backup mode is active\n" ! "Shutdown will not complete until pg_stop_backup() is called.\n\n")); } print_msg(_("waiting for server to shut down...")); --- 976,993 ---- /* * If backup_label exists, an online backup is running. Warn the user * that smart shutdown will wait for it to finish. However, if ! * recovery.conf is also present and new connection has not been ! * allowed yet, an online backup mode must not be active. */ if (shutdown_mode == SMART_MODE && ! stat(backup_file, &statbuf) == 0) { ! if (stat(recovery_file, &statbuf) != 0) ! print_msg(_("WARNING: online backup mode is active\n" ! "Shutdown will not complete until pg_stop_backup() is called.\n\n")); ! else ! print_msg(_("WARNING: online backup mode is active if you can connect as a superuser to server\n" ! "If so, shutdown will not complete until pg_stop_backup() is called.\n\n")); } print_msg(_("waiting for server to shut down...")); *** a/src/bin/pg_resetxlog/pg_resetxlog.c --- b/src/bin/pg_resetxlog/pg_resetxlog.c *************** *** 489,494 **** GuessControlValues(void) --- 489,495 ---- ControlFile.checkPointCopy.redo.xlogid = 0; ControlFile.checkPointCopy.redo.xrecoff = SizeOfXLogLongPHD; ControlFile.checkPointCopy.ThisTimeLineID = 1; + ControlFile.checkPointCopy.fullPageWrites = false; ControlFile.checkPointCopy.nextXidEpoch = 0; ControlFile.checkPointCopy.nextXid = FirstNormalTransactionId; ControlFile.checkPointCopy.nextOid = FirstBootstrapObjectId; *************** *** 503,509 **** GuessControlValues(void) ControlFile.time = (pg_time_t) time(NULL); ControlFile.checkPoint = ControlFile.checkPointCopy.redo; ! /* minRecoveryPoint and backupStartPoint can be left zero */ ControlFile.wal_level = WAL_LEVEL_MINIMAL; ControlFile.MaxConnections = 100; --- 504,510 ---- ControlFile.time = (pg_time_t) time(NULL); ControlFile.checkPoint = ControlFile.checkPointCopy.redo; ! /* minRecoveryPoint, backupStartPoint and backupEndPoint can be left zero */ ControlFile.wal_level = WAL_LEVEL_MINIMAL; ControlFile.MaxConnections = 100; *************** *** 569,574 **** PrintControlValues(bool guessed) --- 570,577 ---- sysident_str); printf(_("Latest checkpoint's TimeLineID: %u\n"), ControlFile.checkPointCopy.ThisTimeLineID); + printf(_("Latest checkpoint's full_page_writes: %s\n"), + ControlFile.checkPointCopy.fullPageWrites ? _("yes") : _("no")); printf(_("Latest checkpoint's NextXID: %u/%u\n"), ControlFile.checkPointCopy.nextXidEpoch, ControlFile.checkPointCopy.nextXid); *************** *** 637,642 **** RewriteControlFile(void) --- 640,647 ---- ControlFile.minRecoveryPoint.xrecoff = 0; ControlFile.backupStartPoint.xlogid = 0; ControlFile.backupStartPoint.xrecoff = 0; + ControlFile.backupEndPoint.xlogid = 0; + ControlFile.backupEndPoint.xrecoff = 0; ControlFile.backupEndRequired = false; /* *** a/src/include/access/xlog.h --- b/src/include/access/xlog.h *************** *** 197,202 **** extern int XLogArchiveTimeout; --- 197,203 ---- extern bool XLogArchiveMode; extern char *XLogArchiveCommand; extern bool EnableHotStandby; + extern bool fullPageWrites; extern bool log_checkpoints; /* WAL levels */ *************** *** 306,311 **** extern void CreateCheckPoint(int flags); --- 307,313 ---- extern bool CreateRestartPoint(int flags); extern void XLogPutNextOid(Oid nextOid); extern XLogRecPtr XLogRestorePoint(const char *rpName); + extern void UpdateFullPageWrites(void); extern XLogRecPtr GetRedoRecPtr(void); extern XLogRecPtr GetInsertRecPtr(void); extern XLogRecPtr GetFlushRecPtr(void); *** a/src/include/catalog/pg_control.h --- b/src/include/catalog/pg_control.h *************** *** 21,27 **** /* Version identifier for this pg_control format */ ! #define PG_CONTROL_VERSION 921 /* * Body of CheckPoint XLOG records. This is declared here because we keep --- 21,27 ---- /* Version identifier for this pg_control format */ ! #define PG_CONTROL_VERSION 922 /* * Body of CheckPoint XLOG records. This is declared here because we keep *************** *** 33,38 **** typedef struct CheckPoint --- 33,39 ---- XLogRecPtr redo; /* next RecPtr available when we began to * create CheckPoint (i.e. REDO start point) */ TimeLineID ThisTimeLineID; /* current TLI */ + bool fullPageWrites; /* current full_page_writes */ uint32 nextXidEpoch; /* higher-order bits of nextXid */ TransactionId nextXid; /* next free XID */ Oid nextOid; /* next free OID */ *************** *** 60,65 **** typedef struct CheckPoint --- 61,67 ---- #define XLOG_BACKUP_END 0x50 #define XLOG_PARAMETER_CHANGE 0x60 #define XLOG_RESTORE_POINT 0x70 + #define XLOG_FPW_CHANGE 0x80 /* *************** *** 138,143 **** typedef struct ControlFileData --- 140,152 ---- * record, to make sure the end-of-backup record corresponds the base * backup we're recovering from. * + * backupEndPoint is the backup end location, if we are recovering from + * an online backup which was taken from the server in recovery mode + * and haven't reached the end of backup yet. It is initialized to + * the minimum recovery point in pg_control which was backed up just + * before pg_stop_backup(). It is reset to zero when the end of backup + * is reached, and we mustn't start up before that. + * * If backupEndRequired is true, we know for sure that we're restoring * from a backup, and must see a backup-end record before we can safely * start up. If it's false, but backupStartPoint is set, a backup_label *************** *** 146,151 **** typedef struct ControlFileData --- 155,161 ---- */ XLogRecPtr minRecoveryPoint; XLogRecPtr backupStartPoint; + XLogRecPtr backupEndPoint; bool backupEndRequired; /*