Обсуждение: BUG #15402: Hot standby server with archive_mode=on keeps initial WALsegments
BUG #15402: Hot standby server with archive_mode=on keeps initial WALsegments
От
PG Bug reporting form
Дата:
The following bug has been logged on the website: Bug reference: 15402 Logged by: TAKATSUKA Haruka Email address: harukat@sraoss.co.jp PostgreSQL version: 11beta4 Operating system: Linux (CentOS 6) Description: Hello PostgreSQL hackers, A hot standby server with “archive_mode = on” keeps initial WAL segment files that copied by pg_basebackup. It shows the following status. 000000010000000000000042 will be kept forever in this case. $ ls data_primary/pg_wal/ 000000010000000000000042.00000028.backup 00000001000000000000004D 000000010000000000000048 00000001000000000000004E 000000010000000000000049 00000001000000000000004F 00000001000000000000004A 000000010000000000000050 00000001000000000000004B 000000010000000000000051 00000001000000000000004C archive_status $ ls data_standby/pg_wal/ 000000010000000000000042 00000001000000000000004B 00000001000000000000004F 000000010000000000000048 00000001000000000000004C archive_status 000000010000000000000049 00000001000000000000004D 00000001000000000000004A 00000001000000000000004E $ ls data_standby/pg_wal/archive_status/ 000000010000000000000042.ready 00000001000000000000004A.done 000000010000000000000048.done 00000001000000000000004B.done 000000010000000000000049.done 00000001000000000000004C.done Though I understand renaming the .ready to .done manually can clean it, I would like to fix the server code like the following patch. I'd appreciate if you would consider that. - - - - - diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 5abaeb0..191ba60 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -3963,7 +3963,9 @@ RemoveOldXlogFiles(XLogSegNo segno, XLogRecPtr RedoRecPtr, XLogRecPtr endptr) */ if (strcmp(xlde->d_name + 8, lastoff + 8) <= 0) { - if (XLogArchiveCheckDone(xlde->d_name)) + if (XLogArchiveCheckDone(xlde->d_name) || + (XLogArchiveMode != ARCHIVE_MODE_ALWAYS && + XLogCtl->SharedRecoveryInProgress)) { /* Update the last removed location in shared memory first */ UpdateLastRemovedPtr(xlde->d_name); - - - - - - Thanks,
Re: BUG #15402: Hot standby server with archive_mode=on keepsinitial WAL segments
От
Michael Paquier
Дата:
On Wed, Sep 26, 2018 at 03:26:50AM +0000, PG Bug reporting form wrote: > A hot standby server with “archive_mode = on” keeps initial WAL segment > files that copied by pg_basebackup. It shows the following status. > 000000010000000000000042 will be kept forever in this case. How did you find yourself in this situation? Did you take a base backup from a primary which had .ready files in it, and those got copied to the standby's data folder? We may want to study the possibility of filtering things when taking a base backup instead. > Though I understand renaming the .ready to .done manually can clean it, > I would like to fix the server code like the following patch. > I'd appreciate if you would consider that. Your patch looks incorrect to me to begin with... What if archive_mode is switched from "on" to "always" back-and-forth and there some of the past segments which should be archived are not? -- Michael
Вложения
Re: BUG #15402: Hot standby server with archive_mode=on keepsinitial WAL segments
От
TAKATSUKA Haruka
Дата:
Hello Michael, Firstly, I will tell the reproduce steps. On Wed, 26 Sep 2018 12:52:05 +0900 Michael Paquier <michael@paquier.xyz> wrote: > On Wed, Sep 26, 2018 at 03:26:50AM +0000, PG Bug reporting form wrote: > > A hot standby server with “archive_mode = on” keeps initial WAL segment > > files that copied by pg_basebackup. It shows the following status. > > 000000010000000000000042 will be kept forever in this case. > > How did you find yourself in this situation? Did you take a base backup > from a primary which had .ready files in it, and those got copied to the > standby's data folder? We may want to study the possibility of > filtering things when taking a base backup instead. I can reproduce it by this steps. $ pg_basebackup -D data_standby -R -Xs $ pg_ctl start -D data_standby -o '-p 5433' $ pgbench -i ; pgbench -i ; pgbench -i $ psql -c 'checkpoint' ; psql -p 5433 -c 'checkpoint' $ ls data_standby/pg_wal ; ls data_standby/pg_wal/archive_status/ postgresql.conf: archive_mode = on archive_command = 'cp %p /tmp/arc/%f' max_wal_size = 160MB There are no .ready files right after pg_basebackup. It will be generated after the replication starts. Thnaks, Haruka Takatsuka > > Though I understand renaming the .ready to .done manually can clean it, > > I would like to fix the server code like the following patch. > > I'd appreciate if you would consider that. > > Your patch looks incorrect to me to begin with... What if archive_mode > is switched from "on" to "always" back-and-forth and there some of the > past segments which should be archived are not? > -- > Michael
Re: BUG #15402: Hot standby server with archive_mode=on keepsinitial WAL segments
От
TAKATSUKA Haruka
Дата:
Hello, Michael > > > Though I understand renaming the .ready to .done manually can clean it, > > > I would like to fix the server code like the following patch. > > > I'd appreciate if you would consider that. > > > > Your patch looks incorrect to me to begin with... What if archive_mode > > is switched from "on" to "always" back-and-forth and there some of the > > past segments which should be archived are not? I understand this scenario. (1) archive_mode=always and archvie_command failure (2) .ready files are building up (3) switch to archive_mode=on (4) WAL segments with .ready status are swept (5) switch to archive_mode=always some WAL segments are lost. I think is it natural that some segments are not archived when you switch archive_mode back-and-forth. I also think a fix to prevent generating .ready file at hot standby server in this case is better. I don't have a concrete idea to do so now. Thanks, Haruka Takatsuka
Re: BUG #15402: Hot standby server with archive_mode=on keepsinitial WAL segments
От
Michael Paquier
Дата:
On Wed, Sep 26, 2018 at 02:12:36PM +0900, TAKATSUKA Haruka wrote: > I think is it natural that some segments are not archived when you > switch archive_mode back-and-forth. > > I also think a fix to prevent generating .ready file at hot standby > server in this case is better. I don't have a concrete idea to do so now. I looked at this problem, and I completely agree. From what I can see, the restart point run on the standby creates a .ready file for the oldest segment because there were no .done file present in it, so the checkpointer thinks that it should mark the file with .ready, and then makes it ready for archiving, which is never going to happen with archive_mode = on. All the newer segments are already marked with .done, so they are getting recycled correctly. Your patch is not completely correct though, as the origin of the problem comes from XLogArchiveCheckDone(), which should be made more solid depending on the archive mode used. The solution attached actually fixes a second bug, which is less annoying by the way, as past backup history files may stay behind on standbys. I looked at all the code paths of XLogArchivingActive() and that's the only problem I can see. In guc.c, SHOW archive_command would print as "(disabled)" on standbys even if archive_mode = on, but we lived with this behavior for ages. The user can understand that archiving is enabled only on a primary this way when looking at a standby, so that should not be changed. Takatsuka-san, what do you think? -- Michael
Вложения
Re: BUG #15402: Hot standby server with archive_mode=on keepsinitial WAL segments
От
Michael Paquier
Дата:
On Thu, Sep 27, 2018 at 01:44:30PM +0900, Michael Paquier wrote: > I looked at this problem, and I completely agree. From what I can see, > the restart point run on the standby creates a .ready file for the > oldest segment because there were no .done file present in it, so the > checkpointer thinks that it should mark the file with .ready, and then > makes it ready for archiving, which is never going to happen with > archive_mode = on. All the newer segments are already marked with > .done, so they are getting recycled correctly. I have spent a couple of hours on this problem, checked the behavior on all branches, and committed the fix. Thanks for the report! -- Michael
Вложения
Re: BUG #15402: Hot standby server with archive_mode=on keepsinitial WAL segments
От
TAKATSUKA Haruka
Дата:
Hi Michael, On Thu, 27 Sep 2018 13:44:30 +0900 Michael Paquier <michael@paquier.xyz> wrote: > On Wed, Sep 26, 2018 at 02:12:36PM +0900, TAKATSUKA Haruka wrote: > > I think is it natural that some segments are not archived when you > > switch archive_mode back-and-forth. > > > > I also think a fix to prevent generating .ready file at hot standby > > server in this case is better. I don't have a concrete idea to do so now. > > I looked at this problem, and I completely agree. From what I can see, > the restart point run on the standby creates a .ready file for the > oldest segment because there were no .done file present in it, so the > checkpointer thinks that it should mark the file with .ready, and then > makes it ready for archiving, which is never going to happen with > archive_mode = on. All the newer segments are already marked with > .done, so they are getting recycled correctly. > > Your patch is not completely correct though, as the origin of the > problem comes from XLogArchiveCheckDone(), which should be made more > solid depending on the archive mode used. The solution attached > actually fixes a second bug, which is less annoying by the way, as past > backup history files may stay behind on standbys. I looked at all the > code paths of XLogArchivingActive() and that's the only problem I can > see. I fixed only RemoveOldXlogFiles() in my patch because merely I wasn't sure that keeping an old history file is a bug behavir. > In guc.c, SHOW archive_command would print as "(disabled)" on > standbys even if archive_mode = on, but we lived with this behavior for > ages. The user can understand that archiving is enabled only on a > primary this way when looking at a standby, so that should not be > changed. > > Takatsuka-san, what do you think? I think it is not necessary to change though it is a somewhat mistakable feature. Thanks, Haruka Takatsuka
Re: BUG #15402: Hot standby server with archive_mode=on keepsinitial WAL segments
От
Michael Paquier
Дата:
On Fri, Sep 28, 2018 at 12:20:03PM +0900, TAKATSUKA Haruka wrote: > I fixed only RemoveOldXlogFiles() in my patch because merely > I wasn't sure that keeping an old history file is a bug behavior. The only caller of CleanupBackupHistory() si clear about willing to see past backup history files go away at the end of a backup, or mark those as .ready if need be. -- Michael
Вложения
Re: BUG #15402: Hot standby server with archive_mode=on keepsinitial WAL segments
От
TAKATSUKA Haruka
Дата:
Thanks for your PostgreSQL work! Haruka Takatsuka On Fri, 28 Sep 2018 12:02:08 +0900 Michael Paquier <michael@paquier.xyz> wrote: > On Thu, Sep 27, 2018 at 01:44:30PM +0900, Michael Paquier wrote: > > I looked at this problem, and I completely agree. From what I can see, > > the restart point run on the standby creates a .ready file for the > > oldest segment because there were no .done file present in it, so the > > checkpointer thinks that it should mark the file with .ready, and then > > makes it ready for archiving, which is never going to happen with > > archive_mode = on. All the newer segments are already marked with > > .done, so they are getting recycled correctly. > > I have spent a couple of hours on this problem, checked the behavior on > all branches, and committed the fix. Thanks for the report! > -- > Michael