Re: Re: BUG #13685: Archiving while idle every archive_timeout with wal_level hot_standby
От | Michael Paquier |
---|---|
Тема | Re: Re: BUG #13685: Archiving while idle every archive_timeout with wal_level hot_standby |
Дата | |
Msg-id | CAB7nPqTTGgYckpbqv8Hip+fypD99tSg2WnObhAJZMDh=GUBoNg@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Re: BUG #13685: Archiving while idle every archive_timeout with wal_level hot_standby (Amit Kapila <amit.kapila16@gmail.com>) |
Список | pgsql-hackers |
On Tue, Feb 2, 2016 at 1:42 PM, Amit Kapila <amit.kapila16@gmail.com> wrote: > On Sat, Jan 30, 2016 at 7:38 PM, Michael Paquier <michael.paquier@gmail.com> > wrote: >> >> On Fri, Jan 29, 2016 at 9:13 PM, Michael Paquier wrote: >> > Well, to put it short, I am just trying to find a way to make the >> > backend skip unnecessary checkpoints on an idle system, which results >> > in the following WAL pattern if system is completely idle: >> > CHECKPOINT_ONLINE >> > RUNNING_XACTS >> > RUNNING_XACTS >> > [etc..] >> > >> > The thing is that I am lost with the meaning of this condition to >> > decide if a checkpoint should be skipped or not: >> > if (prevPtr == ControlFile->checkPointCopy.redo && >> > prevPtr / XLOG_SEG_SIZE == curInsert / XLOG_SEG_SIZE) >> > { >> > WALInsertLockRelease(); >> > LWLockRelease(CheckpointLock); >> > return; >> > } >> > As at least one standby snapshot is logged before the checkpoint >> > record, the redo position is never going to match the previous insert >> > LSN, so checkpoints will never be skipped if wal_level >= hot_standby. >> > Skipping such unnecessary checkpoints is what you would like to >> > address, no? Because that's what I would like to do here first. And >> > once we got that right, we could think about addressing the case where >> > WAL segments are forcibly archived for idle systems. >> >> I have put a bit more of brain power into that, and finished with the >> patch attached. A new field called chkpProgressLSN is attached to >> XLogCtl, which is to the current insert position of the checkpoint >> when wal_level <= archive, or the LSN position of the standby snapshot >> taken after a checkpoint. The bgwriter code is modified as well so as >> it uses this progress LSN and compares it with the current insert LSN >> to determine if a standby snapshot should be logged or not. On an >> instance of Postgres completely idle, no useless checkpoints, and no >> useless standby snapshots are generated anymore. >> > > > @@ -8239,7 +8262,7 @@ CreateCheckPoint(int flags) > if ((flags & (CHECKPOINT_IS_SHUTDOWN | > CHECKPOINT_END_OF_RECOVERY | > CHECKPOINT_FORCE)) == 0) > { > - if > (prevPtr == ControlFile->checkPointCopy.redo && > + if (GetProgressRecPtr() == prevPtr && > > prevPtr / XLOG_SEG_SIZE == curInsert / XLOG_SEG_SIZE) > { > > I think such a check won't consider all the WAL-activity happened > during checkpoint (after checkpoint start, till it finishes) which was > not the intention of this code without your patch. Yes, that's true, in v2 this progress LSN can be updated in two ways: - At the position where checkpoint has begun - At the position where standby snapshot was taken after a checkpoint has run. Honestly, even if we log the snapshot for example before CheckpointGuts(), that's not going to make it... The main issue here is that there will be a snapshot after this checkpoint, so the existing condition is a rather broken concept already when wal_level >= hot_standby because the redo pointer is never going to match the previous LSN pointer. Another idea would be to ensure that the snapshot is logged just after the redo pointer, this would be reliable. > I think both this and previous patch (hs-checkpoints-v1) approach > won't fix the issue in all kind of scenario's. > > Let me try to explain what I think should fix this issue based on > discussion above, suggestions by Andres and some of my own > thoughts: > > Have a new variable in WALInsertLock that would indicate the last > insertion position (last_insert_pos) we write to after holding that lock. > Ensure that we don't update last_insert_pos while logging standby > snapshot (simple way is to pass a flag in XLogInsert or distinguish > it based on type of record (XLOG_RUNNING_XACTS) or if you can > think of a more smarter way). Simplest way is in XLogInsertRecord, the record data is available there, there is for example some logic related to segment switches. > Now both during checkpoint and > in bgwriter, to find out whether there is any activity since last > time, we need to check all the WALInsertLocks for latest insert position > (by referring last_insert_pos) and compare it with the latest position > we have noted during last such check and decide whether to proceed > or not based on comparison result. If you think it is not adequate to > store last_insert_pos in WALInsertLock, then we can think of having > it in PGPROC. So the progress check is used when deciding if a checkpoint should be skipped or not, and when deciding if a standby snapshot should be taken by the bgwriter? When bgwriter logs a snapshot, it will update as well the last LSN found (which would be a single variable in for example XLogCtlData, updated from the data taken from the WAL insert slots). But there is a problem here: if there is no activity since the last bgwriter snapshot, the next checkpoint would be skipped as well. It seems to me that this is not acceptable, a checkpoint generation would be decided based on if there has been some data activity since the last snapshot taken, or to put it in other words, no checkpoints would happen if there has been no activity within 15s after the last standby snapshot has been logged by the bgwriter. I have hacked something according to those lines, please see attached (logs are just here for testing purposes). I may be missing something obvious regarding the tracking of the last progress position. Code speaks better than words here I think, so feel free to look at it. > Yet another idea that occurs to me this morning is that why not > have a variable in shared memory in XLogCtlInsert or XLogCtl > similar to CurrBytePos/PrevBytePos which will be updated on > each XLOGInsert apart from the XLOGInsert for standby snapshots > and use that in a patch somewhat close to what you have in > hs-checkpoints-v1. I have considered that as well, but I do not think it is a good idea to add more contention in ReserveXLogInsertLocation() while taking insertpos_lck lock. That's already really hot, and we surely don't want to make it hotter just to do checks on idle systems. If that's what you had in mind. Btw, this is basically what I did in the attached, no? Except that the current insert position is tracked differently. >From v2 I sent upthread: - if (prevPtr == ControlFile->checkPointCopy.redo && + if (GetProgressRecPtr() == prevPtr && prevPtr / XLOG_SEG_SIZE == curInsert / XLOG_SEG_SIZE) This is wrong actually. prevPtr should be replaced by the redo LSN. -- Michael
Вложения
В списке pgsql-hackers по дате отправления:
Следующее
От: Robert HaasДата:
Сообщение: Re: "using previous checkpoint record at" maybe not the greatest idea?