Re: Re: BUG #13685: Archiving while idle every archive_timeout with wal_level hot_standby

Поиск
Список
Период
Сортировка
От Amit Kapila
Тема Re: Re: BUG #13685: Archiving while idle every archive_timeout with wal_level hot_standby
Дата
Msg-id CAA4eK1J+heuBJ-QpaXzhzW7OcZoxvdePHAjP_EA_KA7cg7CmLA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Re: BUG #13685: Archiving while idle every archive_timeout with wal_level hot_standby  (Michael Paquier <michael.paquier@gmail.com>)
Ответы Re: Re: BUG #13685: Archiving while idle every archive_timeout with wal_level hot_standby  (Michael Paquier <michael.paquier@gmail.com>)
Список pgsql-hackers
On Wed, Feb 10, 2016 at 7:17 AM, Michael Paquier <michael.paquier@gmail.com> wrote:
>
> On Tue, Feb 9, 2016 at 10:42 PM, Amit Kapila wrote:
> > On Tue, Feb 9, 2016 at 6:08 PM, Michael Paquier wrote:
> >> Well, the idea is to improve the system responsiveness. Imagine that
> >> the call to GetProgressRecPtr() is done within the exclusive lock
> >> portion, but that while scanning the WAL insert lock entries another
> >> backend is waiting to take a lock on them, and this backend is trying
> >> to insert the first record that updates the progress LSN since the
> >> last checkpoint. In this case the checkpoint would be skipped.
> >> However, imagine that such a record is able to get through it while
> >> scanning the progress values in the WAL insert locks, in which case a
> >> checkpoint would be generated.
> >
> > Such case was not covered without your patch and I don't see the
> > need of same especially at the cost of taking locks.
>
> In this code path that's quite cheap, and it clearly improves the
> decision-making when wal_level >= hs which is now rather broken (say
> not optimized much).
>
> >>  In any case, I think that we should try
> >> to get exclusive lock areas as small as possible if we have ways to do
> >> so.
> >>
> >
> > Sure, but not when you are already going to take lock for longer
> > duration.
>
> Why would an exclusive lock be taken for a longer duration in the
> checkpoint portion?

Consider below code:

/*

+ * Get progress before acquiring insert locks to shorten the locked

+ * section waiting ahead.

+ */

+ progress_lsn = GetProgressRecPtr();

+

+ /*

  * We must block concurrent insertions while examining insert state to

  * determine the checkpoint REDO pointer.

  */

  WALInsertLockAcquireExclusive();



In GetProgressRecPtr(), first it take WALInsertLocks one-by-one to
retrieve the latest value of progressAt and then again it will take
all the WALInsertLocks in WALInsertLockAcquireExclusive() and then
do some work.  Now I think it is okay to retrieve the latest of progressAt
after WALInsertLockAcquireExclusive() as you don't need to take the
locks again.

>
> > - last_snapshot_lsn != GetXLogInsertRecPtr())
> > +
> > GetLastCheckpointRecPtr() < GetProgressRecPtr())
> >
> > How the above check in patch suppose to work?
> > I think it could so happen once bgwriter logs snapshot, both checkpoint
> > and progressAt won't get updated any further and I think this check will
> > allow to log snapshots in such a case as well.
>
> The only purpose of this check is to do the following thing: if no WAL
> activity has happened since last checkpoint, there is no need to log a
> standby snapshot from the perspective of the bgwriter. In case the
> system is idle, we want to skip logging that and avoid unnecessary
> checkpoints because those records would have been generated. If the
> system is not idle, or to put it in other words there has been at
> least one record since the last checkpoint, we would log a standby
> snapshot, enforcing as well a checkpoint to happen the next time
> CreateCheckpoint() is gone through, and a standby snapshot is logged
> as well after the checkpoint contents are flushed. I am not sure I
> understand what you are getting at...
>

Let me try to say again, suppose ControlFile->checkPoint is at
100 and latest value of progressAt returned by GetProgressRecPtr
is 105, so the first time the above check happens, it will allow
to log standby snapshot which is okay, now assume there is no
activity, neither there is any checkpoint and again after
LOG_SNAPSHOT_INTERVAL_MS interval, when the above check
gets executed, it will pass and log the standby snapshot which is
*not* okay, because we don't want to log standby snapshot when
there is no activity.  Am I missing something?



With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Kyotaro HORIGUCHI
Дата:
Сообщение: Re: Support for N synchronous standby servers - take 2
Следующее
От: Amit Kapila
Дата:
Сообщение: Re: Speed up Clog Access by increasing CLOG buffers