Re: Fix checkpoint skip logic on idle systems by tracking LSN progress

Поиск
Список
Период
Сортировка
От Michael Paquier
Тема Re: Fix checkpoint skip logic on idle systems by tracking LSN progress
Дата
Msg-id CAB7nPqSCLYtEf7fffVYjSgqpqqkHJaeRd_9qGYa1m-872TddGg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Fix checkpoint skip logic on idle systems by tracking LSN progress  (David Steele <david@pgmasters.net>)
Ответы Re: Fix checkpoint skip logic on idle systems by tracking LSN progress  (David Steele <david@pgmasters.net>)
Список pgsql-hackers
On Thu, Sep 29, 2016 at 7:45 AM, David Steele <david@pgmasters.net> wrote:
> OK, I've done functional testing and this patch seems to work as
> specified (including the caveat noted above).  Some comments:

Thanks!

> * [PATCH 1/3] hs-checkpoints-v12-1
>
> +++ b/src/backend/access/transam/xlog.c
> +        * Taking a lock is as well necessary to prevent potential torn reads
> +        * on some platforms.
>
> How about, "Taking a lock is also necessary..."
>
> +               LWLockAcquire(&WALInsertLocks[i].l.lock, LW_EXCLUSIVE);
>
> That's a lot of exclusive locks and that would seem to have performance
> implications.  It seems to me this is going to be a hard one to
> benchmark because the regression (if any) would only be seen under heavy
> load on a very large system.
>
> In general I agree with the other comments that this could end up being
> a problem.  On the other hand, since the additional locks are only taken
> at checkpoint or archive_timeout it may not be that big a deal.

Yes, I did some tests on my laptop a couple of months back, that has 4
cores. After reducing NUM_XLOGINSERT_LOCKS from 8 to 4 to increase
contention and performing a bunch of INSERT using 4 clients on 4
different relations I could not catch a difference.. Autovacuum was
disabled to eliminate any noise. I tried checkpoint_segments at 30s to
see its effects, as well as larger values to see the impact with the
standby snapshot taken by the bgwriter. Other thoughts are welcome.

> +++ b/src/backend/access/transam/xloginsert.c * Should this record
> include the replication origin if one is set up?
>
> Outdated comment from XLogIncludeOrigin().

Fixed. I added as well some comments on top of XLogSetFlags to mention
what are the flags that can be used. I didn't think that it was
necessary to add an assertion here. Also, I noticed that the comment
on top of XLogInsertRecord mentioned those flags but was incorrect.

> * [PATCH 2/3] hs-checkpoints-v12-2
>
> +++ b/src/backend/postmaster/checkpointer.c
> +                       /* OK, it's time to switch */
> +                       elog(LOG, "Request XLog Switch");
>
> LOG level seems a bit much here, perhaps DEBUG1?

That's from Horiguchi-san's patch, and those would be definitely
better as DEBUG1 by looking at it. Now and in order to keep things
simple I think that we had better discard this patch for now. I was
planning to come back to this thing anyway once we are done with the
first problem.

> * [PATCH 3/3] hs-checkpoints-v12-3
>
> +                * switch segment only when any substantial progress have made from
> +                * reasons will cause last_xlog_switch_lsn stay behind but it doesn't
>
> How about, "Switch segment only when substantial progress has been made
> after the last segment was switched by a timeout.  Segment switching for
> other reasons..."
>
> +++ b/src/backend/access/transam/xlog.c
> +               elog(LOG, "Not a forced or shutdown checkpoint: progress_lsn %X/%X,
> ckpt %X/%X",
> +                       elog(LOG, "Checkpoint is skipped");
> +               elog(LOG, "snapshot taken by checkpoint %X/%X",
>
> Same for the above, seems like it would just be noise for most users.
>
> +++ b/src/backend/postmaster/bgwriter.c
> +                               elog(LOG, "snapshot taken by bgwriter %X/%X",
>
> Ditto.

The original patch was developed to ease debugging, and I chose LOG to
not be polluted with a bunch of DEBUG1 entries :)

Now we can do something, as follows:
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -8450,6 +8450,8 @@ CreateCheckPoint(int flags)
    {
        if (progress_lsn == ControlFile->checkPoint)
        {
+           if (log_checkpoints)
+               ereport(LOG, "checkpoint skipped");
            WALInsertLockRelease();
            LWLockRelease(CheckpointLock);
            END_CRIT_SECTION();
Letting users know that the checkpoint has been skipped sounds like a
good idea. Perhaps that's better if squashed with the first patch.

> I don't see any unintended consequences in this patch but it doesn't
> mean there aren't any.  I'm definitely concerned by the exclusive locks
> but it may turn out they do not actually represent a bottleneck.

That's a hard to see a difference. Perhaps I didn't try hard enough..

Well for now attached are two patches, that could just be squashed into one.
--
Michael

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Peter Eisentraut
Дата:
Сообщение: Re: Set log_line_prefix and application name in test drivers
Следующее
От: Tom Lane
Дата:
Сообщение: Re: Set log_line_prefix and application name in test drivers