Обсуждение: checkpoints after database start/immediate checkpoints

Поиск
Список
Период
Сортировка

checkpoints after database start/immediate checkpoints

От
Andres Freund
Дата:
Hi,

Right now it takes checkpoint_timeout till we start a checkpoint, and
checkpoint_timeout + checkpoint_timeout * checkpoint_completion_target
till we complete the first checkpoint after shutdown/forced checkpoints.

That means a) that such checkpoint will often be bigger/more heavyweight
than the following ones, not what you want after a restart/create
database/basebackup... b) reliable benchmarks measuring steady state
have to run for even longer.

There's some logic to that behaviour though: With a target of 0 it'd be
weird to start a checkpoint directly after startup, and even otherwise
there'll be less to write at the beginning.

As an example:
2016-02-02 01:32:58.053 CET 21361 LOG:  checkpoint complete: wrote 186041 buffers (71.0%); 1 transaction log file(s)
added,0 removed, 0 recycled; sort=0.071 s, write=9.504 s, sync=0.000 s, total=9.532 s; sync files=0, longest=0.000 s,
average=0.000s; distance=1460260 kB, estimate=1460260 kB
 
2016-02-02 01:32:58.053 CET 21361 LOG:  checkpoint starting: time
2016-02-02 01:33:08.061 CET 21361 LOG:  checkpoint complete: wrote 127249 buffers (48.5%); 0 transaction log file(s)
added,0 removed, 89 recycled; sort=0.045 s, write=9.987 s, sync=0.000 s, total=10.007 s; sync files=0, longest=0.000 s,
average=0.000s; distance=1187558 kB, estimate=1432989 kB
 
2016-02-02 01:33:58.216 CET 21361 LOG:  checkpoint complete: wrote 124649 buffers (47.5%); 0 transaction log file(s)
added,0 removed, 69 recycled; sort=0.048 s, write=10.160 s, sync=0.000 s, total=10.176 s; sync files=0, longest=0.000
s,average=0.000 s; distance=1135086 kB, estimate=1337776 kB
 
2016-02-02 01:33:58.216 CET 21361 LOG:  checkpoint starting: time
2016-02-02 01:34:08.060 CET 21361 LOG:  checkpoint complete: wrote 123298 buffers (47.0%); 0 transaction log file(s)
added,0 removed, 69 recycled; sort=0.049 s, write=9.838 s, sync=0.000 s, total=9.843 s; sync files=0, longest=0.000 s,
average=0.000s; distance=1184077 kB, estimate=1322406 kB
 
2016-02-02 01:34:08.060 CET 21361 LOG:  checkpoint starting: time
2016-02-02 01:34:18.052 CET 21361 LOG:  checkpoint complete: wrote 118165 buffers (45.1%); 0 transaction log file(s)
added,0 removed, 72 recycled; sort=0.046 s, write=9.987 s, sync=0.000 s, total=9.992 s; sync files=0, longest=0.000 s,
average=0.000s; distance=1198018 kB, estimate=1309967 kB
 
2016-02-02 01:34:18.053 CET 21361 LOG:  checkpoint starting: time
2016-02-02 01:34:28.089 CET 21361 LOG:  checkpoint complete: wrote 120814 buffers (46.1%); 0 transaction log file(s)
added,0 removed, 73 recycled; sort=0.051 s, write=10.020 s, sync=0.000 s, total=10.036 s; sync files=0, longest=0.000
s,average=0.000 s; distance=1203691 kB, estimate=1299339 kB
 
2016-02-02 01:34:28.090 CET 21361 LOG:  checkpoint starting: time
2016-02-02 01:34:39.182 CET 21361 LOG:  checkpoint complete: wrote 110411 buffers (42.1%); 0 transaction log file(s)
added,0 removed, 74 recycled; sort=0.047 s, write=9.908 s, sync=0.000 s, total=11.092 s; sync files=0, longest=0.000 s,
average=0.000s; distance=1073612 kB, estimate=1276767 kB
 

We wrote roughly 1/3 more wal/buffers during the first checkpoint than
following ones (And yes, the above is with an impossibly low timeout,
but that doesn't change anything). You can make that much more extreme
with larget shared buffers and larger timeouts.


I wonder if this essentially point at checkpoint_timeout being wrongly
defined: Currently it means we'll try to finish a checkpoint
(1-checkpoint_completion_target) * timeout before the next one - but
perhaps it should instead be that we start checkpoint_timeout * _target
before the next timeout? Afaics that'd work more graceful in the face of
restarts and forced checkpoints.

Greetings,

Andres Freund



Re: checkpoints after database start/immediate checkpoints

От
Robert Haas
Дата:
On Mon, Feb 1, 2016 at 7:43 PM, Andres Freund <andres@anarazel.de> wrote:
> Right now it takes checkpoint_timeout till we start a checkpoint, and
> checkpoint_timeout + checkpoint_timeout * checkpoint_completion_target
> till we complete the first checkpoint after shutdown/forced checkpoints.
>
> That means a) that such checkpoint will often be bigger/more heavyweight
> than the following ones, not what you want after a restart/create
> database/basebackup... b) reliable benchmarks measuring steady state
> have to run for even longer.
>
> There's some logic to that behaviour though: With a target of 0 it'd be
> weird to start a checkpoint directly after startup, and even otherwise
> there'll be less to write at the beginning.
>
> As an example:
> 2016-02-02 01:32:58.053 CET 21361 LOG:  checkpoint complete: wrote 186041 buffers (71.0%); 1 transaction log file(s)
added,0 removed, 0 recycled; sort=0.071 s, write=9.504 s, sync=0.000 s, total=9.532 s; sync files=0, longest=0.000 s,
average=0.000s; distance=1460260 kB, estimate=1460260 kB 
> 2016-02-02 01:32:58.053 CET 21361 LOG:  checkpoint starting: time
> 2016-02-02 01:33:08.061 CET 21361 LOG:  checkpoint complete: wrote 127249 buffers (48.5%); 0 transaction log file(s)
added,0 removed, 89 recycled; sort=0.045 s, write=9.987 s, sync=0.000 s, total=10.007 s; sync files=0, longest=0.000 s,
average=0.000s; distance=1187558 kB, estimate=1432989 kB 
> 2016-02-02 01:33:58.216 CET 21361 LOG:  checkpoint complete: wrote 124649 buffers (47.5%); 0 transaction log file(s)
added,0 removed, 69 recycled; sort=0.048 s, write=10.160 s, sync=0.000 s, total=10.176 s; sync files=0, longest=0.000
s,average=0.000 s; distance=1135086 kB, estimate=1337776 kB 
> 2016-02-02 01:33:58.216 CET 21361 LOG:  checkpoint starting: time
> 2016-02-02 01:34:08.060 CET 21361 LOG:  checkpoint complete: wrote 123298 buffers (47.0%); 0 transaction log file(s)
added,0 removed, 69 recycled; sort=0.049 s, write=9.838 s, sync=0.000 s, total=9.843 s; sync files=0, longest=0.000 s,
average=0.000s; distance=1184077 kB, estimate=1322406 kB 
> 2016-02-02 01:34:08.060 CET 21361 LOG:  checkpoint starting: time
> 2016-02-02 01:34:18.052 CET 21361 LOG:  checkpoint complete: wrote 118165 buffers (45.1%); 0 transaction log file(s)
added,0 removed, 72 recycled; sort=0.046 s, write=9.987 s, sync=0.000 s, total=9.992 s; sync files=0, longest=0.000 s,
average=0.000s; distance=1198018 kB, estimate=1309967 kB 
> 2016-02-02 01:34:18.053 CET 21361 LOG:  checkpoint starting: time
> 2016-02-02 01:34:28.089 CET 21361 LOG:  checkpoint complete: wrote 120814 buffers (46.1%); 0 transaction log file(s)
added,0 removed, 73 recycled; sort=0.051 s, write=10.020 s, sync=0.000 s, total=10.036 s; sync files=0, longest=0.000
s,average=0.000 s; distance=1203691 kB, estimate=1299339 kB 
> 2016-02-02 01:34:28.090 CET 21361 LOG:  checkpoint starting: time
> 2016-02-02 01:34:39.182 CET 21361 LOG:  checkpoint complete: wrote 110411 buffers (42.1%); 0 transaction log file(s)
added,0 removed, 74 recycled; sort=0.047 s, write=9.908 s, sync=0.000 s, total=11.092 s; sync files=0, longest=0.000 s,
average=0.000s; distance=1073612 kB, estimate=1276767 kB 
>
> We wrote roughly 1/3 more wal/buffers during the first checkpoint than
> following ones (And yes, the above is with an impossibly low timeout,
> but that doesn't change anything). You can make that much more extreme
> with larget shared buffers and larger timeouts.
>
>
> I wonder if this essentially point at checkpoint_timeout being wrongly
> defined: Currently it means we'll try to finish a checkpoint
> (1-checkpoint_completion_target) * timeout before the next one - but
> perhaps it should instead be that we start checkpoint_timeout * _target
> before the next timeout? Afaics that'd work more graceful in the face of
> restarts and forced checkpoints.

There's a certain appeal to that, but at the same time it seems pretty
wonky.  Right now, you can say that a checkpoint is triggered when the
amount of WAL reaches X or the amount of time reaches Y, but with the
alternative definition it's a bit harder to explain what's going on
there.  Maybe we should do it anyway, but...

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: checkpoints after database start/immediate checkpoints

От
Andres Freund
Дата:
On 2016-02-03 09:57:00 -0500, Robert Haas wrote:
> On Mon, Feb 1, 2016 at 7:43 PM, Andres Freund <andres@anarazel.de> wrote:
> > I wonder if this essentially point at checkpoint_timeout being wrongly
> > defined: Currently it means we'll try to finish a checkpoint
> > (1-checkpoint_completion_target) * timeout before the next one - but
> > perhaps it should instead be that we start checkpoint_timeout * _target
> > before the next timeout? Afaics that'd work more graceful in the face of
> > restarts and forced checkpoints.
> 
> There's a certain appeal to that, but at the same time it seems pretty
> wonky.  Right now, you can say that a checkpoint is triggered when the
> amount of WAL reaches X or the amount of time reaches Y, but with the
> alternative definition it's a bit harder to explain what's going on
> there.

Hm, but can you, really? We *start* a checkpoint every
checkpoint_timeout, but we only finish it after
checkpoint_completion_target * timeout, or cct * segments. I find it
pretty hard to explain that we have a gap of checkpoint_timeout, where
nothing happens, after an immediate/manual checkpoint.

Defining it as: We try to *finish* a checkpoint every checkpoint_timeout
or checkpoint_segments/(max_wal_size/~3) actually seems simpler to
me. Then we just need to add that we start a checkpoint
checkpoint_completion_target before either of the above are reached.

- Andres



Re: checkpoints after database start/immediate checkpoints

От
Robert Haas
Дата:
On Thu, Feb 4, 2016 at 4:42 PM, Andres Freund <andres@anarazel.de> wrote:
> On 2016-02-03 09:57:00 -0500, Robert Haas wrote:
>> On Mon, Feb 1, 2016 at 7:43 PM, Andres Freund <andres@anarazel.de> wrote:
>> > I wonder if this essentially point at checkpoint_timeout being wrongly
>> > defined: Currently it means we'll try to finish a checkpoint
>> > (1-checkpoint_completion_target) * timeout before the next one - but
>> > perhaps it should instead be that we start checkpoint_timeout * _target
>> > before the next timeout? Afaics that'd work more graceful in the face of
>> > restarts and forced checkpoints.
>>
>> There's a certain appeal to that, but at the same time it seems pretty
>> wonky.  Right now, you can say that a checkpoint is triggered when the
>> amount of WAL reaches X or the amount of time reaches Y, but with the
>> alternative definition it's a bit harder to explain what's going on
>> there.
>
> Hm, but can you, really? We *start* a checkpoint every
> checkpoint_timeout, but we only finish it after
> checkpoint_completion_target * timeout, or cct * segments. I find it
> pretty hard to explain that we have a gap of checkpoint_timeout, where
> nothing happens, after an immediate/manual checkpoint.
>
> Defining it as: We try to *finish* a checkpoint every checkpoint_timeout
> or checkpoint_segments/(max_wal_size/~3) actually seems simpler to
> me. Then we just need to add that we start a checkpoint
> checkpoint_completion_target before either of the above are reached.

Hmm, I could go for that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company