Обсуждение: Raising the checkpoint_timeout limit

Поиск
Список
Период
Сортировка

Raising the checkpoint_timeout limit

От
Andres Freund
Дата:
Hi,

is there any reason for the rather arbitrary and low checkpoint_timeout
limit? Obviously it's not not appropriate to have a 10h timeout on all
that many systems. But if the reaction to a server crashing is going to
be a failover to one of several standbys anyway, setting the timeout to
a high value, can make sense.

A high timeout has the advantage that the total amount of full page
writes reduces and, especially if the whole system fits into s_b, that
the total amount of writes to disk is drastically reduced.

I'm not sure what'd actually be a good upper limit. I'd be inclined to
even go to as high as a week or so. A lot of our settings have
upper/lower limits that aren't a good idea in general.

I'm also wondering if it'd not make sense to raise the default timeout
to 15min or so. The upper ceiling for that really is recovery time, and
that has really shrunk rather drastically due to faster cpus and
architectural improvements in postgres (bgwriter, separate
checkpointer/bgwriter, restartpoints, ...).

Greetings,

Andres Freund



Re: Raising the checkpoint_timeout limit

От
Jim Nasby
Дата:
On 2/1/16 6:13 PM, Andres Freund wrote:
> I'm not sure what'd actually be a good upper limit. I'd be inclined to
> even go to as high as a week or so. A lot of our settings have
> upper/lower limits that aren't a good idea in general.

The only reason I can see for the 1 hour limit is to try and prevent 
footguns. I think that's a valid goal, but there should be a way to 
over-ride it. And if we don't want that kind of protection then I'd say 
just yank the upper limit.

> I'm also wondering if it'd not make sense to raise the default timeout
> to 15min or so. The upper ceiling for that really is recovery time, and
> that has really shrunk rather drastically due to faster cpus and
> architectural improvements in postgres (bgwriter, separate
> checkpointer/bgwriter, restartpoints, ...).

It would be interesting if someone had a large-ish 9.4 or 9.5 install 
that they could test recovery timing on. My suspicion is that as long as 
FPWs are on that you'd generally end up limited by how fast you could 
read WAL unless you exceeded the FS cache. (I'm assuming a BBU and that 
the FS and controller will do a nice job of ordering writes optimally so 
that you'll get performance similar to reads when it's time to fsync.)
-- 
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com



Re: Raising the checkpoint_timeout limit

От
Euler Taveira
Дата:
On 01-02-2016 21:13, Andres Freund wrote:
> is there any reason for the rather arbitrary and low checkpoint_timeout
> limit?
> 
AFAICS the only reason is to run recover quickly. This setting is the
same value since day 1.

> A high timeout has the advantage that the total amount of full page
> writes reduces and, especially if the whole system fits into s_b, that
> the total amount of writes to disk is drastically reduced.
> 
This statement could be added to documentation. Using this use case, I
want to propose raising the c_t upper limit to one day or even a week.

> I'm not sure what'd actually be a good upper limit. I'd be inclined to
> even go to as high as a week or so. A lot of our settings have
> upper/lower limits that aren't a good idea in general.
> 
A week is an insane default value. However, I'm fine with 10 until 20
minutes (those are the most common values I use for c_t).


--   Euler Taveira                   Timbira - http://www.timbira.com.br/  PostgreSQL: Consultoria, Desenvolvimento,
Suporte24x7 e Treinamento
 



Re: Raising the checkpoint_timeout limit

От
Noah Misch
Дата:
On Tue, Feb 02, 2016 at 01:13:20AM +0100, Andres Freund wrote:
> is there any reason for the rather arbitrary and low checkpoint_timeout
> limit?

Not that I know, and it is inconvenient.

> I'm not sure what'd actually be a good upper limit. I'd be inclined to
> even go to as high as a week or so. A lot of our settings have
> upper/lower limits that aren't a good idea in general.

In general, I favor having limits reflect fundamental system limitations
rather than paternalism.  Therefore, I would allow INT_MAX (68 years).

> I'm also wondering if it'd not make sense to raise the default timeout
> to 15min or so. The upper ceiling for that really is recovery time, and
> that has really shrunk rather drastically due to faster cpus and
> architectural improvements in postgres (bgwriter, separate
> checkpointer/bgwriter, restartpoints, ...).

Have those recovery improvements outpaced the increases in max recovery time
from higher core counts generating more WAL per minute?

nm



Re: Raising the checkpoint_timeout limit

От
Michael Paquier
Дата:
On Tue, Feb 2, 2016 at 1:16 PM, Noah Misch <noah@leadboat.com> wrote:
> On Tue, Feb 02, 2016 at 01:13:20AM +0100, Andres Freund wrote:
>> is there any reason for the rather arbitrary and low checkpoint_timeout
>> limit?
>
> Not that I know, and it is inconvenient.
>
>> I'm not sure what'd actually be a good upper limit. I'd be inclined to
>> even go to as high as a week or so. A lot of our settings have
>> upper/lower limits that aren't a good idea in general.
>
> In general, I favor having limits reflect fundamental system limitations
> rather than paternalism.  Therefore, I would allow INT_MAX (68 years).

+1. This way users can play as they wish.

>> I'm also wondering if it'd not make sense to raise the default timeout
>> to 15min or so. The upper ceiling for that really is recovery time, and
>> that has really shrunk rather drastically due to faster cpus and
>> architectural improvements in postgres (bgwriter, separate
>> checkpointer/bgwriter, restartpoints, ...).
>
> Have those recovery improvements outpaced the increases in max recovery time
> from higher core counts generating more WAL per minute?

Perhaps having some numbers showing that the architecture improvements
of Postgres really matter at constant checkpoint_segments for pgbench
load would help to get into a better default value. I would tend to
agree that as things speed up it would make sense to increase this
value a bit though, even if Postgres is usually conservative enough in
default parameters aimed at low-spec machines.
-- 
Michael



Re: Raising the checkpoint_timeout limit

От
Simon Riggs
Дата:
On 2 February 2016 at 05:54, Michael Paquier <michael.paquier@gmail.com> wrote:
On Tue, Feb 2, 2016 at 1:16 PM, Noah Misch <noah@leadboat.com> wrote:
> On Tue, Feb 02, 2016 at 01:13:20AM +0100, Andres Freund wrote:
>> is there any reason for the rather arbitrary and low checkpoint_timeout
>> limit?
>
> Not that I know, and it is inconvenient.
>
>> I'm not sure what'd actually be a good upper limit. I'd be inclined to
>> even go to as high as a week or so. A lot of our settings have
>> upper/lower limits that aren't a good idea in general.
>
> In general, I favor having limits reflect fundamental system limitations
> rather than paternalism.  Therefore, I would allow INT_MAX (68 years).

+1. This way users can play as they wish.

If people wish to turn off crash recovery, they can already set fsync=off. I don't wish to see us support a setting that causes problems for people that don't understand what checkpoints are and why everybody needs them.

The current code needs to act differently with regard to very low settings, so when we are a small number of blocks remaining we don't spend hours performing them. Allowing very large values would make that even more strange.

I would put a limit of 100,000 seconds = 27 hours.

Some systems offer a recovery_time_objective setting that is used to control how frequently checkpoints occur. That might be a more usable approach.

--
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Raising the checkpoint_timeout limit

От
Andres Freund
Дата:
On 2016-02-01 23:16:16 -0500, Noah Misch wrote:
> On Tue, Feb 02, 2016 at 01:13:20AM +0100, Andres Freund wrote:
> > I'm not sure what'd actually be a good upper limit. I'd be inclined to
> > even go to as high as a week or so. A lot of our settings have
> > upper/lower limits that aren't a good idea in general.
> 
> In general, I favor having limits reflect fundamental system limitations
> rather than paternalism.  Therefore, I would allow INT_MAX (68 years).

I generally incree with that attitude - I'm disinclined to go just that
high though. Going close to INT_MAX means having to care about overflow
in trivial computations, in a scenario we're unlikely to ever
test. Sure, we can use a debugger to adjust time or accellerate time
progress, but that's all unrealistic if we're honest.  So maybe go with
a year?

> > I'm also wondering if it'd not make sense to raise the default timeout
> > to 15min or so. The upper ceiling for that really is recovery time, and
> > that has really shrunk rather drastically due to faster cpus and
> > architectural improvements in postgres (bgwriter, separate
> > checkpointer/bgwriter, restartpoints, ...).
> 
> Have those recovery improvements outpaced the increases in max recovery time
> from higher core counts generating more WAL per minute?

Mostly yes, imo. I think the biggest problem with max recovery time is
in workloads that are considerably bigger than shared buffers: There the
single threaded, synchronously reading, startup process (without the
benefit of FPWs filling up pages), has to compete with a lot of
processes having higher IO throughput, because of multiple processes, at
the same time.  But even that has considerably improved due to SSDs.

Andres



Re: Raising the checkpoint_timeout limit

От
Andres Freund
Дата:
On 2016-02-02 11:37:15 +0100, Simon Riggs wrote:
> If people wish to turn off crash recovery, they can already set fsync=off.
> I don't wish to see us support a setting that causes problems for people
> that don't understand what checkpoints are and why everybody needs them.

I don't think fsync=off and very long checkpoints are really
comparable. Many large modern machines, especially with directly
attached storage and/or large amounts of memory, take a *long* while to
boot. So any outage will be dealth with a failover anyway.  But at the
same time, a database in the 10TB+ range can't easily be copied again.
Thus running with fsync=off isn't something that you'd want in those
scenarios - it'd prevent the previous master/other standbys from failing
back/catching up; the databases could be arbitrarily corrupted after
all.

Additionally a significant portion of the cost of checkpoints are full
page writes - you easily can get into the situation where you have
~20MB/sec normal WAL without FPWs, but with them 300MB/s. That rate is
rather expensive, regardless fsync=off.

> The current code needs to act differently with regard to very low settings,
> so when we are a small number of blocks remaining we don't spend hours
> performing them. Allowing very large values would make that even more
> strange.

Why is that a good thing? Every checkpoint triggers a new round of full
page writes. I don't see why you want to accellerate a checkpoint, just
because there's few writes remaining? Yes, the current code partially
behaves that way, but that's imo more an implementation artifact or even
a bug.

> Some systems offer a recovery_time_objective setting that is used to
> control how frequently checkpoints occur. That might be a more usable
> approach.

While desirable, I have no idea to realistically calculate that :(. It's
also a lot bigger than just adjusting a pointlessly low GUC limit.


Regards,

Andres



Re: Raising the checkpoint_timeout limit

От
Noah Misch
Дата:
On Tue, Feb 02, 2016 at 12:24:50PM +0100, Andres Freund wrote:
> On 2016-02-01 23:16:16 -0500, Noah Misch wrote:
> > On Tue, Feb 02, 2016 at 01:13:20AM +0100, Andres Freund wrote:
> > > I'm not sure what'd actually be a good upper limit. I'd be inclined to
> > > even go to as high as a week or so. A lot of our settings have
> > > upper/lower limits that aren't a good idea in general.
> > 
> > In general, I favor having limits reflect fundamental system limitations
> > rather than paternalism.  Therefore, I would allow INT_MAX (68 years).
> 
> I generally incree with that attitude - I'm disinclined to go just that
> high though. Going close to INT_MAX means having to care about overflow
> in trivial computations, in a scenario we're unlikely to ever
> test. Sure, we can use a debugger to adjust time or accellerate time
> progress, but that's all unrealistic if we're honest.  So maybe go with
> a year?

Okay.



Re: Raising the checkpoint_timeout limit

От
Robert Haas
Дата:
On Tue, Feb 2, 2016 at 8:09 PM, Noah Misch <noah@leadboat.com> wrote:
> On Tue, Feb 02, 2016 at 12:24:50PM +0100, Andres Freund wrote:
>> On 2016-02-01 23:16:16 -0500, Noah Misch wrote:
>> > On Tue, Feb 02, 2016 at 01:13:20AM +0100, Andres Freund wrote:
>> > > I'm not sure what'd actually be a good upper limit. I'd be inclined to
>> > > even go to as high as a week or so. A lot of our settings have
>> > > upper/lower limits that aren't a good idea in general.
>> >
>> > In general, I favor having limits reflect fundamental system limitations
>> > rather than paternalism.  Therefore, I would allow INT_MAX (68 years).
>>
>> I generally incree with that attitude - I'm disinclined to go just that
>> high though. Going close to INT_MAX means having to care about overflow
>> in trivial computations, in a scenario we're unlikely to ever
>> test. Sure, we can use a debugger to adjust time or accellerate time
>> progress, but that's all unrealistic if we're honest.  So maybe go with
>> a year?
>
> Okay.

Sounds good to me, too.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Raising the checkpoint_timeout limit

От
Tom Lane
Дата:
Noah Misch <noah@leadboat.com> writes:
> On Tue, Feb 02, 2016 at 12:24:50PM +0100, Andres Freund wrote:
>> On 2016-02-01 23:16:16 -0500, Noah Misch wrote:
>>> In general, I favor having limits reflect fundamental system limitations
>>> rather than paternalism.  Therefore, I would allow INT_MAX (68 years).

>> I generally incree with that attitude - I'm disinclined to go just that
>> high though. Going close to INT_MAX means having to care about overflow
>> in trivial computations, in a scenario we're unlikely to ever
>> test. Sure, we can use a debugger to adjust time or accellerate time
>> progress, but that's all unrealistic if we're honest.  So maybe go with
>> a year?

> Okay.

I've gotta go with the "paternalism" side of the argument here.  Suppose
you configure your system to checkpoint once a year --- what is going to
happen when the year is up?  Or when you try to shut it down?  You *will*
regret such a setting.

I don't think we should allow the checkpoint distances to be so large that
checkpoints don't happen in the normal course of events.  I'd be okay with
the max being a day, perhaps.
        regards, tom lane



Re: Raising the checkpoint_timeout limit

От
Robert Haas
Дата:
On Tue, Feb 2, 2016 at 10:58 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I've gotta go with the "paternalism" side of the argument here.  Suppose
> you configure your system to checkpoint once a year --- what is going to
> happen when the year is up?  Or when you try to shut it down?  You *will*
> regret such a setting.
>
> I don't think we should allow the checkpoint distances to be so large that
> checkpoints don't happen in the normal course of events.  I'd be okay with
> the max being a day, perhaps.

If smart people[1] want to set checkpoint_timeout to a value higher
than 1 day[2], then I think we should let them.

I think what will happen if you set checkpoint_timeout to 1 year is
that you will checkpoint solely based on WAL volume, which does not
seem like a manifestly unreasonable thing to want.  It's true that if
you set BOTH max_wal_size AND checkpoint_timeout to $WAYTOOBIG then
something bad might happen to you, but even such configurations are
actually not totally crazy: for example, you could ingest data into a
temporary PostgreSQL instance and then do logical replication from
there to another cluster for permanent storage.  You don't really need
recovery or shutdown to happen in the lifetime of the cluster, so no
harm, no foul.  Now, you could also set such configuration settings in
a situation where it will not work out well.  But that is true of most
configuration settings.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

[1] like Andres
[2] see original post



Re: Raising the checkpoint_timeout limit

От
David Steele
Дата:
On 2/2/16 11:10 PM, Robert Haas wrote:
> On Tue, Feb 2, 2016 at 10:58 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I've gotta go with the "paternalism" side of the argument here.  Suppose
>> you configure your system to checkpoint once a year --- what is going to
>> happen when the year is up?  Or when you try to shut it down?  You *will*
>> regret such a setting.
>>
>> I don't think we should allow the checkpoint distances to be so large that
>> checkpoints don't happen in the normal course of events.  I'd be okay with
>> the max being a day, perhaps.
>
> If smart people[1] want to set checkpoint_timeout to a value higher
> than 1 day[2], then I think we should let them.

Agreed - I have a specific instance where I would prefer the daily
backups or checkpoint segments to be the primary source of checkpoints
with checkpoint_timeout set to 36 hours and used only as a fallback.

A limit of 1 day would be lower than I like though still better than
what I have now, 1 hour.

For this case I would probably configure:

checkpoint_segments = 256
checkpoint_timeout = 36h

So the daily backups would generally trigger the checkpoint unless there
was unusually high volume in which case it would be checkpoint segments.Finally, if the backups weren't working and
volumewas normal then 
checkpoint_timeout would come along to save the day.

In this case its all about reducing full-page writes on a medium-sized
system with heavy churn in order to make a long backup/archive retention
more cost-effective.  Recovery time is not as much of an issue for this
application.

--
-David
david@pgmasters.net


Re: Raising the checkpoint_timeout limit

От
Jim Nasby
Дата:
On 2/2/16 10:10 PM, Robert Haas wrote:
> Now, you could also set such configuration settings in
> a situation where it will not work out well.  But that is true of most
> configuration settings.

Yeah, if we're going to start playing parent then I think the first 
thing to do is remove the fsync GUC. The AWS team has done testing that 
shows it to be worthless from a performance standpoint now that we have 
synchronous commit, and it's an extremely large foot-bazooka to have 
laying around.
-- 
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com



Re: Raising the checkpoint_timeout limit

От
Andres Freund
Дата:
On 2016-02-03 15:07:12 -0600, Jim Nasby wrote:
> On 2/2/16 10:10 PM, Robert Haas wrote:
> >Now, you could also set such configuration settings in
> >a situation where it will not work out well.  But that is true of most
> >configuration settings.

By that argument we should probably raise the lower limit of a bunch of
parameters :P

> Yeah, if we're going to start playing parent then I think the first thing to
> do is remove the fsync GUC. The AWS team has done testing that shows it to
> be worthless from a performance standpoint now that we have synchronous
> commit, and it's an extremely large foot-bazooka to have laying around.

Meh, I don't buy that. There are workloads where that's the case, but
also ones were it's not true. Try e.g. 2PC. And yes, there's definitely
cases where 2PC makes sense, even if you don't need durability on a
local basis.

Andres



Re: Raising the checkpoint_timeout limit

От
Peter Geoghegan
Дата:
On Mon, Feb 1, 2016 at 8:16 PM, Noah Misch <noah@leadboat.com> wrote:
>> I'm not sure what'd actually be a good upper limit. I'd be inclined to
>> even go to as high as a week or so. A lot of our settings have
>> upper/lower limits that aren't a good idea in general.
>
> In general, I favor having limits reflect fundamental system limitations
> rather than paternalism.  Therefore, I would allow INT_MAX (68 years).

I agree. I'm in favor of having things be what is sometimes called
foolproof, but I think that you can only take that so far, and it's
mostly a matter of guiding a more or less reasonable user in the right
direction. Making it easy to do the right thing and hard to do the
wrong thing.

I don't think you can effectively design anything around a user that
makes perversely bad decision at every turn. If you investigate why a
user made a bad decision, there will usually be a chain of faulty but
not outrageous reasoning behind it.

-- 
Peter Geoghegan