Re: Redesigning checkpoint_segments

Поиск
Список
Период
Сортировка
От Heikki Linnakangas
Тема Re: Redesigning checkpoint_segments
Дата
Msg-id 51B05EFE.6060108@vmware.com
обсуждение исходный текст
Ответ на Re: Redesigning checkpoint_segments  ("Joshua D. Drake" <jd@commandprompt.com>)
Список pgsql-hackers
On 06.06.2013 11:42, Joshua D. Drake wrote:
> On 6/6/2013 1:11 AM, Heikki Linnakangas wrote:
>>> Yes checkpoint_segments is awkward. We shouldn't have to set it at all.
>>> It should be gone.
>>
>> The point of having checkpoint_segments or max_wal_size is to put a
>> limit (albeit a soft one) on the amount of disk space used. If you
>> don't care about that, I guess we could allow max_wal_size=-1 to mean
>> infinite, and checkpoints would be driven off purely based on time,
>> not WAL consumption.
>
> I would not only agree with that, I would argue that max_wal_size
> doesn't need to be there at least as a default. Perhaps as an "advanced"
> configuration option that only those in the know see.

Well, we have checkpoint_segments=3 as the default currently, which in 
the proposed scheme would be about equal to max_wal_size=120MB. For 
better or worse, our defaults are generally geared towards small 
systems, and that sounds about right for that.

>>> Basically we start with X amount perhaps to be set at
>>> initdb time. That X amount changes dynamically based on the amount of
>>> data being written. In order to not suffer from recycling and creation
>>> penalties we always keep X+N where N is enough to keep up with new data.
>>
>> To clarify, here you're referring to controlling the number of WAL
>> segments preallocated/recycled, rather than how often checkpoints are
>> triggered. Currently, both are derived from checkpoint_segments, but I
>> proposed to separate them. The above is exactly what I proposed to do
>> for the preallocation/recycling, it would be tuned automatically, but
>> you still need something like max_wal_size for the other thing, to
>> trigger a checkpoint if too much WAL is being consumed.
>
> You think so? I agree with 90% of this paragraph but it seems to me that
> we can find an algortihm that manages this without the idea of
> max_wal_size (at least as a user settable).

We are in a violent agreement :-). max_wal_size would not directly 
affect the preallocation of segments. The preallocation would be driven 
off the actual number of segments used in previous checkpoint cycles, 
not on max_wal_size.

Now, max_wal_size would affect when checkpoints happen (ie. if you're 
about to reach max_wal_size, a checkpoint would be triggered), which 
would in turn affect the number of segments used between cycles. But 
there would be no direct connection between the two; the code to 
calculate how much to preallocate would not refer to max_wal_size.

Maybe max_wal_size should set an upper limit on how much to preallocate, 
though. If you want to limit the WAL size, we probably shouldn't exceed 
it on purpose by preallocating segments, even if the algorithm based on 
previous cycles suggests says we should. This situation would arise if 
the checkpoints can't keep up, so that each checkpoint cycle is longer 
than we'd want, and we'd exceed max_wal_size because of that.

>>> This makes sense except I don't see a need for the parameter. Why not
>>> just specify how the algorithm works and adhere to that without the need
>>> for another GUC?
>>
>> Because you want to limit the amount of disk space used for WAL. It's
>> a soft limit, but still.
>
> Why? This is the point that confuses me. Why do we care? We don't care
> how much disk space PGDATA takes... why do we all of a sudden care about
> pg_xlog?

Hmm, dunno. We always have had checkpoint_segments setting to limit 
that, I was just thinking of retaining that functionality.

A few reasons spring to mind: First, running out of WAL space leads to a 
PANIC, which is not nice (I know, we talked about fixing that). 
Secondly, because we can. If a user inserts 10 GB of data into a table, 
we'll have to just store it, but with WAL, we can always issue a 
checkpoint to shrink it. People have asked for quotas for user data too, 
so some people do want to limit disk usage.

Mind you, it's possible to have a tiny database with a high TPS rate, 
such that the WAL grows really big compared to the size of the user 
data. Something with a small hot table that's updated a lot. In such a 
scenario, limiting the WAL size make sense, and it won't affect 
performance much either because checkpointing a small database is very 
cheap.

- Heikki



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Dmitriy Igrishin
Дата:
Сообщение: Re: About large objects asynchronous and non-blocking support
Следующее
От: Heikki Linnakangas
Дата:
Сообщение: Re: Redesigning checkpoint_segments