Re: Partitioned checkpointing

Поиск
Список
Период
Сортировка
От Tomas Vondra
Тема Re: Partitioned checkpointing
Дата
Msg-id 55F2F946.30403@2ndquadrant.com
обсуждение исходный текст
Ответ на Re: Partitioned checkpointing  (Simon Riggs <simon@2ndQuadrant.com>)
Ответы Re: Partitioned checkpointing  (Andres Freund <andres@anarazel.de>)
Список pgsql-hackers

On 09/11/2015 03:56 PM, Simon Riggs wrote:
>
> The idea to do a partial pass through shared buffers and only write a
> fraction of dirty buffers, then fsync them is a good one.
>
> The key point is that we spread out the fsyncs across the whole
> checkpoint period.

I doubt that's really what we want to do, as it defeats one of the 
purposes of spread checkpoints. With spread checkpoints, we write the 
data to the page cache, and then let the OS to actually write the data 
to the disk. This is handled by the kernel, which marks the data as 
expired after some time (say, 30 seconds) and then flushes them to disk.

The goal is to have everything already written to disk when we call 
fsync at the beginning of the next checkpoint, so that the fsync are 
cheap and don't cause I/O issues.

What you propose (spreading the fsyncs) significantly changes that, 
because it minimizes the amount of time the OS has for writing the data 
to disk in the background to 1/N. That's a significant change, and I'd 
bet it's for the worse.

>
> I think we should be writing out all buffers for a particular file
> in one pass, then issue one fsync per file. >1 fsyncs per file seems
> a bad idea.
>
> So we'd need logic like this
> 1. Run through shared buffers and analyze the files contained in there
> 2. Assign files to one of N batches so we can make N roughly equal sized
> mini-checkpoints
> 3. Make N passes through shared buffers, writing out files assigned to
> each batch as we go

What I think might work better is actually keeping the write/fsync 
phases we have now, but instead of postponing the fsyncs until the next 
checkpoint we might spread them after the writes. So with target=0.5 
we'd do the writes in the first half, then the fsyncs in the other half. 
Of course, we should sort the data like you propose, and issue the 
fsyncs in the same order (so that the OS has time to write them to the 
devices).

I wonder how much the original paper (written in 1996) is effectively 
obsoleted by spread checkpoints, but the benchmark results posted by 
Horikawa-san suggest there's a possible gain. But perhaps partitioning 
the checkpoints is not the best approach?

regards

-- 
Tomas Vondra                   http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Robert Haas
Дата:
Сообщение: Re: Speed up Clog Access by increasing CLOG buffers
Следующее
От: Teodor Sigaev
Дата:
Сообщение: Review: check existency of table for -t option (pg_dump) when pattern...