Re: checkpoint writeback via sync_file_range

Поиск
Список
Период
Сортировка
От Simon Riggs
Тема Re: checkpoint writeback via sync_file_range
Дата
Msg-id CA+U5nM+2DwGECG3O0BAihqH8eEhegk-W9kp2Co4yh2u1o4iGBA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: checkpoint writeback via sync_file_range  (Greg Smith <greg@2ndQuadrant.com>)
Ответы Re: checkpoint writeback via sync_file_range  (Simon Riggs <simon@2ndQuadrant.com>)
Re: checkpoint writeback via sync_file_range  (Andres Freund <andres@anarazel.de>)
Список pgsql-hackers
On Wed, Jan 11, 2012 at 4:38 AM, Greg Smith <greg@2ndquadrant.com> wrote:
> On 1/10/12 9:14 PM, Robert Haas wrote:
>>
>> Based on that, I whipped up the attached patch, which,
>> if sync_file_range is available, simply iterates through everything
>> that will eventually be fsync'd before beginning the write phase and
>> tells the Linux kernel to put them all under write-out.
>
>
> I hadn't really thought of using it that way.  The kernel expects that when
> this is called the normal way, you're going to track exactly which segments
> you want it to sync.  And that data isn't really passed through the fsync
> absorption code yet; the list of things to fsync has already lost that level
> of detail.
>
> What you're doing here doesn't care though, and I hadn't considered that
> SYNC_FILE_RANGE_WRITE could be used that way on my last pass through its
> docs.  Used this way, it's basically fsync without the wait or guarantee; it
> just tries to push what's already dirty further ahead of the write queue
> than those writes would otherwise be.

I don't think this will help at all, I think it will just make things worse.

The problem comes from hammering the fsyncs one after the other. What
this patch does is initiate all of the fsyncs at the same time, so it
will max out the disks even more because this will hit all disks all
at once.

It does open the door to various other uses, so I think this work will
be useful.


> One idea I was thinking about here was building a little hash table inside
> of the fsync absorb code, tracking how many absorb operations have happened
> for whatever the most popular relation files are.  The idea is that we might
> say "use sync_file_range every time <N> calls for a relation have come in",
> just to keep from ever accumulating too many writes to any one file before
> trying to nudge some of it out of there. The bat that keeps hitting me in
> the head here is that right now, a single fsync might have a full 1GB of
> writes to flush out, perhaps because it extended a table and then write more
> than that to it.  And in everything but a SSD or giant SAN cache situation,
> 1GB of I/O is just too much to fsync at a time without the OS choking a
> little on it.

A better idea. Seems like it should be easy enough to keep a counter.

I see some other uses around large writes also.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Simon Riggs
Дата:
Сообщение: Re: Sending notifications from the master to the standby
Следующее
От: Florian Weimer
Дата:
Сообщение: Re: checkpoint writeback via sync_file_range