On Tuesday, July 16, 2013 10:16 PM Ants Aasma wrote:
> On Jul 14, 2013 9:46 PM, "Greg Smith" <greg@2ndquadrant.com> wrote:
> > I updated and re-reviewed that in 2011:
> http://www.postgresql.org/message-id/4D31AE64.3000202@2ndquadrant.com
> and commented on why I think the improvement was difficult to reproduce
> back then. The improvement didn't follow for me either. It would take
> a really amazing bit of data to get me to believe write sorting code is
> worthwhile after that. On large systems capable of dirtying enough
> blocks to cause a problem, the operating system and RAID controllers
> are already sorting block. And *that* sorting is also considering
> concurrent read requests, which are a lot more important to an
> efficient schedule than anything the checkpoint process knows about.
> The database doesn't have nearly enough information yet to compete
> against OS level sorting.
>
> That reasoning makes no sense. OS level sorting can only see the
> writes in the time window between PostgreSQL write, and being forced
> to disk. Spread checkpoints sprinkles the writes out over a long
> period and the general tuning advice is to heavily bound the amount of
> memory the OS willing to keep dirty. This makes probability of
> scheduling adjacent writes together quite low, the merging window
> being limited either by dirty_bytes or dirty_expire_centisecs. The
> checkpointer has the best long term overview of the situation here, OS
> scheduling only has the short term view of outstanding read and write
> requests. By sorting checkpoint writes it is much more likely that
> adjacent blocks are visible to OS writeback at the same time and will
> be issued together.
I think Oracle also use similar concept for making writes efficient, and
they have patent also for this technology which you can find at below link:
http://www.google.com/patents/US7194589?dq=645987&hl=en&sa=X&ei=kn7mUZ-PIsWq
rAe99oDgBw&sqi=2&pjf=1&ved=0CEcQ6AEwAw
Although Oracle has different concept for performing checkpoint writes, but
I thought of sharing the above link with you, so that unknowingly we should
not go into wrong path.
AFAIK instead of depending on OS buffers, they use direct I/O and infact in
the patent above they are using temporary buffer (Claim 3) to sort the
writes which is not the same idea as far as I can understand by reading
above thread.
With Regards,
Amit Kapila.