Re: Sorted writes in checkpoint

Поиск
Список
Период
Сортировка
От Gregory Maxwell
Тема Re: Sorted writes in checkpoint
Дата
Msg-id e692861c0706141937p47212c5y4ac6177ecd086430@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Sorted writes in checkpoint  ("Simon Riggs" <simon@2ndquadrant.com>)
Ответы Re: Sorted writes in checkpoint  (Greg Smith <gsmith@gregsmith.com>)
Список pgsql-hackers
On 6/14/07, Simon Riggs <simon@2ndquadrant.com> wrote:
> On Thu, 2007-06-14 at 16:39 +0900, ITAGAKI Takahiro wrote:
> > Greg Smith <gsmith@gregsmith.com> wrote:
> >
> > > On Mon, 11 Jun 2007, ITAGAKI Takahiro wrote:
> > > > If the kernel can treat sequential writes better than random writes, is
> > > > it worth sorting dirty buffers in block order per file at the start of
> > > > checkpoints?
> >
> > I wrote and tested the attached sorted-writes patch base on Heikki's
> > ldc-justwrites-1.patch. There was obvious performance win on OLTP workload.
> >
> >   tests                    | pgbench | DBT-2 response time (avg/90%/max)
> > ---------------------------+---------+-----------------------------------
> >  LDC only                  | 181 tps | 1.12 / 4.38 / 12.13 s
> >  + BM_CHECKPOINT_NEEDED(*) | 187 tps | 0.83 / 2.68 /  9.26 s
> >  + Sorted writes           | 224 tps | 0.36 / 0.80 /  8.11 s
> >
> > (*) Don't write buffers that were dirtied after starting the checkpoint.
> >
> > machine : 2GB-ram, SCSI*4 RAID-5
> > pgbench : -s400 -t40000 -c10  (about 5GB of database)
> > DBT-2   : 60WH (about 6GB of database)
>
> I'm very surprised by the BM_CHECKPOINT_NEEDED results. What percentage
> of writes has been saved by doing that? We would expect a small
> percentage of blocks only and so that shouldn't make a significant
> difference. I thought we discussed this before, about a year ago. It
> would be easy to get that wrong and to avoid writing a block that had
> been re-dirtied after the start of checkpoint, but was already dirty
> beforehand. How long was the write phase of the checkpoint, how long
> between checkpoints?
>
> I can see the sorted writes having an effect because the OS may not
> receive blocks within a sufficient time window to fully optimise them.
> That effect would grow with increasing sizes of shared_buffers and
> decrease with size of controller cache. How big was the shared buffers
> setting? What OS scheduler are you using? The effect would be greatest
> when using Deadline.

Linux has some instrumentation that might be useful for this testing,

echo 1 > /proc/sys/vm/block_dump
Will have the kernel log all physical IO (disable syslog writing to
disk before turning it on if you don't want the system to blow up).

Certainly the OS elevator should be working well enough to not see
that much of an improvement. Perhaps frequent fsync behavior is having
unintended interaction with the elevator?  ... It might be worthwhile
to contact some Linux kernel developers and see if there is some
misunderstanding.


В списке pgsql-hackers по дате отправления:

Предыдущее
От: mark@mark.mielke.cc
Дата:
Сообщение: Re: Change sort order on UUIDs?
Следующее
От: Tom Lane
Дата:
Сообщение: Re: Tsearch vs Snowball, or what's a source file?