Re: Improvement of checkpoint IO scheduler for stable transaction responses

Поиск
Список
Период
Сортировка
От james
Тема Re: Improvement of checkpoint IO scheduler for stable transaction responses
Дата
Msg-id 51E31812.1030908@mansionfamily.plus.com
обсуждение исходный текст
Ответ на Re: Improvement of checkpoint IO scheduler for stable transaction responses  (Greg Smith <greg@2ndQuadrant.com>)
Ответы Re: Improvement of checkpoint IO scheduler for stable transaction responses  (Greg Smith <greg@2ndQuadrant.com>)
Список pgsql-hackers
On 14/07/2013 20:13, Greg Smith wrote:
> The most efficient way to write things out is to delay those writes as 
> long as possible.

That doesn't smell right to me.  It might be that delaying allows more 
combining and allows the kernel to see more at once and optimise it, but 
I think the counter-argument is that it is an efficiency loss to have 
either CPU or disk idle waiting on the other.  It cannot make sense from 
a throughput point of view to have disks doing nothing and then become 
overloaded so they are a bottleneck (primarily seeking) and the CPU does 
nothing.

Now I have NOT measured behaviour but I'd observe that we see disks that 
can stream 100MB/s but do only 5% of that if they are doing random IO.  
Some random seeks during sync can't be helped, but if they are done when 
we aren't waiting for sync completion then they are in effect free.  The 
flip side is that we can't really know whether they will get merged with 
adjacent writes later so its hard to schedule them early.  But we can 
observe that if we have a bunch of writes to adjacent data then a seek 
to do the write is effectively amortised across them.

So it occurs to me that perhaps we can watch for patterns where we have 
groups of adjacent writes that might stream, and when they form we might 
schedule them to be pushed out early (if not immediately), ideally out 
as far as the drive (but not flushed from its cache) and without forcing 
all other data to be flushed too.  And perhaps we should always look to 
be getting drives dedicated to dbms to do something, even if it turns 
out to have been redundant in the end.

That's not necessarily easy on Linux without using a direct unbuffered 
IO but to me that is Linux' problem.  For a start its not the only 
target system, and having feedback 'we need' from db and mail system 
groups to the NT kernels devs hasn't hurt, and it never hurt Solaris to 
hear what Oracle and Sybase devs felt they needed either.




В списке pgsql-hackers по дате отправления:

Предыдущее
От: Noah Misch
Дата:
Сообщение: Re: Materialized views WIP patch
Следующее
От: Fabien COELHO
Дата:
Сообщение: Re: [PATCH] pgbench --throttle (submission 7 - with lag measurement)