Re: [Lsf-pc] Linux kernel impact on PostgreSQL performance

Поиск
Список
Период
Сортировка
От Gregory Smith
Тема Re: [Lsf-pc] Linux kernel impact on PostgreSQL performance
Дата
Msg-id 52D99161.60305@gmail.com
обсуждение исходный текст
Ответ на Re: [Lsf-pc] Linux kernel impact on PostgreSQL performance  (Mel Gorman <mgorman@suse.de>)
Ответы Re: [Lsf-pc] Linux kernel impact on PostgreSQL performance  (Mel Gorman <mgorman@suse.de>)
Re: [Lsf-pc] Linux kernel impact on PostgreSQL performance  (Jim Nasby <jim@nasby.net>)
Список pgsql-hackers
On 1/17/14 10:37 AM, Mel Gorman wrote:
> There is not an easy way to tell. To be 100%, it would require an 
> instrumentation patch or a systemtap script to detect when a 
> particular page is being written back and track the context. There are 
> approximations though. Monitor nr_dirty pages over time.

I have a benchmarking wrapper for the pgbench testing program called 
pgbench-tools:  https://github.com/gregs1104/pgbench-tools  As of 
October, on Linux it now plots the "Dirty" value from /proc/meminfo over 
time.  You get that on the same time axis as the transaction latency 
data.  The report at the end includes things like the maximum amount of 
dirty memory observed during the test sampling. That doesn't tell you 
exactly what's happening to the level someone reworking the kernel logic 
might want, but you can easily see things like the database's checkpoint 
cycle reflected by watching the dirty memory total.  This works really 
well for monitoring production servers too.  I have a lot of data from a 
plugin for the Munin monitoring system that plots the same way.  Once 
you have some history about what's normal, it's easy to see when systems 
fall behind in a way that's ruining writes, and the high water mark 
often correlates with bad responsiveness periods.

Another recent change is that pgbench for the upcoming PostgreSQL 9.4 
now allows you to specify a target transaction rate.  Seeing the write 
latency behavior with that in place is far more interesting than 
anything we were able to watch with pgbench before.  The pgbench write 
tests we've been doing for years mainly told you the throughput rate 
when all of the caches were always as full as the database could make 
them, and tuning for that is not very useful. Turns out it's far more 
interesting to run at 50% of what the storage is capable of, then watch 
what happens to latency when you adjust things like the dirty_* parameters.

I've been working on the problem of how we can make a benchmark test 
case that acts enough like real busy PostgreSQL servers that we can 
share it with kernel developers, and then everyone has an objective way 
to measure changes.  These rate limited tests are working much better 
for that than anything I came up with before.

I am skeptical that the database will take over very much of this work 
and perform better than the Linux kernel does.  My take is that our most 
useful role would be providing test cases kernel developers can add to a 
performance regression suite.  Ugly "we never though that would happen" 
situations seems at the root of many of the kernel performance 
regressions people here get nailed by.

Effective I/O scheduling is very hard, and we are unlikely to ever out 
innovate the kernel hacking community by pulling more of that into the 
database.  It's already possible to experiment with moving in that 
direction with tuning changes.  Use a larger database shared_buffers 
value, tweak checkpoints to spread I/O out, and reduce things like 
dirty_ratio.  I do some of that, but I've learned it's dangerous to 
wander too far that way.

If instead you let Linux do even more work--give it a lot of memory to 
manage and room to re-order I/O--that can work out quite well. For 
example, I've seen a lot of people try to keep latency down by using the 
deadline scheduler and very low settings for the expire times.  Theory 
is great, but it never works out in the real world for me though.  
Here's the sort of deadline I deploy instead now:
    echo 500      > ${DEV}/queue/iosched/read_expire    echo 300000   > ${DEV}/queue/iosched/write_expire    echo
1048576 > ${DEV}/queue/iosched/writes_starved
 

These numbers look insane compared to the defaults, but I assure you 
they're from a server that's happily chugging through 5 to 10K 
transactions/second around the clock.  PostgreSQL forces writes out with 
fsync when they must go out, but this sort of tuning is basically giving 
up on it managing writes beyond that.  We really have no idea what order 
they should go out in.  I just let the kernel have a large pile of work 
queued up, and trust things like the kernel's block elevator and 
congestion code are smarter than the database can possibly be.

-- 
Greg Smith greg.smith@crunchydatasolutions.com
Chief PostgreSQL Evangelist - http://crunchydatasolutions.com/



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Alvaro Herrera
Дата:
Сообщение: Re: currawong is not a happy animal
Следующее
От: Tom Lane
Дата:
Сообщение: Re: [PATCH] pgcrypto: implement gen_random_uuid