Re: Re: significant performance hit whenever autovacuum runs after upgrading from 9.0 -> 9.1
От | Gavin Flower |
---|---|
Тема | Re: Re: significant performance hit whenever autovacuum runs after upgrading from 9.0 -> 9.1 |
Дата | |
Msg-id | 4FBD3C4C.2010409@archidevsys.co.nz обсуждение исходный текст |
Ответ на | Re: Re: significant performance hit whenever autovacuum runs after upgrading from 9.0 -> 9.1 (Lonni J Friedman <netllama@gmail.com>) |
Ответы |
Re: Re: significant performance hit whenever autovacuum
runs after upgrading from 9.0 -> 9.1
(Lonni J Friedman <netllama@gmail.com>)
|
Список | pgsql-general |
On Wed, May 23, 2012 at 9:37 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:Lonni J Friedman <netllama@gmail.com> writes:After banging my head on the wall for a long time, I happened to notice that khugepaged was consuming 100% CPU every time autovacuum was running. I did: echo "madvise" > /sys/kernel/mm/transparent_hugepage/defrag and immediately the entire problem went away.Fascinating.In hindsight, sure. Before that, it was 2 days of horror.So this looks like a nasty Fedora16 kernel bug to me, or maybe postgresql & Fedora16's default kernel settings are just not compatible?I agree, kernel bug. What kernel version are you using exactly?I'm using the stock 3.3.5-2.fc16.x86_64 kernel that is in Fedora updates.Is anyone else using Fedora16 & PostgreSQL-9.1 ?I use an F16 box daily, but can't claim to have done major performance testing with it. Can you put together a summary of your nondefault Postgres settings? I wonder whether it only kicks in for a certain size of shared memory for instance.Oh yea, I'm quite certain that this is somehow related to my setup, and not a generic problem with all F16/pgsql systems. For starters, this problem isn't happening on any of the 3 standby systems, which are all otherwise identical to the master in every respect. Also when we had done some testing (prior to the upgrades), we never ran into any of these problems. However our test environment was on smaller scale hardware, with a much smaller number of clients (and overall load). Here are the non default settings in postgresql.conf : wal_level = hot_standby archive_mode = on archive_timeout = 61 max_wal_senders = 10 wal_keep_segments = 5000 hot_standby = on log_autovacuum_min_duration = 2500 autovacuum_max_workers = 4 maintenance_work_mem = 1GB checkpoint_completion_target = 0.7 effective_cache_size = 88GB work_mem = 576MB wal_buffers = 16MB checkpoint_segments = 64 shared_buffers = 8GB max_connections = 350 Let me know if you have any other questions. I'd be happy to provide as much information as possible if it can aid in fixing this bug.
I think they will need details of things like: RAM, number/type processors, number & type
of disks, disk controllers & any other hardware specs that might be relevant etc.- at very
least: total RAM & number of spindles
Also anything else running on the box.
Plus transaction load pattern - over time and read/write ratios.
type/nature of queries
size of heavily accessed tables and their indexes
Apologies, if you have already supplied the above!
Take the above with a grain of salt, I have read about performance issues in databases,
but never had to chase issues myself. Apart from once where I reduced the query time
of one particular query down from 30+ minutes to about 10 seconds PROGRESS
4GL/Oracle db (not relevant here)!
Cheers,
Gavin
(who has the luxury of designing a very sophisticated db with very low transaction rates,
so doesn't need to worry about performance issues)
В списке pgsql-general по дате отправления: