Обсуждение: Configuration for a new server.
Scott,
I don’t know if you received my private email, but just in case you did not I am posting the infomration here.
I have a new set of servers coming in – Dual Xeon E5620’s, 96GB RAM, 18 spindles (1 RAID1 for OS – SATA, 12 disk RAID10 for data – SAS, RAID-1 for logs – SAS, 2 hot spares SAS). They are replacing a single Dual Xeon E5406 with 16GB RAM and 2x RAID1 – one for OS/Data, one for Logs.
Current server is using 3840MB of shared buffers.
It will be running FreeBSD 8.1 x64, PG 9.0.2, running streaming replication to a like server.
I have read the performance tuning book written by Greg Smith, and am using it as a guide to configure it for performance.
The main questions which I have are the following:
Is the 25% RAM for shared memory still a good number to go with for this size server?
There are approximately 50 tables which get updated with almost 100% records updated every 5 minutes – what is a good number of autovacuum processes to have on these? The current server I am replacing only has 3 of them but I think I may gain a benefit from having more.
Currently I have what I believe to be an aggressive bgwriter setting as follows:
bgwriter_delay = 200ms # 10-10000ms between rounds
bgwriter_lru_maxpages = 1000 # 0-1000 max buffers written/round
bgwriter_lru_multiplier = 10 # 0-10.0 multipler on buffers scanned/round
Does this look right?
I have the following settings:
work_mem = 64MB # min 64kB
maintenance_work_mem = 128MB # min 1MB
And, of course, some of the most critical ones – the WAL settings. Right now, in order to give the best performance to the end users due to the size of the current box, I have a very unoptimal setting in my opinion
fsync = off # turns forced synchronization on or off
#synchronous_commit = on # immediate fsync at commit
#wal_sync_method = fsync # the default is the first option
# supported by the operating system:
# open_datasync
# fdatasync
# fsync
# fsync_writethrough
# open_sync
full_page_writes = on # recover from partial page writes
wal_buffers = 16MB
#wal_buffers = 1024KB # min 32kB
# (change requires restart)
# wal_writer_delay = 100ms # 1-10000 milliseconds
#commit_delay = 0 # range 0-100000, in microseconds
#commit_siblings = 5 # range 1-1000
# - Checkpoints -
#checkpoint_segments = 128 # in logfile segments, min 1, 16MB each
checkpoint_segments = 1024
checkpoint_timeout = 60min # range 30s-1h
#checkpoint_completion_target = 0.5 # checkpoint target duration, 0.0 - 1.0
checkpoint_completion_target = 0.1
checkpoint_warning = 45min # 0 disables
These are values which I arrived to by playing with them to make sure that the end user performance did not suffer. The checkpoints are taking about 8 minutes to complete, but between checkpoints the disk i/o on the data partition is very minimal – when I had lower segments running a 15 minute timeout with a .9 completion target, the platform was fairly slow vis-à-vis the end user.
The above configuration is using PG 8.4.
Thanks in advance for any insight.
have a new set of servers coming in – Dual Xeon E5620’s, 96GB RAM, 18 spindles (1 RAID1 for OS – SATA, 12 disk RAID10 for data – SAS, RAID-1 for logs – SAS, 2 hot spares SAS).
You didn't mention the RAID controller and its cache setup. That's a critical piece to get write, err, right. Presumably you've got a battery-backed RAID cache on your SAS controller. Knowing that and what model it is (to make sure it's one of the ones that performs well) would be good info to pass along here.
Is the 25% RAM for shared memory still a good number to go with for this size server?
Several people have reported to me they see drop-offs in performance between 8GB and 10GB for that setting. I currently recommend limiting shared_buffers to 8GB until we have more data on why that is. You suggested already having checkpoint issues, too; if that's true, you don't want to dedicate too much RAM to the database for that reason, too.
There are approximately 50 tables which get updated with almost 100% records updated every 5 minutes – what is a good number of autovacuum processes to have on these? The current server I am replacing only has 3 of them but I think I may gain a benefit from having more.
Watch pg_stat_user_tables and you can figure this out for your workload. There are no generic answers in this area.
Currently I have what I believe to be an aggressive bgwriter setting as follows:
bgwriter_delay = 200ms # 10-10000ms between rounds
bgwriter_lru_maxpages = 1000 # 0-1000 max buffers written/round
bgwriter_lru_multiplier = 10 # 0-10.0 multipler on buffers scanned/round
Does this look right?
You'd probably be better off decreasing the delay rather than pushing up the other two parameters. It's easy to tell if you did it right or not; just look at pg_stat_bgwriter. If buffers_backend is high relative to the others, that means the multiplier or delay is wrong. Or if maxwritten_clean is increasing fast, that means bgwriter_lru_maxpages is too low.
These are values which I arrived to by playing with them to make sure that the end user performance did not suffer. The checkpoints are taking about 8 minutes to complete, but between checkpoints the disk i/o on the data partition is very minimal – when I had lower segments running a 15 minute timeout with a .9 completion target, the platform was fairly slow vis-à-vis the end user.
The completion target isn't the main driver here, the number of segments/timeout is. When you space checkpoints out further, the actual amount of total I/O the server does decreases, both to the WAL and to the main database. So I suspect your tweaking the target had little impact, and it's possible you might even get smoother performance if you put it back to a higher value again.
-- Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us "PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books
Greg,
Thank you very much for your quick response.
The servers are using Areca 1600 series controllers with battery backup and 2GB cache.
I really enjoyed your book (actually, both of the books your company published). Found them extremely helpful and they filled a lot of gaps in my still gappy knowledge J
From: Greg Smith [mailto:greg@2ndquadrant.com]
Sent: Tuesday, February 01, 2011 4:54 AM
To: Benjamin Krajmalnik
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Configuration for a new server.
Benjamin Krajmalnik wrote:
have a new set of servers coming in – Dual Xeon E5620’s, 96GB RAM, 18 spindles (1 RAID1 for OS – SATA, 12 disk RAID10 for data – SAS, RAID-1 for logs – SAS, 2 hot spares SAS).
You didn't mention the RAID controller and its cache setup. That's a critical piece to get write, err, right. Presumably you've got a battery-backed RAID cache on your SAS controller. Knowing that and what model it is (to make sure it's one of the ones that performs well) would be good info to pass along here.
Is the 25% RAM for shared memory still a good number to go with for this size server?
Several people have reported to me they see drop-offs in performance between 8GB and 10GB for that setting. I currently recommend limiting shared_buffers to 8GB until we have more data on why that is. You suggested already having checkpoint issues, too; if that's true, you don't want to dedicate too much RAM to the database for that reason, too.
There are approximately 50 tables which get updated with almost 100% records updated every 5 minutes – what is a good number of autovacuum processes to have on these? The current server I am replacing only has 3 of them but I think I may gain a benefit from having more.
Watch pg_stat_user_tables and you can figure this out for your workload. There are no generic answers in this area.
Currently I have what I believe to be an aggressive bgwriter setting as follows:
bgwriter_delay = 200ms # 10-10000ms between rounds
bgwriter_lru_maxpages = 1000 # 0-1000 max buffers written/round
bgwriter_lru_multiplier = 10 # 0-10.0 multipler on buffers scanned/round
Does this look right?
You'd probably be better off decreasing the delay rather than pushing up the other two parameters. It's easy to tell if you did it right or not; just look at pg_stat_bgwriter. If buffers_backend is high relative to the others, that means the multiplier or delay is wrong. Or if maxwritten_clean is increasing fast, that means bgwriter_lru_maxpages is too low.
These are values which I arrived to by playing with them to make sure that the end user performance did not suffer. The checkpoints are taking about 8 minutes to complete, but between checkpoints the disk i/o on the data partition is very minimal – when I had lower segments running a 15 minute timeout with a .9 completion target, the platform was fairly slow vis-à-vis the end user.
The completion target isn't the main driver here, the number of segments/timeout is. When you space checkpoints out further, the actual amount of total I/O the server does decreases, both to the WAL and to the main database. So I suspect your tweaking the target had little impact, and it's possible you might even get smoother performance if you put it back to a higher value again.
--
Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books
There are approximately 50 tables which get updated with almost 100% records updated every 5 minutes – what is a good number of autovacuum processes to have on these? The current server I am replacing only has 3 of them but I think I may gain a benefit from having more.
Watch pg_stat_user_tables and you can figure this out for your workload. There are no generic answers in this area.
What in particular should I be looking at to help me decide?
Currently I have what I believe to be an aggressive bgwriter setting as follows:
bgwriter_delay = 200ms # 10-10000ms between rounds
bgwriter_lru_maxpages = 1000 # 0-1000 max buffers written/round
bgwriter_lru_multiplier = 10 # 0-10.0 multipler on buffers scanned/round
Does this look right?
You'd probably be better off decreasing the delay rather than pushing up the other two parameters. It's easy to tell if you did it right or not; just look at pg_stat_bgwriter. If buffers_backend is high relative to the others, that means the multiplier or delay is wrong. Or if maxwritten_clean is increasing fast, that means bgwriter_lru_maxpages is too low.
checkpoints_timed = 261
checkpoints_req = 0
buffers_checkpoint = 49058438
buffers_clean = 3562421
maxwritten_clean = 243
buffers_backend = 11774254
buffers_alloc = 42816578
There are approximately 50 tables which get updated with almost 100% records updated every 5 minutes – what is a good number of autovacuum processes to have on these? The current server I am replacing only has 3 of them but I think I may gain a benefit from having more.
Watch pg_stat_user_tables and you can figure this out for your workload. There are no generic answers in this area.
What in particular should I be looking at to help me decide?
The information reported that's related to vacuuming. If you don't have enough workers, you can watch the dead row counts pop upwards without enough "last autovacuum time" updates on enough tables to suggest it's keeping up. If you see >20% dead rows on lots of tables and they're not being processed by AV and having their timestamps, that's your sign that you don't have enough workers.
You'd probably be better off decreasing the delay rather than pushing up the other two parameters. It's easy to tell if you did it right or not; just look at pg_stat_bgwriter. If buffers_backend is high relative to the others, that means the multiplier or delay is wrong. Or if maxwritten_clean is increasing fast, that means bgwriter_lru_maxpages is too low.
checkpoints_timed = 261checkpoints_req = 0
buffers_checkpoint = 49,058,438
buffers_clean = 3,562,421
maxwritten_clean = 243
buffers_backend = 11,774,254
buffers_alloc = 42,816,578
See how buffers_backend is much larger than buffers_clean, even though maxwritten_clean is low? That means the background writer isn't running often enough to keep up with cleaning things, even though it does a lot of work when it does kick in. In your situation I'd normally do a first pass by cutting bgwriter_lru_maxpages to 1/4 of what it is now, cut bgwriter_delay to 1/4 as well (to 50ms), and then see how the proportions change. You can probably cut the multiplier, too, yet still see more pages written by the cleaner.
I recommend saving a snapsot of this data with a timestamp, i.e.:
select now(),* from pg_stat_bgwriter;
Anytime you make a change to one of the background writer or checkpoint timing parameters. That way you have a new baseline to compare against. These numbers aren't very useful with a single value, but once you get two of them with timestamps you can compute all sorts of fun statistics from the pair.
-- Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us "PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books
>See how buffers_backend is much larger than buffers_clean, even though maxwritten_clean is low? That means the backgroundwriter isn't running often enough to keep up with cleaning things, even though >it does a lot of work when it doeskick in. In your situation I'd normally do a first pass by cutting bgwriter_lru_maxpages to 1/4 of what it is now, cutbgwriter_delay to 1/4 as well (to 50ms), and >then see how the proportions change. You can probably cut the multiplier,too, yet still see more pages written by the cleaner. >I recommend saving a snapsot of this data with a timestamp, i.e.: >select now(),* from pg_stat_bgwriter; >Anytime you make a change to one of the background writer or checkpoint timing parameters. That way you have a new baselineto compare against. These numbers aren't very useful with a single value, >but once you get two of them with timestampsyou can compute all sorts of fun statistics from the pair. So, if I understand correctly, I should strive for a relative increase in buffers_clean to buffers_backend
Benjamin Krajmalnik wrote: > So, if I understand correctly, I should strive for a relative increase in buffers_clean to buffers_backend > Right. Buffers written by a backend are the least efficient way to get data out of the buffer cache, because the client running into that is stuck waiting for a write call before it can use the resuling free block. You want to avoid those in favor of checkpoint and background-writer cleaner writes instead. -- Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us "PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books