Обсуждение: Configuration for a new server.

Поиск
Список
Период
Сортировка

Configuration for a new server.

От
"Benjamin Krajmalnik"
Дата:

Scott,

I don’t know if you received my private email, but just in case you did not I am posting the infomration here.

 

I have a new set of servers coming in – Dual Xeon E5620’s, 96GB RAM, 18 spindles (1 RAID1 for OS – SATA, 12 disk RAID10 for data – SAS, RAID-1 for logs – SAS, 2 hot spares SAS).  They are replacing a single Dual Xeon E5406 with 16GB RAM and 2x RAID1 – one for OS/Data, one for Logs.

Current server is using 3840MB of shared buffers.

 

It will be running FreeBSD 8.1 x64, PG 9.0.2, running streaming replication to a like server.

I have read the performance tuning book written by Greg Smith, and am using it as a guide to configure it for performance.

The main questions which I have are the following:

 

Is the 25% RAM for shared memory still a good number to go with for this size server?

There are approximately 50 tables which get updated with almost 100% records updated every 5 minutes – what is a good number of autovacuum processes to have on these?  The current server I am replacing only has 3 of them but I think I may gain a benefit from having more.

Currently I have what I believe to be an aggressive bgwriter setting as follows:

 

bgwriter_delay = 200ms                  # 10-10000ms between rounds

bgwriter_lru_maxpages = 1000            # 0-1000 max buffers written/round    

bgwriter_lru_multiplier = 10            # 0-10.0 multipler on buffers scanned/round

 

Does this look right?

 

I have the following settings:

work_mem = 64MB                         # min 64kB

maintenance_work_mem = 128MB            # min 1MB

 

And, of course, some of the most critical ones – the WAL settings.  Right now, in order to give the best performance to the end users due to the size of the current box, I have a very unoptimal setting in my opinion

 

fsync = off                             # turns forced synchronization on or off

#synchronous_commit = on                # immediate fsync at commit

#wal_sync_method = fsync                # the default is the first option

                                        # supported by the operating system:

                                        #   open_datasync

                                        #   fdatasync

                                        #   fsync

                                        #   fsync_writethrough

                                        #   open_sync

full_page_writes = on                   # recover from partial page writes

wal_buffers = 16MB

#wal_buffers = 1024KB                   # min 32kB

                                        # (change requires restart)

# wal_writer_delay = 100ms              # 1-10000 milliseconds

                                       

#commit_delay = 0                       # range 0-100000, in microseconds

#commit_siblings = 5                    # range 1-1000

 

# - Checkpoints -

 

#checkpoint_segments = 128              # in logfile segments, min 1, 16MB each

checkpoint_segments = 1024

checkpoint_timeout = 60min              # range 30s-1h

#checkpoint_completion_target = 0.5     # checkpoint target duration, 0.0 - 1.0

checkpoint_completion_target = 0.1

checkpoint_warning = 45min              # 0 disables

 

These are values which I arrived to by playing with them to make sure that the end user performance did not suffer.  The checkpoints are taking about 8 minutes to complete, but between checkpoints the disk i/o on the data partition is very minimal – when I had lower segments running a 15 minute timeout with a .9 completion target, the platform was fairly slow vis-à-vis the end user.

 

The above configuration is using PG 8.4.

 

Thanks in advance for any insight.

Re: Configuration for a new server.

От
Greg Smith
Дата:
Benjamin Krajmalnik wrote:

 have a new set of servers coming in – Dual Xeon E5620’s, 96GB RAM, 18 spindles (1 RAID1 for OS – SATA, 12 disk RAID10 for data – SAS, RAID-1 for logs – SAS, 2 hot spares SAS).


You didn't mention the RAID controller and its cache setup.  That's a critical piece to get write, err, right.  Presumably you've got a battery-backed RAID cache on your SAS controller.  Knowing that and what model it is (to make sure it's one of the ones that performs well) would be good info to pass along here.
 

Is the 25% RAM for shared memory still a good number to go with for this size server?


Several people have reported to me they see drop-offs in performance between 8GB and 10GB for that setting.  I currently recommend limiting shared_buffers to 8GB until we have more data on why that is.  You suggested already having checkpoint issues, too; if that's true, you don't want to dedicate too much RAM to the database for that reason, too.

There are approximately 50 tables which get updated with almost 100% records updated every 5 minutes – what is a good number of autovacuum processes to have on these?  The current server I am replacing only has 3 of them but I think I may gain a benefit from having more.


Watch pg_stat_user_tables and you can figure this out for your workload.  There are no generic answers in this area.

Currently I have what I believe to be an aggressive bgwriter setting as follows:

 

bgwriter_delay = 200ms                  # 10-10000ms between rounds

bgwriter_lru_maxpages = 1000            # 0-1000 max buffers written/round    

bgwriter_lru_multiplier = 10            # 0-10.0 multipler on buffers scanned/round

 

Does this look right?


You'd probably be better off decreasing the delay rather than pushing up the other two parameters.  It's easy to tell if you did it right or not; just look at pg_stat_bgwriter.  If buffers_backend is high relative to the others, that means the multiplier or delay is wrong.  Or if maxwritten_clean is increasing fast, that means bgwriter_lru_maxpages is too low.


These are values which I arrived to by playing with them to make sure that the end user performance did not suffer.  The checkpoints are taking about 8 minutes to complete, but between checkpoints the disk i/o on the data partition is very minimal – when I had lower segments running a 15 minute timeout with a .9 completion target, the platform was fairly slow vis-à-vis the end user.


The completion target isn't the main driver here, the number of segments/timeout is.  When you space checkpoints out further, the actual amount of total I/O the server does decreases, both to the WAL and to the main database.  So I suspect your tweaking the target had little impact, and it's possible you might even get smoother performance if you put it back to a higher value again.

-- 
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books

Re: Configuration for a new server.

От
"Benjamin Krajmalnik"
Дата:

Greg,

 

Thank you very much for your quick response.

The servers are using Areca 1600 series controllers with battery backup and 2GB cache.

I really enjoyed your book (actually, both of the books your company published).  Found them extremely helpful and they filled a lot of gaps in my still gappy knowledge J

 

 

 

 

From: Greg Smith [mailto:greg@2ndquadrant.com]
Sent: Tuesday, February 01, 2011 4:54 AM
To: Benjamin Krajmalnik
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Configuration for a new server.

 

Benjamin Krajmalnik wrote:

 

 have a new set of servers coming in – Dual Xeon E5620’s, 96GB RAM, 18 spindles (1 RAID1 for OS – SATA, 12 disk RAID10 for data – SAS, RAID-1 for logs – SAS, 2 hot spares SAS).


You didn't mention the RAID controller and its cache setup.  That's a critical piece to get write, err, right.  Presumably you've got a battery-backed RAID cache on your SAS controller.  Knowing that and what model it is (to make sure it's one of the ones that performs well) would be good info to pass along here.
 

Is the 25% RAM for shared memory still a good number to go with for this size server?


Several people have reported to me they see drop-offs in performance between 8GB and 10GB for that setting.  I currently recommend limiting shared_buffers to 8GB until we have more data on why that is.  You suggested already having checkpoint issues, too; if that's true, you don't want to dedicate too much RAM to the database for that reason, too.


There are approximately 50 tables which get updated with almost 100% records updated every 5 minutes – what is a good number of autovacuum processes to have on these?  The current server I am replacing only has 3 of them but I think I may gain a benefit from having more.


Watch pg_stat_user_tables and you can figure this out for your workload.  There are no generic answers in this area.


Currently I have what I believe to be an aggressive bgwriter setting as follows:

 

bgwriter_delay = 200ms                  # 10-10000ms between rounds

bgwriter_lru_maxpages = 1000            # 0-1000 max buffers written/round    

bgwriter_lru_multiplier = 10            # 0-10.0 multipler on buffers scanned/round

 

Does this look right?


You'd probably be better off decreasing the delay rather than pushing up the other two parameters.  It's easy to tell if you did it right or not; just look at pg_stat_bgwriter.  If buffers_backend is high relative to the others, that means the multiplier or delay is wrong.  Or if maxwritten_clean is increasing fast, that means bgwriter_lru_maxpages is too low.



These are values which I arrived to by playing with them to make sure that the end user performance did not suffer.  The checkpoints are taking about 8 minutes to complete, but between checkpoints the disk i/o on the data partition is very minimal – when I had lower segments running a 15 minute timeout with a .9 completion target, the platform was fairly slow vis-à-vis the end user.


The completion target isn't the main driver here, the number of segments/timeout is.  When you space checkpoints out further, the actual amount of total I/O the server does decreases, both to the WAL and to the main database.  So I suspect your tweaking the target had little impact, and it's possible you might even get smoother performance if you put it back to a higher value again.


-- 
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books

Re: Configuration for a new server.

От
"Benjamin Krajmalnik"
Дата:

There are approximately 50 tables which get updated with almost 100% records updated every 5 minutes – what is a good number of autovacuum processes to have on these?  The current server I am replacing only has 3 of them but I think I may gain a benefit from having more.


Watch pg_stat_user_tables and you can figure this out for your workload.  There are no generic answers in this area.

What in particular should I be looking at to help me decide?

 

 

Currently I have what I believe to be an aggressive bgwriter setting as follows:

 

bgwriter_delay = 200ms                  # 10-10000ms between rounds

bgwriter_lru_maxpages = 1000            # 0-1000 max buffers written/round    

bgwriter_lru_multiplier = 10            # 0-10.0 multipler on buffers scanned/round

 

Does this look right?


You'd probably be better off decreasing the delay rather than pushing up the other two parameters.  It's easy to tell if you did it right or not; just look at pg_stat_bgwriter.  If buffers_backend is high relative to the others, that means the multiplier or delay is wrong.  Or if maxwritten_clean is increasing fast, that means bgwriter_lru_maxpages is too low.

checkpoints_timed = 261

checkpoints_req = 0

buffers_checkpoint = 49058438

buffers_clean = 3562421

maxwritten_clean = 243

buffers_backend = 11774254

buffers_alloc = 42816578

Re: Configuration for a new server.

От
Greg Smith
Дата:
Benjamin Krajmalnik wrote:

There are approximately 50 tables which get updated with almost 100% records updated every 5 minutes – what is a good number of autovacuum processes to have on these?  The current server I am replacing only has 3 of them but I think I may gain a benefit from having more.


Watch pg_stat_user_tables and you can figure this out for your workload.  There are no generic answers in this area.

What in particular should I be looking at to help me decide?



The information reported that's related to vacuuming.  If you don't have enough workers, you can watch the dead row counts pop upwards without enough "last autovacuum time" updates on enough tables to suggest it's keeping up.  If you see >20% dead rows on lots of tables and they're not being processed by AV and having their timestamps, that's your sign that you don't have enough workers.


You'd probably be better off decreasing the delay rather than pushing up the other two parameters.  It's easy to tell if you did it right or not; just look at pg_stat_bgwriter.  If buffers_backend is high relative to the others, that means the multiplier or delay is wrong.  Or if maxwritten_clean is increasing fast, that means bgwriter_lru_maxpages is too low.

checkpoints_timed = 261

checkpoints_req = 0

buffers_checkpoint = 49,058,438

buffers_clean = 3,562,421

maxwritten_clean = 243

buffers_backend = 11,774,254

buffers_alloc = 42,816,578


See how buffers_backend is much larger than buffers_clean, even though maxwritten_clean is low?  That means the background writer isn't running often enough to keep up with cleaning things, even though it does a lot of work when it does kick in.  In your situation I'd normally do a first pass by cutting bgwriter_lru_maxpages to 1/4 of what it is now, cut bgwriter_delay to 1/4 as well (to 50ms), and then see how the proportions change.  You can probably cut the multiplier, too, yet still see more pages written by the cleaner.

I recommend saving a snapsot of this data with a timestamp, i.e.:

select now(),* from pg_stat_bgwriter;

Anytime you make a change to one of the background writer or checkpoint timing parameters.  That way you have a new baseline to compare against.  These numbers aren't very useful with a single value, but once you get two of them with timestamps you can compute all sorts of fun statistics from the pair.


-- 
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books

Re: Configuration for a new server.

От
"Benjamin Krajmalnik"
Дата:

>See how buffers_backend is much larger than buffers_clean, even though maxwritten_clean is low?  That means the
backgroundwriter isn't running often enough to keep up with cleaning things, even though >it does a lot of work when it
doeskick in.  In your situation I'd normally do a first pass by cutting bgwriter_lru_maxpages to 1/4 of what it is now,
cutbgwriter_delay to 1/4 as well (to 50ms), and >then see how the proportions change.  You can probably cut the
multiplier,too, yet still see more pages written by the cleaner. 

>I recommend saving a snapsot of this data with a timestamp, i.e.:

>select now(),* from pg_stat_bgwriter;

>Anytime you make a change to one of the background writer or checkpoint timing parameters.  That way you have a new
baselineto compare against.  These numbers aren't very useful with a single value, >but once you get two of them with
timestampsyou can compute all sorts of fun statistics from the pair. 

So, if I understand correctly, I should strive for a relative increase in buffers_clean to buffers_backend



Re: Configuration for a new server.

От
Greg Smith
Дата:
Benjamin Krajmalnik wrote:
> So, if I understand correctly, I should strive for a relative increase in buffers_clean to buffers_backend
>

Right.  Buffers written by a backend are the least efficient way to get
data out of the buffer cache, because the client running into that is
stuck waiting for a write call before it can use the resuling free
block.  You want to avoid those in favor of checkpoint and
background-writer cleaner writes instead.

--
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books