Обсуждение: Now I am back, next thing. Final PGS tuning.
Ok,
I have left the previous thread. After changing the last permissions, even though it said Access Denied, suddenly PostgreSQL started to work again. I will not dig any further to the strangeness.
I copied the content of the.conf from tuning wizard and restarted. Still working!
I want to say thanks to several people on that thread :) Thank you!
I would like to further tune the tuning wizards recommendations though. I think it put itself on the lower scale.
I have 8GB memory, Intel Quad Core 2.4Ghz with 8MB L2 cache. I am running Windows Web Server 2008 x64 and will be running a Java (64 bit version) application.
I want to give the java app room for working on 2-3GB. The operating system is currently consuming around 1GB but lets give it a little more room. Lets give it a total of 2GB.
That leaves 8-3-2 = 3 GB for PostgreSQL alone. Sounds good?
Here is my config file :
http://85.235.31.35/resources/postgresql.conf
I see there is a setting
max_connections = 100
What does this do? Should I be looking at this as max similtaneous queries ? is 100 really enough? I think I want to max this more.
I am looking for a worst scenario around like 50-100 similitaneous user clicks (per second?). But the querying might be around like 200 queries per seocond, not really, but I want to be prepared. :)
I would appreciate if could have a discussion on these topics. On whats important and whats not.
Here is some other settings I am thinking about :
effective_cache_size = 449697
is this kilo bytes ? Is this a good value?
maintenance_work_mem = 16384
work_mem = 1024 # I think this is kb. Way to low, right? What is a better value?
shared_buffers = 1024 # min 128kB or max_connections*16kB ## Also to low. Right? I've got 3GB to work with!
wal_buffers = 256 # Also kB...
Please give your thoughts. I was also wondering about the Vacuum, force reindex and stuff. Are those things good to run once in a while? Force sounds a little brutal though!
Something else I should consider?
/ Jennifer
In response to Jennifer Trey <jennifer.trey@gmail.com>: > > I have 8GB memory, Intel Quad Core 2.4Ghz with 8MB L2 cache. I am running > Windows Web Server 2008 x64 and will be running a Java (64 bit version) > application. > > I want to give the java app room for working on 2-3GB. The operating system > is currently consuming around 1GB but lets give it a little more room. Lets > give it a total of 2GB. > > That leaves 8-3-2 = 3 GB for PostgreSQL alone. Sounds good? > > Here is my config file : > > http://85.235.31.35/resources/postgresql.conf > > I see there is a setting > > *max_connections = 100* > > What does this do? Should I be looking at this as max similtaneous queries ? > is 100 really enough? I think I want to max this more. > > I am looking for a worst scenario around like 50-100 similitaneous user > clicks (per second?). But the querying might be around like 200 queries per > seocond, not really, but I want to be prepared. :) Depends on how long your "clicks" take to process. If you're doing 100 page views (clicks) /second and each view takes 2 seconds to process, you're tying up 200 connections on a continual basis. Unless you're using some sort of connection pooling ... I'm no Java expert, but doesn't Java have connection pooling built in? If so, it becomes more difficult to estimate the # of simultaneous connections because each instance of a running script might share a connection with other scripts. In that case, you'll probably have to test to see what a good max is, as it's going to be difficult or impossible to estimate. In any event, 100 is probably a good starting point (based on my experience). Note that if you find that you have to raise that value too high, (much over a few hundred) then you probably want to investigate some form of connection pooling, such as pgpool. > Here is some other settings I am thinking about : > > *effective_cache_size = 449697* What version of Postgres? In modern versions, you can specify MB, GB, etc. This value should be the memory that's left unused when everything is running (including Postgres). It helps the planner estimate how much of the filesystem is cached in memory. Based on the other numbers you've mentioned, this should probably be set to about 2G. > *maintenance_work_mem = 16384 * > > *work_mem = 1024 # I think this is kb. Way to low, right? What is a better > value?* I haven't noticed much value in tweaking this. It only affects a few commands, such as vacuum and analyze. Test to see if tweaking it speeds up vacuum without pushing the system into swap. > *shared_buffers = 1024 # min 128kB or max_connections*16kB ## Also to low. > Right? I've got 3GB to work with!* Assuming that's equating to 1G, then the value is about right. Common best practice is to set this value to 1/4 - 1/3 of the memory available for PostgreSQL. You're saying you'll have ~3G for PG, so 1G is about right to start with. Once the system is up and running, you can install pg_buffercache to monitor usage and help tune it. > *wal_buffers = 256 # Also kB...* > > Please give your thoughts. I was also wondering about the Vacuum, force > reindex and stuff. Are those things good to run once in a while? Force > sounds a little brutal though! Turn on autovacuum. I've found it's the best way to go in 99% of installs (the corner cases being servers that have _very_ predictable workloads ... in which case explicit, scheduled vacuums are better). REINDEXing is an occasional topic of discussion. Doing it occasionally definitely saves disk space on frequently updated databases, but the impact (if any) on performance is a subject for debate. I've yet to see any drastic performance improvement from REINDEXing, but if you've got obvious off-peak times (i.e., if nobody uses the system over weekends or something) it probably doesn't hurt to reindex everything on a regular schedule. Don't obsess over it, though. -- Bill Moran http://www.potentialtech.com http://people.collaborativefusion.com/~wmoran/
> *shared_buffers = 1024 # min 128kB or max_connections*16kB ## Also to low.
> Right? I've got 3GB to work with!*
Assuming that's equating to 1G, then the value is about right. Common
best practice is to set this value to 1/4 - 1/3 of the memory available
for PostgreSQL. You're saying you'll have ~3G for PG, so 1G is about
right to start with.
"documenting" that for the wiki is still on my backlog; so, here:
shared_buffers of PostgreSQL on Windows != shared_buffers of PostgreSQL on Unix
My experience is that raising shared_memory on Windows above minimum+~20% is not helping performance; it's more effective to have that memory at Windows for caching. (at least up to server 2003)
Harald
--
GHUM Harald Massa
persuadere et programmare
Harald Armin Massa
Spielberger Straße 49
70435 Stuttgart
0173/9409607
no fx, no carrier pigeon
-
LASIK good, steroids bad?
In response to Jennifer Trey <jennifer.trey@gmail.com>:>Depends on how long your "clicks" take to process. If you're doing 100
> I have 8GB memory, Intel Quad Core 2.4Ghz with 8MB L2 cache. I am running
> Windows Web Server 2008 x64 and will be running a Java (64 bit version)
> application.
>
> I want to give the java app room for working on 2-3GB. The operating system
> is currently consuming around 1GB but lets give it a little more room. Lets
> give it a total of 2GB.
>
> That leaves 8-3-2 = 3 GB for PostgreSQL alone. Sounds good?
>
> Here is my config file :
>
> http://85.235.31.35/resources/postgresql.conf
>
> I see there is a setting
>
> *max_connections = 100*
>
> What does this do? Should I be looking at this as max similtaneous queries ?
> is 100 really enough? I think I want to max this more.
>
> I am looking for a worst scenario around like 50-100 similitaneous user
> clicks (per second?). But the querying might be around like 200 queries per
> seocond, not really, but I want to be prepared. :)
page views (clicks) /second and each view takes 2 seconds to process, you're
tying up 200 connections on a continual basis.
Unless you're using some sort of connection pooling ... I'm no Java expert,
but doesn't Java have connection pooling built in? If so, it becomes
more difficult to estimate the # of simultaneous connections because each
instance of a running script might share a connection with other scripts.
In that case, you'll probably have to test to see what a good max is, as
it's going to be difficult or impossible to estimate.
In any event, 100 is probably a good starting point (based on my
experience). Note that if you find that you have to raise that value too
high, (much over a few hundred) then you probably want to investigate some
form of connection pooling, such as pgpool.
> Here is some other settings I am thinking about :> *effective_cache_size = 449697*
>What version of Postgres? In modern versions, you can specify MB, GB, etc.
This value should be the memory that's left unused when everything is
running (including Postgres). It helps the planner estimate how much of
the filesystem is cached in memory. Based on the other numbers you've
mentioned, this should probably be set to about 2G.
> *maintenance_work_mem = 16384 *>> value?*
> *work_mem = 1024 # I think this is kb. Way to low, right? What is a better
I haven't noticed much value in tweaking this. It only affects a few
commands, such as vacuum and analyze. Test to see if tweaking it speeds
up vacuum without pushing the system into swap.
> *shared_buffers = 1024 # min 128kB or max_connections*16kB ## Also to low.
> Right? I've got 3GB to work with!*
Assuming that's equating to 1G, then the value is about right. Common
best practice is to set this value to 1/4 - 1/3 of the memory available
for PostgreSQL. You're saying you'll have ~3G for PG, so 1G is about
right to start with.
Once the system is up and running, you can install pg_buffercache to
monitor usage and help tune it.
> *wal_buffers = 256 # Also kB...*>Turn on autovacuum. I've found it's the best way to go in 99% of installs
> Please give your thoughts. I was also wondering about the Vacuum, force
> reindex and stuff. Are those things good to run once in a while? Force
> sounds a little brutal though!
(the corner cases being servers that have _very_ predictable workloads ...
in which case explicit, scheduled vacuums are better).
REINDEXing is an occasional topic of discussion. Doing it occasionally
definitely saves disk space on frequently updated databases, but the
impact (if any) on performance is a subject for debate. I've yet to see
any drastic performance improvement from REINDEXing, but if you've got
obvious off-peak times (i.e., if nobody uses the system over weekends or
something) it probably doesn't hurt to reindex everything on a regular
schedule. Don't obsess over it, though.
--
Bill Moran
http://www.potentialtech.com
http://people.collaborativefusion.com/~wmoran/
In response to Jennifer Trey <jennifer.trey@gmail.com>:
> *maintenance_work_mem = 16384 *
If your vacuums and / or create index are taking ages, considering a higher value here may be useful. I would need to know more about the database before suggesting though. I have a gut feeling that this may be a good starting place.
>> value?*
> *work_mem = 1024 # I think this is kb. Way to low, right? What is a better
If your application makes 100 connections to the database and your work_mem =1GB, IF you are running big nasty order by's... you would be swapping 100 GB. This is a pretty extreme example, but I think it's important.
As a rule of thumb, I like to start with 1 - 2 Mb and almost always leave it there. If you're doing joins and order by's on many many gigs later on, then it could be an issue.
> *shared_buffers = 1024 # min 128kB or max_connections*16kB ## Also to low.
> Right? I've got 3GB to work with!*
Assuming that's equating to 1G, then the value is about right. Common
best practice is to set this value to 1/4 - 1/3 of the memory available
for PostgreSQL. You're saying you'll have ~3G for PG, so 1G is about
right to start with.
The idea here is to be conservative with shared_buffers and then use effective_cache_size to tell the optimizer how much ram the OS can use for buffering data. 1 GB is a good start place.
Once the system is up and running, you can install pg_buffercache to
monitor usage and help tune it.
Good advice
> *wal_buffers = 256 # Also kB...*>Turn on autovacuum. I've found it's the best way to go in 99% of installs
> Please give your thoughts. I was also wondering about the Vacuum, force
> reindex and stuff. Are those things good to run once in a while? Force
> sounds a little brutal though!
(the corner cases being servers that have _very_ predictable workloads ...
in which case explicit, scheduled vacuums are better).
+ 1
REINDEXing is an occasional topic of discussion. Doing it occasionally
definitely saves disk space on frequently updated databases, but the
impact (if any) on performance is a subject for debate. I've yet to see
any drastic performance improvement from REINDEXing, but if you've got
obvious off-peak times (i.e., if nobody uses the system over weekends or
something) it probably doesn't hurt to reindex everything on a regular
schedule. Don't obsess over it, though.
Just remember that the REINDEX command is a locking command, so using 'create index concurrently' is recommended.
You can also use the pg_stat_all_indexes table to look at index scans vs. tuples being read, this can sometimes hint at index 'bloat'. I would also recommend pg_stattuple which has a pg_statindex function for looking at index fragmentation.
--Scott
Bill, Jennifer,"documenting" that for the wiki is still on my backlog; so, here:> *shared_buffers = 1024 # min 128kB or max_connections*16kB ## Also to low.
> Right? I've got 3GB to work with!*
Assuming that's equating to 1G, then the value is about right. Common
best practice is to set this value to 1/4 - 1/3 of the memory available
for PostgreSQL. You're saying you'll have ~3G for PG, so 1G is about
right to start with.
shared_buffers of PostgreSQL on Windows != shared_buffers of PostgreSQL on Unix
My experience is that raising shared_memory on Windows above minimum+~20% is not helping performance; it's more effective to have that memory at Windows for caching. (at least up to server 2003)
Harald
--
GHUM Harald Massa
persuadere et programmare
Harald Armin Massa
Spielberger Straße 49
70435 Stuttgart
0173/9409607
no fx, no carrier pigeon
-
LASIK good, steroids bad?
On Wed, 8 Apr 2009, Massa, Harald Armin wrote: > "documenting" that for the wiki is still on my backlog; so, here: > shared_buffers of PostgreSQL on Windows != shared_buffers of PostgreSQL on Unix There's already comments about that in the shared_buffers section of http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
effective_cache_size = 2048MB # Old value 439MB --> Even older : 128MB
#Is this too high?
maintenance_work_mem = 96MB # Old 16MB. Would 64MB be better? Updates
and therefore re-indexing of tuples happens quite frequently.
work_mem = 3MB
# Old was 1MB!? That is too low.
# Scott you mentioned an example with 1 GB. I guess this is the work
memory to work on per user query to sort, join and so on. I will be
doing those things quite often.
# After all, if I understand the concept correctly, it will only use
it if needs too, otherwise performance will take a hit.
# Scott, you say that I might need to change this later on when I have
several gigs of data. But will it hurt when I don't?
# I think 4-8MB should be enough and relativly safe to start with. I
am scared of going higher. But 1MB is low.
shared_buffer = 1024MB # Kept it
random_page_cost = 3 # I have pretty fast disks.
wal_buffers = 1024KB
Scott, you mentioned :
You can also use the pg_stat_all_indexes table to look at index scans
vs. tuples being read, this can sometimes hint at index 'bloat'. I
would also recommend pg_stattuple which has a pg_statindex function
for looking at index fragmentation.
From where can I see these stats ? Is there any graphic tool?
Thanks all / Jennifer
max_connections = 150 # A comprimise :)
Scott, you mentioned :From where can I see these stats ? Is there any graphic tool?
You can also use the pg_stat_all_indexes table to look at index scans
vs. tuples being read, this can sometimes hint at index 'bloat'. I
would also recommend pg_stattuple which has a pg_statindex function
for looking at index fragmentation.
From pgAdmin, you could:
select * from pg_stat_all_indexes;
You will see this system view in pgAdmin by:
database +
Catalogs +
PostgreSQL (pg_catalog) +
Views +
You should be able to see the structure there.
--Scott
I think I might have misunderstood the effective cache size. Its measured in 8kB blocks. So the old number 449697 equals 3.5 GB, which is quite much. Should I lower this? I had plans to use 2.75GB max. Can I put 2.75GB there? Should I leave it?
Also, Greg. Since I use Java, prepared statements are quite natural. And I read this part on the guide which I understand you are part of :
http://www.postgresql.org/docs/current/static/runtime-config-resource.html#GUC-MAX-PREPARED-TRANSACTIONS
Should I change this value? Not sure... :S
Worried about the locks... whats your though on this? Should I just leave it alone?
Sincerely / Jennifer
On Wed, Apr 8, 2009 at 12:24 PM, Jennifer Trey <jennifer.trey@gmail.com> wrote: > I think I might have misunderstood the effective cache size. Its measured in > 8kB blocks. So the old number 449697 equals 3.5 GB, which is quite much. > Should I lower this? I had plans to use 2.75GB max. Can I put 2.75GB there? > Should I leave it? The effective cache size setting is merely letting postgres know how much caching it can expect the OS to be doing. If you know that the OS isn't going to have more than 2.75 GB available for caching DB files, then by all means reduce it. The setting by itself doesn't affect postgres memory usage at all, though. -- - David T. Wilson david.t.wilson@gmail.com
Well, no.. I don't know that. But in a worst case scenario, where everything is using max, there won't be 3.5 GB for the OS. But for the OS + Postgre (combined) there will be 2.5 + 2.75 .. But it seems that there is no greater danger in the effective cache, but a good setting would be nice :) Is the effective cache only the one for the OS ? not for them combined ?
Sincerely / Jen
On Wed, Apr 8, 2009 at 12:24 PM, Jennifer Trey <jennifer.trey@gmail.com> wrote:The effective cache size setting is merely letting postgres know how
> I think I might have misunderstood the effective cache size. Its measured in
> 8kB blocks. So the old number 449697 equals 3.5 GB, which is quite much.
> Should I lower this? I had plans to use 2.75GB max. Can I put 2.75GB there?
> Should I leave it?
much caching it can expect the OS to be doing. If you know that the OS
isn't going to have more than 2.75 GB available for caching DB files,
then by all means reduce it. The setting by itself doesn't affect
postgres memory usage at all, though.
--
- David T. Wilson
david.t.wilson@gmail.com
Jennifer Trey wrote: > Scott, thank you. > > I think I might have misunderstood the effective cache size. Its > measured in 8kB blocks. So the old number 449697 equals 3.5 GB, which > is quite much. Should I lower this? I had plans to use 2.75GB max. Can > I put 2.75GB there? Should I leave it? effective_cache_size is an estimate of how much disk data the OS is likely to have cached in memory. postgres uses this to guess whether or not recently read data is likely to be 'fast' (in the system cache) or 'slow' (on the physical disk, hence requiring disk IO to read). This value is used in some fairly abstract heuristics, it does NOT need to be that accurate, its jusr a ballpark estimate. you should run your system under your expected workload, then view the actual working cache size in Task Manager ("System Cache" on the Performance tab of the task manager in XP, I dunno about 2008 Server)... Now some of that cache probably belongs to other processes than postgres, so round down a bit. On my desktop system at the moment, I'm showing 1.3GB
On Wed, Apr 8, 2009 at 8:01 AM, Jennifer Trey <jennifer.trey@gmail.com> wrote: > I would like to further tune the tuning wizards recommendations though. I > think it put itself on the lower scale. OK, instead of blindly guessing at better values, and making a lot of concurrent changes, you need to set up some kind of simple yet realistic benchmark for your database. It doesn't have to be perfect, but it should realistically reflect the number of clients you'll have connecting at once and the types of queries they're likely to run. Write a simple pgbench script and use it to test your changes. > I have 8GB memory, Intel Quad Core 2.4Ghz with 8MB L2 cache. I am running > Windows Web Server 2008 x64 and will be running a Java (64 bit version) > application. Note that the very first thing you could do to performance tune your server would be to run it on something other than windows. This is not to bash windows, it's a simple fact of postgresql's architecture not being a great match for windows under heavy load. > I want to give the java app room for working on 2-3GB. The operating system > is currently consuming around 1GB but lets give it a little more room. Lets > give it a total of 2GB. > > That leaves 8-3-2 = 3 GB for PostgreSQL alone. Sounds good? Note that PostgreSQL relies on the OS caching as much as its own, and this tends to be even more true in windows environments. > Here is my config file : > > max_connections = 100 > > What does this do? Should I be looking at this as max similtaneous queries ? > is 100 really enough? I think I want to max this more. Max connections is the maximum number of clients that can connect at the same time. Each connection uses up a few megs of memory and can start a query independent of other connections. Generally a hundred or so is a reasonable place to start. But capacity planning will tell you how many you really need. If you find yourself going over 100 by much, start looking at connection pooling. > I am looking for a worst scenario around like 50-100 similitaneous user > clicks (per second?). But the querying might be around like 200 queries per > seocond, not really, but I want to be prepared. :) OK, the real issue here will be whether or not you have persistent pooled connections. Creating PostgreSQL connections is expensive, especially so on Windows. Definitely look at pooling > I would appreciate if could have a discussion on these topics. On whats > important and whats not. > > Here is some other settings I am thinking about : > > effective_cache_size = 449697 > > is this kilo bytes ? Is this a good value? In 8.3 you can put the actual unit after, so 400MB would be a nicer way to put that number in. Effective cache size just tells the planner about how much cache there is in the OS, and postgresql. It's not a very fine grained control, so just guestimate it at say 3000MB or something for now. > maintenance_work_mem = 16384 Again, it's a good idea to put units after in 8.3. Since maintenance_work_mem is only used by things like vacuum, you can set it pretty high (256MB is common) and not worry too much. > work_mem = 1024 # I think this is kb. Way to low, right? What is a better > value? Again, throw a unit on the end. default is kb. 1M is fine for now. Again, implement some kind of benchmark, increase it when it provably makes a difference for most of your queries. If there's a lone query that can use a lot more, then set work_mem higher in that session or for a special user so it's not higher for everybody. work_mem is PER SORT type op / PER SESSION. So, if you have 100 users doing 2 sorts each you can theoretically use up 100x2xwork_mem memory. A machine that's running fine one moment can collapse under load as the number of processes increase and memory gets allocated out of control. That said, on my servers, with 100 to 200 connections, it's set to 8 meg. That machine has 32Gig of ram, so 800 to 1600 Meg of ram theoretically getting used won't cause some kind of swap storm. > shared_buffers = 1024 # min 128kB or max_connections*16kB ## Also to low. > Right? I've got 3GB to work with! But, you're on Windows, and the shared_buffer implementation there doesn't scale as well as it does on linux or other flavors of unix. So, while setting it a bit higher is good, don't set it any higher than it needs to be to hold the current working set of all queries, which is usually in the hundreds of megabytes, not the gigabyte range. Again, benchmark and test, but a good starting point is likely in the 128MB to 512MB range for windows. > Please give your thoughts. I was also wondering about the Vacuum, force > reindex and stuff. Are those things good to run once in a while? Force > sounds a little brutal though! It's much better to monitor your db for such things and vacuum full / reindex only when / if needed, and do what you can to head those things off. > Something else I should consider? If your dataset can fit in memory, consider lowering random_page_cost to something in the range of but above 1.0.
On Wed, 8 Apr 2009, Jennifer Trey wrote: > shared_buffer = 1024MB # Kept it As mentioned a couple of times here, this is a really large setting for Windows. Something like 256MB would work better, and you might even find some people making a case for 64MB or less on Windows. I don't really know for sure myself. > Is the effective cache only the one for the OS ? not for them combined ? It is sizing the combination of the shared_buffers *plus* what you expect in the OS buffer cache. I normally look at the size of the OS buffer cache before the PostgreSQL server is started as a rough estimate here. > Since I use Java, prepared statements are quite natural. Prepared statements are not prepared transactions. It's unlikely you've got any code that uses PREPARE TRANSACTION, so you shouldn't need to increase max_prepared_transactions. All three of the above are not really clear in the tuning guide on the wiki, I'll do an update to improve those sections when I get a minute. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD