Обсуждение: High load,
Hi all, we are running a fairly big Ruby on Rails application on Postgres 8.4. Our traffic grew quite a bit lately, and since then we are facing DB performance issues. System load occasionally explodes (around 170 yesterday on a 16 core system), which seems to be caused by disk I/O (iowait in our Munin graphs goes up significantly during these periods). At other times the laod stays rather low under pretty much the same circumstances. There are 6 application servers with 18 unicorns each, as well as 12 beanstalk workers talking to the DB. I know the problem description is very vague, but so far we haven't consistently managed to reproduce the problem. Turning of the beanstalk workers usually leads to a great decreases in writes and system loads, but during yesterday's debugging session they obviously ran fine (thanks, Murphy). Below you'll find our system information and Postgres config, maybe someone could be so kind as to point out any obvious flaws in our current configuration while I'm trying to get a better description of the underlying problem. Postgres version: 8.4.6 Number of logical CPUs: 16 (4x Quadcore Xeon E5520 @ 2.27GHz) RAM: 16GB total used free shared buffers cached Mem: 16461012 16399520 61492 0 72392 12546112 -/+ buffers/cache: 3781016 12679996 Swap: 999992 195336 804656 HDD: 2x 120 GB OCZ Vertex 2 SSD; RAID 1 Concurrent connections (according to our monitoring tool): 7 (min), 74 (avg), 197 (max) Our config (all other settings at default value): max_connections = 200 ssl = true shared_buffers = 4096MB work_mem = 256MB maintenance_work_mem = 512MB synchronous_commit = off wal_buffers = 8MB checkpoint_segments = 30 checkpoint_timeout = 15min checkpoint_completion_target = 0.9 random_page_cost = 2.0 effective_cache_size = 8192MB logging_collector = on log_directory = '/var/log/postgresql' log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log' log_min_duration_statement = 1000 log_connections = on log_disconnections = on log_line_prefix = '%t ' datestyle = 'iso, mdy' gin_fuzzy_search_limit = 10000 The config options are a mix of the article "Configuring PostgreSQL for Pretty Good Performance" [1] and the talk "PostgreSQL as a secret weapon for high-performance Ruby on Rails applications" [2]. Thanks, Michael [1] http://www.linux.com/learn/tutorials/394523-configuring-postgresql-for-pretty-good-performance [2] http://www.pgcon.org/2010/schedule/events/210.en.html
2011/1/27 Michael Kohl <michael.kohl@tupalo.com>: > Hi all, > > we are running a fairly big Ruby on Rails application on Postgres 8.4. > Our traffic grew quite a bit lately, and since then we are facing DB > performance issues. System load occasionally explodes (around 170 > yesterday on a 16 core system), which seems to be caused by disk I/O > (iowait in our Munin graphs goes up significantly during these > periods). At other times the laod stays rather low under pretty much > the same circumstances. > > There are 6 application servers with 18 unicorns each, as well as 12 > beanstalk workers talking to the DB. I know the problem description is > very vague, but so far we haven't consistently managed to reproduce > the problem. Turning of the beanstalk workers usually leads to a great > decreases in writes and system loads, but during yesterday's debugging > session they obviously ran fine (thanks, Murphy). > > Below you'll find our system information and Postgres config, maybe > someone could be so kind as to point out any obvious flaws in our > current configuration while I'm trying to get a better description of > the underlying problem. > > Postgres version: 8.4.6 > > Number of logical CPUs: 16 (4x Quadcore Xeon E5520 @ 2.27GHz) > > RAM: 16GB > > total used free shared buffers cached > Mem: 16461012 16399520 61492 0 72392 12546112 > -/+ buffers/cache: 3781016 12679996 > Swap: 999992 195336 804656 you have swap used, IO on the swap partition ? can you paste the /proc/meminfo ? Also turn on log_checkpoint if it is not already and check the duration to write the data. You didn't said the DB size (and size of active part of it), it would help here. > > HDD: 2x 120 GB OCZ Vertex 2 SSD; RAID 1 > > Concurrent connections (according to our monitoring tool): 7 (min), 74 > (avg), 197 (max) > > Our config (all other settings at default value): > > max_connections = 200 > ssl = true > shared_buffers = 4096MB > work_mem = 256MB it is too much with 200 connections. you may experiment case where you try to use more than the memory available. see http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server --> work_mem maintainance_work_mem > maintenance_work_mem = 512MB 128MB is usualy enough > synchronous_commit = off > wal_buffers = 8MB 16MB should work well > checkpoint_segments = 30 > checkpoint_timeout = 15min > checkpoint_completion_target = 0.9 > random_page_cost = 2.0 > effective_cache_size = 8192MB 12-14GB looks better > logging_collector = on > log_directory = '/var/log/postgresql' > log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log' > log_min_duration_statement = 1000 > log_connections = on > log_disconnections = on > log_line_prefix = '%t ' > datestyle = 'iso, mdy' > gin_fuzzy_search_limit = 10000 you use full_text_search ? > > The config options are a mix of the article "Configuring PostgreSQL > for Pretty Good Performance" [1] and the talk "PostgreSQL as a secret > weapon for high-performance Ruby on Rails applications" [2]. do you monitor the 'locks' ? and the commit/rollbacks ? > > Thanks, > Michael > > [1] http://www.linux.com/learn/tutorials/394523-configuring-postgresql-for-pretty-good-performance > [2] http://www.pgcon.org/2010/schedule/events/210.en.html > > -- > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance > -- Cédric Villemain 2ndQuadrant http://2ndQuadrant.fr/ PostgreSQL : Expertise, Formation et Support
Cédric, thanks a lot for your answer so far! On Thu, Jan 27, 2011 at 12:24 PM, Cédric Villemain <cedric.villemain.debian@gmail.com> wrote: > you have swap used, IO on the swap partition ? Memory-wise we are fine. > can you paste the /proc/meminfo ? Sure: # cat /proc/meminfo MemTotal: 16461012 kB MemFree: 280440 kB Buffers: 60984 kB Cached: 13757080 kB SwapCached: 6112 kB Active: 7049744 kB Inactive: 7716308 kB Active(anon): 2743696 kB Inactive(anon): 2498056 kB Active(file): 4306048 kB Inactive(file): 5218252 kB Unevictable: 0 kB Mlocked: 0 kB SwapTotal: 999992 kB SwapFree: 989496 kB Dirty: 3500 kB Writeback: 0 kB AnonPages: 943752 kB Mapped: 4114916 kB Shmem: 4293312 kB Slab: 247036 kB SReclaimable: 212788 kB SUnreclaim: 34248 kB KernelStack: 3144 kB PageTables: 832768 kB NFS_Unstable: 0 kB Bounce: 0 kB WritebackTmp: 0 kB CommitLimit: 9230496 kB Committed_AS: 5651528 kB VmallocTotal: 34359738367 kB VmallocUsed: 51060 kB VmallocChunk: 34350787468 kB HardwareCorrupted: 0 kB HugePages_Total: 0 HugePages_Free: 0 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 2048 kB DirectMap4k: 7936 kB DirectMap2M: 16760832 kB > Also turn on log_checkpoint if it is not already and check the > duration to write the data. Will do, thanks! > You didn't said the DB size (and size of active part of it), it would help here. => select pg_size_pretty(pg_database_size('xxx')); pg_size_pretty ---------------- 32 GB (1 row) > it is too much with 200 connections. you may experiment case where you > try to use more than the memory available. So far memory never really was a problem, but I'll keep these suggestions in mind. > 16MB should work well We already thought of increasing that, will do so now. >> effective_cache_size = 8192MB > > 12-14GB looks better Thank you, I was rather unsure on this on. > you use full_text_search ? Not anymore, probably a leftover. > do you monitor the 'locks' ? and the commit/rollbacks ? No, but I'll look into doing that. Thanks a lot for the feedback again, Michael
2011/1/27 Michael Kohl <michael.kohl@tupalo.com>: > Cédric, thanks a lot for your answer so far! > > On Thu, Jan 27, 2011 at 12:24 PM, Cédric Villemain > <cedric.villemain.debian@gmail.com> wrote: > >> you have swap used, IO on the swap partition ? > > Memory-wise we are fine. > >> can you paste the /proc/meminfo ? > > Sure: > > # cat /proc/meminfo > MemTotal: 16461012 kB > MemFree: 280440 kB > Buffers: 60984 kB > Cached: 13757080 kB > SwapCached: 6112 kB > Active: 7049744 kB > Inactive: 7716308 kB > Active(anon): 2743696 kB > Inactive(anon): 2498056 kB > Active(file): 4306048 kB > Inactive(file): 5218252 kB > Unevictable: 0 kB > Mlocked: 0 kB > SwapTotal: 999992 kB > SwapFree: 989496 kB > Dirty: 3500 kB > Writeback: 0 kB > AnonPages: 943752 kB > Mapped: 4114916 kB > Shmem: 4293312 kB > Slab: 247036 kB > SReclaimable: 212788 kB > SUnreclaim: 34248 kB > KernelStack: 3144 kB > PageTables: 832768 kB > NFS_Unstable: 0 kB > Bounce: 0 kB > WritebackTmp: 0 kB > CommitLimit: 9230496 kB the commitlimit looks to low, it is because your swap partition is small. You need to either enlarge the swap partition, or change the vm.overcommit_ratio if you want to be able to use more of your mermory sanely. ( see kernel/Documentation/filesystems/proc.txt for the explanations on the formula : CommitLimit = ('vm.overcommit_ratio' * Physical RAM) + Swap ) > Committed_AS: 5651528 kB this is way under CommitLimit so you are good. (it is rare to be limited by that anyway, and your perf issues are not relative to that) > VmallocTotal: 34359738367 kB > VmallocUsed: 51060 kB > VmallocChunk: 34350787468 kB > HardwareCorrupted: 0 kB > HugePages_Total: 0 > HugePages_Free: 0 > HugePages_Rsvd: 0 > HugePages_Surp: 0 > Hugepagesize: 2048 kB > DirectMap4k: 7936 kB > DirectMap2M: 16760832 kB > >> Also turn on log_checkpoint if it is not already and check the >> duration to write the data. > > Will do, thanks! > >> You didn't said the DB size (and size of active part of it), it would help here. > > => select pg_size_pretty(pg_database_size('xxx')); > pg_size_pretty > ---------------- > 32 GB > (1 row) > >> it is too much with 200 connections. you may experiment case where you >> try to use more than the memory available. > > So far memory never really was a problem, but I'll keep these > suggestions in mind. > >> 16MB should work well > > We already thought of increasing that, will do so now. > >>> effective_cache_size = 8192MB >> >> 12-14GB looks better > > Thank you, I was rather unsure on this on. > >> you use full_text_search ? > > Not anymore, probably a leftover. > >> do you monitor the 'locks' ? and the commit/rollbacks ? > > No, but I'll look into doing that. It may help to find what is the issue. -- Cédric Villemain 2ndQuadrant http://2ndQuadrant.fr/ PostgreSQL : Expertise, Formation et Support
On Thursday, January 27, 2011 12:24:10 PM Cédric Villemain wrote: > > maintenance_work_mem = 512MB > 128MB is usualy enough Uhm, I don't want to be picky, but thats not really my experience. Sorts for index creation are highly dependent on a high m_w_m. Quite regularly I find the existing 1GB limit a probleme here... Andres
On Thu, Jan 27, 2011 at 1:30 PM, Justin Pitts <justinpitts@gmail.com> wrote: > That is a foot-gun waiting to go off. Thanks, I had already changed this after Cedric's mail. >> HDD: 2x 120 GB OCZ Vertex 2 SSD; RAID 1 >> random_page_cost = 2.0 > I thought these drives were a lot better at random IO than this gives > them credit for. I'll look into that. Thanks a lot, Michael
2011/1/27 Andres Freund <andres@anarazel.de>: > On Thursday, January 27, 2011 12:24:10 PM Cédric Villemain wrote: >> > maintenance_work_mem = 512MB >> 128MB is usualy enough > Uhm, I don't want to be picky, but thats not really my experience. Sorts for > index creation are highly dependent on a high m_w_m. Quite regularly I find the > existing 1GB limit a probleme here... That is right for index creation, but not for 'pure' maintenance stuff. Once the database is running as usual, there is no really point to give auto-vacuum or auto-analyze much more (depend on the raid card memory too ...) -- Cédric Villemain 2ndQuadrant http://2ndQuadrant.fr/ PostgreSQL : Expertise, Formation et Support
On Thursday, January 27, 2011 02:23:48 PM Cédric Villemain wrote: > 2011/1/27 Andres Freund <andres@anarazel.de>: > > On Thursday, January 27, 2011 12:24:10 PM Cédric Villemain wrote: > >> > maintenance_work_mem = 512MB > >> > >> 128MB is usualy enough > > > > Uhm, I don't want to be picky, but thats not really my experience. Sorts > > for index creation are highly dependent on a high m_w_m. Quite regularly > > I find the existing 1GB limit a probleme here... > > That is right for index creation, but not for 'pure' maintenance > stuff. Once the database is running as usual, there is no really point > to give auto-vacuum or auto-analyze much more (depend on the raid card > memory too ...) Even that I cannot agree with, sorry ;-). If you have a database with much churn a high m_w_m helps to avoid multiple scans during vacuum of the database because the amount of dead tuples doesn't fit m_w_m. Andres
> Number of logical CPUs: 16 (4x Quadcore Xeon E5520 @ 2.27GHz) > RAM: 16GB > Concurrent connections (according to our monitoring tool): 7 (min), 74 > (avg), 197 (max) Your current issue may be IO wait, but a connection pool isn't far off in your future either. > max_connections = 200 > work_mem = 256MB That is a foot-gun waiting to go off. If 32 queries manage to simultaneously each need 256MB to sort, your cache is blown out and the server is out of RAM. If your application is like most, you need a huge work_mem for, maybe, 1% of your queries. You can request it high on a per connection/per query basis for the queries that need it, and set the default to a low, safe figure. > HDD: 2x 120 GB OCZ Vertex 2 SSD; RAID 1 > random_page_cost = 2.0 I thought these drives were a lot better at random IO than this gives them credit for. The are certainly no better at sequential IO than (good) conventional drives. You might have a lot of room to turn this down even smaller.
On 1/27/2011 4:31 AM, Michael Kohl wrote: > Hi all, > > we are running a fairly big Ruby on Rails application on Postgres 8.4. > Our traffic grew quite a bit lately, and since then we are facing DB > performance issues. System load occasionally explodes (around 170 > yesterday on a 16 core system), which seems to be caused by disk I/O > (iowait in our Munin graphs goes up significantly during these > periods). At other times the laod stays rather low under pretty much > the same circumstances. > > There are 6 application servers with 18 unicorns each, as well as 12 > beanstalk workers talking to the DB. I know the problem description is > very vague, but so far we haven't consistently managed to reproduce > the problem. Turning of the beanstalk workers usually leads to a great > decreases in writes and system loads, but during yesterday's debugging > session they obviously ran fine (thanks, Murphy). > > Below you'll find our system information and Postgres config, maybe > someone could be so kind as to point out any obvious flaws in our > current configuration while I'm trying to get a better description of > the underlying problem. > <SNIP> If the suggestions below are not enough, you might have to check some of your sql statements and make sure they are all behaving. You may not notice a table scan when the user count is low, but you will when it gets higher. Have you run each of your queries through explain analyze lately? Have you checked for bloat? You are vacuuming/autovacuuming, correct? -Andy
On Thu, Jan 27, 2011 at 4:06 PM, Andy Colson <andy@squeakycode.net> wrote: > Have you run each of your queries through explain analyze lately? A code review including checking of queries is on our agenda. > You are vacuuming/autovacuuming, correct? Sure :-) Thank you, Michael
On Thu, Jan 27, 2011 at 8:09 AM, Michael Kohl <michael.kohl@tupalo.com> wrote: > On Thu, Jan 27, 2011 at 4:06 PM, Andy Colson <andy@squeakycode.net> wrote: >> Have you run each of your queries through explain analyze lately? > > A code review including checking of queries is on our agenda. A good method to start is to log long running queries and then explain analyze just them.
On 1/27/2011 9:09 AM, Michael Kohl wrote: > On Thu, Jan 27, 2011 at 4:06 PM, Andy Colson<andy@squeakycode.net> wrote: >> Have you run each of your queries through explain analyze lately? > > A code review including checking of queries is on our agenda. > >> You are vacuuming/autovacuuming, correct? > > Sure :-) > > Thank you, > Michael > Oh, also, when the box is really busy, have you watched vmstat to see if you start swapping? -Andy
* Michael Kohl (michael.kohl@tupalo.com) wrote: > HDD: 2x 120 GB OCZ Vertex 2 SSD; RAID 1 I'm amazed no one else has mentioned this yet, but you should look into splitting your data and your WALs. Obviously, having another set of SSDs to put your WALs on would be ideal. You should probably also be looking into adjustments to the background writer. It sounds like you're getting hit by large checkpoint i/o (if you turn on logging of that, as someone else suggested, you'll be able to corrollate the times), which can be helped by increasing the amount of writing done between checkpoints, so that the checkpoints aren't as big and painful. That can be done by making the background writer more aggressive. Thanks, Stephen
Вложения
On Thu, Jan 27, 2011 at 10:54 AM, Stephen Frost <sfrost@snowman.net> wrote: > * Michael Kohl (michael.kohl@tupalo.com) wrote: >> HDD: 2x 120 GB OCZ Vertex 2 SSD; RAID 1 > > I'm amazed no one else has mentioned this yet, but you should look into > splitting your data and your WALs. Obviously, having another set of > SSDs to put your WALs on would be ideal. Actually spinning media would be a better choice. A pair of fast 15krpm drives in a mirror will almost always outrun an SSD for sequential write speed. Even meh-grade 7200RPM SATA drives will win. > You should probably also be looking into adjustments to the background > writer. It sounds like you're getting hit by large checkpoint i/o > (if you turn on logging of that, as someone else suggested, you'll be > able to corrollate the times), which can be helped by increasing the > amount of writing done between checkpoints, so that the checkpoints > aren't as big and painful. That can be done by making the background > writer more aggressive. This++. Increasing checkpoint segments can make a huge difference. We run 64 segments in production and it's a world of difference from the stock setting.
On Thu, Jan 27, 2011 at 10:20 AM, Andy Colson <andy@squeakycode.net> wrote: > On 1/27/2011 9:09 AM, Michael Kohl wrote: >> >> On Thu, Jan 27, 2011 at 4:06 PM, Andy Colson<andy@squeakycode.net> wrote: >>> >>> Have you run each of your queries through explain analyze lately? >> >> A code review including checking of queries is on our agenda. >> >>> You are vacuuming/autovacuuming, correct? >> >> Sure :-) >> >> Thank you, >> Michael >> > > Oh, also, when the box is really busy, have you watched vmstat to see if you > start swapping? Setting sysstat service to run so you can see what your disks were doing in the last 7 days is useful too. Makes it much easier to figure things out afterwards when you have history of what has been happening.
On Thursday, January 27, 2011 07:13:17 PM Scott Marlowe wrote: > On Thu, Jan 27, 2011 at 10:54 AM, Stephen Frost <sfrost@snowman.net> wrote: > > * Michael Kohl (michael.kohl@tupalo.com) wrote: > >> HDD: 2x 120 GB OCZ Vertex 2 SSD; RAID 1 > > > > I'm amazed no one else has mentioned this yet, but you should look into > > splitting your data and your WALs. Obviously, having another set of > > SSDs to put your WALs on would be ideal. > > Actually spinning media would be a better choice. A pair of fast > 15krpm drives in a mirror will almost always outrun an SSD for > sequential write speed. Even meh-grade 7200RPM SATA drives will win. Unless he is bulk loading or running with synchronous_commit=off sequential speed wont be the limit for WAL. The number of syncs will be the limit. Andres
Another advice is to look the presentation of Alexander Dymo, on the RailsConf2009 called: Advanced Performance Optimizationof Rails Applications available on http://en.oreilly.com/rails2009/public/schedule/detail/8615 This talk are focused on Rails and PostgreSQL, based on the development of the Acunote ´s Project Management Platform http://blog.pluron.com ----- Mensaje original ----- De: "Andy Colson" <andy@squeakycode.net> Para: "Michael Kohl" <michael.kohl@tupalo.com> CC: pgsql-performance@postgresql.org Enviados: Jueves, 27 de Enero 2011 12:20:18 GMT -05:00 Región oriental EE. UU./Canadá Asunto: Re: [PERFORM] High load, On 1/27/2011 9:09 AM, Michael Kohl wrote: > On Thu, Jan 27, 2011 at 4:06 PM, Andy Colson<andy@squeakycode.net> wrote: >> Have you run each of your queries through explain analyze lately? > > A code review including checking of queries is on our agenda. > >> You are vacuuming/autovacuuming, correct? > > Sure :-) > > Thank you, > Michael > Oh, also, when the box is really busy, have you watched vmstat to see if you start swapping? -Andy -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance -- Ing. Marcos Luís Ortíz Valmaseda System Engineer -- Database Administrator Centro de Tecnologías de Gestión de Datos (DATEC) Universidad de las Ciencias Informáticas http://postgresql.uci.cu
On Thu, Jan 27, 2011 at 6:05 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote: > A good method to start is to log long running queries and then explain > analyze just them. We are already doing the logging part, we are just a bit behind on the "explain analyze" part of things. One day soon... Thanks, Michael
Michael Kohl wrote: > We are already doing the logging part, we are just a bit behind on the > "explain analyze" part of things. One day soon... > > There is, of course, the auto_explain module which will do that for you. -- Mladen Gogala Sr. Oracle DBA 1500 Broadway New York, NY 10036 (212) 329-5251 www.vmsinfo.com
On 27/01/2011 11:31, Michael Kohl wrote: > Hi all, > > we are running a fairly big Ruby on Rails application on Postgres 8.4. > Our traffic grew quite a bit lately, and since then we are facing DB > performance issues. System load occasionally explodes (around 170 > yesterday on a 16 core system), which seems to be caused by disk I/O > (iowait in our Munin graphs goes up significantly during these > periods). At other times the laod stays rather low under pretty much > the same circumstances. Is there any way you can moderate the number of total active connections to the database to approximately match the number of (logical) CPU cores on your system? I.e. some kind of connection pool or connection limiting? This should help you in more ways than one (limit PG lock contention, limit parallel disk IO).
On Thu, Jan 27, 2011 at 5:31 AM, Michael Kohl <michael.kohl@tupalo.com> wrote: > we are running a fairly big Ruby on Rails application on Postgres 8.4. > Our traffic grew quite a bit lately, and since then we are facing DB > performance issues. System load occasionally explodes (around 170 > yesterday on a 16 core system), which seems to be caused by disk I/O > (iowait in our Munin graphs goes up significantly during these > periods). At other times the laod stays rather low under pretty much > the same circumstances. [...] > [1] http://www.linux.com/learn/tutorials/394523-configuring-postgresql-for-pretty-good-performance > [2] http://www.pgcon.org/2010/schedule/events/210.en.html At the risk of shameless self-promotion, you might also find this helpful: http://rhaas.blogspot.com/2010/12/troubleshooting-database.html It's fairly basic but it might at least get you pointed in the right direction... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Michael Kohl wrote: > HDD: 2x 120 GB OCZ Vertex 2 SSD; RAID 1 > As a general warning here, as far as I know the regular Vertex 2 SSD doesn't cache writes properly for database use. It's possible to have a crash that leaves the database corrupted, if the drive has writes queued up in its cache. The Vertex 2 Pro resolves this issue with a supercap, you may have a model with concerns here. See http://wiki.postgresql.org/wiki/Reliable_Writes for more information. In addition to the log_checkpoints suggestion already made, I'd also recommend turning on log_lock_waits and log_temp_files on your server. All three of those--checkpoints, locks, and unexpected temp file use--can cause the sort of issue you're seeing. Well, not locks so much given you're seeing heavy disk I/O, but it's good to start logging those issues before they get bad, too. -- Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us "PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books
On Thu, Jan 27, 2011 at 6:36 AM, Michael Kohl <michael.kohl@tupalo.com> wrote: > Cédric, thanks a lot for your answer so far! > > On Thu, Jan 27, 2011 at 12:24 PM, Cédric Villemain > <cedric.villemain.debian@gmail.com> wrote: > >> you have swap used, IO on the swap partition ? > > Memory-wise we are fine. > >> can you paste the /proc/meminfo ? > > Sure: > > # cat /proc/meminfo > MemTotal: 16461012 kB > MemFree: 280440 kB > Buffers: 60984 kB > Cached: 13757080 kB > SwapCached: 6112 kB > Active: 7049744 kB > Inactive: 7716308 kB > Active(anon): 2743696 kB > Inactive(anon): 2498056 kB > Active(file): 4306048 kB > Inactive(file): 5218252 kB > Unevictable: 0 kB > Mlocked: 0 kB > SwapTotal: 999992 kB > SwapFree: 989496 kB > Dirty: 3500 kB > Writeback: 0 kB > AnonPages: 943752 kB > Mapped: 4114916 kB > Shmem: 4293312 kB > Slab: 247036 kB > SReclaimable: 212788 kB > SUnreclaim: 34248 kB > KernelStack: 3144 kB > PageTables: 832768 kB > NFS_Unstable: 0 kB > Bounce: 0 kB > WritebackTmp: 0 kB > CommitLimit: 9230496 kB > Committed_AS: 5651528 kB > VmallocTotal: 34359738367 kB > VmallocUsed: 51060 kB > VmallocChunk: 34350787468 kB > HardwareCorrupted: 0 kB > HugePages_Total: 0 > HugePages_Free: 0 > HugePages_Rsvd: 0 > HugePages_Surp: 0 > Hugepagesize: 2048 kB > DirectMap4k: 7936 kB > DirectMap2M: 16760832 kB > >> Also turn on log_checkpoint if it is not already and check the >> duration to write the data. > > Will do, thanks! > >> You didn't said the DB size (and size of active part of it), it would help here. > > => select pg_size_pretty(pg_database_size('xxx')); > pg_size_pretty > ---------------- > 32 GB > (1 row) > Here I am still a big fan of setting shared_buffers=8GB for dbsize of 32GB that is a 25% in bufferpool ration effective cache size then will be more like 8GB. The only time this will hurt is you have more sequential access than random which wont be populated in the shared_buffer but chances of that being the problem is lowered with your random_page_cost set to 2.0 or lower. Also I am a big fan of separating the WAL and data separately which gives two advantages and monitoring the IO that way so you know where your IO are coming from.. WAL or DATA and then further tuning can be done according to what you see. Also SSDs sometimes have trouble with varying sizes of WAL writes so response times for WAL writes varies quite a bit and can confuse SSDs. -Jignesh >> it is too much with 200 connections. you may experiment case where you >> try to use more than the memory available. > > So far memory never really was a problem, but I'll keep these > suggestions in mind. > >> 16MB should work well > > We already thought of increasing that, will do so now. > >>> effective_cache_size = 8192MB >> >> 12-14GB looks better > > Thank you, I was rather unsure on this on. > >> you use full_text_search ? > > Not anymore, probably a leftover. > >> do you monitor the 'locks' ? and the commit/rollbacks ? > > No, but I'll look into doing that. > > Thanks a lot for the feedback again, > Michael > > -- > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance >