Обсуждение: bgwriter, checkpoints, curious (seeing delays)
Okay ladies and gents and the rest of you :) It's time I dig into another issue, and that's a curious 5 second delay on connect, on occasion. Now, I believe the network to be sound and there are zero errors on any servers, no retrans, no runts, nada nada nada. However I will continue to run tests with/without dns, tcpdumps, tracking communications, handshakes, however. I've been doing some reading and what did I find. I found "Checkpoints are very disrupting to your database performance and can cause connections to stall for up to a few seconds while they occur." Now I'm reaching here, but wondered. My issue is what appears to be a random 5 second connection delay. Connection times over a 24 hour period from my application servers, which connect to my query servers (postgres 8.3.4, slony 1.2.15), big 8 way boxes with 16-32gb ram Conn time in seconds and count. (50 kind of a round even number eh!) '1.0-2.0' => 4, '5.0-' => 50, '0.5-1.0' => 6, '0.0-0.5' => 155,632 So the 5 second makes us instantly think network but I want to hit this from multiple angles, so figured I would reach out to the brain trust here for some ideas. Here is what may be interesting and may point me into a direction of further tuning and adjusting. postgres=# select * from pg_stat_bgwriter; checkpoints_timed | checkpoints_req | buffers_checkpoint | buffers_clean | maxwritten_clean | buffers_backend | buffers_alloc -------------------+-----------------+--------------------+---------------+------------------+-----------------+--------------- 34820 | 207 | 444406118 | 214634102 | 1274873 | 333375850 | 3370607455 (1 row) Now I'll be honest, I have nothing special in my configs for bg_writer, in fact other than enabling log_checkpoints today, I have nothing but defaults for bg_*. but the fact that checkpoints_timed is so high, seams to maybe point to some required tuning? Things of interest. Memory 24gb 8 way processor postgres 8.3.4 shared_buffers = 1500MB # min 128kB or max_connections*16kB max_prepared_transactions = 0 # can be 0 or more work_mem = 100MB # min 64kB maintenance_work_mem = 128MB # min 1MB max_fsm_pages = 500000 # min max_fsm_relations*16, 6 bytes each max_fsm_relations = 225 # min 100, ~70 bytes each fsync = off # turns forced synchronization on or off checkpoint_segments = 100 # in logfile segments, min 1, 16MB each checkpoint_warning = 3600s # 0 is off So nothing exciting in my configs (300 connections).. So my guess is "IF" a really big if, the delay is actually due to checkpoints (perfect lil storm, timing) than there should be ways for me to tune this ya? Thanks "again" for your assistance Tory
On Thu, Feb 25, 2010 at 10:12 PM, Tory M Blue <tmblue@gmail.com> wrote: > Okay ladies and gents and the rest of you :) > > It's time I dig into another issue, and that's a curious 5 second > delay on connect, on occasion. Now, I believe the network to be sound > and there are zero errors on any servers, no retrans, no runts, nada > nada nada. However I will continue to run tests with/without dns, > tcpdumps, tracking communications, handshakes, however. > > I've been doing some reading and what did I find. I found "Checkpoints > are very disrupting to your database performance and can cause > connections to stall for up to a few seconds while they occur." Quick added note, sorry. "checkpoint_completion_target (floating point) Specifies the target length of checkpoints, as a fraction of the checkpoint interval. The default is 0.5. This parameter can only be set in the postgresql.conf file or on the server command line. " interesting that it's a .5 second default setting and I'm seeing exactly that .5 second delay. Again could be reaching here! Thanks for putting up with me Tory
Friday, February 26, 2010, 7:20:38 AM you wrote: > "checkpoint_completion_target (floating point) > interesting that it's a .5 second default setting and I'm seeing > exactly that .5 second delay. It's not an exact time, but a multiplier to 'checkpoint_timeout'. So a setting of .5 with a timeout of 300 seconds means a checkpoint should be completed after 300*0.5 = 150 seconds. -- Jochen Erwied | home: jochen@erwied.eu +49-208-38800-18, FAX: -19 Sauerbruchstr. 17 | work: joe@mbs-software.de +49-2151-7294-24, FAX: -50 D-45470 Muelheim | mobile: jochen.erwied@vodafone.de +49-173-5404164
On Thu, 2010-02-25 at 22:12 -0800, Tory M Blue wrote: > shared_buffers = 1500MB Some people tend to increase this to 2.2GB(32-bit) or 4-6 GB (64 bit), if needed. Please note that more shared_buffers will lead to more pressure on bgwriter, but it also has lots of benefits, too. > work_mem = 100MB This is too much. Since you have 300 connections, you will probably swap because of this setting, since each connection may use this much work_mem. The rule of the thumb is to set this to a lower general value (say, 1-2 MB), and set it per-query when needed. > checkpoint_segments = 100 > checkpoint_warning = 3600s What about checkpoint_timeout? Please note that even if checkpoint_segments = 100, if timeout value is low (say 5 mins), PostgreSQL will probably checkpoint every checkpoint_timeout minutes (unless PostgreSQL creates $checkpoint_segments xlogs before checkpoint_timeout value). Depending on your workload, this may not be intended, and it may cause spikes -- which will lead to the issues you complain. I'll stop here, and suggest you read this: http://www.westnet.com/~gsmith/content/postgresql/chkp-bgw-83.htm for details about this subject. As noted there, if you are running 8.3+, pg_stat_bgwriter will help you to tune checkpoint & bgwriter settings. -HTH. -- Devrim GÜNDÜZ PostgreSQL Danışmanı/Consultant, Red Hat Certified Engineer devrim~gunduz.org, devrim~PostgreSQL.org, devrim.gunduz~linux.org.tr http://www.gunduz.org Twitter: http://twitter.com/devrimgunduz
Вложения
2010/2/25 Devrim GÜNDÜZ <devrim@gunduz.org>: > On Thu, 2010-02-25 at 22:12 -0800, Tory M Blue wrote: >> shared_buffers = 1500MB > > Some people tend to increase this to 2.2GB(32-bit) or 4-6 GB (64 bit), > if needed. Please note that more shared_buffers will lead to more > pressure on bgwriter, but it also has lots of benefits, too. > >> work_mem = 100MB > > This is too much. Since you have 300 connections, you will probably swap > because of this setting, since each connection may use this much > work_mem. The rule of the thumb is to set this to a lower general value > (say, 1-2 MB), and set it per-query when needed. > >> checkpoint_segments = 100 >> checkpoint_warning = 3600s > > What about checkpoint_timeout? Please note that even if > checkpoint_segments = 100, if timeout value is low (say 5 mins), > PostgreSQL will probably checkpoint every checkpoint_timeout minutes > (unless PostgreSQL creates $checkpoint_segments xlogs before > checkpoint_timeout value). Depending on your workload, this may not be > intended, and it may cause spikes -- which will lead to the issues you > complain. > > I'll stop here, and suggest you read this: > > http://www.westnet.com/~gsmith/content/postgresql/chkp-bgw-83.htm > > for details about this subject. As noted there, if you are running 8.3+, > pg_stat_bgwriter will help you to tune checkpoint & bgwriter settings. > > -HTH. Checkpoint_timeout is the default and that looks like 5 mins (300 seconds). And is obviously why I have such a discrepancy between time reached and requested. Thank you sir, that's actually the page that I've spent much of my time on this eve :) I'll continue to read and check my configuration settings. Tory
2010/2/25 Tory M Blue <tmblue@gmail.com>: > 2010/2/25 Devrim GÜNDÜZ <devrim@gunduz.org>: >> On Thu, 2010-02-25 at 22:12 -0800, Tory M Blue wrote: >>> shared_buffers = 1500MB >> >> Some people tend to increase this to 2.2GB(32-bit) or 4-6 GB (64 bit), >> if needed. Please note that more shared_buffers will lead to more >> pressure on bgwriter, but it also has lots of benefits, too. >> >>> work_mem = 100MB >> >> This is too much. Since you have 300 connections, you will probably swap >> because of this setting, since each connection may use this much >> work_mem. The rule of the thumb is to set this to a lower general value >> (say, 1-2 MB), and set it per-query when needed. >> >>> checkpoint_segments = 100 >>> checkpoint_warning = 3600s >> >> What about checkpoint_timeout? Please note that even if >> checkpoint_segments = 100, if timeout value is low (say 5 mins), >> PostgreSQL will probably checkpoint every checkpoint_timeout minutes >> (unless PostgreSQL creates $checkpoint_segments xlogs before >> checkpoint_timeout value). Depending on your workload, this may not be >> intended, and it may cause spikes -- which will lead to the issues you >> complain. >> >> I'll stop here, and suggest you read this: >> >> http://www.westnet.com/~gsmith/content/postgresql/chkp-bgw-83.htm >> >> for details about this subject. As noted there, if you are running 8.3+, >> pg_stat_bgwriter will help you to tune checkpoint & bgwriter settings. >> >> -HTH. > > Checkpoint_timeout is the default and that looks like 5 mins (300 > seconds). And is obviously why I have such a discrepancy between time > reached and requested. > > Thank you sir, that's actually the page that I've spent much of my > time on this eve :) I'll continue to read and check my configuration > settings. > > Tory Also since I set the log on today I have some log information regarding the checkpoints 2010-02-25 22:08:13 PST LOG: checkpoint starting: time 2010-02-25 22:10:41 PST LOG: checkpoint complete: wrote 44503 buffers (23.2%); 0 transaction log file(s) added, 0 removed, 20 recycled; write=148.539 s, sync=0.000 s, total=148.540 s 2010-02-25 22:13:13 PST LOG: checkpoint starting: time 2010-02-25 22:15:37 PST LOG: checkpoint complete: wrote 38091 buffers (19.8%); 0 transaction log file(s) added, 0 removed, 20 recycled; write=144.713 s, sync=0.000 s, total=144.714 s 2010-02-25 22:18:13 PST LOG: checkpoint starting: time 2010-02-25 22:20:42 PST LOG: checkpoint complete: wrote 38613 buffers (20.1%); 0 transaction log file(s) added, 0 removed, 19 recycled; write=149.870 s, sync=0.000 s, total=149.871 s 2010-02-25 22:23:13 PST LOG: checkpoint starting: time 2010-02-25 22:25:42 PST LOG: checkpoint complete: wrote 39009 buffers (20.3%); 0 transaction log file(s) added, 0 removed, 19 recycled; write=149.876 s, sync=0.000 s, total=149.877 s 2010-02-25 22:28:13 PST LOG: checkpoint starting: time 2010-02-25 22:30:43 PST LOG: checkpoint complete: wrote 30847 buffers (16.1%); 0 transaction log file(s) added, 0 removed, 19 recycled; write=150.000 s, sync=0.000 s, total=150.001 s 2010-02-25 22:33:13 PST LOG: checkpoint starting: time 2010-02-25 22:35:43 PST LOG: checkpoint complete: wrote 11917 buffers (6.2%); 0 transaction log file(s) added, 0 removed, 14 recycled; write=150.064 s, sync=0.000 s, total=150.065 s 2010-02-25 22:38:13 PST LOG: checkpoint starting: time 2010-02-25 22:40:43 PST LOG: checkpoint complete: wrote 10869 buffers (5.7%); 0 transaction log file(s) added, 0 removed, 5 recycled; write=149.995 s, sync=0.000 s, total=149.996 s 2010-02-25 22:43:13 PST LOG: checkpoint starting: time 2010-02-25 22:45:41 PST LOG: checkpoint complete: wrote 31424 buffers (16.4%); 0 transaction log file(s) added, 0 removed, 4 recycled; write=148.597 s, sync=0.000 s, total=148.598 s 2010-02-25 22:48:13 PST LOG: checkpoint starting: time 2010-02-25 22:50:42 PST LOG: checkpoint complete: wrote 33895 buffers (17.7%); 0 transaction log file(s) added, 0 removed, 17 recycled; write=149.825 s, sync=0.000 s, total=149.826 s 2010-02-25 22:53:13 PST LOG: checkpoint starting: time 2010-02-25 22:53:17 PST postgres postgres [local] LOG: unexpected EOF on client connection 2010-02-25 22:55:43 PST LOG: checkpoint complete: wrote 34155 buffers (17.8%); 0 transaction log file(s) added, 0 removed, 15 recycled; write=150.045 s, sync=0.000 s, total=150.046 s 2010-02-25 22:58:13 PST LOG: checkpoint starting: time 2010-02-25 23:00:41 PST LOG: checkpoint complete: wrote 33873 buffers (17.6%); 0 transaction log file(s) added, 0 removed, 15 recycled; write=148.223 s, sync=0.000 s, total=148.224 s 2010-02-25 23:03:13 PST LOG: checkpoint starting: time
On Thu, 2010-02-25 at 23:01 -0800, Tory M Blue wrote: > Checkpoint_timeout is the default and that looks like 5 mins (300 > seconds). And is obviously why I have such a discrepancy between time > reached and requested. If you have a high load, you may want to start tuning with 15 minutes, and bump it to 30 mins if needed. Also you may want to decrease segments value based on your findings, since increasing only one of them won't help you a lot. As I wrote before, pg_stat_bgwriter is your friend here. -- Devrim GÜNDÜZ PostgreSQL Danışmanı/Consultant, Red Hat Certified Engineer devrim~gunduz.org, devrim~PostgreSQL.org, devrim.gunduz~linux.org.tr http://www.gunduz.org Twitter: http://twitter.com/devrimgunduz
Вложения
Tory M Blue wrote: > 2010-02-25 22:10:41 PST LOG: checkpoint complete: wrote 44503 > buffers (23.2%); 0 transaction log file(s) added, 0 removed, 20 > recycled; write=148.539 s, sync=0.000 s, total=148.540 s > This one is typical for your list so I'll only comment on it. This is writing out 350MB spread over 148 seconds, which means your background checkpoint I/O is about a 2.4MB/s stream. That's a moderate amount that could be tough for some systems, but note that your "sync" time is nothing. Generally, when someone sees a long pause that's caused by a checkpoint, the sync number is really high. Your disks seem to be keeping up with the checkpoint overhead moderately well. (Mind you, the zero sync time is because you have 'fsync = off ', which will eventually result in your database being corrupted after a badly timed outage one day; you really don't want to do that) My guess is your connections are doing some sort of DNS operation that periodically stalls waiting for a 5-second timeout. There's nothing in your checkpoint data suggesting it's a likely cause of a delay that long--and it would be a lot more random if that were the case, too. Bad checkpoint spikes will be seconds sometimes, no time at all others; a heavy grouping at 5 seconds doesn't match the pattern they have at all. -- Greg Smith 2ndQuadrant US Baltimore, MD PostgreSQL Training, Services and Support greg@2ndQuadrant.com www.2ndQuadrant.us
Tory M Blue wrote: > 2010-02-25 22:53:13 PST LOG: checkpoint starting: time > 2010-02-25 22:53:17 PST postgres postgres [local] LOG: unexpected > EOF on client connection > 2010-02-25 22:55:43 PST LOG: checkpoint complete: wrote 34155 > buffers (17.8%); 0 transaction log file(s) added, 0 removed, 15 > recycled; write=150.045 s, sync=0.000 s, total=150.046 s Did that unexpected EOF correspond to a connection attempt that gave up based on time? -Kevin
On Fri, Feb 26, 2010 at 5:09 AM, Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote: > Tory M Blue wrote: > >> 2010-02-25 22:53:13 PST LOG: checkpoint starting: time >> 2010-02-25 22:53:17 PST postgres postgres [local] LOG: unexpected >> EOF on client connection >> 2010-02-25 22:55:43 PST LOG: checkpoint complete: wrote 34155 >> buffers (17.8%); 0 transaction log file(s) added, 0 removed, 15 >> recycled; write=150.045 s, sync=0.000 s, total=150.046 s > > Did that unexpected EOF correspond to a connection attempt that gave > up based on time? > > -Kevin > Kevin Good question, I'm unclear what that was. I mean it's a LOG, so not a client connection, that really kind of confused me. I don't normally see EOF of client and an EOF on client from local, that's really really weird Tory
On Feb 26, 2010, at 11:23 AM, Tory M Blue wrote: > On Fri, Feb 26, 2010 at 5:09 AM, Kevin Grittner > <Kevin.Grittner@wicourts.gov> wrote: >> Tory M Blue wrote: >> >>> 2010-02-25 22:53:13 PST LOG: checkpoint starting: time >>> 2010-02-25 22:53:17 PST postgres postgres [local] LOG: unexpected >>> EOF on client connection >>> 2010-02-25 22:55:43 PST LOG: checkpoint complete: wrote 34155 >>> buffers (17.8%); 0 transaction log file(s) added, 0 removed, 15 >>> recycled; write=150.045 s, sync=0.000 s, total=150.046 s >> >> Did that unexpected EOF correspond to a connection attempt that gave >> up based on time? >> >> -Kevin >> > Kevin > > Good question, I'm unclear what that was. I mean it's a LOG, so not a > client connection, that really kind of confused me. I don't normally > see EOF of client and an EOF on client from local, that's really > really weird We see that from our monitoring software testing port 5432.
2010/2/25 Devrim GÜNDÜZ <devrim@gunduz.org>: > On Thu, 2010-02-25 at 22:12 -0800, Tory M Blue wrote: >> shared_buffers = 1500MB > > Some people tend to increase this to 2.2GB(32-bit) or 4-6 GB (64 bit), > if needed. Please note that more shared_buffers will lead to more > pressure on bgwriter, but it also has lots of benefits, too. > >> work_mem = 100MB > > This is too much. Since you have 300 connections, you will probably swap > because of this setting, since each connection may use this much > work_mem. The rule of the thumb is to set this to a lower general value > (say, 1-2 MB), and set it per-query when needed. I'm slightly confused. Most things I've read, including running pg_tune for grins puts this around 100MB, 98MB for pgtune. 1-2MB just seems really low to me. And Ignore the 300 connections, thats an upper limit, I usually run a max of 40-45 but usually around 20 connections per sec. Also is there a way to log if there are any deadlocks happening (I'm not seeing them in logs) deadlock_timeout = 5s Tory
2010/2/25 Devrim GÜNDÜZ <devrim@gunduz.org>: > On Thu, 2010-02-25 at 23:01 -0800, Tory M Blue wrote: > >> Checkpoint_timeout is the default and that looks like 5 mins (300 >> seconds). And is obviously why I have such a discrepancy between time >> reached and requested. > > If you have a high load, you may want to start tuning with 15 minutes, > and bump it to 30 mins if needed. Also you may want to decrease segments > value based on your findings, since increasing only one of them won't > help you a lot. > > As I wrote before, pg_stat_bgwriter is your friend here. Actually these servers have almost no load. Really they run very cool Load Average (for the day): Cur: 0.16 Load Avg: 0.22 Load So I don't think it's load No network issues (that I've found) and while the server will eventually eat all the memory it's currently sitting with 4 gb free. Mem Total (for the day): Cur: 25.55 GBytes Avg: 25.55 GBytes Max: 25.55 GBytes Mem Available (for the day): Cur: 4.72 GBytes Avg: 5.15 GBytes Max: 5.71 GBytes Bytes Used (for the day): Cur: 20.83 GBytes Avg: 20.40 GBytes Max: 21.20 GBytes Thanks for your pointers, I'm continuing to look and will do some tests today. I also hear you about fsync and will do some testing here to see why this was set (been running for 4-6 years), and that setting was probably set way way back in the day and it's survived each upgrade/hardware/storage update. Tory
>>> Tory M Blue <tmblue@gmail.com> 02/26/10 12:52 PM >>> >> >> This is too much. Since you have 300 connections, you will probably swap >> because of this setting, since each connection may use this much >> work_mem. The rule of the thumb is to set this to a lower general value >> (say, 1-2 MB), and set it per-query when needed. > > I'm slightly confused. Most things I've read, including running > pg_tune for grins puts this around 100MB, 98MB for pgtune. 1-2MB just > seems really low to me. And Ignore the 300 connections, thats an upper > limit, I usually run a max of 40-45 but usually around 20 connections >per sec. It has been said in the list before that pg_tune is extremely aggressive when it comes to work_mem. 100MB is just a whole lot of memory for something that is dedicated mostly to sorting. Some of my relatively heavy duty queries,which end up manipulating hundreds of thousands of rows in subqueries, do just fine with quite a bit less. 1-2MB is good enough for many families of queries, but it's hard to say what the right default should be for you. The rightnumber can be estimated by running explain analyze on your most common queries, with parameters that are representativeto regular use, and see how much memory they actually claim to use. In my case, for example, most of my queriesdo just fine with 10 MB, while the reporting queries that accumulate quite a bit of deta request up to 60MB. If your average query needs 100 MB, it'd still mean that 40 connections take 4 gigs worth of work memory, which might bebetter spent caching the database. Now, if your system is so over-specced that wasting a few gigs of RAM doesn't impact your performance one bit, then you mightnot have to worry about this at all.
> -----Mensaje original----- > De: Tory M Blue > > 2010/2/25 Devrim GÜNDÜZ <devrim@gunduz.org>: > > On Thu, 2010-02-25 at 22:12 -0800, Tory M Blue wrote: > >> shared_buffers = 1500MB > > > > Some people tend to increase this to 2.2GB(32-bit) or 4-6 > GB (64 bit), > > if needed. Please note that more shared_buffers will lead to more > > pressure on bgwriter, but it also has lots of benefits, too. > > > >> work_mem = 100MB > > > > This is too much. Since you have 300 connections, you will probably > > swap because of this setting, since each connection may use > this much > > work_mem. The rule of the thumb is to set this to a lower general > > value (say, 1-2 MB), and set it per-query when needed. > > I'm slightly confused. Most things I've read, including > running pg_tune for grins puts this around 100MB, 98MB for > pgtune. 1-2MB just seems really low to me. And Ignore the > 300 connections, thats an upper limit, I usually run a max of > 40-45 but usually around 20 connections per sec. > If you have maximum 45 users running simultaneously a rather complex query that requires... say 3 sorts, thats 45 x 100MB x 3 = 13.5 GB of RAM used up because of this particular work_mem setting. Doesn't mean it will happen just that your settings make this scenario possible. So, to avoid this scenario, the suggestion is to keep work_mem low and adjust it per query if required. I find 1-2 MB too low for my particular requirements so I have it in 8 MB. Anyway, due to your server having a lot of RAM your setting might make sense. But all that memory would probably be better used if it was available for caching. > > Also is there a way to log if there are any deadlocks > happening (I'm not seeing them in logs) deadlock_timeout = 5s > In postgresql.conf: log_lock_waits = on # log lock waits >= deadlock_timeout Regards, Fernando.
On Fri, Feb 26, 2010 at 11:49 AM, Jorge Montero <jorge_montero@homedecorators.com> wrote: > > >>>> Tory M Blue <tmblue@gmail.com> 02/26/10 12:52 PM >>> >>> >>> This is too much. Since you have 300 connections, you will probably swap >>> because of this setting, since each connection may use this much >>> work_mem. The rule of the thumb is to set this to a lower general value >>> (say, 1-2 MB), and set it per-query when needed. > > 1-2MB is good enough for many families of queries, but it's hard to say what the right default should be for you. The rightnumber can be estimated by running explain analyze on your most common queries, with parameters that are representativeto regular use, and see how much memory they actually claim to use. In my case, for example, most of my queriesdo just fine with 10 MB, while the reporting queries that accumulate quite a bit of deta request up to 60MB. > > If your average query needs 100 MB, it'd still mean that 40 connections take 4 gigs worth of work memory, which might bebetter spent caching the database. Ya my boxes are pretty well stacked, but a question. How does one get the memory usage of a query. You state to look at explain analyze but this gives timing and costs, but is one of the numbers memory or do I have to take values and do some math? -------------------------------------------------------------------------------------------------------------------------- Function Scan on listings_search (cost=0.00..260.00 rows=1000 width=108) (actual time=904.374..904.383 rows=10 loops=1) Total runtime: 904.411 ms Thanks Tory Also don't think this 5 second thing is the DB.. Sure is not checkpoints.
On Fri, Feb 26, 2010 at 6:24 PM, Tory M Blue <tmblue@gmail.com> wrote: > Ya my boxes are pretty well stacked, but a question. How does one get > the memory usage of a query. You state to look at explain analyze but > this gives timing and costs, but is one of the numbers memory or do I > have to take values and do some math? There's really not good instrumentation on memory usage, except for sorts. 9.0 will add a bit of instrumentation for hashes. I might try 'top'. ...Robert