Обсуждение: autovacuum strategy / parameters
I have a DB with small and large tables that can go up to 15G. For performance benefits, it appears that analyze has much less cost than vacuum, but the same benefits? I can’t find any clear recommendations for frequencies and am considering these parameters: Autovacuum_vacuum_threshold = 50000 Autovacuum_analyze_threshold = 10000 Autovacuum_vacuum_scale_factor = 0.01 Autovacuum_analyze_scale_factor = 0.005 This appears it will result in table analyzes occurring around 10,000 to 85,000 dead tuples and vacuum occuring around 50,000 to 200,000, depending on the table sizes. Can anyone comment on whether this is the right strategy and targets to use?
On Wed, Apr 21, 2010 at 11:06 AM, Rick <richard.branton@ca.com> wrote: > I have a DB with small and large tables that can go up to 15G. > For performance benefits, it appears that analyze has much less cost > than vacuum, but the same benefits? Err, no. ANALYZE gathers statistics for the query planner; VACUUM clears out old, dead tuples so that space can be reused by the database system. > I can’t find any clear recommendations for frequencies and am > considering these parameters: > > Autovacuum_vacuum_threshold = 50000 > Autovacuum_analyze_threshold = 10000 > Autovacuum_vacuum_scale_factor = 0.01 > Autovacuum_analyze_scale_factor = 0.005 > > This appears it will result in table analyzes occurring around 10,000 > to 85,000 dead tuples and vacuum occuring around 50,000 to 200,000, > depending on the table sizes. > > Can anyone comment on whether this is the right strategy and targets > to use? I'm not that familiar with tuning these parameters but increasing the default thesholds by a thousand-fold doesn't seem like a good idea. Small tables will never get vacuumed or analyzed at all. ...Robert
On Apr 22, 2:55 pm, robertmh...@gmail.com (Robert Haas) wrote: > On Wed, Apr 21, 2010 at 11:06 AM, Rick <richard.bran...@ca.com> wrote: > > I have a DB with small and large tables that can go up to 15G. > > For performance benefits, it appears that analyze has much less cost > > than vacuum, but the same benefits? > > Err, no. ANALYZE gathers statistics for the query planner; VACUUM > clears out old, dead tuples so that space can be reused by the > database system. > > > I can’t find any clear recommendations for frequencies and am > > considering these parameters: > > > Autovacuum_vacuum_threshold = 50000 > > Autovacuum_analyze_threshold = 10000 > > Autovacuum_vacuum_scale_factor = 0.01 > > Autovacuum_analyze_scale_factor = 0.005 > > > This appears it will result in table analyzes occurring around 10,000 > > to 85,000 dead tuples and vacuum occuring around 50,000 to 200,000, > > depending on the table sizes. > > > Can anyone comment on whether this is the right strategy and targets > > to use? > > I'm not that familiar with tuning these parameters but increasing the > default thesholds by a thousand-fold doesn't seem like a good idea. > Small tables will never get vacuumed or analyzed at all. > > ...Robert > > -- > Sent via pgsql-performance mailing list (pgsql-performa...@postgresql.org) > To make changes to your subscription:http://www.postgresql.org/mailpref/pgsql-performance The problem is with the autovacuum formula: In a loop, autovacuum checks to see if number of dead tuples > ((number of live tuples * autovacuum_vacuum_scale_factor) + autovacuum_vacuum_threshold), and if so, it runs VACUUM. If not, it sleeps. It works the same way for ANALYZE. So, in a large table, the scale_factor is the dominant term. In a small table, the threshold is the dominant term. But both are taken into account. The default values are set for small tables; it is not being run for large tables. The question boils down to exactly what is the max number of dead tuples that should be allowed to accumulate before running analyze? Since vacuum just recovers space, that doesn't seem to be nearly as critical for performance? -Rick
Rick wrote: > So, in a large table, the scale_factor is the dominant term. In a > small > table, the threshold is the dominant term. But both are taken into > account. Correct. > The default values are set for small tables; it is not being run for > large tables. So decrease the scale factor and leave threshold alone. -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
On Thu, Apr 22, 2010 at 4:42 PM, Rick <richard.branton@ca.com> wrote: > On Apr 22, 2:55 pm, robertmh...@gmail.com (Robert Haas) wrote: >> On Wed, Apr 21, 2010 at 11:06 AM, Rick <richard.bran...@ca.com> wrote: >> > I have a DB with small and large tables that can go up to 15G. >> > For performance benefits, it appears that analyze has much less cost >> > than vacuum, but the same benefits? >> >> Err, no. ANALYZE gathers statistics for the query planner; VACUUM >> clears out old, dead tuples so that space can be reused by the >> database system. >> >> > I can’t find any clear recommendations for frequencies and am >> > considering these parameters: >> >> > Autovacuum_vacuum_threshold = 50000 >> > Autovacuum_analyze_threshold = 10000 >> > Autovacuum_vacuum_scale_factor = 0.01 >> > Autovacuum_analyze_scale_factor = 0.005 >> >> > This appears it will result in table analyzes occurring around 10,000 >> > to 85,000 dead tuples and vacuum occuring around 50,000 to 200,000, >> > depending on the table sizes. >> >> > Can anyone comment on whether this is the right strategy and targets >> > to use? >> >> I'm not that familiar with tuning these parameters but increasing the >> default thesholds by a thousand-fold doesn't seem like a good idea. >> Small tables will never get vacuumed or analyzed at all. >> >> ...Robert >> >> -- >> Sent via pgsql-performance mailing list (pgsql-performa...@postgresql.org) >> To make changes to your subscription:http://www.postgresql.org/mailpref/pgsql-performance > > The problem is with the autovacuum formula: > > In a loop, autovacuum checks to see if number of dead tuples > > ((number of live tuples * autovacuum_vacuum_scale_factor) + > autovacuum_vacuum_threshold), and if > so, it runs VACUUM. If not, it sleeps. It works the same way for > ANALYZE. > > So, in a large table, the scale_factor is the dominant term. In a > small > table, the threshold is the dominant term. But both are taken into > account. > > The default values are set for small tables; it is not being run for > large tables. > The question boils down to exactly what is the max number of dead > tuples that should be allowed to accumulate before running analyze? > Since vacuum just recovers space, that doesn't seem to be nearly as > critical for performance? That doesn't really match my experience. Without regular vacuuming, tables and indices end up being larger than they ought to be and contain large amounts of dead space that slows things down. How much of an impact that ends up having depends on how badly bloated they are and what you're trying to do, but it can get very ugly. My guess is that the reason we run ANALYZE more frequently than vacuum (with the default settings) is that ANALYZE is pretty cheap. In many cases, if the statistical distribution of the data hasn't changed much, then it's not really necessary, but it doesn't cost much either. And for certain types of usage patterns, like time series (where the maximum value keeps increasing) it's REALLY important to analyze frequently. But having said that, on the systems I've worked with, I've only rarely seen a problem caused by not analyzing frequently enough. On the other hand, I've seen MANY problems caused by not vacuuming enough. Someone runs a couple of big queries that rewrite a large portion of a table several times over and, boom, problems. 8.3 and higher are better about this because of an optimization called HOT, but there can still be problems. Other people's experiences may not match mine, but the bottom line is that you need to do both of these things, and you need to make sure they happen regularly. In most cases, the CPU and I/O time they consume will be amply repaid in improved query performance. ...Robert
Robert Haas <robertmhaas@gmail.com> wrote: > Rick <richard.branton@ca.com> wrote: >> Since vacuum just recovers space, that doesn't seem to be nearly >> as critical for performance? > > That doesn't really match my experience. Without regular > vacuuming, tables and indices end up being larger than they ought > to be and contain large amounts of dead space that slows things > down. How much of an impact that ends up having depends on how > badly bloated they are and what you're trying to do, but it can > get very ugly. That has been my experience, too. When we first started using PostgreSQL, we noticed a performance hit when some small tables which were updated very frequently were vacuumed. Our knee-jerk reaction was to tune autovacuum to be less aggressive, so that we didn't get hit with the pain as often. Of course, things just got worse, because every access to that table, when vacuum hadn't been run recently, had to examine all versions of the desired row, and test visibility for each version, to find the current one. So performance fell off even worse. So we went to much more aggressive settings for autovacuum (although only slightly more aggressive than what has since become the default) and the problems disappeared. Basically, as long as small tables are not allowed to bloat, vacuuming them is so fast that you never notice it. > 8.3 and higher are better about this because of an optimization > called HOT, but there can still be problems. Agreed on both counts. -Kevin
Rick, 22.04.2010 22:42: > > So, in a large table, the scale_factor is the dominant term. In a > small table, the threshold is the dominant term. But both are taken into > account. > > The default values are set for small tables; it is not being run for > large tables. With 8.4 you can adjust the autovacuum settings per table...
Hi -
don't want to side track the discussion. We have 8.4, which of AUTOVACUUM PARAMETERS can be set to handle individual table? I ran into bloat with small table only. Now the issue is being resolved.
Regards
On Wed, Apr 28, 2010 at 10:20 AM, Thomas Kellerer <spam_eater@gmx.net> wrote:
Rick, 22.04.2010 22:42:With 8.4 you can adjust the autovacuum settings per table...
So, in a large table, the scale_factor is the dominant term. In a
small table, the threshold is the dominant term. But both are taken into
account.
The default values are set for small tables; it is not being run for
large tables.
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
Check out the manual: http://www.postgresql.org/docs/8.4/static/routine-vacuuming.html#AUTOVACUUM Cheers, Ken On Wed, Apr 28, 2010 at 10:37:35AM -0400, akp geek wrote: > Hi - > don't want to side track the discussion. We have 8.4, which of > AUTOVACUUM PARAMETERS can be set to handle individual table? I ran into > bloat with small table only. Now the issue is being resolved. > > Regards > On Wed, Apr 28, 2010 at 10:20 AM, Thomas Kellerer <spam_eater@gmx.net>wrote: > > > Rick, 22.04.2010 22:42: > > > > > >> So, in a large table, the scale_factor is the dominant term. In a > >> small table, the threshold is the dominant term. But both are taken into > >> account. > >> > >> The default values are set for small tables; it is not being run for > >> large tables. > >> > > > > With 8.4 you can adjust the autovacuum settings per table... > > > > > > > > > > > > -- > > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) > > To make changes to your subscription: > > http://www.postgresql.org/mailpref/pgsql-performance > >
akp geek, 28.04.2010 16:37: > We have 8.4, which of AUTOVACUUM PARAMETERS can be set to handle individual table? All documented here: http://www.postgresql.org/docs/current/static/sql-createtable.html
> My guess is that the reason we run ANALYZE more frequently than vacuum > (with the default settings) is that ANALYZE is pretty cheap. In many > cases, if the statistical distribution of the data hasn't changed > much, then it's not really necessary, but it doesn't cost much either. > And for certain types of usage patterns, like time series (where the > maximum value keeps increasing) it's REALLY important to analyze > frequently. > > But having said that, on the systems I've worked with, I've only > rarely seen a problem caused by not analyzing frequently enough. On > the other hand, I've seen MANY problems caused by not vacuuming > enough. Which is the opposite of my experience; currently we have several clients who have issues which required more-frequent analyzes on specific tables. Before 8.4, vacuuming more frequently, especially on large tables, was very costly; vacuum takes a lot of I/O and CPU. Even with 8.4 it's not something you want to increase without thinking about the tradeoffs. Since I'm responsible for the current defaults, I though I'd explain the reasoning behind them. I developed and tested them while at Greenplum, so they are *not* designed for small databases. #autovacuum_vacuum_threshold = 50 #autovacuum_analyze_threshold = 50 These two are set to the minimum threshold to avoid having small tables get vacuum/analyzed continuously, but to make sure that small tables do get vacuumed & analyzed sometimes. #autovacuum_vacuum_scale_factor = 0.2 This is set because in my experience, 20% bloat is about the level at which bloat starts affecting performance; thus, we want to vacuum at that level but not sooner. This does mean that very large tables which never have more than 10% updates/deletes don't get vacuumed at all until freeze_age; this is a *good thing*. VACUUM on large tables is expensive; you don't *want* to vacuum a billion-row table which has only 100,000 updates. #autovacuum_analyze_scale_factor = 0.1 The 10% threshold for analyze is there because (a) analyze is cheap, and (b) 10% changes to a table can result in very bad plans if the changes are highly skewed towards a specific range, such as additions onto the end of a time-based table. The current postgres defaults were tested on DBT2 as well as pgbench, and in my last 2 years of consulting I've seldom found reason to touch them except on *specific* tables. So I still feel that they are good defaults. It would be worth doing a DBT2/DBT5 test run with different autovac settings post-8.4 so see if we should specifically change the vacuum threshold. Pending that, though, I think the current defaults are good enough. -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com
Josh Berkus escribió: > #autovacuum_vacuum_scale_factor = 0.2 > > This is set because in my experience, 20% bloat is about the level at > which bloat starts affecting performance; thus, we want to vacuum at > that level but not sooner. This does mean that very large tables which > never have more than 10% updates/deletes don't get vacuumed at all until > freeze_age; this is a *good thing*. VACUUM on large tables is expensive; > you don't *want* to vacuum a billion-row table which has only 100,000 > updates. Hmm, now that we have partial vacuum, perhaps we should revisit this. > It would be worth doing a DBT2/DBT5 test run with different autovac > settings post-8.4 so see if we should specifically change the vacuum > threshold. Right. -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
On Fri, Apr 30, 2010 at 6:50 PM, Josh Berkus <josh@agliodbs.com> wrote: > Which is the opposite of my experience; currently we have several > clients who have issues which required more-frequent analyzes on > specific tables. That's all fine, but probably not too relevant to the original complaint - the OP backed off the default settings by several orders of magnitude, which might very well cause a problem with both VACUUM and ANALYZE. I don't have a stake in the ground on what the right settings are, but I think it's fair to say that if you vacuum OR analyze much less frequently than what we recommend my default, it might break. ...Robert
On Wed, Apr 28, 2010 at 8:20 AM, Thomas Kellerer <spam_eater@gmx.net> wrote: > Rick, 22.04.2010 22:42: >> >> So, in a large table, the scale_factor is the dominant term. In a >> small table, the threshold is the dominant term. But both are taken into >> account. >> >> The default values are set for small tables; it is not being run for >> large tables. > > With 8.4 you can adjust the autovacuum settings per table... You can as well with 8.3, but it's not made by alter table but by pg_autovacuum table entries.
On Fri, Apr 30, 2010 at 4:50 PM, Josh Berkus <josh@agliodbs.com> wrote: > Which is the opposite of my experience; currently we have several > clients who have issues which required more-frequent analyzes on > specific tables. Before 8.4, vacuuming more frequently, especially on > large tables, was very costly; vacuum takes a lot of I/O and CPU. Even > with 8.4 it's not something you want to increase without thinking about > the tradeoff Actually I would think that statement would be be that before 8.3 vacuum was much more expensive. The changes to vacuum for 8.4 mostly had to do with moving FSM to disk, making seldom vacuumed tables easier to keep track of, and making autovac work better in the presence of long running transactions. The ability to tune IO load etc was basically unchanged in 8.4.
Robert Haas wrote: > I don't have a stake in the ground on what the right settings are, but > I think it's fair to say that if you vacuum OR analyze much less > frequently than what we recommend my default, it might break. > I think the default settings are essentially minimum recommended frequencies. They aren't too terrible for the giant data warehouse case Josh was suggesting they came from--waiting until there's 20% worth of dead stuff before kicking off an intensive vacuum is OK when vacuum is expensive and you're mostly running big queries anyway. And for smaller tables, the threshold helps it kick in a little earlier. It's unlikely anyone wants to *increase* those, so that autovacuum runs even less; out of the box it's not tuned to run very often at all. If anything, I'd expect people to want to increase how often it runs, for tables where much less than 20% dead is a problem. The most common situation I've seen where that's the case is when you have a hotspot of heavily updated rows in a large table, and this may match some of the situations that Robert was alluding to seeing. Let's say you have a big table where 0.5% of the users each update their respective records heavily, averaging 30 times each. That's only going to result in 15% dead rows, so no autovacuum. But latency for those users will suffer greatly, because they might have to do lots of seeking around to get their little slice of the data. -- Greg Smith 2ndQuadrant US Baltimore, MD PostgreSQL Training, Services and Support greg@2ndQuadrant.com www.2ndQuadrant.us
Greg Smith <greg@2ndquadrant.com> writes: > If anything, I'd expect people to want to increase how often it runs, > for tables where much less than 20% dead is a problem. The most common > situation I've seen where that's the case is when you have a hotspot of > heavily updated rows in a large table, and this may match some of the > situations that Robert was alluding to seeing. Let's say you have a big > table where 0.5% of the users each update their respective records > heavily, averaging 30 times each. That's only going to result in 15% > dead rows, so no autovacuum. But latency for those users will suffer > greatly, because they might have to do lots of seeking around to get > their little slice of the data. With a little luck, HOT will alleviate that case, since HOT updates can be reclaimed without running vacuum per se. I agree there's a risk there though. Now that partial vacuum is available, it'd be a real good thing to revisit these numbers. regards, tom lane
On Sat, May 1, 2010 at 12:13 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote: > On Fri, Apr 30, 2010 at 4:50 PM, Josh Berkus <josh@agliodbs.com> wrote: >> Which is the opposite of my experience; currently we have several >> clients who have issues which required more-frequent analyzes on >> specific tables. Before 8.4, vacuuming more frequently, especially on >> large tables, was very costly; vacuum takes a lot of I/O and CPU. Even >> with 8.4 it's not something you want to increase without thinking about >> the tradeoff > > Actually I would think that statement would be be that before 8.3 > vacuum was much more expensive. The changes to vacuum for 8.4 mostly > had to do with moving FSM to disk, making seldom vacuumed tables > easier to keep track of, and making autovac work better in the > presence of long running transactions. The ability to tune IO load > etc was basically unchanged in 8.4. What about http://www.postgresql.org/docs/8.4/static/storage-vm.html ? ...Robert
On Sat, May 1, 2010 at 1:08 PM, Robert Haas <robertmhaas@gmail.com> wrote: > On Sat, May 1, 2010 at 12:13 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote: >> On Fri, Apr 30, 2010 at 4:50 PM, Josh Berkus <josh@agliodbs.com> wrote: >>> Which is the opposite of my experience; currently we have several >>> clients who have issues which required more-frequent analyzes on >>> specific tables. Before 8.4, vacuuming more frequently, especially on >>> large tables, was very costly; vacuum takes a lot of I/O and CPU. Even >>> with 8.4 it's not something you want to increase without thinking about >>> the tradeoff >> >> Actually I would think that statement would be be that before 8.3 >> vacuum was much more expensive. The changes to vacuum for 8.4 mostly >> had to do with moving FSM to disk, making seldom vacuumed tables >> easier to keep track of, and making autovac work better in the >> presence of long running transactions. The ability to tune IO load >> etc was basically unchanged in 8.4. > > What about http://www.postgresql.org/docs/8.4/static/storage-vm.html ? That really only has an effect no tables that aren't updated very often. Unless you've got a whole bunch of those, it's not that big of a deal.
On Sat, May 1, 2010 at 1:17 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote: > On Sat, May 1, 2010 at 1:08 PM, Robert Haas <robertmhaas@gmail.com> wrote: >> On Sat, May 1, 2010 at 12:13 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote: >>> On Fri, Apr 30, 2010 at 4:50 PM, Josh Berkus <josh@agliodbs.com> wrote: >>>> Which is the opposite of my experience; currently we have several >>>> clients who have issues which required more-frequent analyzes on >>>> specific tables. Before 8.4, vacuuming more frequently, especially on >>>> large tables, was very costly; vacuum takes a lot of I/O and CPU. Even >>>> with 8.4 it's not something you want to increase without thinking about >>>> the tradeoff >>> >>> Actually I would think that statement would be be that before 8.3 >>> vacuum was much more expensive. The changes to vacuum for 8.4 mostly >>> had to do with moving FSM to disk, making seldom vacuumed tables >>> easier to keep track of, and making autovac work better in the >>> presence of long running transactions. The ability to tune IO load >>> etc was basically unchanged in 8.4. >> >> What about http://www.postgresql.org/docs/8.4/static/storage-vm.html ? > > That really only has an effect no tables that aren't updated very > often. Unless you've got a whole bunch of those, it's not that big of > a deal. sigh, s/ no / on / Anyway, my real point was that the big improvements that made vacuum so much better came in 8.3, with HOT updates and multi-threaded vacuum (that might have shown up in 8.2 even) 8.3 was a huge improvement and compelling upgrade from 8.1 for me.
On Sat, May 1, 2010 at 1:11 PM, Greg Smith <greg@2ndquadrant.com> wrote: > Robert Haas wrote: >> >> I don't have a stake in the ground on what the right settings are, but >> I think it's fair to say that if you vacuum OR analyze much less >> frequently than what we recommend my default, it might break. >> > > I think the default settings are essentially minimum recommended > frequencies. They aren't too terrible for the giant data warehouse case > Josh was suggesting they came from--waiting until there's 20% worth of dead > stuff before kicking off an intensive vacuum is OK when vacuum is expensive > and you're mostly running big queries anyway. And for smaller tables, the > threshold helps it kick in a little earlier. It's unlikely anyone wants to > *increase* those, so that autovacuum runs even less; out of the box it's not > tuned to run very often at all. > > If anything, I'd expect people to want to increase how often it runs, for > tables where much less than 20% dead is a problem. The most common > situation I've seen where that's the case is when you have a hotspot of > heavily updated rows in a large table, and this may match some of the > situations that Robert was alluding to seeing. Let's say you have a big > table where 0.5% of the users each update their respective records heavily, > averaging 30 times each. That's only going to result in 15% dead rows, so > no autovacuum. But latency for those users will suffer greatly, because > they might have to do lots of seeking around to get their little slice of > the data. For me it's more that my applications are typically really fast, and when they run at half-speed people think "oh, it's slow today" but they can still work and attribute the problem to their computer, or the network, or something. When they slow down by like 10x then they file a bug. I'm typically dealing with a situation where the whole database can be easily cached in RAM and the CPU is typically 90% idle, which cushions the blow quite a bit. A few months ago someone reported that "the portal was slow" and the problem turned out to be that the database was bloated by in excess of a factor a factor of 10 due to having blown out the free space map. I wasn't a regular user of that system at that time so hadn't had the opportunity to notice myself. ...Robert