Обсуждение: Potential autovacuum optimization: new tables
Folks, One chronic problem users encounter is this one: 1. User creates new table 2. User inserts 45 records into new table. 3. Time passes. 4. User creates a query which joins against new table. 5. Planner uses estimate of 1000 rows for the new table. 6. User gets very bad query plan. Now, I look at this, and ask myself: why didn't autoanalyze kick in at step 3? After all, this was a table which had 0 rows, we inserted 45 rows, making the table infinitely larger. It should have got on the autoanalyze list, no? Well, no. It seems that any table with less than autovacuum_analyze_threshold rows will NEVER be autoanalyzed. Ever. postgres=# create table thirty_rows ( val int ); CREATE TABLE ^ postgres=# insert into thirty_rows select i from generate_series(1,30) as gs(i); INSERT 0 30 postgres=# create table onetwenty_rows ( val int ); CREATE TABLE postgres=# insert into onetwenty_rows select i from generate_series(1,120) as gs(i); INSERT 0 120 postgres=# create table twocent_rows ( val int ); CREATE TABLE postgres=# insert into twocent_rows select i from generate_series(1,200) as gs(i); ... wait 5 min ... postgres=# select relname, last_autoanalyze from pg_stat_user_tables where relname like '%_rows'; relname | last_autoanalyze ----------------+-------------------------------thirty_rows |twocent_rows | 2012-10-12 16:46:45.025647-07onetwenty_rows| 2012-10-12 16:46:45.014084-07 postgres=# select * from pg_stats where tablename = 'thirty_rows';schemaname | tablename | attname | inherited | null_frac| avg_width | n_distinct | most_common_vals | most_common_freqs | histogram_bounds | correlation | most_common_elems | most_common_elem_freqs | elem_count_histogram (0 rows) This seems easy to fix. If a table has no stats and has any write stats at all, it should automatically go on the autoanalyze list. Or if it's easier, one where last_autoanalyze is null. Objections/complications/alternatives? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
All, > 1. User creates new table > 2. User inserts 45 records into new table. > 3. Time passes. > 4. User creates a query which joins against new table. > 5. Planner uses estimate of 1000 rows for the new table. > 6. User gets very bad query plan. Because Snowman asked me for an example: Before ANALYZE on the new table: http://explain.depesz.com/s/36D After ANALYZE on the new table: http://explain.depesz.com/s/851t That's a 25X difference in execution time. This is not the first time I've seen this issue. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
Josh Berkus <josh@agliodbs.com> writes: > Now, I look at this, and ask myself: why didn't autoanalyze kick in at > step 3? After all, this was a table which had 0 rows, we inserted 45 > rows, making the table infinitely larger. It should have got on the > autoanalyze list, no? > Well, no. It seems that any table with less than > autovacuum_analyze_threshold rows will NEVER be autoanalyzed. Ever. Yeah ... > This seems easy to fix. If a table has no stats and has any write stats > at all, it should automatically go on the autoanalyze list. Or if it's > easier, one where last_autoanalyze is null. No, it's not that easy. The question you have to ask is "when has that initial write burst stopped?". As an example, if autovacuum happened to see that table in the instant after CREATE, it might autovacuum it while it's still empty, and then this rule fails to trigger any further effort. Personally I've always thought that autovacuum's rules should be based on a percentage of rows changed, not an absolute threshold (or maybe in addition to an absolute threshold). This way, if you create a table and insert 10 rows, that would make it subject to analyze on-sight, even if autovac had managed to pass by while it was still empty, because the percentage-changed is infinite. Then, if you insert the other 35 rows you meant to insert, it's *again* subject to autoanalyze on the next pass, because the percentage-changed is still 350%. I remember having got voted down on the percentage approach back when we first put AV into core, but I remain convinced that decision was a bad one. regards, tom lane
* Josh Berkus (josh@agliodbs.com) wrote: > Because Snowman asked me for an example: Thanks. :) > That's a 25X difference in execution time. This is not the first time > I've seen this issue. If we can figure out an 'easy' solution to this, I'd definitely vote for it being back-patched. Having a table simply never get analyze'd strikes me as a very bad thing. Thanks again, Stephen
> No, it's not that easy. The question you have to ask is "when has that > initial write burst stopped?". As an example, if autovacuum happened to > see that table in the instant after CREATE, it might autovacuum it while > it's still empty, and then this rule fails to trigger any further effort. Well, frankly, it would be useful to know it's empty too. If you really wanna see that 1000-row default estimate bite you on the tuchas, try a JOIN against an empty new table. > Personally I've always thought that autovacuum's rules should be based > on a percentage of rows changed, not an absolute threshold (or maybe in > addition to an absolute threshold). This way, if you create a table and > insert 10 rows, that would make it subject to analyze on-sight, even if > autovac had managed to pass by while it was still empty, because the > percentage-changed is infinite. Then, if you insert the other 35 rows > you meant to insert, it's *again* subject to autoanalyze on the next > pass, because the percentage-changed is still 350% > I remember having got voted down on the percentage approach back when > we first put AV into core, but I remain convinced that decision was a > bad one. Yeah, I was one of the ones voting against you. The reason not to have percentage-only is for small tables. Imagine that you have a table with 18 rows, and analyze_threshold is 0 and analyze_scale_factor is 0.1. Add two rows --> ANALYZE UPDATE two rows --> ANALYZE UPDATE three more rows --> ANALYZE DELETE three rows --> ANALYZE Without a threshold, any table under 100 rows which gets UPDATEs would be continually in the autoanalyze queue, which would not be beneficial; the churn in pg_statistic alone would be detrimental. I guess the real issue is that we tried to solve the issue of ANALYZEing tables of radically different sizes with a simple heuristic of threshold+scale_factor, and that's proving too simple for actual production sites. The other end where autoanalyze often falls down is the high end (tables with a million rows). Can anyone think of a new heuristic which doesn't involve adding 2-6 new GUCS knobs? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
Josh Berkus <josh@agliodbs.com> writes: >> I remember having got voted down on the percentage approach back when >> we first put AV into core, but I remain convinced that decision was a >> bad one. > Yeah, I was one of the ones voting against you. The reason not to have > percentage-only is for small tables. Imagine that you have a table with > 18 rows, and analyze_threshold is 0 and analyze_scale_factor is 0.1. [ shrug... ] You're attacking a straw man, or more precisely putting words into my mouth about what the percentage-based thresholds might be. Notice the examples I gave involved update percentages quite far north of 100%. It's possible and maybe likely that we need a sliding scale. Also, I don't necessarily accept the conclusion you seem to be drawing, that it's okay to have complete turnover of a small table and not redo its stats. If you don't like the current behavior when there's no stats, why would you like the behavior when there are some stats but they no longer have the remotest relationship to reality? > Can anyone think of a new heuristic which doesn't involve adding 2-6 new > GUCS knobs? The increased number of knobs may be a problem, but I don't think we can avoid having more. Your own complaint is that the current design is too simplistic. Replacing it with a different but equally simplistic design probably won't help much. regards, tom lane
> [ shrug... ] You're attacking a straw man, or more precisely putting > words into my mouth about what the percentage-based thresholds might be. > Notice the examples I gave involved update percentages quite far north > of 100%. It's possible and maybe likely that we need a sliding scale. Yes, or a logarithmic one. > Also, I don't necessarily accept the conclusion you seem to be drawing, > that it's okay to have complete turnover of a small table and not redo > its stats. I'm not drawing that conclusion. I'm explaining the logic of autovacuum_analyze_threshold. That logic actually works pretty well for tables between 200 rows and 200,000 rows. It's outside of those boundaries where it starts to break down. > The increased number of knobs may be a problem, but I don't think we can > avoid having more. Your own complaint is that the current design is too > simplistic. Replacing it with a different but equally simplistic design > probably won't help much. Well, we could do something which involves no GUCS at all, which would be my favorite approach. For example, Frost and I were discussing this on IRC. Imagine if autovac threshold were set according to a simple log function, resulting in very small tables getting analyzed after 100% changes, and very large tables getting analyzed after 0.1% changes, and everyone else between? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
* Tom Lane (tgl@sss.pgh.pa.us) wrote: > [ shrug... ] You're attacking a straw man, or more precisely putting > words into my mouth about what the percentage-based thresholds might be. > Notice the examples I gave involved update percentages quite far north > of 100%. It's possible and maybe likely that we need a sliding scale. I was just discussing such a sliding scale approach w/ Josh on IRC, my thinking was that we could use a logarithmic approach based on table size. > Also, I don't necessarily accept the conclusion you seem to be drawing, > that it's okay to have complete turnover of a small table and not redo > its stats. If you don't like the current behavior when there's no > stats, why would you like the behavior when there are some stats but > they no longer have the remotest relationship to reality? Josh's concern is about autovacuum causing lots of stats churn, which is understandable, we don't want it constantly rescanning a table, but perhaps we could use some kind of threshold for preventing autovac from rescanning a table it just scanned? Note that I did *not* say 'GUC', but I don't know what the 'right' answer is for how frequently is good-but-not-too-frequent. I'd also like to try and avoid adding GUCs. Thanks, Stephen
On Oct 12, 2012, at 22:13, Stephen Frost <sfrost@snowman.net> wrote: > * Tom Lane (tgl@sss.pgh.pa.us) wrote: >> [ shrug... ] You're attacking a straw man, or more precisely putting >> words into my mouth about what the percentage-based thresholds might be. >> Notice the examples I gave involved update percentages quite far north >> of 100%. It's possible and maybe likely that we need a sliding scale. > > I was just discussing such a sliding scale approach w/ Josh on IRC, my > thinking was that we could use a logarithmic approach based on table > size. > >> Also, I don't necessarily accept the conclusion you seem to be drawing, >> that it's okay to have complete turnover of a small table and not redo >> its stats. If you don't like the current behavior when there's no >> stats, why would you like the behavior when there are some stats but >> they no longer have the remotest relationship to reality? > > Josh's concern is about autovacuum causing lots of stats churn, which is > understandable, we don't want it constantly rescanning a table, but > perhaps we could use some kind of threshold for preventing autovac from > rescanning a table it just scanned? Note that I did *not* say 'GUC', > but I don't know what the 'right' answer is for how frequently is > good-but-not-too-frequent. I'd also like to try and avoid adding GUCs. > > Instead of global could you attach an interface function to the table and have the auto-analyzer call that function to basicallyask the table whether it needs to be analyzed? Still need to deal with defaults and provide a decent supply ofbuilt-in algorithms but at least the system can be made tunable. The default algorithm could maybe just handoff to a tablesize specific handler. The create table and alter table commands could be used to change the assigned algorithm ifdesired and new ones could be supplied via extensions. The 1000 row default seems unusual at first glance and contributes to the problem described. It is likely that the first I sent following the create table is going to be a bulk load if the table is going to have manyrows. In the case where rows are inserted individually it is likely that the expected row count will be closer to 1than 1000. One useful algorithm to provide the user is analyze on insert and, though maybe less so, analyze on update. So that anyinsert/update causes the table to be re-analyzed. Not a good default but, combined with "delayed analyze" logic to establisha minimum frequency, is a possible option for some use cases. Temporary table creation should have special attention given if changes are going to be made here. Another idea is to have system after [command] trigger(s) than can be used to call analyze without waiting for the auto-vacuumprocess. Provide some way for CREATE/ALTER TABLE and maybe auto-vacuum to enable and disable the trigger. David J.
* David Johnston (polobo@yahoo.com) wrote: > Instead of global could you attach an interface function to the table and have the auto-analyzer call that function tobasically ask the table whether it needs to be analyzed? Still need to deal with defaults and provide a decent supplyof built-in algorithms but at least the system can be made tunable. The default algorithm could maybe just handoffto a table size specific handler. The create table and alter table commands could be used to change the assignedalgorithm if desired and new ones could be supplied via extensions. For my part, while that's certainly an interesting idea, it's far more complicated than even providing GUCs and the idea is to make PG just "do it right", not to offer the user more ways to get it wrong... Thanks, Stephen
> For my part, while that's certainly an interesting idea, it's far > more > complicated than even providing GUCs and the idea is to make PG just > "do > it right", not to offer the user more ways to get it wrong... Yes, please let's not replace the existing too-simplistic knobs with giant complicated gadgets nobody, including us, understands. For my part, over the last 3 years of consulting and dealing with postgresql.conf settings for more than 140 clients: * only 10% of them ever touched the autoanalyze settings at all * of the ~~ 14 who did: * 1 improved the tuning of their database * 3 of them messed up autoanalyze, causing stats andvacuum issues * ~~ 10 had no measurable effect ... so you'll understand when I say that I don't think ease of knob-twiddling is a priority for autoanalyze design. In fact,I'd say that removing the knobs entirely is a design goal. I've been going over the notes and email archives from the period where Matt O'Connor and I arrived at the current settings. All of our testing was devoted to autovacuum, not autoanalyze. The threshold+scale_factor design works prettywell for autovacuum; it prevents us from constantly vacuuming small tables, or large tables with less than 20% deadrows. And I did extensive testing using DBT2 on OSDL to set the current defaults. Our mistake was assuming that the same formula which worked well for vacuum would work well for analyze. And since the DBT2database has entirely medium-sized tables full of random data, no shortcomings in this thinking showed up in the tests. Since the only counterproposal at the time was to have a flat percentage without a threshold, we got the current defaults. So, problem #1 is coming up with a mathematical formula. My initial target values are in terms of # of rows in the tablevs. # of writes before analyze is triggered: 1 : 3 10 : 5 100 : 10 1000 : 100 100000 : 2000 1000000 : 5000 10000000 : 25000 100000000 : 100000 .... etc. So problem #1 is a mathematical formula which gives this kind of curve. I've tried some solution-seeking software,but I don't know how to use it well enough to get something useful. Second problem is actually testing the result. At this point, we don't have any performance tests which create anythingother than fairly randomly distributed data, which doesn't tend to show up any issues in analyze. We really needa performance test where new data is skewed and unbalanced, including tables of radically different sizes, and wherewe're set up to measure the level of inaccuracy in query statistics. Hmmm. Actually, for measuring the innacuracy, I have some tools thanks to David Wheeler. But not to generate the test inthe first place. --Josh Berkus
On Sat, Oct 13, 2012 at 12:49 PM, Joshua Berkus <josh@agliodbs.com> wrote: > > So, problem #1 is coming up with a mathematical formula. My initial target values are in terms of # of rows in the tablevs. # of writes before analyze is triggered: > > 1 : 3 > 10 : 5 > 100 : 10 > 1000 : 100 > 100000 : 2000 > 1000000 : 5000 > 10000000 : 25000 > 100000000 : 100000 > > .... etc. So problem #1 is a mathematical formula which gives this kind of curve. I've tried some solution-seeking software,but I don't know how to use it well enough to get something useful. That is close to a power law, where best fit is about "threshold = 1.5 * (rows ** 0.6)" rows yours powerfit 1.00E+00 3.00E+00 1.50E+00 1.00E+01 5.00E+00 5.97E+00 1.00E+02 1.00E+01 2.38E+01 1.00E+03 1.00E+02 9.46E+01 1.00E+05 2.00E+03 1.50E+03 1.00E+06 5.00E+03 5.97E+03 1.00E+07 2.50E+04 2.38E+04 1.00E+08 1.00E+05 9.46E+04 If you want something more natural, reduce the exponent from 0.6 to 0.5 so it becomes the square root. I have no opinion on the suitability of this, I'm just crunching the numbers for you. Cheers, Jeff
Joshua Berkus <josh@agliodbs.com> writes: > I've been going over the notes and email archives from the period > where Matt O'Connor and I arrived at the current settings. All of our > testing was devoted to autovacuum, not autoanalyze. > Our mistake was assuming that the same formula which worked well for > vacuum would work well for analyze. Ah. Okay, maybe we can agree that that wasn't a good idea. > So, problem #1 is coming up with a mathematical formula. My initial target values are in terms of # of rows in the tablevs. # of writes before analyze is triggered: > 1 : 3 > 10 : 5 > 100 : 10 > 1000 : 100 > 100000 : 2000 > 1000000 : 5000 > 10000000 : 25000 > 100000000 : 100000 I don't really see that we need to bend over backwards to exactly match some data points that you made up out of thin air. How about ceil(sqrt(N)) to start with? regards, tom lane
> Ah. Okay, maybe we can agree that that wasn't a good idea. Oh, I'd say there's no question it was a mistake. We just didn't have the data at the time to realize it. > I don't really see that we need to bend over backwards to exactly > match > some data points that you made up out of thin air. How about > ceil(sqrt(N)) to start with? We can start with anything, including Jeff Jane's equation (for my part, I think sqrt(N) will result in analyzing very largetables a bit too often) The tough part will be coming up with some way to test it. --Josh
On Sat, Oct 13, 2012 at 3:13 AM, Stephen Frost <sfrost@snowman.net> wrote: > Josh's concern is about autovacuum causing lots of stats churn, which is > understandable, we don't want it constantly rescanning a table I don't think rescanning the table is a big concern. autovacuum will only scan as often as it feels like in the first place and these are by definition small tables anyways. Josh's stated concern was about the churn in the stats table. That could cause extra vacuums on the stats table which could be a fairly substantial table. Hopefully HOT updates and the visibility bitmap would protect against that being too bad though. -- greg
On Sat, Oct 13, 2012 at 3:49 PM, Joshua Berkus <josh@agliodbs.com> wrote: > For my part, over the last 3 years of consulting and dealing with postgresql.conf settings for more than 140 clients: > > * only 10% of them ever touched the autoanalyze settings at all > * of the ~~ 14 who did: > * 1 improved the tuning of their database > * 3 of them messed up autoanalyze, causing stats and vacuum issues > * ~~ 10 had no measurable effect > > ... so you'll understand when I say that I don't think ease of knob-twiddling is a priority for autoanalyze design. Infact, I'd say that removing the knobs entirely is a design goal. Yeah. My experience is shorter in time frame, but similar in composition. > I've been going over the notes and email archives from the period where Matt O'Connor and I arrived at the current settings. All of our testing was devoted to autovacuum, not autoanalyze. The threshold+scale_factor design works prettywell for autovacuum; it prevents us from constantly vacuuming small tables, or large tables with less than 20% deadrows. And I did extensive testing using DBT2 on OSDL to set the current defaults. However, I disagree with this. I think that things have changed a lot in 8.4+, because of the visibility map. Extra vacuuming is not nearly so expensive as it used to be, and essentially 100% of the vacuum problems I see are caused by not vacuuming frequently enough, either because (1) when vacuum does eventually run it imposes a gigantic server load for a really long time or (2) too much bloat builds up between vacuum runs. If these settings were adjusted in an exactly middle-of-the-road fashion, I ought to see 50% of the problems from vacuuming too often and the other 50% from not vacuuming often enough.The reality is nothing like that; it's all on one side. As I've said before (and I believe Simon has said similar things), I think we should be vacuuming the heap much more often but only doing index vac when we accumulate enough dead tuples to justify the cost of the index scan. Pruning the heap is cheap and very effective. > Our mistake was assuming that the same formula which worked well for vacuum would work well for analyze. And since theDBT2 database has entirely medium-sized tables full of random data, no shortcomings in this thinking showed up in thetests. Since the only counterproposal at the time was to have a flat percentage without a threshold, we got the currentdefaults. > > So, problem #1 is coming up with a mathematical formula. My initial target values are in terms of # of rows in the tablevs. # of writes before analyze is triggered: > > 1 : 3 > 10 : 5 > 100 : 10 > 1000 : 100 > 100000 : 2000 > 1000000 : 5000 > 10000000 : 25000 > 100000000 : 100000 > > .... etc. So problem #1 is a mathematical formula which gives this kind of curve. I've tried some solution-seeking software,but I don't know how to use it well enough to get something useful. That's a pretty funny-looking curve, because it doubles between 10 and 100 but then increases 10x between 100 and 1000. It's similarly erratic further on. But I do agree that some kind of log scale might be appropriate. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert, > As I've said before (and I believe Simon has said similar things), I > think we should be vacuuming the heap much more often but only doing > index vac when we accumulate enough dead tuples to justify the cost of > the index scan. Pruning the heap is cheap and very effective. You are probably correct, especially since the testing which set the current thresholds was pre-HOT, even. We don't have a vacuum method which allows us to vacuum the heap but not the indexes, though, do we? Note that the only time I encounter chronic autovacuum issues, it's for "problem" tables like queue tables, and no default is going to cope with those. Anyway, big thing is, we need to be able to test this. > That's a pretty funny-looking curve, because it doubles between 10 and > 100 but then increases 10x between 100 and 1000. It's similarly > erratic further on. But I do agree that some kind of log scale might > be appropriate. Yeah, it's mainly the endpoints I'm sure of based on experience. The middle should be a smooth curve between them, if possible. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On Sat, Oct 13, 2012 at 3:49 PM, Joshua Berkus <josh@agliodbs.com> wrote: > So, problem #1 is coming up with a mathematical formula. My initial target values are in terms of # of rows in the tablevs. # of writes before analyze is triggered: > > 1 : 3 > 10 : 5 > 100 : 10 > 1000 : 100 > 100000 : 2000 > 1000000 : 5000 > 10000000 : 25000 > 100000000 : 100000 Do we necessarily care about smoothness? If we don't at all, then this would be fine: func powerlaw (tuples int) int {if tuples < 10 { return 3}if tuples < 100 { return 5}if tuples < 1000 { return 10}iftuples < 100000 { return 100}if tuples < 1000000 { return 2000}if tuples < 10000000 { return 5000}if tuples< 100000000 { return 25000}return 100000 } If we want smoothness within the ranges, this is a piecewise linear representation of your table: func powerlaw2 (tuples int) int {if tuples < 10 { return 3}if tuples < 100 { return 5 + 5 * (tuples - 90)/90}if tuples< 1000 { return 10 + 90 * (tuples - 900)/900}if tuples < 100000 { return 100 + 1900 * (tuples - 99000)/99000}iftuples < 1000000 { return 2000 + 3000 * (tuples - 900000)/900000}if tuples < 10000000 { return 5000+ 22000 * (tuples - 9000000)/9000000}if tuples < 100000000 { return 25000 + 75000 * (tuples - 90000000)/90000000}return100000 } That's in Go, but there shouldn't be anything too unfamiliar looking about it :-). It would be nice to have a simpler functional representation, but the above is by no means heinous, and it's not verbose beyond reason. -- When confronted by a difficult problem, solve it by reducing it to the question, "How would the Lone Ranger handle this?"
On Mon, Oct 15, 2012 at 12:06:27AM +0100, Greg Stark wrote: > On Sat, Oct 13, 2012 at 3:13 AM, Stephen Frost <sfrost@snowman.net> wrote: > > Josh's concern is about autovacuum causing lots of stats churn, which is > > understandable, we don't want it constantly rescanning a table > > I don't think rescanning the table is a big concern. autovacuum will > only scan as often as it feels like in the first place and these are > by definition small tables anyways. > > Josh's stated concern was about the churn in the stats table. That > could cause extra vacuums on the stats table which could be a fairly > substantial table. Hopefully HOT updates and the visibility bitmap > would protect against that being too bad though. Added to TODO: Improve autovacuum tuning -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +