Обсуждение: Why won't it index scan?
Can someone help me understand why the 8.1.2 query below is using a seq scan instead of an index scan? All relevant columns appear to be indexed and all tables vacuum analyzed. $ psql -c "analyze verbose patient" INFO: analyzing "public.patient" INFO: "patient": scanned 3000 of 3353 pages, containing 117685 live rows and 5471 dead rows; 3000 rows in sample, 131533estimated total rows ANALYZE $ psql -c "select count(1) from patient" count -------- 131661 (1 row) $ psql -c "analyze verbose visit" INFO: analyzing "public.visit" INFO: "visit": scanned 3000 of 19985 pages, containing 58520 live rows and 7045 dead rows; 3000 rows in sample, 389841 estimatedtotal rows ANALYZE $ psql -c "select count(1) from visit" count -------- 389102 (1 row) $ psql -c "explain analyze select * from visit inner join patient on patient.key = visit.patient_key where nursestation_key= '40';" QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------------------- Merge Join (cost=27724.37..28457.01 rows=4956 width=421) (actual time=1819.993..2004.802 rows=6 loops=1) Merge Cond: ("outer".patient_key = "inner"."key") -> Sort (cost=11859.31..11871.70 rows=4956 width=209) (actual time=0.416..0.426 rows=6 loops=1) Sort Key: visit.patient_key -> Bitmap Heap Scan on visit (cost=69.35..11555.14 rows=4956 width=209) (actual time=0.187..0.245 rows=6 loops=1) Recheck Cond: (nursestation_key = 40) -> Bitmap Index Scan on idx_visit_nursestation_key (cost=0.00..69.35 rows=4956 width=0) (actual time=0.158..0.158rows=6 loops=1) Index Cond: (nursestation_key = 40) -> Sort (cost=15865.05..16194.21 rows=131661 width=212) (actual time=1768.501..1856.334 rows=61954 loops=1) Sort Key: patient."key" -> Seq Scan on patient (cost=0.00..4669.61 rows=131661 width=212) (actual time=0.010..355.299 rows=131661 loops=1) Total runtime: 2046.323 ms (12 rows) Table "public.patient" Column | Type | Modifiers -------------------+-----------------------------+------------------------------------------------------- key | integer | not null default nextval('patient_key_seq'::regclass) ... Indexes: "pk_patient" PRIMARY KEY, btree ("key") ... Table "public.visit" Column | Type | Modifiers -----------------------+-----------------------------+----------------------------------------------------- patient_key | integer | not null nursestation_key | integer | ... Indexes: "idx_visit_nursestation_key" btree (nursestation_key) "idx_visit_patient_key" btree (patient_key)
Ed L. wrote: > Can someone help me understand why the 8.1.2 query below is > using a seq scan instead of an index scan? Because the planner thinks a sequential scan would be faster than an index scan - in many situations, this is the case. See the FAQ: http://www.postgresql.org/docs/faqs.FAQ.html#item4.6 - John Burger MITRE
On Wednesday May 17 2006 10:37 am, Ed L. wrote: > Can someone help me understand why the 8.1.2 query below is > using a seq scan instead of an index scan? All relevant > columns appear to be indexed and all tables vacuum analyzed. > > > $ psql -c "explain analyze select * from visit inner join > patient on patient.key = visit.patient_key where > nursestation_key = '40';" QUERY PLAN > -------------------------------------------------------------- >--------------------------------------------------------------- >---------------------- Merge Join (cost=27724.37..28457.01 > rows=4956 width=421) (actual time=1819.993..2004.802 rows=6 > loops=1) Merge Cond: ("outer".patient_key = "inner"."key") > -> Sort (cost=11859.31..11871.70 rows=4956 width=209) > (actual time=0.416..0.426 rows=6 loops=1) Sort Key: > visit.patient_key > -> Bitmap Heap Scan on visit (cost=69.35..11555.14 > rows=4956 width=209) (actual time=0.187..0.245 rows=6 loops=1) > Recheck Cond: (nursestation_key = 40) > -> Bitmap Index Scan on > idx_visit_nursestation_key (cost=0.00..69.35 rows=4956 > width=0) (actual time=0.158..0.158 rows=6 loops=1) Index Cond: > (nursestation_key = 40) -> Sort (cost=15865.05..16194.21 > rows=131661 width=212) (actual time=1768.501..1856.334 > rows=61954 loops=1) Sort Key: patient."key" > -> Seq Scan on patient (cost=0.00..4669.61 > rows=131661 width=212) (actual time=0.010..355.299 rows=131661 > loops=1) Total runtime: 2046.323 ms > (12 rows) Increasing statistics target yielded index scan. How can I best find optimal statistics target to ensure 100% index scan? Ed
On Wednesday May 17 2006 11:44 am, Ed L. wrote: > On Wednesday May 17 2006 10:37 am, Ed L. wrote: > > Can someone help me understand why the 8.1.2 query below is > > using a seq scan instead of an index scan? All relevant > > columns appear to be indexed and all tables vacuum analyzed. > > > > > > $ psql -c "explain analyze select * from visit inner join > > patient on patient.key = visit.patient_key where > > nursestation_key = '40';" QUERY PLAN > > ------------------------------------------------------------ > >-- > > ------------------------------------------------------------ > >--- ---------------------- Merge Join > > (cost=27724.37..28457.01 rows=4956 width=421) (actual > > time=1819.993..2004.802 rows=6 loops=1) Merge Cond: > > ("outer".patient_key = "inner"."key") -> Sort > > (cost=11859.31..11871.70 rows=4956 width=209) (actual > > time=0.416..0.426 rows=6 loops=1) Sort Key: > > visit.patient_key > > -> Bitmap Heap Scan on visit > > (cost=69.35..11555.14 rows=4956 width=209) (actual > > time=0.187..0.245 rows=6 loops=1) Recheck Cond: > > (nursestation_key = 40) > > -> Bitmap Index Scan on > > idx_visit_nursestation_key (cost=0.00..69.35 rows=4956 > > width=0) (actual time=0.158..0.158 rows=6 loops=1) Index > > Cond: (nursestation_key = 40) -> Sort > > (cost=15865.05..16194.21 rows=131661 width=212) (actual > > time=1768.501..1856.334 rows=61954 loops=1) Sort Key: > > patient."key" > > -> Seq Scan on patient (cost=0.00..4669.61 > > rows=131661 width=212) (actual time=0.010..355.299 > > rows=131661 loops=1) Total runtime: 2046.323 ms > > (12 rows) > > Increasing statistics target yielded index scan. > > How can I best find optimal statistics target to ensure 100% > index scan? I'm trying to understand what happened here, and I have a theory. There are 389K rows total, and 262K rows with a null indexed value. Their are 15164 non-null rows newer than those null rows. When stats target is set to 50 or less, analyze scans 15,000 rows or less. If it scans the newest rows/pages first, then is it possible it never sees any hint of the 262K null rows, and thus ends up with skewed stats that yield seq scans when idx scan is in order? If stat target is > 50, analyze begins to include non-null rows in stat sample, yielding idx scans. Also, I see the most_common_vals array is not growing linearly with the stats target as the docs seem to suggest. I have 34 unique values, so with stats target >= 34, I'd expect most_common_vals array to have 34 values, but it has 8. Ed
"Ed L." <pgsql@bluepolka.net> writes: > I'm trying to understand what happened here, and I have a theory. The problem is the horrid misestimation of the selectivity of "nursestation_key = 40": -> Bitmap Index Scan on idx_visit_nursestation_key (cost=0.00..69.35 rows=4956 width=0) (actual time=0.158..0.158rows=6 loops=1) Index Cond: (nursestation_key = 40) When you're off by a factor of 800+ on the number of matching rows, you're going to arrive at a less than optimal plan. Increasing the stats target on visit.nursestation_key would be the solution. > There are 389K rows total, and 262K rows with a null indexed > value. Their are 15164 non-null rows newer than those null > rows. When stats target is set to 50 or less, analyze scans > 15,000 rows or less. If it scans the newest rows/pages first, > then is it possible it never sees any hint of the 262K null > rows, and thus ends up with skewed stats that yield seq scans > when idx scan is in order? ANALYZE goes to some considerable trouble to make sure it gets an unbiased random sample. With those numbers it would see an expected 500+ of the nonnull rows; a sample containing none at all would be highly improbable. > Also, I see the most_common_vals array is not growing linearly > with the stats target as the docs seem to suggest. I have 34 > unique values, so with stats target >= 34, I'd expect > most_common_vals array to have 34 values, but it has 8. To get into most_common_vals, a value has to occur more than once in the sample. Given the situation you have, it's not surprising that not all the possible values got into the stats. regards, tom lane
On Wednesday May 17 2006 1:26 pm, Tom Lane wrote: > "Ed L." <pgsql@bluepolka.net> writes: > > I'm trying to understand what happened here, and I have a > > theory. > > The problem is the horrid misestimation of the selectivity of > "nursestation_key = 40": > > -> Bitmap Index Scan on > idx_visit_nursestation_key (cost=0.00..69.35 rows=4956 > width=0) (actual time=0.158..0.158 rows=6 loops=1) Index Cond: > (nursestation_key = 40) > > When you're off by a factor of 800+ on the number of matching > rows, you're going to arrive at a less than optimal plan. > Increasing the stats target on visit.nursestation_key would be > the solution. Ok, makes sense. So, does this sound like we just happened to get repeatedly horribly unrepresentative random samples with stats target at 10? Are we at the mercy of randomness here? Or is there a better preventive procedure we can follow to systematically identify this kind of situation? Ed
"Ed L." <pgsql@bluepolka.net> writes: > So, does this sound like we just happened to get repeatedly > horribly unrepresentative random samples with stats target at > 10? Are we at the mercy of randomness here? Or is there a > better preventive procedure we can follow to systematically > identify this kind of situation? I think the real issue is that stats target 10 is too small for large tables: the samples are just not large enough to support a decent numdistinct estimate, which is the critical stat for cases such as this (ie, estimating the number of hits on a value that's not in the most-common-values list). The reason the default is currently 10 is just conservatism: it was already an order of magnitude better than what it replaced (a *single* representative value) and I didn't feel I had the evidence to justify higher values. It's become clear that the default ought to be higher, but I've still got no good fix on a more reasonable default. 100 might be too much, or then again maybe not. I encourage you to play around with default_statistics_target and see what you can learn about quality of estimates vs. planning time. regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> writes: > "Ed L." <pgsql@bluepolka.net> writes: > > So, does this sound like we just happened to get repeatedly > > horribly unrepresentative random samples with stats target at > > 10? Are we at the mercy of randomness here? Or is there a > > better preventive procedure we can follow to systematically > > identify this kind of situation? > > I think the real issue is that stats target 10 is too small for large > tables: the samples are just not large enough to support a decent > numdistinct estimate, which is the critical stat for cases such as this > (ie, estimating the number of hits on a value that's not in the > most-common-values list). There's been some discussion on -hackers about this area. Sadly the idea of using samples to calculate numdistinct estimates is fundamentally on pretty shaky ground. Whereas a fixed sample size works fine for calculating distribution of values, in order to generate consistent precision for numdistinct estimates the samples will have to be a constant fraction of the table -- and unfortunately a pretty large fraction at that. So sadly I think "at the mercy of randomness" is pretty accurate. You'll have to raise the statistics target as the table grows and I expect you'll eventually run into some downsides of large stats targets. Some better algorithms were posted, but they would require full table scans during analyze, not just samples. -- greg
Sorry for the naive question, but: is there a problem with analyze doing full table scans? Analyze will not lock anything, will it? Peter Greg Stark wrote: > Tom Lane <tgl@sss.pgh.pa.us> writes: > > >> "Ed L." <pgsql@bluepolka.net> writes: >> >>> So, does this sound like we just happened to get repeatedly >>> horribly unrepresentative random samples with stats target at >>> 10? Are we at the mercy of randomness here? Or is there a >>> better preventive procedure we can follow to systematically >>> identify this kind of situation? >>> >> I think the real issue is that stats target 10 is too small for large >> tables: the samples are just not large enough to support a decent >> numdistinct estimate, which is the critical stat for cases such as this >> (ie, estimating the number of hits on a value that's not in the >> most-common-values list). >> > > There's been some discussion on -hackers about this area. Sadly the idea of > using samples to calculate numdistinct estimates is fundamentally on pretty > shaky ground. > > Whereas a fixed sample size works fine for calculating distribution of values, > in order to generate consistent precision for numdistinct estimates the > samples will have to be a constant fraction of the table -- and unfortunately > a pretty large fraction at that. > > So sadly I think "at the mercy of randomness" is pretty accurate. You'll have > to raise the statistics target as the table grows and I expect you'll > eventually run into some downsides of large stats targets. > > Some better algorithms were posted, but they would require full table scans > during analyze, not just samples. > >
Peter Kovacs wrote: > Sorry for the naive question, but: is there a problem with analyze doing > full table scans? Analyze will not lock anything, will it? It used to do that, but the read overhead was too great. --------------------------------------------------------------------------- > > Peter > > Greg Stark wrote: > > Tom Lane <tgl@sss.pgh.pa.us> writes: > > > > > >> "Ed L." <pgsql@bluepolka.net> writes: > >> > >>> So, does this sound like we just happened to get repeatedly > >>> horribly unrepresentative random samples with stats target at > >>> 10? Are we at the mercy of randomness here? Or is there a > >>> better preventive procedure we can follow to systematically > >>> identify this kind of situation? > >>> > >> I think the real issue is that stats target 10 is too small for large > >> tables: the samples are just not large enough to support a decent > >> numdistinct estimate, which is the critical stat for cases such as this > >> (ie, estimating the number of hits on a value that's not in the > >> most-common-values list). > >> > > > > There's been some discussion on -hackers about this area. Sadly the idea of > > using samples to calculate numdistinct estimates is fundamentally on pretty > > shaky ground. > > > > Whereas a fixed sample size works fine for calculating distribution of values, > > in order to generate consistent precision for numdistinct estimates the > > samples will have to be a constant fraction of the table -- and unfortunately > > a pretty large fraction at that. > > > > So sadly I think "at the mercy of randomness" is pretty accurate. You'll have > > to raise the statistics target as the table grows and I expect you'll > > eventually run into some downsides of large stats targets. > > > > Some better algorithms were posted, but they would require full table scans > > during analyze, not just samples. > > > > > > ---------------------------(end of broadcast)--------------------------- > TIP 2: Don't 'kill -9' the postmaster > -- Bruce Momjian http://candle.pha.pa.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
On Wed, May 17, 2006 at 10:29:14PM -0400, Tom Lane wrote: > The reason the default is currently 10 is just conservatism: it was > already an order of magnitude better than what it replaced (a *single* > representative value) and I didn't feel I had the evidence to justify > higher values. It's become clear that the default ought to be higher, > but I've still got no good fix on a more reasonable default. 100 might > be too much, or then again maybe not. > > I encourage you to play around with default_statistics_target and see > what you can learn about quality of estimates vs. planning time. Is the only downside to a large value planning speed? It seems it would be hard to bloat that too much, except in cases where people are striving for millisecond response times, and those folks had better know enough about tuning to be able to adjust the stats target... I'd wager that upping the target to 100 would eliminate a lot more support emails than it creates. -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
"Jim C. Nasby" <jnasby@pervasive.com> writes: > On Wed, May 17, 2006 at 10:29:14PM -0400, Tom Lane wrote: >> The reason the default is currently 10 is just conservatism: it was >> already an order of magnitude better than what it replaced (a *single* >> representative value) and I didn't feel I had the evidence to justify >> higher values. It's become clear that the default ought to be higher, >> but I've still got no good fix on a more reasonable default. 100 might >> be too much, or then again maybe not. > Is the only downside to a large value planning speed? It seems it would > be hard to bloat that too much, except in cases where people are > striving for millisecond response times, and those folks had better know > enough about tuning to be able to adjust the stats target... It would be nice to have some *evidence*, not unsupported handwaving. regards, tom lane
> The reason the default is currently 10 is just conservatism: it was > already an order of magnitude better than what it replaced (a *single* > representative value) and I didn't feel I had the evidence to justify > higher values. It's become clear that the default ought to be higher, > but I've still got no good fix on a more reasonable default. 100 might > be too much, or then again maybe not. > My hands on experience is that 10 is plenty except for the minority of tables within a database. Those table can be accurately represented using alter table without having to adjust the global. That being said, 10 is fairly small and I often find myself setting the value to at least 250 just to keep it out of my way. Sincerely, Joshua D. Drake -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/
Tom Lane wrote: > "Jim C. Nasby" <jnasby@pervasive.com> writes: > >>Is the only downside to a large value planning speed? It seems it would >>be hard to bloat that too much, except in cases where people are >>striving for millisecond response times, and those folks had better know >>enough about tuning to be able to adjust the stats target... > > > It would be nice to have some *evidence*, not unsupported handwaving. For that you'd need a large enough sample for statistics sizes people use/need. To me it has always been a bit vague under what conditions I'd need to change statistics sizes fro tables, I imagine I'm not the only one. That makes it all the harder to determine a good default value. I suppose it'd be useful to have some kind of measurement toolkit that people can run on their databases to collect statistics about what statistics sizes would be "optimal". There must be some mathematical way to determine this on a given data set? Being able to provide these numbers to you guys would then help in determining what a good default statistics size is, and maybe even for determining an algorithm to adjust statistics sizes on the fly... Regards, -- Alban Hertroys alban@magproductions.nl magproductions b.v. T: ++31(0)534346874 F: ++31(0)534346876 M: I: www.magproductions.nl A: Postbus 416 7500 AK Enschede // Integrate Your World //
On Monday May 22 2006 4:43 pm, Tom Lane wrote: > "Jim C. Nasby" <jnasby@pervasive.com> writes: > > On Wed, May 17, 2006 at 10:29:14PM -0400, Tom Lane wrote: > >> The reason the default is currently 10 is just > >> conservatism: it was already an order of magnitude better > >> than what it replaced (a *single* representative value) and > >> I didn't feel I had the evidence to justify higher values. > >> It's become clear that the default ought to be higher, but > >> I've still got no good fix on a more reasonable default. > >> 100 might be too much, or then again maybe not. > > > > Is the only downside to a large value planning speed? It > > seems it would be hard to bloat that too much, except in > > cases where people are striving for millisecond response > > times, and those folks had better know enough about tuning > > to be able to adjust the stats target... > > It would be nice to have some *evidence*, not unsupported > handwaving. Not exactly related the topic at hand, but we set the column specific target to 50 before it would use the plan. Then we set it to values upto 500, analyzing after each alter, and for each each of those fell back into seq scanning. Of 50, 60, 70, 100, 200, 300, 500, only 50 was foolproof. I have no idea why. I would have only expected the plan to get better with higher targets. Ed
On Mon, May 22, 2006 at 06:43:22PM -0400, Tom Lane wrote: > "Jim C. Nasby" <jnasby@pervasive.com> writes: > > On Wed, May 17, 2006 at 10:29:14PM -0400, Tom Lane wrote: > >> The reason the default is currently 10 is just conservatism: it was > >> already an order of magnitude better than what it replaced (a *single* > >> representative value) and I didn't feel I had the evidence to justify > >> higher values. It's become clear that the default ought to be higher, > >> but I've still got no good fix on a more reasonable default. 100 might > >> be too much, or then again maybe not. > > > Is the only downside to a large value planning speed? It seems it would > > be hard to bloat that too much, except in cases where people are > > striving for millisecond response times, and those folks had better know > > enough about tuning to be able to adjust the stats target... > > It would be nice to have some *evidence*, not unsupported handwaving. If someone has an idea on how to actually get that evidence, I'm all ears. In the meantime, the number of people that run into problems with the default of 10 provides pretty good evidence that that number is far too low, so it would be better to take a guess at a better number and miss whatever would be optimal than continue on with something that's pretty clearly a problem. Of course this applies to other GUC's as well; the autovacuum settings come to mind, now that 8.2 does a better job of finding a more realistic value for shared_buffers. -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
On Mon, May 22, 2006 at 05:55:16PM -0700, Joshua D. Drake wrote: > > >The reason the default is currently 10 is just conservatism: it was > >already an order of magnitude better than what it replaced (a *single* > >representative value) and I didn't feel I had the evidence to justify > >higher values. It's become clear that the default ought to be higher, > >but I've still got no good fix on a more reasonable default. 100 might > >be too much, or then again maybe not. > > > > My hands on experience is that 10 is plenty except for the minority of > tables within a database. Those table can be accurately represented > using alter table without having to adjust the global. Ditto. > That being said, 10 is fairly small and I often find myself setting the > value to at least 250 just to keep it out of my way. And ditto (though I normally go for 100). Have you ever run into problems from setting this too high? -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
>> That being said, 10 is fairly small and I often find myself setting the >> value to at least 250 just to keep it out of my way. > > And ditto (though I normally go for 100). > > Have you ever run into problems from setting this too high? No. Except that it does increase the time it takes to analyze which can be a performance factor if you IO is limited. Joshua D. Drake -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/
"Jim C. Nasby" <jnasby@pervasive.com> writes: > On Mon, May 22, 2006 at 06:43:22PM -0400, Tom Lane wrote: >> It would be nice to have some *evidence*, not unsupported handwaving. > If someone has an idea on how to actually get that evidence, I'm all > ears. Well, for example, actually measuring the planner overhead from larger pg_statistic entries would be interesting. Plus how much more time ANALYZE takes to generate the entries. (I'm afraid that ANALYZE is probably worse-than-linear CPU-wise, but it may be dominated by disk access.) regards, tom lane
On Tue, May 23, 2006 at 06:18:07PM -0400, Tom Lane wrote: > "Jim C. Nasby" <jnasby@pervasive.com> writes: > > On Mon, May 22, 2006 at 06:43:22PM -0400, Tom Lane wrote: > >> It would be nice to have some *evidence*, not unsupported handwaving. > > > If someone has an idea on how to actually get that evidence, I'm all > > ears. > > Well, for example, actually measuring the planner overhead from larger > pg_statistic entries would be interesting. Plus how much more time > ANALYZE takes to generate the entries. (I'm afraid that ANALYZE is > probably worse-than-linear CPU-wise, but it may be dominated by disk > access.) How should I go about analyzing planner time? Subtract \timing from EXPLAIN ANALYZE? Well, I did find one reason not to go ape with this: the number of pages analyzed scales with the number of buckets, so doubling the statistics target will roughly double the ANALYZE time for any table over 6000 pages (though the effect isn't linear, see below). There is a small increase in time for a small table, but I doubt it's enough for anyone to care: (single AMD64, 1G memory, FBSD 6.0, software RAID1) bench=# set default_statistics_target= 100; SET Time: 0.320 ms bench=# analyze verbose accounts;analyze verbose branches; INFO: analyzing "public.accounts" INFO: "accounts": scanned 30000 of 5172414 pages, containing 1740000 live rows and 0 dead rows; 30000 rows in sample, 300000012estimated total rows ANALYZE Time: 198892.080 ms INFO: analyzing "public.branches" INFO: "branches": scanned 17 of 17 pages, containing 3000 live rows and 0 dead rows; 3000 rows in sample, 3000 estimatedtotal rows ANALYZE Time: 25.133 ms bench=# set default_statistics_target= 10; SET Time: 0.212 ms bench=# analyze verbose accounts;analyze verbose branches; INFO: analyzing "public.accounts" INFO: "accounts": scanned 3000 of 5172414 pages, containing 174000 live rows and 0 dead rows; 3000 rows in sample, 300000012estimated total rows ANALYZE Time: 27227.885 ms INFO: analyzing "public.branches" INFO: "branches": scanned 17 of 17 pages, containing 3000 live rows and 0 dead rows; 3000 rows in sample, 3000 estimatedtotal rows ANALYZE Time: 1.973 ms bench=# analyze branches; ANALYZE Time: 2.016 ms bench=# analyze branches; ANALYZE Time: 2.009 ms bench=# set default_statistics_target= 100; SET Time: 0.210 ms bench=# analyze branches; ANALYZE Time: 2.231 ms bench=# analyze branches; ANALYZE Time: 2.346 ms bench=# analyze branches; ANALYZE Time: 9.220 ms bench=# analyze branches; ANALYZE Time: 2.057 ms bench=# -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
On Tuesday May 23 2006 4:55 pm, Jim C. Nasby wrote: > Well, I did find one reason not to go ape with this: the > number of pages analyzed scales with the number of buckets, so > doubling the statistics target will roughly double the ANALYZE > time for any table over 6000 pages (though the effect isn't > linear, see below). There is a small increase in time for a > small table, but I doubt it's enough for anyone to care: Are you accounting for your well-primed OS and DB caches? I'd think a more realistic test would clear those of the target table between measurements. Ed
On Tue, May 23, 2006 at 05:00:13PM -0600, Ed L. wrote: > On Tuesday May 23 2006 4:55 pm, Jim C. Nasby wrote: > > Well, I did find one reason not to go ape with this: the > > number of pages analyzed scales with the number of buckets, so > > doubling the statistics target will roughly double the ANALYZE > > time for any table over 6000 pages (though the effect isn't > > linear, see below). There is a small increase in time for a > > small table, but I doubt it's enough for anyone to care: > > Are you accounting for your well-primed OS and DB caches? I'd > think a more realistic test would clear those of the target > table between measurements. That was after a number of other analyze runs that had already happened, and the table way larger than my cache, so it's unlikely that caching played much of an issue. -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461