On Mon, 18 Jun 2001, Zeugswetter Andreas SB wrote:
> First of all thanks for the great effort, it will surely be appreciated :-)
>
> > * On large tables, ANALYZE uses a random sample of rows rather than
> > examining every row, so that it should take a reasonably short time
> > even on very large tables. Possible downside: inaccurate stats.
> > We need to find out if the sample size is large enough.
>
> Imho that is not optimal :-) ** ducks head, to evade flying hammer **
> 1. the random sample approach should be explicitly requested with some
> syntax extension
> 2. the sample size should also be tuneable with some analyze syntax
> extension (the dba chooses the tradeoff between accuracy and runtime)
> 3. if at all, an automatic analyze should do the samples on small tables,
> and accurate stats on large tables
>
> The reasoning behind this is, that when the optimizer does a "mistake"
> on small tables the runtime penalty is small, and probably even beats
> the cost of accurate statistics lookup. (3 page table --> no stats
> except table size needed)
I disagree.
As monte carlo method shows, _as long as you_ query random rows, your
result will be sufficiently close to the real statistics. I'm not sure if
I can find math behind this, though...
-alex