pgsql@mohawksoft.com wrote:
>
> In this case, the behavior observed could be changed by altering the
> sample size for a table. I submit that an arbitrary fixed sample size is
> not a good base for the analyzer, but that the sample size should be based
> on the size of the table or some calculation of its deviation.
>
I can see your point, however I wonder if the issue is that the default
stats settings of '10' (3000 rows, 10 histogram buckets) is too low, and
maybe we should consider making a higher value (say '100') the default.
> There is no reason why old stats can't be used to create more accurate
> stats. Using succesive analyze operations, we could create better
> statistics for the planner. We can increase the sample size based on the
> table size. We could, I suppose, also calculate some sort of deviation
> statistic so that "n_distinct" can be calculated better with a smaller
> sample set.
The idea of either automatically increasing sample size for large
tables, or doing a few more samplings with different sizes and examining
the stability of the estimates is rather nice, provided we can keep the
runtime for ANALYZE to reasonable limits, I guess :-)
>
> The basic problem, though, is that PostgreSQL performed incorrectly on a
> simple query after indexes were created and analyze performed. Yes, it can
> be corrected, that's what led me to my conclusions, but shouldn't we try
> to devise a better system in the future to improve PostgreSQL so it does
> not need this sort of tuning?
>
Thanks for clarifying.
bets wishes
Mark