On Fri, 2008-12-12 at 18:01 +0000, Greg Stark wrote:
> I think you need to find two different formulas, one which represents
> a clustered table and one which represents randomly distributed data.
> Then you need a way to measure just how clustered the data is so you
> know how much weight to give each formula. Perhaps comparing the
> number of duplicates in whole-block samples versus overall random
> selections would give that measure.
Please read the Chaudhuri paper.
-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support