On Mon, 2008-10-13 at 08:30 -0400, Tom Lane wrote:
> Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
> > No, I was thinking of something along the lines of:
> > INFO: clustering "public.my_c"
> > INFO: complete, was 33%, now 100% clustered
> > The only such measure that we have is the correlation, which isn't very
> > good anyway, so I'm not sure if that's worthwhile.
>
> It'd be possible to count the number of order reversals during the
> indexscan, ie the number of tuples with CTID lower than the previous
> one's. But I'm not sure how useful that number really is. Also it's
> not clear how to preserve such functionality if cluster is
> re-implemented with a sort.
>
I assume here you mean a CTID with a lower page number, as the line
pointer wouldn't make any difference, right?
I think it would be a useful metric to decide whether or not to use an
index scan (I don't know how easy it is to estimate this from a sample,
but a CLUSTER could clearly get an exact number). It would solve the
problem where synchronized scans used by pg_dump could result in poor
correlation on restore and therefore not choose index scans (which is
what prompted turning off sync scans for pg_dump).
Regards,Jeff Davis