Jim Cox wrote:
> On Mon, Oct 13, 2008 at 8:30 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
>>> No, I was thinking of something along the lines of:
>>> INFO: clustering "public.my_c"
>>> INFO: complete, was 33%, now 100% clustered
>>> The only such measure that we have is the correlation, which isn't very
>>> good anyway, so I'm not sure if that's worthwhile.
>> It'd be possible to count the number of order reversals during the
>> indexscan, ie the number of tuples with CTID lower than the previous
>> one's. But I'm not sure how useful that number really is.
It will look bad for patterns like:
2
1
4
3
6
5
..
which for all practical purposes is just as good as a perfectly sorted
table. So no, I don't think that's a very useful metric either without
somehow taking caching effects into account.
> Another version of the patch should be attached, this time counting the
> number of "inversions" (pairs of tuples in the table that are in the wrong
> order) as a measure of the "sortedness" of the original data (scanned/live
> numbers still reported as an indication of the extent to which the table was
> vacuumed).
Until we have a better metric for "sortedness", my earlier suggestion to
print it was probably a bad idea. If anything, should probably print the
same correlation metric that ANALYZE calculates, so that it would at
least match what the planner uses for decision-making.
-- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com