Bruce Momjian <pgman@candle.pha.pa.us> writes:
>> How's reading a sufficiently large fraction of random rows going to be
>> significantly faster than reading all rows? If you're just going to read
>> the first n rows then that isn't really random, is it?
> Ingres did this too, I thought. You could specify a certain number of
> random rows, perhaps 10%. On a large table, that is often good enough
> and much faster. Often 2% is enough.
Peter's got a good point though. Even 2% is going to mean fetching most
or all of the blocks in the table, for typical-size rows. Furthermore,
fetching (say) every second or third block is likely to be actually
slower than a straight sequential read, because you're now fighting the
kernel's readahead policy instead of working with it.
To get a partial VACUUM ANALYZE that was actually usefully faster than
the current code, I think you'd have to read just a few percent of the
blocks, which means much less than a few percent of the rows ... unless
maybe you picked selected blocks but then used all the rows in those
blocks ... but is that a random sample? It's debatable.
I find it hard to believe that VAC ANALYZE is all that much slower than
plain VACUUM anyway; fixing the indexes is the slowest part of VACUUM in
my experience. It would be useful to know exactly what the columns are
in a table where VAC ANALYZE is considered unusably slow.
regards, tom lane