> It sounds like the costing model might need a bit more work before we commit
>this.
I tried again the simple sql tests I posted a while ago, and I still get the
same ratios.
I've tested the applied patch on a dual opteron + disk array Solaris machine.
I really don't get how a laptop hard drive can be faster at reading data using
random
seeks (required by the original cluster method) than seq scan + sort for the 5M
rows
test case.
Same thing for the "cluster vs bloat" test: the seq scan + sort is faster on my
machine.
I've just noticed that Josh used shared_buffers = 16MB for the "cluster vs
bloat" test:
I'm using a much higher shared_buffers (I think something like 200MB), since if
you're working with tables this big I thought it could be a more appropriate
value.
Maybe that's the thing that makes the difference???
Can someone else test the patch?
And: I don't have that deep knowledge of how postgresql deletes rows; but I
thought
that something like:
DELETE FROM mybloat WHERE RANDOM() < 0.9;
would only delete data, not indexes; so the patch should perform even better in
this
case (as it does, in fact, on my test machine), as:
- the original cluster method would read the whole index, and fetch only the
"still alive"
rows
- the new method would read the table using a seq scan, and sort in memory the
few
rows still alive
But, as I said, maybe I'm getting this part wrong...