Pius Chan <pchan@contigo.com> writes:
> Thanks for your prompt response. Yeah, I should have provided you with my testing scripts. BTW, during numerous
tests,I felt that if there is no long holding transaction (the one used for middle-tier service master/slave failover),
thedatabase server is much quicker to recover the space left by dead-row and it is also hard to make the TOAST area
grow.Therefore, it is hard for me to reproduce the ERROR if there is no long-holding open transaction. Do you have any
insightto it?
I think the proximate cause is probably this case mentioned in
GetOldestXmin's comments:
* if allDbs is FALSE and there are no transactions running in the current
* database, GetOldestXmin() returns latestCompletedXid. If a transaction
* begins after that, its xmin will include in-progress transactions in other
* databases that started earlier, so another call will return a lower value.
So the trouble case is where autovacuum on the toast table starts at an
instant where nothing's running in the "test" database, but there are
pre-existing transaction(s) in the other database. Then later CLUSTER
starts at an instant where transactions are running in "test" and their
xmins include the pre-existing transactions.
So you need long-running transactions in another DB than the one where
the vacuuming/clustering action is happening, as well as some unlucky
timing. Assuming my theory is the correct one, of course.
regards, tom lane