Christopher Kings-Lynne <chriskl@familyhealth.com.au> writes:
> Maybe we could avoid removing it until the next checkpoint? Or is that
> not enough. Maybe it could stay there forever :/
Part of the problem here is that this code has to serve several
purposes. We have different scenarios to worry about:
* crash recovery from the most recent checkpoint
* PITR replay over a long interval (many checkpoints)
* recovery in the face of a partially corrupt filesystem
It's the last one that is mostly bothering me at the moment. I don't
want us to throw away data simply because the filesystem forgot an
inode. Yeah, we might not have enough data in the WAL log to completely
reconstruct a table, but we should push out what we do have, *not* toss
it into the bit bucket.
In the first case (straight crash recovery) I think it is true that any
reference to a missing file is a reference to a file that will get
deleted before recovery finishes. But I don't think that holds for PITR
(we might be asked to stop short of where the table gets deleted) nor
for the case where there's been filesystem damage.
regards, tom lane