Hi Tom,
On 12/4/17 3:15 PM, Tom Lane wrote:
> While working through Michael Paquier's patch to clean up inconsistent
> usage of AllocateDir(), I noticed that ResetUnloggedRelations and its
> subroutines are not consistent about whether a directory open failure
> results in erroring out or just emitting a LOG message and continuing.
> ResetUnloggedRelations itself throws a hard error if it fails to open
> pg_tblspc, but all the rest of reinit.c thinks a LOG message is
> sufficient.
By a strange coincidence I spent a while today reading through this code...
> My first thought was to change ResetUnloggedRelations to match the
> rest, but on reflection I'm less sure about that. What we've got
> at the moment is that a possibly-transient directory open failure
> can result in failure to reset an unlogged relation to empty,
> which to me amounts to data corruption.
I'm wondering how this transient directory open failure is going to
happen without a bunch of other things going wrong, but I agree that if
it happens then corruption would be the likely result.
> If the contents of the
> unlogged relation are inconsistent, which is plenty likely after
> a crash, we could end up crashing later because of that; and in
> any case the user would not see what they expect in the tables.
Agreed.
> So now I'm thinking we should do the reverse and change these functions
> to give a hard error on AllocateDir failure. That would result in
> startup-process failure if we are unable to scan the database, which is
> not great, but there's certainly something badly wrong if we can't.
+1. If a tablespace or database directory cannot be opened then I don't
think it makes any sense to continue.
Regards,
--
-David
david@pgmasters.net