On Sat, Dec 2, 2017 at 4:46 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> On Sat, Dec 2, 2017 at 3:54 PM, Thomas Munro
> <thomas.munro@enterprisedb.com> wrote:
>> On Sat, Dec 2, 2017 at 1:55 PM, Andres Freund <andres@anarazel.de> wrote:
>>> - As we don't clean temp files after crash-restarts it isn't impossible
>>> to have a bunch of crash-restarts and end up with pids *and* per-pid
>>> shared file set counters reused. Which'd lead to conflicts. Do we care?
>>
>> We don't care. PathNameCreateTemporaryDir() on a path that already
>> exists will return silently. PathNameCreateTemporaryFile() on a path
>> that already exists, as mentioned in a comment and following an
>> existing convention, will open and truncate it. So either it was
>> really an orphan and that is a continuation of our traditional
>> recycling behaviour, or the calling code has a bug and used the same
>> path twice and it's not going to end well.
>
> On further reflection, there are problems with that higher up. (1)
> Even though PathNameCreateTemporaryFile() will happily truncate and
> reuse an orphaned file when BufFileCreateShared() calls it,
> BufFileOpenShared() could get confused by the orphaned files. It
> could believe that XXX.1 is a continuation of XXX.0, when in fact it
> is junk left over from a crash restart. Perhaps BufFileCreateShared()
> needs to delete XXX.{N+1} if it exists, whenever it creates XXX.{N}.
> That will create a gap in the series of existing files that will cause
> BufFileOpenShared()'s search to terminate. (2) BufFileOpenShared()
> thinks that the absence of an XXX.0 file means there is no BufFile by
> this name, when it could mistakenly open pre-crash junk due to a
> colliding name. I use that condition as information, but I think I
> can fix that easily by using SharedTuplestoreParticipant::npage == 0
> to detect that there should be no file, rather than trying to open it,
> and then I can define this problem away by declaring that
> BufFileOpenShared() on a name that you didn't call
> BufFileCreateShared() on invokes undefined behaviour. I will make it
> so.
Here is a patch to deal with that problem. Thoughts?
I suppose if we wanted to make this type of problem go away, but still
keep files for forensic purposes, we could introduce a "restart
number", and stick it into the top level temporary directory's name.
That way you'd never be able to collide with files created before a
crash-restart, and we could add O_EXCL so it'd become an error to try
to create the same filename again.
I'll post a new SharedTuplestore and Parallel Hash patch set shortly.
--
Thomas Munro
http://www.enterprisedb.com