Antonin Houska <ah@cybertec.at> writes:
> When doing my experiments with bucket split ([1]), I noticed a comment that
> _hash_getnewbuf should not be called concurrently. However, there's no
> synchronization of calls from _hash_splitbucket in HEAD. I could reproduce
> such concurrent calls using gdb (multiple bucket splits in progress at a
> time).
Ugh.
> [ locking the metapage around the _hash_getnewbuf call ]
> I think it'd also be the easiest fix for _hash_splitbucket. Or should a
> separate ("regular") lock be introduced and used and used in both cases?
That doesn't really fix the problem: once we release lock on the metapage,
if a new split starts then it would try to create a bucket at the next
higher page number, so that the _hash_getnewbuf() calls might occur "out
of order", leading to failures of the crosschecks therein.
I think the only simple fix here is to rearrange _hash_splitbucket's API
so that the _hash_getnewbuf() call is done before we drop the metapage
lock at all.  Not sure offhand if the best way is to move that call
into _hash_expandtable(), or to change the API definition so that
_hash_splitbucket is responsible for dropping the metapage lock at the
right time.
Obviously, this whole business of needing the metapage lock to add pages
is nasty.  I wonder if we could use the relation extension lock instead
to serialize those operations, rather than requiring a metapage buffer
lock.  But that would be a much more invasive thing.  In any case, the
current state of affairs is that the relation physical length is tightly
tied to the bucket mapping defined by the metapage contents, and we have
to change both of those together.
Actually ... one of the other things that's not being accounted for here
is the possibility of ENOSPC while adding the new page.  Really, we want
to attempt the extension *before* we modify the bucket mapping.  So that
seems to argue for moving the _hash_getnewbuf call into _hash_expandtable,
where it could be done just before we scribble on the metapage.  OTOH,
we're not any less screwed if we get ENOSPC while trying to add an
overflow page later on :-(, since future readers would assume the bucket
split's been performed.
        regards, tom lane