On 04/13/2014 11:39 PM, Heikki Linnakangas wrote:
> However, I just noticed that there's a race condition between PREPARE
> TRANSACTION and COMMIT/ROLLBACK PREPARED. PostPrepare_Locks runs after
> the prepared transaction is already marked as fully prepared. That means
> that by the time we get to PostPrepare_Locks, another backend might
> already have finished and removed the prepared transaction. That leads
> to a PANIC (put a breakpoint just before PostPrepare_Locks):
>
> postgres=# commit prepared 'foo';
> PANIC: failed to re-find shared proclock object
> PANIC: failed to re-find shared proclock object
> The connection to the server was lost. Attempting reset: Failed.
>
> FinishPrepareTransaction reads the list of locks from the two-phase
> state file, but PANICs when it doesn't find the corresponding locks in
> the lock manager (because PostPrepare_Locks hasn't transfered them to
> the dummy PGPROC yet).
>
> I think we'll need to transfer of the locks earlier, before the
> transaction is marked as fully prepared. I'll take a closer look at this
> tomorrow.
Here's a patch to do that. It's very straightforward, I just moved the
calls to transfer locks earlier, before ProcArrayClearTransaction.
PostPrepare_MultiXact had a similar race - it also transfer state from
the old PGPROC entry to the new, and needs to be done before allowing
another backend to remove the new PGPROC entry. I changed the names of
the functions to distinguish them from the other PostPrepare_* functions
that now happen at a different time.
The patch is simple, but it's a bit scary to change the order of things
like this. Looking at all the calls that now happen after transferring
the locks, I believe this is OK. The change also applies to the
callbacks called by the RegisterXactCallback mechanism, which means that
in theory there might be a 3rd party extension out there that's
affected. All the callbacks in contrib and plpgsql are OK, and it's
questionable to do anything complicated that would depend on
heavy-weight locks to be held in those callbacks, so I think this is OK.
Warrants a note in the release notes, though.
- Heikki