There is a deadlock hazard in parallel hashjoin (pointed out by Thomas
Munro in the past). Workers attached to the stripe_barrier emit tuples
and then wait on that barrier.
I believe that that can be addressed starting with this
relatively unoptimized solution:
- after probing a stripe in a batch, a worker sets the status of that
batch to "tentatively done" and saves the stripe_barrier phase
- if that worker is not the only worker attached to that batch, it
detaches from both stripe and batch barriers and moves on to other
batches
- if that worker is the only worker attached to the batch, it will
proceed to load the next stripe of that batch, and, once it has
finished loading, it will set the status of the batch back to "not
done" for itself
- when the other worker encounters that batch again, if the
stripe_barrier phase has not moved forward, it will mark that batch as
done for itself. if the stripe_barrier phase has moved forward, it can
join in in probing this batch for the current stripe.
Just to follow-up on the stripe barrier deadlock, I've implemented a
solution and attached it.
There are three solutions I've thought about so far:
1) leaders don't participate in fallback batches
2) serial after stripe 0
no worker can join a batch after any worker has left and only one
worker can work on stripes after stripe 0
3) provisionally complete batches
After the end of stripe 0, all workers except the last worker
detach from the stripe barrier, mark the batch as provisionally
done, save the stripe barrier phase, and move on to another batch.
Later, when one of these workers returns to the batch, if it is
not already done, the worker checks to see if the phase of the
stripe barrier has advanced. If the phase has advanced, it means
that no one is waiting for that worker. The worker can join that
batch. If the phase hasn't advanced, the worker won't risk
deadlock and will simply mark the batch as done. The last worker
executes the normal path -- participating in each stripe.
I've attached a patch to implement solution 3
v7-0002-Provisionally-detach-unless-last-worker.patch
This isn't a very optimized version of this solution. It detaches from
the stripe barrier and closes the outer match status bitmap upon
provisional completion by a worker. However, I ran into some problems
keeping outer match status bitmaps open for multiple batches at a time.
I've also attached the original adaptive hashjoin patch with a couple
small tweaks (not quite meriting a patch version bump, but that seemed
like the easiest way).