Re: asynchronous and vectorized execution
| От | Amit Khandekar | 
|---|---|
| Тема | Re: asynchronous and vectorized execution | 
| Дата | |
| Msg-id | CAJ3gD9fZ=rtBZ0i1_pxycbkgxi=OzTgv1n0ojkmK318Mcc921A@mail.gmail.com обсуждение исходный текст | 
| Ответ на | Re: asynchronous and vectorized execution (Robert Haas <robertmhaas@gmail.com>) | 
| Ответы | Re: asynchronous and vectorized execution Re: asynchronous and vectorized execution | 
| Список | pgsql-hackers | 
On 13 September 2016 at 20:20, Robert Haas <robertmhaas@gmail.com> wrote:
 
On Mon, Aug 29, 2016 at 4:08 AM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote: 
> [ new patches ]
+ /*
+ * We assume that few nodes are async-aware and async-unaware
+ * nodes cannot be revserse-dispatched from lower nodes that is
+ * async-aware. Firing of an async node that is not a descendant
+ * of the planstate will cause such reverse-diaptching to
+ * async-aware nodes, which is unexpected behavior for them.
+ *
+ * For instance, consider an async-unaware Hashjoin(OUTER, INNER)
+ * where the OUTER is running asynchronously but the Hashjoin is
+ * waiting on the async INNER during inner-hash creation. If the
+ * OUTER fires for the case, since anyone is waiting on it,
+ * ExecAsyncWaitForNode finally dispatches to the Hashjoin which
+ * is now in the middle of thing its work.
+ */
+ if (!IsParent(planstate, node))
+ continue;
I'm not entirely sure that I understand this comment, but I don't
think it's going in the right direction. Let's start with the example
in the second paragraph. If the hash join is async-unaware, then it
isn't possible for the hash join to be both running the outer side of
the join asynchronously and at the same time waiting on the inner
side. Once it tries to pull the first tuple from the outer side, it's
waiting for that to finish and can't do anything else. So, the inner
side can't possibly get touched in any way until the outer side
finishes. For anything else to happen, the hash join would have to be
async-aware. Even if we did that, I don't think it would be right to
kick off both sides of the hash join at the same time. Right now, if
the outer side turns out to be empty, we never need to build the hash
table, and that's good.
I feel the !IsParent() condition is actually to prevent the infinite wait
caused by a re-entrant issue in ExecAsuncWaitForNode() that Kyotaro mentioned
earlier. But yes, the comments don't explain exactly how the hash join can
cause the re-entrant issue.
But I attempted to come up with some testcase which might reproduce the
infinite-waiting in ExecAsyncWaitForNode() after removing the !IsParent() check
so that the other subtree nodes are also included, but I couldn't reproduce.
Kyotaro, is it possible for you to give a testcase that consistently hangs if
we revert back the !IsParent() check ?
I was also thinking about another possibility where the same plan state node is
re-entered, as explained below. 
I don't think it's a good idea to wait for only nodes that are in the
current subtree. For example, consider a plan like this:
Append
-> Foreign Scan on a
-> Hash Join
-> Foreign Scan on b
-> Hash
-> Seq Scan on x
Suppose Append and Foreign Scan are parallel-aware but the other nodes
are not. Append kicks off the Foreign Scan on a and then waits for
the hash join to produce a tuple; the hash join kicks off the Foreign
Scan on b and waits for it to return a tuple. If, while we're waiting
for the foreign scan on b, the foreign scan on a needs some attention
- either to produce tuples, or maybe just to call PQconsumeInput() so
that more data can be sent from the other side, I think we need to be
able to do that. There's no real problem here; even if the Append
becomes result-ready before the hash join returns, that is fine.
Yes I agree : we should be able to do this. Sine we have all the waiting events
in a common estate, there's no harm if we start executing nodes of another
sub-tree if we get an event from there.
But I am thinking about what would happen when this node from other sub-tree
returns result_ready, and then it's parents are called, and then the result
gets bubbled up upto the node which had already caused us to call
ExecAsyncWaitForNode() in the first place.
For e.g., in the above plan which you specified, suppose :
1. Hash Join has called ExecProcNode() for the child foreign scan b, and so is
waiting in ExecAsyncWaitForNode(foreign_scan_on_b).
2. The event wait list already has foreign scan on a that is on a different
subtree.
3. This foreign scan a happens to be ready, so in 
ExecAsyncWaitForNode (), ExecDispatchNode(foreign_scan_a) is called,
which returns with result_ready.
4. Since it returns result_ready, it's parent node is now inserted in the
callbacks array, and so it's parent (Append) is executed.
5. But, this Append planstate is already in the middle of executing Hash
join, and is waiting for HashJoin.
Is this safe to execute the same plan state when it is already inside its
execution ? In other words, is the plan state re-entrant ? I suspect, the new
execution may even corrupt the structures with which it was already executing.
In usual cases, a tree can contain multiple plan state nodes belonging to the
same plan node, but in this case, we are having the same plan state node being
executed again while it is already executing.
I suspect this can be one reason why Kyotaro might be getting infinite
recursion issues. May be we need to prevent a plan state node to re-enter,
but allow nodes from any subtree to execute. So propagate the result upwards
until we get a node which is already executing.
В списке pgsql-hackers по дате отправления: