Обсуждение: The case when AsyncAppend exists also in the qual of Async ForeignScan
Hi, I found one bug. Look at the case: CREATE TABLE test (x int) PARTITION BY HASH (x); CREATE TABLE test_1 (x int); CREATE TABLE test_2 (x int); CREATE FOREIGN TABLE ftest_1 PARTITION OF test FOR VALUES WITH (modulus 2, remainder 0) SERVER loopback OPTIONS (table_name 'test_1'); CREATE FOREIGN TABLE ftest_2 PARTITION OF test FOR VALUES WITH (modulus 2, remainder 1) SERVER loopback2 OPTIONS (table_name 'test_2'); INSERT INTO test (SELECT * FROM generate_series(1,10)); ANALYZE test,test_1,test_2,ftest_1,ftest_2; EXPLAIN (ANALYZE) SELECT * FROM test WHERE x NOT IN ( SELECT avg(x) FROM test WHERE x > 10 ); ERROR: InstrEndLoop called on running node When I added ', COSTS OFF, SUMMARY OFF, TIMING OFF' this example fall down into the SEGFAULT. I prepared quick fix for this problem (see patch in attachment). Maybe it is'nt a best solution but it can speedup search of it. -- regards, Andrey Lepikhov Postgres Professional
Вложения
Hi Andrey, On Fri, Jul 2, 2021 at 10:24 PM Andrey Lepikhov <a.lepikhov@postgrespro.ru> wrote: > I found one bug. Look at the case: > > CREATE TABLE test (x int) PARTITION BY HASH (x); > CREATE TABLE test_1 (x int); > CREATE TABLE test_2 (x int); > CREATE FOREIGN TABLE ftest_1 PARTITION OF test > FOR VALUES WITH (modulus 2, remainder 0) > SERVER loopback OPTIONS (table_name 'test_1'); > CREATE FOREIGN TABLE ftest_2 PARTITION OF test > FOR VALUES WITH (modulus 2, remainder 1) > SERVER loopback2 OPTIONS (table_name 'test_2'); > INSERT INTO test (SELECT * FROM generate_series(1,10)); > ANALYZE test,test_1,test_2,ftest_1,ftest_2; > > EXPLAIN (ANALYZE) > SELECT * FROM test WHERE x NOT IN ( > SELECT avg(x) FROM test WHERE x > 10 > ); > ERROR: InstrEndLoop called on running node > > When I added ', COSTS OFF, SUMMARY OFF, TIMING OFF' this example fall > down into the SEGFAULT. > > I prepared quick fix for this problem (see patch in attachment). Maybe > it is'nt a best solution but it can speedup search of it. I’ll looking into this. Thanks for the report and patch! Best regards, Etsuro Fujita
On Sat, Jul 3, 2021 at 8:35 AM Etsuro Fujita <etsuro.fujita@gmail.com> wrote: > On Fri, Jul 2, 2021 at 10:24 PM Andrey Lepikhov > <a.lepikhov@postgrespro.ru> wrote: > > I found one bug. Look at the case: > > > > CREATE TABLE test (x int) PARTITION BY HASH (x); > > CREATE TABLE test_1 (x int); > > CREATE TABLE test_2 (x int); > > CREATE FOREIGN TABLE ftest_1 PARTITION OF test > > FOR VALUES WITH (modulus 2, remainder 0) > > SERVER loopback OPTIONS (table_name 'test_1'); > > CREATE FOREIGN TABLE ftest_2 PARTITION OF test > > FOR VALUES WITH (modulus 2, remainder 1) > > SERVER loopback2 OPTIONS (table_name 'test_2'); > > INSERT INTO test (SELECT * FROM generate_series(1,10)); > > ANALYZE test,test_1,test_2,ftest_1,ftest_2; > > > > EXPLAIN (ANALYZE) > > SELECT * FROM test WHERE x NOT IN ( > > SELECT avg(x) FROM test WHERE x > 10 > > ); > > ERROR: InstrEndLoop called on running node > > > > When I added ', COSTS OFF, SUMMARY OFF, TIMING OFF' this example fall > > down into the SEGFAULT. > > > > I prepared quick fix for this problem (see patch in attachment). Maybe > > it is'nt a best solution but it can speedup search of it. > > I’ll looking into this. Thanks for the report and patch! I tried to reproduce this in my environment using HEAD and PG14, but I couldn't. Could you elaborate a bit more on how to reproduce this? Best regards, Etsuro Fujita
On 8/7/21 10:49, Etsuro Fujita wrote: > On Sat, Jul 3, 2021 at 8:35 AM Etsuro Fujita <etsuro.fujita@gmail.com> wrote: >> On Fri, Jul 2, 2021 at 10:24 PM Andrey Lepikhov >> <a.lepikhov@postgrespro.ru> wrote: >>> I found one bug. Look at the case: >>> >>> CREATE TABLE test (x int) PARTITION BY HASH (x); >>> CREATE TABLE test_1 (x int); >>> CREATE TABLE test_2 (x int); >>> CREATE FOREIGN TABLE ftest_1 PARTITION OF test >>> FOR VALUES WITH (modulus 2, remainder 0) >>> SERVER loopback OPTIONS (table_name 'test_1'); >>> CREATE FOREIGN TABLE ftest_2 PARTITION OF test >>> FOR VALUES WITH (modulus 2, remainder 1) >>> SERVER loopback2 OPTIONS (table_name 'test_2'); >>> INSERT INTO test (SELECT * FROM generate_series(1,10)); >>> ANALYZE test,test_1,test_2,ftest_1,ftest_2; >>> >>> EXPLAIN (ANALYZE) >>> SELECT * FROM test WHERE x NOT IN ( >>> SELECT avg(x) FROM test WHERE x > 10 >>> ); >>> ERROR: InstrEndLoop called on running node >>> >>> When I added ', COSTS OFF, SUMMARY OFF, TIMING OFF' this example fall >>> down into the SEGFAULT. >>> >>> I prepared quick fix for this problem (see patch in attachment). Maybe >>> it is'nt a best solution but it can speedup search of it. >> >> I’ll looking into this. Thanks for the report and patch! > > I tried to reproduce this in my environment using HEAD and PG14, but I > couldn't. Could you elaborate a bit more on how to reproduce this? > > Best regards, > Etsuro Fujita > Initially I cought this bug during TPC-H benchmarking, query Q16, right after start of execution. Key thing here is using a hashed subplan. -- regards, Andrey Lepikhov Postgres Professional
On Thu, Jul 8, 2021 at 5:06 PM Andrey Lepikhov <a.lepikhov@postgrespro.ru> wrote: > Initially I cought this bug during TPC-H benchmarking, query Q16, right > after start of execution. > Key thing here is using a hashed subplan. The plan I got also uses a hashed SubPlan; moreover, the plan is the same as the one shown in your patch. Best regards, Etsuro Fujita
On 8/7/21 10:49, Etsuro Fujita wrote: > I tried to reproduce this in my environment using HEAD and PG14, but I > couldn't. Could you elaborate a bit more on how to reproduce this? Interesting... I use the only script (see t.sql) right after the instance has started. Output see in t.out All my specific preferences: autovacuum = 'off' max_worker_processes = 64 max_parallel_workers_per_gather = 16 max_parallel_workers = 32 parallel_setup_cost = 100 parallel_tuple_cost = 0.05 min_parallel_table_scan_size = 0 -- regards, Andrey Lepikhov Postgres Professional
Вложения
On 8/7/21 10:49, Etsuro Fujita wrote: Important. If you want to get the InstrEndLoop error, you need to enable timing in the EXPLAIN operation. -- regards, Andrey Lepikhov Postgres Professional
On Thu, Jul 8, 2021 at 7:18 PM Andrey Lepikhov <a.lepikhov@postgrespro.ru> wrote: > On 8/7/21 10:49, Etsuro Fujita wrote: > > I tried to reproduce this in my environment using HEAD and PG14, but I > > couldn't. Could you elaborate a bit more on how to reproduce this? > Interesting... > I use the only script (see t.sql) right after the instance has started. Reproduced here. I’ll continue to investigate this. The reason I couldn’t reproduce this is my setting where partitions were created on the same foreign server. :-( Thanks! Best regards, Etsuro Fujita
On Fri, Jul 2, 2021 at 10:24 PM Andrey Lepikhov <a.lepikhov@postgrespro.ru> wrote: > I found one bug. Look at the case: > > CREATE TABLE test (x int) PARTITION BY HASH (x); > CREATE TABLE test_1 (x int); > CREATE TABLE test_2 (x int); > CREATE FOREIGN TABLE ftest_1 PARTITION OF test > FOR VALUES WITH (modulus 2, remainder 0) > SERVER loopback OPTIONS (table_name 'test_1'); > CREATE FOREIGN TABLE ftest_2 PARTITION OF test > FOR VALUES WITH (modulus 2, remainder 1) > SERVER loopback2 OPTIONS (table_name 'test_2'); > INSERT INTO test (SELECT * FROM generate_series(1,10)); > ANALYZE test,test_1,test_2,ftest_1,ftest_2; > > EXPLAIN (ANALYZE) > SELECT * FROM test WHERE x NOT IN ( > SELECT avg(x) FROM test WHERE x > 10 > ); > ERROR: InstrEndLoop called on running node > > When I added ', COSTS OFF, SUMMARY OFF, TIMING OFF' this example fall > down into the SEGFAULT. > > I prepared quick fix for this problem (see patch in attachment). Maybe > it is'nt a best solution but it can speedup search of it. @@ -7015,6 +7015,21 @@ process_pending_request(AsyncRequest *areq) fetch_more_data(node); + /* + * If the request are made by another append we will only prepare connection + * for the next query and don't take a tuple immediately. It is needed to + * prevent possible recursion into a qual subplan. + */ + if (!fetch) + { + AppendState *node = (AppendState *) areq->requestor; + + ExecAsyncRequestDone(areq, NULL); + node->as_needrequest = bms_add_member(node->as_needrequest, + areq->request_index); + return; + } I don’t think this is a good idea, because it is pretty inconsistent, as doing ExecAsyncRequestDone(areq, NULL) means that there are no more tuples while changing as_needrequest like that means that there is at least one tuple to return. This would happen to work, but for example, if we add to the core more sanity checks on AsyncRequests, this would not work well anymore. I tried to devise a consistent solution for this issue, but I couldn’t. So I feel inclined to disable async execution in cases where async-capable nodes access to subplans (or initplans), for now. Best regards, Etsuro Fujita
Re: The case when AsyncAppend exists also in the qual of Async ForeignScan
От
"Andrey V. Lepikhov"
Дата:
On 7/22/21 4:14 PM, Etsuro Fujita wrote: > On Fri, Jul 2, 2021 at 10:24 PM Andrey Lepikhov > @@ -7015,6 +7015,21 @@ process_pending_request(AsyncRequest *areq) > > fetch_more_data(node); > > + /* > + * If the request are made by another append we will only prepare connection > + * for the next query and don't take a tuple immediately. It is needed to > + * prevent possible recursion into a qual subplan. > + */ > + if (!fetch) > + { > + AppendState *node = (AppendState *) areq->requestor; > + > + ExecAsyncRequestDone(areq, NULL); > + node->as_needrequest = bms_add_member(node->as_needrequest, > + areq->request_index); > + return; > + } > > I don’t think this is a good idea, because it is pretty inconsistent, > as doing ExecAsyncRequestDone(areq, NULL) means that there are no more > tuples while changing as_needrequest like that means that there is at > least one tuple to return. This would happen to work, but for > example, if we add to the core more sanity checks on AsyncRequests, > this would not work well anymore. I agree. > I tried to devise a consistent > solution for this issue, but I couldn’t. So I feel inclined to > disable async execution in cases where async-capable nodes access to > subplans (or initplans), for now. I think it can be done, but only as a temporary solution. InitPlan is a common planning utility. Maybe we can split async logic into: - receiving stage, when we only fetch and store tuples, - evaluating stage, when we form resulting tuple and return by a scan node. I will think about such solution more. Also, may be you tell your opinion about an additional optimization of Async Append [1]? [1] https://www.postgresql.org/message-id/edb1331c-e861-0c53-9fdb-f7ca7dfd884d%40postgrespro.ru -- regards, Andrey Lepikhov Postgres Professional
Re: The case when AsyncAppend exists also in the qual of Async ForeignScan
От
"Andrey V. Lepikhov"
Дата:
On 7/22/21 4:14 PM, Etsuro Fujita wrote: > On Fri, Jul 2, 2021 at 10:24 PM Andrey Lepikhov > @@ -7015,6 +7015,21 @@ process_pending_request(AsyncRequest *areq) > > fetch_more_data(node); > > + /* > + * If the request are made by another append we will only prepare connection > + * for the next query and don't take a tuple immediately. It is needed to > + * prevent possible recursion into a qual subplan. > + */ > + if (!fetch) > + { > + AppendState *node = (AppendState *) areq->requestor; > + > + ExecAsyncRequestDone(areq, NULL); > + node->as_needrequest = bms_add_member(node->as_needrequest, > + areq->request_index); > + return; > + } > > I don’t think this is a good idea, because it is pretty inconsistent, > as doing ExecAsyncRequestDone(areq, NULL) means that there are no more > tuples while changing as_needrequest like that means that there is at > least one tuple to return. This would happen to work, but for > example, if we add to the core more sanity checks on AsyncRequests, > this would not work well anymore. I tried to devise a consistent > solution for this issue, but I couldn’t. So I feel inclined to > disable async execution in cases where async-capable nodes access to > subplans (or initplans), for now. That do you think, if (in addition to the patch) we will give ExecAsyncAppendResponse a chance to fill the result slot? Code: --- a/src/backend/executor/nodeAppend.c +++ b/src/backend/executor/nodeAppend.c @@ -1108,6 +1108,9 @@ ExecAsyncAppendResponse(AsyncRequest *areq) return; } + if (areq->result == NULL) + areq->result = areq->requestee->ExecProcNodeReal(areq->requestee); + /* If the result is NULL or an empty slot, there's nothing more to do. */ if (TupIsNull(slot)) { -- regards, Andrey Lepikhov Postgres Professional
On Fri, Jul 23, 2021 at 3:09 PM Andrey V. Lepikhov <a.lepikhov@postgrespro.ru> wrote: > On 7/22/21 4:14 PM, Etsuro Fujita wrote: > > On Fri, Jul 2, 2021 at 10:24 PM Andrey Lepikhov > > @@ -7015,6 +7015,21 @@ process_pending_request(AsyncRequest *areq) > > > > fetch_more_data(node); > > > > + /* > > + * If the request are made by another append we will only prepare connection > > + * for the next query and don't take a tuple immediately. It is needed to > > + * prevent possible recursion into a qual subplan. > > + */ > > + if (!fetch) > > + { > > + AppendState *node = (AppendState *) areq->requestor; > > + > > + ExecAsyncRequestDone(areq, NULL); > > + node->as_needrequest = bms_add_member(node->as_needrequest, > > + areq->request_index); > > + return; > > + } > > > > I don’t think this is a good idea, because it is pretty inconsistent, > > as doing ExecAsyncRequestDone(areq, NULL) means that there are no more > > tuples while changing as_needrequest like that means that there is at > > least one tuple to return. This would happen to work, but for > > example, if we add to the core more sanity checks on AsyncRequests, > > this would not work well anymore. > > So I feel inclined to > > disable async execution in cases where async-capable nodes access to > > subplans (or initplans), for now. > I think it can be done, but only as a temporary solution. My concern about that is that such an inconsistency would make the code complicated, and thus make the maintenance hard. > Maybe we can split async logic into: > - receiving stage, when we only fetch and store tuples, > - evaluating stage, when we form resulting tuple and return by a scan node. > I will think about such solution more. One simple solution along this line I came up with, which is not the rewrite, is to 1) split process_pending_request() into the two steps, and 2) postpone the second step until we are called from postgresForeignAsyncConfigureWait(), like the attached, which I think would be much consistent with the existing logic. > Also, may be you tell your opinion about an additional optimization of > Async Append [1]? Is the optimization related to this issue? (Sorry, I didn’t have time for reviewing the patch in [1] than expected. I plan to do so next month.) Best regards, Etsuro Fujita
Вложения
Re: The case when AsyncAppend exists also in the qual of Async ForeignScan
От
"Andrey V. Lepikhov"
Дата:
>> I think it can be done, but only as a temporary solution. > My concern about that is that such an inconsistency would make the > code complicated, and thus make the maintenance hard. Agree, but your new patch is quite understandable. >> Maybe we can split async logic into: >> - receiving stage, when we only fetch and store tuples, >> - evaluating stage, when we form resulting tuple and return by a scan >> node. >> I will think about such solution more. > One simple solution along this line I came up with, which is not the > rewrite, is to 1) split process_pending_request() into the two steps, > and 2) postpone the second step until we are called from > postgresForeignAsyncConfigureWait(), like the attached, which I think > would be much consistent with the existing logic. Good idea. Are you planning to commit this patch? >> Also, may be you tell your opinion about an additional optimization >> of Async Append [1]? > Is the optimization related to this issue? (Sorry, I didn’t have time > for reviewing the patch in [1] than expected. I plan to do so next > month.) This optimization tries to postpone choice of async subplans. It allows us to make a decision on async capable subplans after all plan flattening operations. -- regards, Andrey Lepikhov Postgres Professional
On Tue, Jul 27, 2021 at 3:22 PM Andrey V. Lepikhov <a.lepikhov@postgrespro.ru> wrote: > >> Maybe we can split async logic into: > >> - receiving stage, when we only fetch and store tuples, > >> - evaluating stage, when we form resulting tuple and return by a > >> scan node. > >> I will think about such solution more. > > One simple solution along this line I came up with, which is not the > > rewrite, is to 1) split process_pending_request() into the two steps, > > and 2) postpone the second step until we are called from > > postgresForeignAsyncConfigureWait(), like the attached, which I think > > would be much consistent with the existing logic. > Good idea. Are you planning to commit this patch? Cool! Here is an updated version of the patch, in which I added/tweaked comments a bit further. I'm planning to push this version if there are no objections from others. > >> Also, may be you tell your opinion about an additional optimization > >> of Async Append [1]? > > Is the optimization related to this issue? > This optimization tries to postpone choice of async subplans. It allows > us to make a decision on async capable subplans after all plan > flattening operations. Thanks for the explanation! My understanding is that the optimization isn’t related to this issue. Right? Best regards, Etsuro Fujita
Вложения
Re: The case when AsyncAppend exists also in the qual of Async ForeignScan
От
"Andrey V. Lepikhov"
Дата:
On 7/27/21 3:50 PM, Etsuro Fujita wrote: > On Tue, Jul 27, 2021 at 3:22 PM Andrey V. Lepikhov > <a.lepikhov@postgrespro.ru> wrote: >> >> Maybe we can split async logic into: >> >> - receiving stage, when we only fetch and store tuples, >> >> - evaluating stage, when we form resulting tuple and return by a >> >> scan node. >> >> I will think about such solution more. > >> > One simple solution along this line I came up with, which is not the >> > rewrite, is to 1) split process_pending_request() into the two steps, >> > and 2) postpone the second step until we are called from >> > postgresForeignAsyncConfigureWait(), like the attached, which I think >> > would be much consistent with the existing logic. > >> Good idea. Are you planning to commit this patch? > > Cool! > > Here is an updated version of the patch, in which I added/tweaked > comments a bit further. I'm planning to push this version if there > are no objections from others. Thanks. In addition, I will use this patch in TPC-H benchmark tomorrow. > >> >> Also, may be you tell your opinion about an additional optimization >> >> of Async Append [1]? > >> > Is the optimization related to this issue? > >> This optimization tries to postpone choice of async subplans. It allows >> us to make a decision on async capable subplans after all plan >> flattening operations. > > Thanks for the explanation! My understanding is that the optimization > isn’t related to this issue. Right? Right -- regards, Andrey Lepikhov Postgres Professional
On Tue, Jul 27, 2021 at 8:01 PM Andrey V. Lepikhov <a.lepikhov@postgrespro.ru> wrote: > On 7/27/21 3:50 PM, Etsuro Fujita wrote: > > On Tue, Jul 27, 2021 at 3:22 PM Andrey V. Lepikhov > > <a.lepikhov@postgrespro.ru> wrote: > >> >> Also, may be you tell your opinion about an additional optimization > >> >> of Async Append [1]? > > > >> > Is the optimization related to this issue? > > > >> This optimization tries to postpone choice of async subplans. It allows > >> us to make a decision on async capable subplans after all plan > >> flattening operations. > > > > Thanks for the explanation! My understanding is that the optimization > > isn’t related to this issue. Right? > Right OK Best regards, Etsuro Fujita
Re: The case when AsyncAppend exists also in the qual of Async ForeignScan
От
"Andrey V. Lepikhov"
Дата:
On 7/27/21 3:50 PM, Etsuro Fujita wrote: > On Tue, Jul 27, 2021 at 3:22 PM Andrey V. Lepikhov > <a.lepikhov@postgrespro.ru> wrote: > Here is an updated version of the patch, in which I added/tweaked > comments a bit further. I'm planning to push this version if there > are no objections from others. I used this patch in TPC-H benchmark (that caused initial error) and it finished without problems. Queries and their explains are attached. You could look for situations when async append was not used. -- regards, Andrey Lepikhov Postgres Professional
Вложения
On Wed, Jul 28, 2021 at 3:39 PM Andrey V. Lepikhov <a.lepikhov@postgrespro.ru> wrote: > On 7/27/21 3:50 PM, Etsuro Fujita wrote: > > On Tue, Jul 27, 2021 at 3:22 PM Andrey V. Lepikhov > > <a.lepikhov@postgrespro.ru> wrote: > > Here is an updated version of the patch, in which I added/tweaked > > comments a bit further. I'm planning to push this version if there > > are no objections from others. > I used this patch in TPC-H benchmark (that caused initial error) and it > finished without problems. Good to know! > Queries and their explains are attached. You could look for situations > when async append was not used. Interesting! Yeah, async exection doesn't apply to Merge Append, so let's improve to support async-aware Merge Append. Thanks for the testing! Best regards, Etsuro Fujita
On Tue, Jul 27, 2021 at 7:50 PM Etsuro Fujita <etsuro.fujita@gmail.com> wrote: > Here is an updated version of the patch, in which I added/tweaked > comments a bit further. I'm planning to push this version if there > are no objections from others. Done. Best regards, Etsuro Fujita
On Fri, Jul 30, 2021 at 5:11 PM Etsuro Fujita <etsuro.fujita@gmail.com> wrote: > On Tue, Jul 27, 2021 at 7:50 PM Etsuro Fujita <etsuro.fujita@gmail.com> wrote: > > Here is an updated version of the patch, in which I added/tweaked > > comments a bit further. I'm planning to push this version if there > > are no objections from others. > > Done. I noticed that some buildfarm members cause this assertion failure: TRAP: FailedAssertion("fsstate->conn_state->pendingAreq == areq", File: "postgres_fdw.c", Line: 6900, PID: 25840) I will look into this. Best regards, Etsuro Fujita