Обсуждение: Asynchronous MergeAppend

Поиск
Список
Период
Сортировка

Asynchronous MergeAppend

От
Alexander Pyhalov
Дата:
Hello.

I'd like to make MergeAppend node Async-capable like Append node. 
Nowadays when planner chooses MergeAppend plan, asynchronous execution 
is not possible. With attached patches you can see plans like

EXPLAIN (VERBOSE, COSTS OFF)
SELECT * FROM async_pt WHERE b % 100 = 0 ORDER BY b, a;
                                                           QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------
  Merge Append
    Sort Key: async_pt.b, async_pt.a
    ->  Async Foreign Scan on public.async_p1 async_pt_1
          Output: async_pt_1.a, async_pt_1.b, async_pt_1.c
          Remote SQL: SELECT a, b, c FROM public.base_tbl1 WHERE (((b % 
100) = 0)) ORDER BY b ASC NULLS LAST, a ASC NULLS LAST
    ->  Async Foreign Scan on public.async_p2 async_pt_2
          Output: async_pt_2.a, async_pt_2.b, async_pt_2.c
          Remote SQL: SELECT a, b, c FROM public.base_tbl2 WHERE (((b % 
100) = 0)) ORDER BY b ASC NULLS LAST, a ASC NULLS LAST

This can be quite profitable (in our test cases you can gain up to two 
times better speed with MergeAppend async execution on remote servers).

Code for asynchronous execution in Merge Append was mostly borrowed from 
Append node.

What significantly differs - in ExecMergeAppendAsyncGetNext() you must 
return tuple from the specified slot.
Subplan number determines tuple slot where data should be retrieved to. 
When subplan is ready to provide some data,
it's cached in ms_asyncresults. When we get tuple for subplan, specified 
in ExecMergeAppendAsyncGetNext(),
ExecMergeAppendAsyncRequest() returns true and loop in 
ExecMergeAppendAsyncGetNext() ends. We can fetch data for
subplans which either don't have cached result ready or have already 
returned them to the upper node. This
flag is stored in ms_has_asyncresults. As we can get data for some 
subplan either earlier or after loop in ExecMergeAppendAsyncRequest(),
we check this flag twice in this function.
Unlike ExecAppendAsyncEventWait(), it seems 
ExecMergeAppendAsyncEventWait() doesn't need a timeout - as there's no 
need to get result
from synchronous subplan if a tuple form async one was explicitly 
requested.

Also we had to fix postgres_fdw to avoid directly looking at Append 
fields. Perhaps, accesors to Append fields look strange, but allows
to avoid some code duplication. I suppose, duplication could be even 
less if we reworked async Append implementation, but so far I haven't
tried to do this to avoid big diff from master.

Also mark_async_capable() believes that path corresponds to plan. This 
can be not true when create_[merge_]append_plan() inserts sort node.
In this case mark_async_capable() can treat Sort plan node as some other 
and crash, so there's a small fix for this.
-- 
Best regards,
Alexander Pyhalov,
Postgres Professional
Вложения

Re: Asynchronous MergeAppend

От
Alena Rybakina
Дата:
Hi! Thank you for your work on this subject! I think this is a very 
useful optimization)

While looking through your code, I noticed some points that I think 
should be taken into account. Firstly, I noticed only two tests to 
verify the functionality of this function and I think that this is not 
enough.
Are you thinking about adding some tests with queries involving, for 
example, join connections with different tables and unusual operators?

In addition, I have a question about testing your feature on a 
benchmark. Are you going to do this?

On 17.07.2024 16:24, Alexander Pyhalov wrote:
> Hello.
>
> I'd like to make MergeAppend node Async-capable like Append node. 
> Nowadays when planner chooses MergeAppend plan, asynchronous execution 
> is not possible. With attached patches you can see plans like
>
> EXPLAIN (VERBOSE, COSTS OFF)
> SELECT * FROM async_pt WHERE b % 100 = 0 ORDER BY b, a;
>                                                           QUERY PLAN
>
------------------------------------------------------------------------------------------------------------------------------

>
>  Merge Append
>    Sort Key: async_pt.b, async_pt.a
>    ->  Async Foreign Scan on public.async_p1 async_pt_1
>          Output: async_pt_1.a, async_pt_1.b, async_pt_1.c
>          Remote SQL: SELECT a, b, c FROM public.base_tbl1 WHERE (((b % 
> 100) = 0)) ORDER BY b ASC NULLS LAST, a ASC NULLS LAST
>    ->  Async Foreign Scan on public.async_p2 async_pt_2
>          Output: async_pt_2.a, async_pt_2.b, async_pt_2.c
>          Remote SQL: SELECT a, b, c FROM public.base_tbl2 WHERE (((b % 
> 100) = 0)) ORDER BY b ASC NULLS LAST, a ASC NULLS LAST
>
> This can be quite profitable (in our test cases you can gain up to two 
> times better speed with MergeAppend async execution on remote servers).
>
> Code for asynchronous execution in Merge Append was mostly borrowed 
> from Append node.
>
> What significantly differs - in ExecMergeAppendAsyncGetNext() you must 
> return tuple from the specified slot.
> Subplan number determines tuple slot where data should be retrieved 
> to. When subplan is ready to provide some data,
> it's cached in ms_asyncresults. When we get tuple for subplan, 
> specified in ExecMergeAppendAsyncGetNext(),
> ExecMergeAppendAsyncRequest() returns true and loop in 
> ExecMergeAppendAsyncGetNext() ends. We can fetch data for
> subplans which either don't have cached result ready or have already 
> returned them to the upper node. This
> flag is stored in ms_has_asyncresults. As we can get data for some 
> subplan either earlier or after loop in ExecMergeAppendAsyncRequest(),
> we check this flag twice in this function.
> Unlike ExecAppendAsyncEventWait(), it seems 
> ExecMergeAppendAsyncEventWait() doesn't need a timeout - as there's no 
> need to get result
> from synchronous subplan if a tuple form async one was explicitly 
> requested.
>
> Also we had to fix postgres_fdw to avoid directly looking at Append 
> fields. Perhaps, accesors to Append fields look strange, but allows
> to avoid some code duplication. I suppose, duplication could be even 
> less if we reworked async Append implementation, but so far I haven't
> tried to do this to avoid big diff from master.
>
> Also mark_async_capable() believes that path corresponds to plan. This 
> can be not true when create_[merge_]append_plan() inserts sort node.
> In this case mark_async_capable() can treat Sort plan node as some 
> other and crash, so there's a small fix for this.

I think you should add this explanation to the commit message because 
without it it's hard to understand the full picture of how your code works.

-- 
Regards,
Alena Rybakina
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: Asynchronous MergeAppend

От
Alexander Pyhalov
Дата:
Hi.

I've updated patches for asynchronous merge append. They allowed us to 
significantly improve performance in practice. Earlier select from 
partitioned (and distributed table) could switch to synchronous merge 
append plan from asynchronous append. Given that table could have 20+ 
partitions, it was cheaper, but much less efficient due to remote parts 
executing synchronously.

In this version there's a couple of small fixes - earlier 
ExecMergeAppend() scanned all asyncplans, but should do this only for 
valid asyncplans. Also incorporated logic from

commit af717317a04f5217728ce296edf4a581eb7e6ea0
Author: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date:   Wed Mar 12 20:53:09 2025 +0200

     Handle interrupts while waiting on Append's async subplans

into ExecMergeAppendAsyncEventWait().

-- 
Best regards,
Alexander Pyhalov,
Postgres Professional
Вложения

Re: Asynchronous MergeAppend

От
Álvaro Herrera
Дата:
I noticed that this patch has gone largely unreviewed, but it needs
rebase due to the GUC changes, so here it is again.

Thanks

-- 
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/

Вложения