Asynchronous MergeAppend

Поиск
Список
Период
Сортировка
От Alexander Pyhalov
Тема Asynchronous MergeAppend
Дата
Msg-id 59be194c5a409fb9fc9f2031581b8a44@postgrespro.ru
обсуждение исходный текст
Список pgsql-hackers
Hello.

I'd like to make MergeAppend node Async-capable like Append node. 
Nowadays when planner chooses MergeAppend plan, asynchronous execution 
is not possible. With attached patches you can see plans like

EXPLAIN (VERBOSE, COSTS OFF)
SELECT * FROM async_pt WHERE b % 100 = 0 ORDER BY b, a;
                                                           QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------
  Merge Append
    Sort Key: async_pt.b, async_pt.a
    ->  Async Foreign Scan on public.async_p1 async_pt_1
          Output: async_pt_1.a, async_pt_1.b, async_pt_1.c
          Remote SQL: SELECT a, b, c FROM public.base_tbl1 WHERE (((b % 
100) = 0)) ORDER BY b ASC NULLS LAST, a ASC NULLS LAST
    ->  Async Foreign Scan on public.async_p2 async_pt_2
          Output: async_pt_2.a, async_pt_2.b, async_pt_2.c
          Remote SQL: SELECT a, b, c FROM public.base_tbl2 WHERE (((b % 
100) = 0)) ORDER BY b ASC NULLS LAST, a ASC NULLS LAST

This can be quite profitable (in our test cases you can gain up to two 
times better speed with MergeAppend async execution on remote servers).

Code for asynchronous execution in Merge Append was mostly borrowed from 
Append node.

What significantly differs - in ExecMergeAppendAsyncGetNext() you must 
return tuple from the specified slot.
Subplan number determines tuple slot where data should be retrieved to. 
When subplan is ready to provide some data,
it's cached in ms_asyncresults. When we get tuple for subplan, specified 
in ExecMergeAppendAsyncGetNext(),
ExecMergeAppendAsyncRequest() returns true and loop in 
ExecMergeAppendAsyncGetNext() ends. We can fetch data for
subplans which either don't have cached result ready or have already 
returned them to the upper node. This
flag is stored in ms_has_asyncresults. As we can get data for some 
subplan either earlier or after loop in ExecMergeAppendAsyncRequest(),
we check this flag twice in this function.
Unlike ExecAppendAsyncEventWait(), it seems 
ExecMergeAppendAsyncEventWait() doesn't need a timeout - as there's no 
need to get result
from synchronous subplan if a tuple form async one was explicitly 
requested.

Also we had to fix postgres_fdw to avoid directly looking at Append 
fields. Perhaps, accesors to Append fields look strange, but allows
to avoid some code duplication. I suppose, duplication could be even 
less if we reworked async Append implementation, but so far I haven't
tried to do this to avoid big diff from master.

Also mark_async_capable() believes that path corresponds to plan. This 
can be not true when create_[merge_]append_plan() inserts sort node.
In this case mark_async_capable() can treat Sort plan node as some other 
and crash, so there's a small fix for this.
-- 
Best regards,
Alexander Pyhalov,
Postgres Professional
Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: torikoshia
Дата:
Сообщение: Re: Add new COPY option REJECT_LIMIT
Следующее
От: Ashutosh Sharma
Дата:
Сообщение: Re: Addressing SECURITY DEFINER Function Vulnerabilities in PostgreSQL Extensions