Обсуждение: Asynchronous execution on FDW

Поиск
Список
Период
Сортировка

Asynchronous execution on FDW

От
Kyotaro HORIGUCHI
Дата:
Hello. This is the new version of FDW async exection feature.

The status of this feature is as follows, as of the last commitfest.

- Async execution is valuable to have.
- But do the first kick in ExecInit phase is wrong.

So the design outline of this version is as following,

- The patch set consists of three parts. The fist is the infrastracture in core-side, second is the code to enable
asynchronousexecution of Postgres-FDW. The third part is the alternative set of three methods to adapt fetch size,
whichmakes asynchronous execution more effective.
 

- It was a problem when to give the first kick for async exec. It is not in ExecInit phase, and ExecProc phase does not
fit,too. An extra phase ExecPreProc or something is too invasive. So I tried "pre-exec callback".
 
 Any init-node can register callbacks on their turn, then the registerd callbacks are called just before ExecProc phase
inexecutor. The first patch adds functions and structs to enable this.
 

- The second part is not changed from the previous version. Add PgFdwConn as a extended PgConn which have some members
tosupport asynchronous execution.
 
 The asynchronous execution is kicked only for the first ForeignScan node on the same foreign server. And the state
lastsuntil the next scan comes. This behavior is mainly controlled in fetch_more_data(). The behavior limits the number
ofsimultaneous exection for one foreign server to 1. This behavior is decided from the reason that no reasonable method
tolimit multiplicity of execution on *single peer* was found so far.
 

- The third part is three kind of trials of adaptive fetch size feature.
  The first one is duration-based adaptation. The patch  increases the fetch size by every FETCH execution but try to
keepthe duration of every FETCH below 500 ms. But it is not  promising because it looks very unstable, or the behavior
is nearly unforeseeable..
 
  The second one is based on byte-based FETCH feature. This  patch adds to FETCH command an argument to limit the
numberof  bytes (octets) to send. But this might be a over-exposure of  the internals. The size is counted based on
internal representation of a tuple and the client is needed to send the  overhead of its internal tuple representation
inbytes. This  is effective but quite ugly..
 
  The third is the most simple and straight-forward way, that  is, adds a foreign table option to specify the
fetch_size.The  effect of this is also in doubt since the size of tuples for  one foreign table would vary according to
thereturn-columns  list. But it is foreseeable for users and is a necessary knob  for those who want to tune it.
Foreignserver also could have  the same option as the default for that for foreign tables but  this patch have not
addedit.
 


The attached patches are the following,

- 0001-Add-infrastructure-of-pre-execution-callbacks.patch Infrastructure of pre-execution callback

- 0002-Allow-asynchronous-remote-query-of-postgres_fdw.patch FDW asynchronous execution feature

- 0003a-Add-experimental-POC-adaptive-fetch-size-feature.patch Adaptive fetch size alternative 1: duration based
control

- 0003b-POC-Experimental-fetch_by_size-feature.patch Adaptive fetch size alternative 2: FETCH by size

- 0003c-Add-foreign-table-option-to-set-fetch-size.patch Adaptive fetch size alternative 3: Foreign table option.

regards,

Re: Asynchronous execution on FDW

От
Kyotaro HORIGUCHI
Дата:
Ouch! I mistakenly made two CF entries for this patch. Could
someone remove this entry for me?

https://commitfest.postgresql.org/5/290/

The correct entry is "/5/291/"

======
Hello. This is the new version of FDW async exection feature.

The status of this feature is as follows, as of the last commitfest.

- Async execution is valuable to have.
- But do the first kick in ExecInit phase is wrong.

So the design outline of this version is as following,

- The patch set consists of three parts. The fist is the infrastracture in core-side, second is the code to enable
asynchronousexecution of Postgres-FDW. The third part is the alternative set of three methods to adapt fetch size,
whichmakes asynchronous execution more effective.
 

- It was a problem when to give the first kick for async exec. It is not in ExecInit phase, and ExecProc phase does not
fit,too. An extra phase ExecPreProc or something is too invasive. So I tried "pre-exec callback".
 
 Any init-node can register callbacks on their turn, then the registerd callbacks are called just before ExecProc phase
inexecutor. The first patch adds functions and structs to enable this.
 

- The second part is not changed from the previous version. Add PgFdwConn as a extended PgConn which have some members
tosupport asynchronous execution.
 
 The asynchronous execution is kicked only for the first ForeignScan node on the same foreign server. And the state
lastsuntil the next scan comes. This behavior is mainly controlled in fetch_more_data(). The behavior limits the number
ofsimultaneous exection for one foreign server to 1. This behavior is decided from the reason that no reasonable method
tolimit multiplicity of execution on *single peer* was found so far.
 

- The third part is three kind of trials of adaptive fetch size feature.
  The first one is duration-based adaptation. The patch  increases the fetch size by every FETCH execution but try to
keepthe duration of every FETCH below 500 ms. But it is not  promising because it looks very unstable, or the behavior
is nearly unforeseeable..
 
  The second one is based on byte-based FETCH feature. This  patch adds to FETCH command an argument to limit the
numberof  bytes (octets) to send. But this might be a over-exposure of  the internals. The size is counted based on
internal representation of a tuple and the client is needed to send the  overhead of its internal tuple representation
inbytes. This  is effective but quite ugly..
 
  The third is the most simple and straight-forward way, that  is, adds a foreign table option to specify the
fetch_size.The  effect of this is also in doubt since the size of tuples for  one foreign table would vary according to
thereturn-columns  list. But it is foreseeable for users and is a necessary knob  for those who want to tune it.
Foreignserver also could have  the same option as the default for that for foreign tables but  this patch have not
addedit.
 


The attached patches are the following,

- 0001-Add-infrastructure-of-pre-execution-callbacks.patch Infrastructure of pre-execution callback

- 0002-Allow-asynchronous-remote-query-of-postgres_fdw.patch FDW asynchronous execution feature

- 0003a-Add-experimental-POC-adaptive-fetch-size-feature.patch Adaptive fetch size alternative 1: duration based
control

- 0003b-POC-Experimental-fetch_by_size-feature.patch Adaptive fetch size alternative 2: FETCH by size

- 0003c-Add-foreign-table-option-to-set-fetch-size.patch Adaptive fetch size alternative 3: Foreign table option.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: Asynchronous execution on FDW

От
Michael Paquier
Дата:
On Thu, Jul 2, 2015 at 3:07 PM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:
> Ouch! I mistakenly made two CF entries for this patch. Could
> someone remove this entry for me?
>
> https://commitfest.postgresql.org/5/290/
>
> The correct entry is "/5/291/"

Done.
-- 
Michael



Re: Asynchronous execution on FDW

От
Kyotaro HORIGUCHI
Дата:
Thank you.

At Thu, 2 Jul 2015 16:02:27 +0900, Michael Paquier <michael.paquier@gmail.com> wrote in
<CAB7nPqTs0YCwXedt1P=JjxFJeoj9UzLzkLuiX8=JdtPYUtNwwg@mail.gmail.com>
> On Thu, Jul 2, 2015 at 3:07 PM, Kyotaro HORIGUCHI
> <horiguchi.kyotaro@lab.ntt.co.jp> wrote:
> > Ouch! I mistakenly made two CF entries for this patch. Could
> > someone remove this entry for me?
> >
> > https://commitfest.postgresql.org/5/290/
> >
> > The correct entry is "/5/291/"
> 
> Done.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: Asynchronous execution on FDW

От
Heikki Linnakangas
Дата:
On 07/02/2015 08:48 AM, Kyotaro HORIGUCHI wrote:
> - It was a problem when to give the first kick for async exec. It
>    is not in ExecInit phase, and ExecProc phase does not fit,
>    too. An extra phase ExecPreProc or something is too
>    invasive. So I tried "pre-exec callback".
>
>    Any init-node can register callbacks on their turn, then the
>    registerd callbacks are called just before ExecProc phase in
>    executor. The first patch adds functions and structs to enable
>    this.

At a quick glance, I think this has all the same problems as starting 
the execution at ExecInit phase. The correct way to do this is to kick 
off the queries in the first IterateForeignScan() call. You said that 
"ExecProc phase does not fit" - why not?

- Heikki




Re: Asynchronous execution on FDW

От
Kyotaro HORIGUCHI
Дата:
Hello, thank you for looking this.

If it is acceptable to reconstruct the executor nodes to have
additional return state PREP_RUN or such (which means it needs
one more call for the first tuple) , I'll modify the whole
executor to handle the state in the next patch to do so.

I haven't take the advice I had so far in this sense. But I came
to think that it is the most reasonable way to solve this.


======
> > - It was a problem when to give the first kick for async exec. It
> >    is not in ExecInit phase, and ExecProc phase does not fit,
> >    too. An extra phase ExecPreProc or something is too
> >    invasive. So I tried "pre-exec callback".
> >
> >    Any init-node can register callbacks on their turn, then the
> >    registerd callbacks are called just before ExecProc phase in
> >    executor. The first patch adds functions and structs to enable
> >    this.
> 
> At a quick glance, I think this has all the same problems as starting
> the execution at ExecInit phase. The correct way to do this is to kick
> off the queries in the first IterateForeignScan() call. You said that
> "ExecProc phase does not fit" - why not?

Execution nodes are expected to return the first tuple if
available. But asynchronous execution can not return the first
tuple immediately. Simultaneous execution for the first tuple on
every foreign node is crucial than asynchronous fetching for many
cases, especially for the cases like sort/agg pushdown on FDW.

The reason why ExecProc does not fit is that the first loop
without returning tuple looks impact too large portion in
executor.

It is my mistake that it doesn't address the problem about
parameterized paths. Parameterized paths should be executed
within ExecProc loops so this patch would be like following.

- To gain the advantage of kicking execution before the first ExecProc loop, non-parameterized paths are started using
thecallback feature this patch provides.
 

- Parameterized paths need the upper nodes executed before it starts execution so they should be start in ExecProc
loop,but runs asynchronously if possible.
 

This is rather a makeshift solution for the problem, but
considering current trend of parallelism, it might the time to
make the executor to fit parallel execution.

If it is acceptable to reconstruct the executor nodes to have
additional return state PREP_RUN or such (which means it needs
one more call for the first tuple) , I'll modify the whole
executor to handle the state in the next patch to do so.

I hate my stupidity if you suggested this kind of solution by "do
it in ExecProc":(

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: Asynchronous execution on FDW

От
Kyotaro HORIGUCHI
Дата:
Hello, This is the new version of this patch.

At Tue, 07 Jul 2015 10:19:35 +0900, Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote
> This is rather a makeshift solution for the problem, but
> considering current trend of parallelism, it might the time to
> make the executor to fit parallel execution.
> 
> If it is acceptable to reconstruct the executor nodes to have
> additional return state PREP_RUN or such (which means it needs
> one more call for the first tuple) , I'll modify the whole
> executor to handle the state in the next patch to do so.

I made a patchset to do this. The details of it and some examples
are shown after the summary below.

- I provided an infrastructure for asynchronous (simultaneous) execution of multiple execnodes belonging one node, like
joins.

- It (should) have addressed the "parameterized plan" problem.

- The infrastructure is a bit intrusive but simple, and it will be usable by any nodes that supports asynchronous
execution(none so far except fdw, needs some modification in core). So the async exec for Postgres-FDW now became an
exapmlefor the infrastructure. It might be nice to start backend worker for promising async resuest for a sort node.
 

- The postgres_fdw part is almost the same as the previous one.


The detailed explanation of the patchset follows.

============

I made a patchset to do this. It consists of five patches (plus
one for debug message).

1. Add infrastructure for run-state of executor node.
Currently executor nodes have binary run-states, one is!TupIsNull(slot) which indicates that the next tuple may
comefromthe node, and the another is TupIsNull(slot) whichindicates that no more tuple will be come.
 
This patch expands it to four-state and have the value inPlanState struct.
  Inited : it is just after initialized.
  Started: it is startd execution but no tuple retrieved. This           could be skipped.
  Running: it is returning tuples.
  Done   : it has no more tuple to return. This is equivalent to           TupIsNull(slot).
The nodes Group, ModifyTable, SetOp and WindowAgg had their ownstate flag replaceable by the new states in their own
*Statepartso they are moved to this new state set in this patch. Thispatch does not change the current behavior.
 


2. Change all tuple-returning execnodes to maintain the new  run-state appropriately.
The rest nodes are modified by this patch to maintain the stateto be consistent with the TupIsNull() state at the
ExecProcNodelevel. This patch does not change the current behavior, too. (Ifeel that the state Done would be no other
thanan encumbrancein maintenance. The state is not referred in nowhere)
 

3. Add a feature to start node asynchronously.
All nodes that have more than one child node can execute thechildren asynchronously by this patch. It tries start
childrenasynchronouslyif the state is "Inited" when entering Exec*functions. Async request for nodes which has just one
childissimply propagated to the child, and leaf nodes such as scanswill decide whether to be async or not. Currently no
leafnodecan be async except postgres_fdw.
 
NestLoop may run parameterized plan so it is specially treatedin StartNestLoop so that parameterized plans will not
beasynchronouslystarted.
 
In StartHashJoin, whether the inner (hash) node is executed ornot is judged by the similar logic with ExecHashJoin.
Even after this patch applied, no leaf node can startasynchronously so the behavior of the executor still beunchanged.


4. Add StartForeignScan to FdwRoutine
Add new entry function to accept the asynchronous executionrequest from the core.


5. Allow asynchronous remote query of postgres_fdw.
This is almost the same as the previous version. Except that itruns on the new infrastructure, and added new
server/foreigntableoption allow_async.
 
The first foreign scan on the same server will be asynchronouslystarted execution if requested. And apart from the
asyncstart,every successive fetches for the same foreign scan will beasynchronously fetched.
 

Currently there's no means to observe what it is doing from
outside, so the additional sixth patch is to output debug
messages about asynchronous execution.

However, currently it is no test code for that but I'm at a loss
what to do as the test..


FWIW I provided two exaples of running asynchronous exexution.

regards,


===== Example
CREATE SERVER sv1 FOREIGN DATA WRAPPER postgres_fdw OPTIONS (host 'localhost', dbname 'postgres');
CREATE SERVER sv2 FOREIGN DATA WRAPPER postgres_fdw OPTIONS (host 'localhost', dbname 'postgres');
CREATE USER MAPPING FOR CURRENT_USER SERVER sv1;
CREATE USER MAPPING FOR CURRENT_USER SERVER sv2;
CREATE TABLE lp (a int, b int);
CREATE TABLE lt1 () INHERITS (lp);
CREATE TABLE lt2 () INHERITS (lp);
CREATE TABLE lt3 () INHERITS (lp);
CREATE TABLE lt4 () INHERITS (lp);
CREATE TABLE fp (LIKE lp);
CREATE FOREIGN TABLE ft1 () INHERITS (fp) SERVER sv1 OPTIONS (table_name 'lt1');
CREATE FOREIGN TABLE ft2 () INHERITS (fp) SERVER sv1 OPTIONS (table_name 'lt1');
CREATE FOREIGN TABLE ft3 () INHERITS (fp) SERVER sv2 OPTIONS (table_name 'lt1');
CREATE FOREIGN TABLE ft4 () INHERITS (fp) SERVER sv2 OPTIONS (table_name 'lt1');
INSERT INTO lt1 (SELECT a, a FROM generate_series(0, 999) a);
INSERT INTO lt2 (SELECT a+1000, a FROM generate_series(0, 999) a);
INSERT INTO lt3 (SELECT a+2000, a FROM generate_series(0, 999) a);
INSERT INTO lt4 (SELECT a+3000, a FROM generate_series(0, 999) a);

;; TEST FOR SIMPLE APPEND
=# SELECT * FROM fp;
1  LOG:  pg_fdw: [ft1/sv1/0x293a580] Async exec started.
2  LOG:  pg_fdw: [ft2/sv1/0x293a580] Async exec denied.
3  LOG:  pg_fdw: [ft3/sv2/0x2898c70] Async exec started.
4  LOG:  pg_fdw: [ft4/sv2/0x2898c70] Async exec denied.
5  LOG:  pg_fdw: [ft1/sv1/0x293a580] Async fetch  ....
6  LOG:  pg_fdw: [ft1/sv1/0x293a580] Async fetch
7  LOG:  pg_fdw: [ft2/sv1/0x293a580] Sync fetch.
8  LOG:  pg_fdw: [ft2/sv1/0x293a580] Async fetch  ...
9  LOG:  pg_fdw: [ft2/sv1/0x293a580] Async fetch
10 LOG:  pg_fdw: [ft3/sv2/0x2898c70] Async fetch  ....
11 LOG:  pg_fdw: [ft3/sv2/0x2898c70] Async fetch
12 LOG:  pg_fdw: [ft4/sv2/0x2898c70] Sync fetch.
14 LOG:  pg_fdw: [ft4/sv2/0x2898c70] Async fetch  ...
15 LOG:  pg_fdw: [ft4/sv2/0x2898c70] Async fetch

;;  The notation inside the square bracket is
;;       <table name>/<server name>/<ponter of connection>.
;;
;;  1-4 foreign servers denied async for the second scan for each (ft2/ft4).
;;
;; At 7, reading different table from 6 made it sync fetch but
;;   the successive fetches afterward are async.
;;
;; ft2 and ft3 was on different server so 10 is async fetch for
;;   the query executed asynchronously at 3.
;;
;; At 12 the same thing to 7 occurred.


;; TEST FOR PARAMETERIZED NESTLOOP
=# SET enable_hashjoin TO false;
=# SET enable_mergejoin TO false;
=# SET enable_material TO false;
=# ALTER FOREIGN TABLE ft4 OPTIONS (ADD use_remote_estimate 'true');
=# SELECT ft4.a FROM ft1 JOIN ft4 ON ft1.b = ft4.b WHERE ft1.a BETWEEN 800 AND 1000;
1  LOG:  pg_fdw: [ft1/sv1/0x293a580] Async exec started.
2  LOG:  pg_fdw: [ft1/sv1/0x293a580] Async fetch
3  LOG:  pg_fdw: [ft4/sv2/0x2898c70] Sync fetch.
4  LOG:  pg_fdw: [ft4/sv2/0x2898c70] Sync fetch.  ...
5  LOG:  pg_fdw: [ft4/sv2/0x2898c70] Sync fetch.
6  LOG:  pg_fdw: [ft1/sv1/0x293a580] Async fetch
7  LOG:  pg_fdw: [ft4/sv2/0x2898c70] Sync fetch.  ...
8  LOG:  pg_fdw: [ft4/sv2/0x2898c70] Sync fetch.
9  LOG:  pg_fdw: [ft1/sv1/0x293a580] Async fetch

;; ft4 did not even try to async since the inner(ft4) is parameterized.
;; All fetches for inner(ft4) was executed synchronously.
;;
;; Meanwhile, ft1 was continuously reading asynchronously.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Asynchronous execution on FDW

От
Kyotaro HORIGUCHI
Дата:
Hi,

> Currently there's no means to observe what it is doing from
> outside, so the additional sixth patch is to output debug
> messages about asynchronous execution.

The sixth patch did not contain one message shown in the example.
Attached is the revised version.
Other patches are not changed.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Asynchronous execution on FDW

От
Robert Haas
Дата:
On Fri, Jul 3, 2015 at 4:41 PM, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> At a quick glance, I think this has all the same problems as starting the
> execution at ExecInit phase. The correct way to do this is to kick off the
> queries in the first IterateForeignScan() call. You said that "ExecProc
> phase does not fit" - why not?

What exactly are those problems?

I can think of these:

1. If the scan is parametrized, we probably can't do it for lack of
knowledge of what they will be.  This seems easy; just don't do it in
that case.
2. It's possible that we're down inside some subtree of the plan that
won't actually get executed.  This is trickier.

Consider this:

Append
-> Foreign Scan
-> Foreign Scan
-> Foreign Scan
<repeat 17 more times>

If we don't start each foreign scan until the first tuple is fetched,
we will not get any benefit here, because we won't fetch the first
tuple from query #2 until we finish reading the results of query #1.
If the result of the Append node will be needed in its entirety, we
really, really want to launch of those queries as early as possible.
OTOH, if there's a Limit node with a small limit on top of the Append
node, that could be quite wasteful.  We could decide not to care:
after all, if our limit is satisfied, we can just bang the remote
connections shut, and if they wasted some CPU, well, tough luck for
them.  But it would be nice to be smarter.  I'm not sure how, though.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Asynchronous execution on FDW

От
Kyotaro HORIGUCHI
Дата:
Hello, thank you for the comment.

At Fri, 17 Jul 2015 14:34:53 -0400, Robert Haas <robertmhaas@gmail.com> wrote in
<CA+TgmoaiJK1svzw_GkFU+zsSxciJKFELqu2AOMVUPhpSFw4BsQ@mail.gmail.com>
> On Fri, Jul 3, 2015 at 4:41 PM, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> > At a quick glance, I think this has all the same problems as starting the
> > execution at ExecInit phase. The correct way to do this is to kick off the
> > queries in the first IterateForeignScan() call. You said that "ExecProc
> > phase does not fit" - why not?
> 
> What exactly are those problems?
> 
> I can think of these:
> 
> 1. If the scan is parametrized, we probably can't do it for lack of
> knowledge of what they will be.  This seems easy; just don't do it in
> that case.

We can put an early kick to foreign scans only for the first shot
if we do it outside (before) ExecProc phase.

Nestloop
-> SeqScan
-> Append  -> Foreign (Index) Scan  -> Foreign (Index) Scan  ..

This plan premises precise (even to some extent) estimate for
remote query but async execution within ExecProc phase would be
in effect for this case.


> 2. It's possible that we're down inside some subtree of the plan that
> won't actually get executed.  This is trickier.

As for current postgres_fdw, it is done simply abandoning queued
result then close the cursor.

> Consider this:
> 
> Append
> -> Foreign Scan
> -> Foreign Scan
> -> Foreign Scan
> <repeat 17 more times>
> 
> If we don't start each foreign scan until the first tuple is fetched,
> we will not get any benefit here, because we won't fetch the first
> tuple from query #2 until we finish reading the results of query #1.
> If the result of the Append node will be needed in its entirety, we
> really, really want to launch of those queries as early as possible.
> OTOH, if there's a Limit node with a small limit on top of the Append
> node, that could be quite wasteful.

It's the nature of speculative execution, but the Limit will be
pushed down onto every Foreign Scans near future.

> We could decide not to care: after all, if our limit is
> satisfied, we can just bang the remote connections shut, and if
> they wasted some CPU, well, tough luck for them.  But it would
> be nice to be smarter.  I'm not sure how, though.

Appropriate fetch size will cap the harm and the case will be
handled as I mentioned above as for postgres_fdw.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: Asynchronous execution on FDW

От
Kouhei Kaigai
Дата:
> -----Original Message-----
> From: pgsql-hackers-owner@postgresql.org
> [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Kyotaro HORIGUCHI
> Sent: Wednesday, July 22, 2015 4:10 PM
> To: robertmhaas@gmail.com
> Cc: hlinnaka@iki.fi; pgsql-hackers@postgresql.org
> Subject: Re: [HACKERS] Asynchronous execution on FDW
>
> Hello, thank you for the comment.
>
> At Fri, 17 Jul 2015 14:34:53 -0400, Robert Haas <robertmhaas@gmail.com> wrote
> in <CA+TgmoaiJK1svzw_GkFU+zsSxciJKFELqu2AOMVUPhpSFw4BsQ@mail.gmail.com>
> > On Fri, Jul 3, 2015 at 4:41 PM, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> > > At a quick glance, I think this has all the same problems as starting the
> > > execution at ExecInit phase. The correct way to do this is to kick off the
> > > queries in the first IterateForeignScan() call. You said that "ExecProc
> > > phase does not fit" - why not?
> >
> > What exactly are those problems?
> >
> > I can think of these:
> >
> > 1. If the scan is parametrized, we probably can't do it for lack of
> > knowledge of what they will be.  This seems easy; just don't do it in
> > that case.
>
> We can put an early kick to foreign scans only for the first shot
> if we do it outside (before) ExecProc phase.
>
> Nestloop
> -> SeqScan
> -> Append
>    -> Foreign (Index) Scan
>    -> Foreign (Index) Scan
>    ..
>
> This plan premises precise (even to some extent) estimate for
> remote query but async execution within ExecProc phase would be
> in effect for this case.
>
>
> > 2. It's possible that we're down inside some subtree of the plan that
> > won't actually get executed.  This is trickier.
>
> As for current postgres_fdw, it is done simply abandoning queued
> result then close the cursor.
>
> > Consider this:
> >
> > Append
> > -> Foreign Scan
> > -> Foreign Scan
> > -> Foreign Scan
> > <repeat 17 more times>
> >
> > If we don't start each foreign scan until the first tuple is fetched,
> > we will not get any benefit here, because we won't fetch the first
> > tuple from query #2 until we finish reading the results of query #1.
> > If the result of the Append node will be needed in its entirety, we
> > really, really want to launch of those queries as early as possible.
> > OTOH, if there's a Limit node with a small limit on top of the Append
> > node, that could be quite wasteful.
>
> It's the nature of speculative execution, but the Limit will be
> pushed down onto every Foreign Scans near future.
>
> > We could decide not to care: after all, if our limit is
> > satisfied, we can just bang the remote connections shut, and if
> > they wasted some CPU, well, tough luck for them.  But it would
> > be nice to be smarter.  I'm not sure how, though.
>
> Appropriate fetch size will cap the harm and the case will be
> handled as I mentioned above as for postgres_fdw.
>
Horiguchi-san,

Let me ask an elemental question.

If we have ParallelAppend node that kicks a background worker process for
each underlying child node in parallel, does ForeignScan need to do something
special?

Expected waste of CPU or I/O is common problem to be solved, however, it does
not need to add a special case handling to ForeignScan, I think.
How about your opinion?

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>




Re: Asynchronous execution on FDW

От
Kyotaro HORIGUCHI
Дата:
Hello,

> Let me ask an elemental question.
> 
> If we have ParallelAppend node that kicks a background worker process for
> each underlying child node in parallel, does ForeignScan need to do something
> special?

Although I don't see the point of the background worker in your
story but at least for ParalleMergeAppend, it would frequently
discontinues to scan by upper Limit so one more state, say setup
- which mans a worker is allocated but not started- would be
useful and the driver node might need to manage the number of
async execution. Or the driven nodes might do so inversely.

As for ForeignScan, it is merely an API for FDW and does nothing
substantial so it would have nothing special to do. As for
postgres_fdw, current patch restricts one execution per one
foreign server at once by itself. We would have to provide
another execution management if we want to have two or more
simultaneous scans per one foreign server at once.

Sorry for the focusless discussion but does this answer some of
your question?

> Expected waste of CPU or I/O is common problem to be solved, however, it does
> not need to add a special case handling to ForeignScan, I think.
> How about your opinion?

I agree with you that ForeignScan as the wrapper for FDWs don't
need anything special for the case. I suppose for now that
avoiding the penalty from abandoning too many speculatively
executed scans (or other works on bg worker like sorts) would be
a business of the upper node of FDWs, or somewhere else.

However, I haven't dismissed the possibility that some common
works related to resource management could be integrated into
executor (or even into planner), but I see none for now.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: Asynchronous execution on FDW

От
Kouhei Kaigai
Дата:
> > If we have ParallelAppend node that kicks a background worker process for
> > each underlying child node in parallel, does ForeignScan need to do something
> > special?
>
> Although I don't see the point of the background worker in your
> story but at least for ParalleMergeAppend, it would frequently
> discontinues to scan by upper Limit so one more state, say setup
> - which mans a worker is allocated but not started- would be
> useful and the driver node might need to manage the number of
> async execution. Or the driven nodes might do so inversely.
>
I expected workloads like single shot scan on a partitioned large
fact table on DWH system. Yep, if workload is expected to rescan
so frequently, its expected cost shall be higher (by the cost to
launch bgworker) than existing Append, then planner will kick out
this path.

Regarding of interaction between Limit and ParallelMergeAppend,
it is probably the best scenario, isn't it? If Limit picks up
the least 1000rows from a partitioned table consists of 20 child
tables, ParallelMergeAppend can launch 20 parallel jobs that
picks up the least 1000rows from the child relations for each.
Probably, it is same job done in pass_down_bound() of nodeLimit.c.

> As for ForeignScan, it is merely an API for FDW and does nothing
> substantial so it would have nothing special to do. As for
> postgres_fdw, current patch restricts one execution per one
> foreign server at once by itself. We would have to provide
> another execution management if we want to have two or more
> simultaneous scans per one foreign server at once.
>
Yep, your 4th patch defines a new callback to FdwRoutines and
5th patch implements postgres_fdw specific portion.
It shall work for distributed / shaded database environment well,
however, its benefit is around ForeignScan only.
Once management node kicks underlying SeqScan, ForeignScan or
others in parallel, it also enables to run local heap scan
asynchronously.

> Sorry for the focusless discussion but does this answer some of
> your question?
>
Hmm... Its advantage is still unclear for me. However, it is not
fair to hijack this thread by my idea.
I'll submit my design proposal about ParallelAppend towards the
next commit-fest. Please comment on.

> > Expected waste of CPU or I/O is common problem to be solved, however, it does
> > not need to add a special case handling to ForeignScan, I think.
> > How about your opinion?
>
> I agree with you that ForeignScan as the wrapper for FDWs don't
> need anything special for the case. I suppose for now that
> avoiding the penalty from abandoning too many speculatively
> executed scans (or other works on bg worker like sorts) would be
> a business of the upper node of FDWs, or somewhere else.
>
> However, I haven't dismissed the possibility that some common
> works related to resource management could be integrated into
> executor (or even into planner), but I see none for now.
>
I also agree with it is "eventually" needed, but may not be supported
in the first version.

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>




Re: Asynchronous execution on FDW

От
Kyotaro HORIGUCHI
Дата:
Hello,

At Thu, 23 Jul 2015 09:38:39 +0000, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote in
<9A28C8860F777E439AA12E8AEA7694F80111BCEC@BPXM15GP.gisp.nec.co.jp>
> I expected workloads like single shot scan on a partitioned large
> fact table on DWH system. Yep, if workload is expected to rescan
> so frequently, its expected cost shall be higher (by the cost to
> launch bgworker) than existing Append, then planner will kick out
> this path.
> 
> Regarding of interaction between Limit and ParallelMergeAppend,
> it is probably the best scenario, isn't it? If Limit picks up
> the least 1000rows from a partitioned table consists of 20 child
> tables, ParallelMergeAppend can launch 20 parallel jobs that
> picks up the least 1000rows from the child relations for each.
> Probably, it is same job done in pass_down_bound() of nodeLimit.c.

Yes. I confused a bit. The scenario is one of least problematic
cases.

> > As for ForeignScan, it is merely an API for FDW and does nothing
> > substantial so it would have nothing special to do. As for
> > postgres_fdw, current patch restricts one execution per one
> > foreign server at once by itself. We would have to provide
> > another execution management if we want to have two or more
> > simultaneous scans per one foreign server at once.
> >
> Yep, your 4th patch defines a new callback to FdwRoutines and
> 5th patch implements postgres_fdw specific portion.
> It shall work for distributed / shaded database environment well,
> however, its benefit is around ForeignScan only.
> Once management node kicks underlying SeqScan, ForeignScan or
> others in parallel, it also enables to run local heap scan
> asynchronously.

I suppose SeqScan don't need async kick since its startup cost is
extremely low as nothing. (fetching first several pages would
boost seqscans?) On the other hand sort/hash would be a field
where asynchronous execution is in effect.

> > Sorry for the focusless discussion but does this answer some of
> > your question?
> >
> Hmm... Its advantage is still unclear for me. However, it is not
> fair to hijack this thread by my idea.

It would be more advantageous if join/sort pushdown on fdw comes,
where start-up cost could be extremely high...

> I'll submit my design proposal about ParallelAppend towards the
> next commit-fest. Please comment on.

Ok, I'll come there.

> > > Expected waste of CPU or I/O is common problem to be solved, however, it does
> > > not need to add a special case handling to ForeignScan, I think.
> > > How about your opinion?
> > 
> > I agree with you that ForeignScan as the wrapper for FDWs don't
> > need anything special for the case. I suppose for now that
> > avoiding the penalty from abandoning too many speculatively
> > executed scans (or other works on bg worker like sorts) would be
> > a business of the upper node of FDWs, or somewhere else.
> > 
> > However, I haven't dismissed the possibility that some common
> > works related to resource management could be integrated into
> > executor (or even into planner), but I see none for now.
> >
> I also agree with it is "eventually" needed, but may not be supported
> in the first version.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: Asynchronous execution on FDW

От
Kouhei Kaigai
Дата:
Hello Horiguchi-san,

> > > As for ForeignScan, it is merely an API for FDW and does nothing
> > > substantial so it would have nothing special to do. As for
> > > postgres_fdw, current patch restricts one execution per one
> > > foreign server at once by itself. We would have to provide
> > > another execution management if we want to have two or more
> > > simultaneous scans per one foreign server at once.
> > >
> > Yep, your 4th patch defines a new callback to FdwRoutines and
> > 5th patch implements postgres_fdw specific portion.
> > It shall work for distributed / shaded database environment well,
> > however, its benefit is around ForeignScan only.
> > Once management node kicks underlying SeqScan, ForeignScan or
> > others in parallel, it also enables to run local heap scan
> > asynchronously.
>
> I suppose SeqScan don't need async kick since its startup cost is
> extremely low as nothing. (fetching first several pages would
> boost seqscans?) On the other hand sort/hash would be a field
> where asynchronous execution is in effect.
>
Startup cost is not only advantage of asynchronous execution.
If background worker prefetches the records to be read soon, during
other tasks are in progress, its latency to fetch next record is
much faster than usual execution path.
Please assume if next record is on neither shared-buffer nor page
cache of operating system.
First, the upper node calls heap_getnext() to fetch next record,
then it looks up the target block on the shared-buffer, then it
issues read(2) system call, then operating system makes the caller
process slept until this block gets read from the storage.
If asynchronous worker already goes through the above painful code
path and the records to be read are ready on the top of queue, it
will reduce the i/o wait time dramatically.

> > > Sorry for the focusless discussion but does this answer some of
> > > your question?
> > >
> > Hmm... Its advantage is still unclear for me. However, it is not
> > fair to hijack this thread by my idea.
>
> It would be more advantageous if join/sort pushdown on fdw comes,
> where start-up cost could be extremely high...
>
Not only FDW. I intend to combine the ParallelAppend with another idea
I previously post, to run tables join in parallel.
In case of partitioned foreign-tables, planner probably needs to consider
(1) FDW scan + local serial join, (2) FDW scan + local parallel join,
or (3) FDW remote join, according to the cost.

* [idea] table partition + hash join:
http://www.postgresql.org/message-id/9A28C8860F777E439AA12E8AEA7694F8010F672B@BPXM15GP.gisp.nec.co.jp

Anyway, let's have a further discussion in another thread.

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>



Re: Asynchronous execution on FDW

От
Heikki Linnakangas
Дата:
I've marked this as rejected in the commitfest, because others are
working on a more general solution with parallel workers. That's still
work-in-progress, and it's not certain if it's going to make it into
9.6, but if it does it will largely render this obsolete. We can revisit
this patch later in the release cycle, if the parallel scan patch hasn't
solved the same use case by then.

- Heikki




Re: Asynchronous execution on FDW

От
Robert Haas
Дата:
On Mon, Aug 10, 2015 at 3:23 AM, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> I've marked this as rejected in the commitfest, because others are
> working on a more general solution with parallel workers. That's still
> work-in-progress, and it's not certain if it's going to make it into
> 9.6, but if it does it will largely render this obsolete. We can revisit
> this patch later in the release cycle, if the parallel scan patch hasn't
> solved the same use case by then.

I think the really important issue for this patch is the one discussed here:

http://www.postgresql.org/message-id/CA+TgmoaiJK1svzw_GkFU+zsSxciJKFELqu2AOMVUPhpSFw4BsQ@mail.gmail.com

You raised an important issue there but never really expressed an
opinion on the points I raised, here or on the other thread.  And
neither did anyone else except the patch author who, perhaps
unsurprisingly, thinks it's OK.  I wish we could get more discussion
about that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Asynchronous execution on FDW

От
Stephen Frost
Дата:
* Robert Haas (robertmhaas@gmail.com) wrote:
> On Mon, Aug 10, 2015 at 3:23 AM, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> > I've marked this as rejected in the commitfest, because others are
> > working on a more general solution with parallel workers. That's still
> > work-in-progress, and it's not certain if it's going to make it into
> > 9.6, but if it does it will largely render this obsolete. We can revisit
> > this patch later in the release cycle, if the parallel scan patch hasn't
> > solved the same use case by then.
>
> I think the really important issue for this patch is the one discussed here:
>
> http://www.postgresql.org/message-id/CA+TgmoaiJK1svzw_GkFU+zsSxciJKFELqu2AOMVUPhpSFw4BsQ@mail.gmail.com

I agree that it'd be great to figure out the answer to #2, but I'm also
of the opinion that we can either let the user tell us through the use
of the GUCs proposed in the patch or simply not worry about the
potential for time wastage associated with starting them all at once, as
you suggested there.

> You raised an important issue there but never really expressed an
> opinion on the points I raised, here or on the other thread.  And
> neither did anyone else except the patch author who, perhaps
> unsurprisingly, thinks it's OK.  I wish we could get more discussion
> about that.

When I read the proposal, I had the same reaction that it didn't seem
like quite the right place and it further bothered me that it was
specific to FDWs.

Perhaps not surprisingly, as I authored it, but I'm still a fan of my
proposal #1 here:

http://www.postgresql.org/message-id/20131104032604.GB2706@tamriel.snowman.net

More generally, I completely agree with the position (I believe your's,
but I might be misremembering) that we want to have this async
capability independently and in addition to parallel scan.  I don't
believe one obviates the advantages of the other.
Thanks!
    Stephen