Re: Async execution of postgres_fdw.

Поиск
Список
Период
Сортировка
От Kyotaro HORIGUCHI
Тема Re: Async execution of postgres_fdw.
Дата
Msg-id 20150116.171849.109146500.horiguchi.kyotaro@lab.ntt.co.jp
обсуждение исходный текст
Ответ на Re: Async execution of postgres_fdw.  (Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>)
Ответы Re: Async execution of postgres_fdw.  (Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>)
Список pgsql-hackers
I revised the patch so that async scan will be done more
aggressively, and took execution time for two very simple cases.

As the result, simple seq scan gained 5% and hash join of two
foreign tables gained 150%. (2.4 times faster).

While measuring the performance, I noticed that each scan in a
query runs at once rather than alternating with each other in
many cases such as hash join or sorted joins and so. So I
modified the patch so that async fetch is done more
aggressively. The new v4 patch is attached. The following numbers
are taken based on it.

========
Simple seq scan for the first test.

> CREATE TABLE lt1 (a int, b timestamp, c text);
> CREATE SERVER sv1 FOREIGN DATA WRAPPER postgres_fdw OPTIONS (host 'localhost');
> CREATE USER MAPPING FOR PUBLIC SERVER sv1;
> CREATE FOREIGN TABLE ft1 () SERVER sv1 OPTIONS (table_name 'lt1');
> INSERT INTO lt1 (SELECT a, now(), repeat('x', 128) FROM generate_series(0, 999999) a);

On this case, I took the the 10 times average of exec time of the
following query for both master head and patched version.  The
fetch size is 100.

> postgres=# EXPLAIN (ANALYZE ON, COSTS OFF) SELECT * FROM ft1;
>                          QUERY PLAN                  
> ------------------------------------------------------------------
>  Foreign Scan on ft1  (actual time=0.79 5..4175.706 rows=1000000 loops=1)
>  Planning time: 0.060 ms
>  Execution time: 4276.043 ms
 master head  : avg = 4256.621,  std dev = 17.099 patched pgfdw: avg = 4036.463,  std dev =  2.608

The patched version is faster by about 5%. This should be pure
result of asynchronous fetching, not including the effect of
early starting of remote execution in ExecInit.

Interestingly, as fetch_count gets larger, the gain raises in
spite of the decrease of the number of query sending.
 master head  : avg = 2622.759,  std dev = 38.379 patched pgfdw: avg = 2277.622,  std dev = 27.269

About 15% gain. And for 10000,
 master head  : avg = 2000.980,  std dev =  6.434 patched pgfdw: avg = 1616.793,  std dev = 13.192

19%.. It is natural that exec time reduces along with increase of
fetch size, but I haven't found the reason why the patch's gain
also increases.

======================

The second case is a simple join of two foreign tables sharing
one connection.

The master head runs this query in about 16 seconds with almost
no fluctuation among multiple tries.

> =# EXPLAIN (ANALYZE ON, COSTS OFF) SELECT x.a, x.c, y.c
>    FROM ft1 AS x JOIN ft1 AS y on x.a = y.a;
>                                       QUERY PLAN
> ----------------------------------------------------------------------------
>  Hash Join (actual time=7541.831..15924.631 rows=1000000 loops=1)
>    Hash Cond: (x.a = y.a)
>   ->  Foreign Scan on ft1 x (actual time=1.176..6553.480 rows=1000000 loops=1)
>   ->  Hash (actual time=7539.761..7539.761 rows=1000000 loops=1)
>        Buckets: 32768  Batches: 64  Memory Usage: 2829kB
>    ->  Foreign Scan on ft1 y (actual time=1.067..6529.165 rows=1000000 loops=1)
>  Planning time: 0.223 ms
>  Execution time: 15973.916 ms

But the v4 patch mysteriously accelerates this query, 6.5 seconds.

> =# EXPLAIN (ANALYZE ON, COSTS OFF) SELECT x.a, x.c, y.c
>    FROM ft1 AS x JOIN ft1 AS y on x.a = y.a;
>                                    QUERY PLAN
> ----------------------------------------------------------------------------
>  Hash Join (actual time=2556.977..5812.937 rows=1000000 loops=1)
>    Hash Cond: (x.a = y.a)
>   ->  Foreign Scan on ft1 x (actual time=32.689..1936.565 rows=1000000 loops=1)
>   ->  Hash (actual time=2523.810..2523.810 rows=1000000 loops=1)
>        Buckets: 32768  Batches: 64  Memory Usage: 2829kB
>    ->  Foreign Scan on ft1 y (actual time=50.345..1928.411 rows=1000000 loops=1)
>  Planning time: 0.220 ms
>  Execution time: 6512.043 ms

The result data seems not broken. I don't know the reason yet but
I'll investigate it.


regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Oleg Bartunov
Дата:
Сообщение: Re: VODKA?
Следующее
От: Noah Misch
Дата:
Сообщение: hamerkop is stuck