Re: PassDownLimitBound for ForeignScan/CustomScan [take-2]

Поиск
Список
Период
Сортировка
От Kouhei Kaigai
Тема Re: PassDownLimitBound for ForeignScan/CustomScan [take-2]
Дата
Msg-id 9A28C8860F777E439AA12E8AEA7694F80125F306@BPXM15GP.gisp.nec.co.jp
обсуждение исходный текст
Ответ на PassDownLimitBound for ForeignScan/CustomScan [take-2]  (Kouhei Kaigai <kaigai@ak.jp.nec.com>)
Список pgsql-hackers
Hello,

The attached patch is a revised version of pass-down LIMIT to FDW/CSP.

Below is the updates from the last version.

'ps_numTuples' of PlanState was declared as uint64, instead of long
to avoid problems on 32bits machine when a large LIMIT clause is
supplied.

'ps_numTuples' is re-interpreted; 0 means that its upper node wants
to fetch all the tuples. It allows to eliminate a boring initialization
on ExecInit handler for each executor node.

Even though it was not suggested, estimate_path_cost_size() of postgres_fdw
adjusts number of rows if foreign-path is located on top-level of
the base-relations and LIMIT clause takes a constant value.
It will make more adequate plan as follows:

* WITHOUT this patch
--------------------
postgres=# explain verbose select * from t_a, t_b where t_a.id = t_b.id and t_a.x < t_b.x LIMIT 100;
                                       QUERY PLAN
----------------------------------------------------------------------------------------
 Limit  (cost=261.17..274.43 rows=100 width=88)
   Output: t_a.id, t_a.x, t_a.y, t_b.id, t_b.x, t_b.y
   ->  Hash Join  (cost=261.17..581.50 rows=2416 width=88)
         Output: t_a.id, t_a.x, t_a.y, t_b.id, t_b.x, t_b.y
         Hash Cond: (t_a.id = t_b.id)
         Join Filter: (t_a.x < t_b.x)
         ->  Foreign Scan on public.t_a  (cost=100.00..146.12 rows=1204 width=44)
               Output: t_a.id, t_a.x, t_a.y
               Remote SQL: SELECT id, x, y FROM public.t
         ->  Hash  (cost=146.12..146.12 rows=1204 width=44)
               Output: t_b.id, t_b.x, t_b.y
               ->  Foreign Scan on public.t_b  (cost=100.00..146.12 rows=1204 width=44)
                     Output: t_b.id, t_b.x, t_b.y
                     Remote SQL: SELECT id, x, y FROM public.t
(14 rows)

* WITH this patch
-----------------
postgres=# explain verbose select * from t_a, t_b where t_a.id = t_b.id and t_a.x < t_b.x LIMIT 100;
                                                                      QUERY PLAN
--------------------------------------------------------------------------------------------------------------
 Limit  (cost=100.00..146.58 rows=100 width=88)
   Output: t_a.id, t_a.x, t_a.y, t_b.id, t_b.x, t_b.y
   ->  Foreign Scan  (cost=100.00..146.58 rows=100 width=88)
         Output: t_a.id, t_a.x, t_a.y, t_b.id, t_b.x, t_b.y
         Relations: (public.t_a) INNER JOIN (public.t_b)
         Remote SQL: SELECT r1.id, r1.x, r1.y, r2.id, r2.x, r2.y FROM (public.t r1 INNER JOIN public.t r2 ON (((r1.x <
r2.x))AND ((r1.id = r2.id))))
 
(6 rows)


On the other hands, I noticed it is not safe to attach LIMIT clause at
the planner stage because root->limit_tuples is declared as double.
Even if LIMIT clause takes a constant value, it is potentially larger
than 2^53 which is the limitation we can represent accurately with
float64 data type but LIMIT clause allows up to 2^63-1.
So, postgres_fdw now attaches LIMIT clause on the remote query on
execution time only.

Thanks,
----
PG-Strom Project / NEC OSS Promotion Center
KaiGai Kohei <kaigai@ak.jp.nec.com>


> -----Original Message-----
> From: Robert Haas [mailto:robertmhaas@gmail.com]
> Sent: Thursday, November 10, 2016 3:08 AM
> To: Kaigai Kouhei(海外 浩平) <kaigai@ak.jp.nec.com>
> Cc: pgsql-hackers@postgresql.org; Jeevan Chalke
> <jeevan.chalke@enterprisedb.com>; Etsuro Fujita
> <fujita.etsuro@lab.ntt.co.jp>; Andres Freund <andres@anarazel.de>
> Subject: ##freemail## Re: PassDownLimitBound for ForeignScan/CustomScan
> [take-2]
> 
> On Mon, Oct 31, 2016 at 10:20 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com>
> wrote:
> > As an example, I enhanced postgres_fdw to understand the ps_numTuples
> > if it is set. If and when remote ORDER BY is pushed down, the latest
> > code tries to sort the entire remote table because it does not know
> > how many rows to be returned. Thus, it took larger execution time.
> > On the other hands, the patched one runs the remote query with LIMIT
> > clause according to the ps_numTuples; which is informed by the Limit
> > node on top of the ForeignScan node.
> 
> So there are two cases here.  If the user says LIMIT 12, we could in theory
> know that at planner time and optimize accordingly.  If the user says LIMIT
> twelve(), however, we will need to wait until execution time unless twelve()
> happens to be capable of being simplified to a constant by the planner.
> 
> Therefore, it's possible to imagine having two mechanisms here. In the
> simple case where the LIMIT and OFFSET values are constants, we could
> implement a system to get hold of that information during planning and
> use it for whatever we like.   In addition, we can have an
> execution-time system that optimizes based on values available at execution
> (regardless of whether those values were also available during planning).
> Those are, basically, two separate things, and this patch has enough to
> do just focusing on one of them.
> 
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL
> Company

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Craig Ringer
Дата:
Сообщение: Re: Use procsignal_sigusr1_handler and RecoveryConflictInterrupt() from walsender?
Следующее
От: Craig Ringer
Дата:
Сообщение: Re: Re: BUG #13755: pgwin32_is_service not checking if SECURITY_SERVICE_SID is disabled