PassDownLimitBound for ForeignScan/CustomScan [take-2]

Поиск
Список
Период
Сортировка
От Jeevan Chalke
Тема PassDownLimitBound for ForeignScan/CustomScan [take-2]
Дата
Msg-id CAM2+6=VBkoPE54U_jGr8iSuR7Xe+BnwesiUK1tye3hckcy_AyQ@mail.gmail.com
обсуждение исходный текст
Ответы Re: PassDownLimitBound for ForeignScan/CustomScan [take-2]
Список pgsql-hackers


On Mon, Nov 21, 2016 at 1:59 PM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:
Hello,

The attached patch is a revised version of pass-down LIMIT to FDW/CSP.

Below is the updates from the last version.

'ps_numTuples' of PlanState was declared as uint64, instead of long
to avoid problems on 32bits machine when a large LIMIT clause is
supplied.

'ps_numTuples' is re-interpreted; 0 means that its upper node wants
to fetch all the tuples. It allows to eliminate a boring initialization
on ExecInit handler for each executor node.

Even though it was not suggested, estimate_path_cost_size() of postgres_fdw
adjusts number of rows if foreign-path is located on top-level of
the base-relations and LIMIT clause takes a constant value.
It will make more adequate plan as follows:

* WITHOUT this patch
--------------------
postgres=# explain verbose select * from t_a, t_b where t_a.id = t_b.id and t_a.x < t_b.x LIMIT 100;
                                       QUERY PLAN
----------------------------------------------------------------------------------------
 Limit  (cost=261.17..274.43 rows=100 width=88)
   Output: t_a.id, t_a.x, t_a.y, t_b.id, t_b.x, t_b.y
   ->  Hash Join  (cost=261.17..581.50 rows=2416 width=88)
         Output: t_a.id, t_a.x, t_a.y, t_b.id, t_b.x, t_b.y
         Hash Cond: (t_a.id = t_b.id)
         Join Filter: (t_a.x < t_b.x)
         ->  Foreign Scan on public.t_a  (cost=100.00..146.12 rows=1204 width=44)
               Output: t_a.id, t_a.x, t_a.y
               Remote SQL: SELECT id, x, y FROM public.t
         ->  Hash  (cost=146.12..146.12 rows=1204 width=44)
               Output: t_b.id, t_b.x, t_b.y
               ->  Foreign Scan on public.t_b  (cost=100.00..146.12 rows=1204 width=44)
                     Output: t_b.id, t_b.x, t_b.y
                     Remote SQL: SELECT id, x, y FROM public.t
(14 rows)

* WITH this patch
-----------------
postgres=# explain verbose select * from t_a, t_b where t_a.id = t_b.id and t_a.x < t_b.x LIMIT 100;
                                                                      QUERY PLAN
--------------------------------------------------------------------------------------------------------------
 Limit  (cost=100.00..146.58 rows=100 width=88)
   Output: t_a.id, t_a.x, t_a.y, t_b.id, t_b.x, t_b.y
   ->  Foreign Scan  (cost=100.00..146.58 rows=100 width=88)
         Output: t_a.id, t_a.x, t_a.y, t_b.id, t_b.x, t_b.y
         Relations: (public.t_a) INNER JOIN (public.t_b)
         Remote SQL: SELECT r1.id, r1.x, r1.y, r2.id, r2.x, r2.y FROM (public.t r1 INNER JOIN public.t r2 ON (((r1.x < r2.x)) AND ((r1.id = r2.id))))
(6 rows)


That's nice.
 
On the other hands, I noticed it is not safe to attach LIMIT clause at
the planner stage because root->limit_tuples is declared as double.
Even if LIMIT clause takes a constant value, it is potentially larger
than 2^53 which is the limitation we can represent accurately with
float64 data type but LIMIT clause allows up to 2^63-1.
So, postgres_fdw now attaches LIMIT clause on the remote query on
execution time only.

I think, it's OK.

Here are few comments on latest patch:

1.
make/make check is fine, however I am getting regression failure in
postgres_fdw contrib module (attached regression.diff).
Please investigate and fix.

2.
+             *
+             * MEMO: root->limit_tuples is not attached when query contains
+             * grouping-clause or aggregate functions. So, we don's adjust
+             * rows even if LIMIT <const> is supplied.


Can you please explain why you are not doing this for grouping-clause or
aggregate functions.

3.
Typo:

don's  => don't

Rest of the changes look good to me.

Thanks


Thanks,
----
PG-Strom Project / NEC OSS Promotion Center
KaiGai Kohei <kaigai@ak.jp.nec.com>


> -----Original Message-----
> From: Robert Haas [mailto:robertmhaas@gmail.com]
> Sent: Thursday, November 10, 2016 3:08 AM
> To: Kaigai Kouhei(海外 浩平) <kaigai@ak.jp.nec.com>
> Cc: pgsql-hackers@postgresql.org; Jeevan Chalke
> <jeevan.chalke@enterprisedb.com>; Etsuro Fujita
> <fujita.etsuro@lab.ntt.co.jp>; Andres Freund <andres@anarazel.de>
> Subject: ##freemail## Re: PassDownLimitBound for ForeignScan/CustomScan
> [take-2]
>
> On Mon, Oct 31, 2016 at 10:20 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com>
> wrote:
> > As an example, I enhanced postgres_fdw to understand the ps_numTuples
> > if it is set. If and when remote ORDER BY is pushed down, the latest
> > code tries to sort the entire remote table because it does not know
> > how many rows to be returned. Thus, it took larger execution time.
> > On the other hands, the patched one runs the remote query with LIMIT
> > clause according to the ps_numTuples; which is informed by the Limit
> > node on top of the ForeignScan node.
>
> So there are two cases here.  If the user says LIMIT 12, we could in theory
> know that at planner time and optimize accordingly.  If the user says LIMIT
> twelve(), however, we will need to wait until execution time unless twelve()
> happens to be capable of being simplified to a constant by the planner.
>
> Therefore, it's possible to imagine having two mechanisms here. In the
> simple case where the LIMIT and OFFSET values are constants, we could
> implement a system to get hold of that information during planning and
> use it for whatever we like.   In addition, we can have an
> execution-time system that optimizes based on values available at execution
> (regardless of whether those values were also available during planning).
> Those are, basically, two separate things, and this patch has enough to
> do just focusing on one of them.
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL
> Company



--
Jeevan Chalke
Principal Software Engineer, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Etsuro Fujita
Дата:
Сообщение: Re: Push down more full joins in postgres_fdw
Следующее
От: David Rowley
Дата:
Сообщение: Functions Immutable but not parallel safe?