Re: Patch to add functionality to specify ORDER BY in CREATE FUNCTION for SRFs

Поиск
Список
Период
Сортировка
От Ashutosh Bapat
Тема Re: Patch to add functionality to specify ORDER BY in CREATE FUNCTION for SRFs
Дата
Msg-id CAFjFpRcTr-A6x_XwQmv+DWEkVs_Ti_v6P3XLLOKCEOZOQmc3ag@mail.gmail.com
обсуждение исходный текст
Ответ на Patch to add functionality to specify ORDER BY in CREATE FUNCTION for SRFs  (Atri Sharma <atri.jiit@gmail.com>)
Ответы Re: Patch to add functionality to specify ORDER BY in CREATE FUNCTION for SRFs  (Atri Sharma <atri.jiit@gmail.com>)
Список pgsql-hackers


On Mon, Jan 5, 2015 at 8:42 PM, Atri Sharma <atri.jiit@gmail.com> wrote:

Hi All,

Please forgive if this is a repost.

Please find attached patch for supporting ORDER BY clause in CREATE FUNCTION for SRFs. Specifically:

CREATE OR REPLACE FUNCTION func1(OUT e int, OUT f int) returns setof record as ' SELECT a,b FROM table1 ORDER BY a; ' language 'sql' ORDER BY e;

This shall allow for capturing information about existing preorder that might be present inherently in the SRF's input or algorithm (the above example and think generate_series).

This allows for eliminating sorts that can be based off the known existing preorder. For eg:

SELECT * FROM correct_order_singlecol() ORDER BY e; # Does not need to sort by e since          existing preorder is known.

Eliminating such sorts can be a huge gain, especially if the expected input to needed Sort node is large.

The obvious question that comes is what happens if specified ORDER BY clause is false. For checking the order, a new node is added which is top node of the plan and is responsible for projecting result rows. It tracks the previous row seen and given a sort order, ensures that the current tuple to be projected is in the required sort order.

So, for above example

EXPLAIN (COSTS OFF) SELECT * FROM correct_order_multicol() ORDER BY e;
                  QUERY PLAN                  
-----------------------------------------------
 OrderCheck
   ->  Function Scan on correct_order_multicol
(2 rows)


If order of result rows is not the same as required, an error is raised:

SELECT * FROM incorrect_order_nulls() ORDER BY e NULLS LAST;
ERROR:  Order not same as specified



Preorder columns are first transformed into SortGroupClauses first and then stored directly in pg_proc.


This functionality is a user case seen functionality, and is especially useful when SRF inputs are large and/or might be pipelined from another function (SRFs are used in pipelines in analytical systems many times, with large data).

The overhead of this patch is small. A new path is added for the preorder keys, and OrderCheck node's additional cost is pretty low, given that it only compares two rows and stores only a single row (previous row seen), hence the memory footprint is minuscule.


We can eliminate the new node and put onus or having the right order on the user like we do with volatile setting of the function.
 
In the inner joins thread, Tom mentioned having a new node which has multiple plans and executor can decide which plan to execute given runtime conditions. I played around with the idea, and am open to experiment having a new node which has a Sort based plan and is executed in case OrderCheck node sees that the inherent order of result tuples is not correct. Feedback here would be very welcome.


I will add the patch to current commitfest.

Thoughts?

Regards,

Atri


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers




--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Amit Langote
Дата:
Сообщение: Re: On partitioning
Следующее
От: Atri Sharma
Дата:
Сообщение: Re: Patch to add functionality to specify ORDER BY in CREATE FUNCTION for SRFs