Обсуждение: Patch to add functionality to specify ORDER BY in CREATE FUNCTION for SRFs

Поиск
Список
Период
Сортировка

Patch to add functionality to specify ORDER BY in CREATE FUNCTION for SRFs

От
Atri Sharma
Дата:

Hi All,

Please forgive if this is a repost.

Please find attached patch for supporting ORDER BY clause in CREATE FUNCTION for SRFs. Specifically:

CREATE OR REPLACE FUNCTION func1(OUT e int, OUT f int) returns setof record as ' SELECT a,b FROM table1 ORDER BY a; ' language 'sql' ORDER BY e;

This shall allow for capturing information about existing preorder that might be present inherently in the SRF's input or algorithm (the above example and think generate_series).

This allows for eliminating sorts that can be based off the known existing preorder. For eg:

SELECT * FROM correct_order_singlecol() ORDER BY e; # Does not need to sort by e since          existing preorder is known.

Eliminating such sorts can be a huge gain, especially if the expected input to needed Sort node is large.

The obvious question that comes is what happens if specified ORDER BY clause is false. For checking the order, a new node is added which is top node of the plan and is responsible for projecting result rows. It tracks the previous row seen and given a sort order, ensures that the current tuple to be projected is in the required sort order.

So, for above example

EXPLAIN (COSTS OFF) SELECT * FROM correct_order_multicol() ORDER BY e;
                  QUERY PLAN                  
-----------------------------------------------
 OrderCheck
   ->  Function Scan on correct_order_multicol
(2 rows)


If order of result rows is not the same as required, an error is raised:

SELECT * FROM incorrect_order_nulls() ORDER BY e NULLS LAST;
ERROR:  Order not same as specified



Preorder columns are first transformed into SortGroupClauses first and then stored directly in pg_proc.


This functionality is a user case seen functionality, and is especially useful when SRF inputs are large and/or might be pipelined from another function (SRFs are used in pipelines in analytical systems many times, with large data).

The overhead of this patch is small. A new path is added for the preorder keys, and OrderCheck node's additional cost is pretty low, given that it only compares two rows and stores only a single row (previous row seen), hence the memory footprint is minuscule.

In the inner joins thread, Tom mentioned having a new node which has multiple plans and executor can decide which plan to execute given runtime conditions. I played around with the idea, and am open to experiment having a new node which has a Sort based plan and is executed in case OrderCheck node sees that the inherent order of result tuples is not correct. Feedback here would be very welcome.


I will add the patch to current commitfest.

Thoughts?

Regards,

Atri
Вложения

Re: Patch to add functionality to specify ORDER BY in CREATE FUNCTION for SRFs

От
David G Johnston
Дата:
Atri Sharma wrote
> If order of result rows is not the same as required, an error is raised:
> 
> SELECT * FROM incorrect_order_nulls() ORDER BY e NULLS LAST;
> ERROR:  Order not same as specified

First reaction for the error was unfavorable but (see below) it likely is
the best option and does adequately cover the reason for failure -
programmer error.  It is not data specific (other than by accident) so any
runtime attempt to correct an error is going to be wasted effort and
overhead.


> In the inner joins thread, Tom mentioned having a new node which has
> multiple plans and executor can decide which plan to execute given runtime
> conditions. I played around with the idea, and am open to experiment
> having
> a new node which has a Sort based plan and is executed in case OrderCheck
> node sees that the inherent order of result tuples is not correct.
> Feedback
> here would be very welcome.

Could SQL functions be explored such that if the planner sees an order by it
omits the post-function sort node otherwise it adds an explicit one?

How expensive is sorting an already sorted result?

Runtime conditional sorting seems worse case, depending on implementation,
since a large result with an error in the last bit will end up nearly double
processing.  I'd rather deem unsorted output programmer error and raise the
error so it is likely to be discovered during testing and not have any
meaningful runtime overhead.

David J.




--
View this message in context:
http://postgresql.nabble.com/Patch-to-add-functionality-to-specify-ORDER-BY-in-CREATE-FUNCTION-for-SRFs-tp5832876p5832885.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.



Re: Re: Patch to add functionality to specify ORDER BY in CREATE FUNCTION for SRFs

От
Tom Lane
Дата:
David G Johnston <david.g.johnston@gmail.com> writes:
> Atri Sharma wrote
>> If order of result rows is not the same as required, an error is raised:
>> 
>> SELECT * FROM incorrect_order_nulls() ORDER BY e NULLS LAST;
>> ERROR:  Order not same as specified

> First reaction for the error was unfavorable but (see below) it likely is
> the best option and does adequately cover the reason for failure -
> programmer error.

TBH, my first reaction to this entire patch is unfavorable: it's a
solution in search of a problem.  It adds substantial complication not
only for users but for PG developers in order to solve a rather narrow
performance issue.

What would make sense to me is to teach the planner about inlining
SQL functions that include ORDER BY clauses, so that the performance
issue of a double sort could be avoided entirely transparently to
the user.
        regards, tom lane



Re: Re: Patch to add functionality to specify ORDER BY in CREATE FUNCTION for SRFs

От
Atri Sharma
Дата:


On Mon, Jan 5, 2015 at 11:24 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
David G Johnston <david.g.johnston@gmail.com> writes:
> Atri Sharma wrote
>> If order of result rows is not the same as required, an error is raised:
>>
>> SELECT * FROM incorrect_order_nulls() ORDER BY e NULLS LAST;
>> ERROR:  Order not same as specified

> First reaction for the error was unfavorable but (see below) it likely is
> the best option and does adequately cover the reason for failure -
> programmer error.

TBH, my first reaction to this entire patch is unfavorable: it's a
solution in search of a problem.  It adds substantial complication not
only for users but for PG developers in order to solve a rather narrow
performance issue.

I could agree about the scope of the performance issue, but am not sure about the added complication. It essentially is similar to, say, a combination of how Unique is implemented with a flavour or ORDINALITY implementation. A new path that is added in a certain number of cases plus a low overhead node does not seem too bad to me IMO. This is inline with a lot of real world cases I have seen, where the data is *bubbled* up to SRFs, which does give a possibility of an existing order. Couple it with the power to specify ORDER BY in your SRF function and you could save a lot.

I am not sure how it complicates for hackers.  Could you please elaborate a bit?


What would make sense to me is to teach the planner about inlining
SQL functions that include ORDER BY clauses, so that the performance
issue of a double sort could be avoided entirely transparently to
the user.

It sounds good, but inlining in current way shall restrict the scope of optimization (which is not applicable for current design). For eg, you cannot inline RECORD returning SRFs...


--
Regards,
 
Atri
l'apprenant

Re: Re: Patch to add functionality to specify ORDER BY in CREATE FUNCTION for SRFs

От
Jim Nasby
Дата:
On 1/5/15, 12:20 PM, Atri Sharma wrote:
>     What would make sense to me is to teach the planner about inlining
>     SQL functions that include ORDER BY clauses, so that the performance
>     issue of a double sort could be avoided entirely transparently to
>     the user.
>
>
> It sounds good, but inlining in current way shall restrict the scope of optimization (which is not applicable for
currentdesign). For eg, you cannot inline RECORD returning SRFs...
 

Related... I'd like to see a way to inline a function that does something like:

CREATE FUNCTION foo(text) RETURNS int LANGUAGE sql AS $$
SELECT a FROM b WHERE lower(b.c) = lower($1)
$$

and have the performance be comparable to

SELECT ..., (SELECT a FROM b WHERE lower(b.c) = lower(something)) AS foo

I realize that there's a whole question about the function not being an SRF, but the thing is this works great when
manuallyinlined and is fast. The SQL function is significantly slower.
 
-- 
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com



Re: Re: Patch to add functionality to specify ORDER BY in CREATE FUNCTION for SRFs

От
Tom Lane
Дата:
Jim Nasby <Jim.Nasby@bluetreble.com> writes:
> Related... I'd like to see a way to inline a function that does something like:

> CREATE FUNCTION foo(text) RETURNS int LANGUAGE sql AS $$
> SELECT a FROM b WHERE lower(b.c) = lower($1)
> $$

The reason that's not inlined ATM is that the semantics wouldn't be the
same (ie, what happens if the SELECT returns more than one row).  It's
possible they would be the same if we attached a LIMIT 1 to the function's
query, but I'm not 100% sure about that offhand.  I'm also not real sure
that you'd still get good performance if there were an inserted LIMIT;
that would disable at least some optimizations.
        regards, tom lane



Re: Patch to add functionality to specify ORDER BY in CREATE FUNCTION for SRFs

От
Ashutosh Bapat
Дата:


On Mon, Jan 5, 2015 at 8:42 PM, Atri Sharma <atri.jiit@gmail.com> wrote:

Hi All,

Please forgive if this is a repost.

Please find attached patch for supporting ORDER BY clause in CREATE FUNCTION for SRFs. Specifically:

CREATE OR REPLACE FUNCTION func1(OUT e int, OUT f int) returns setof record as ' SELECT a,b FROM table1 ORDER BY a; ' language 'sql' ORDER BY e;

This shall allow for capturing information about existing preorder that might be present inherently in the SRF's input or algorithm (the above example and think generate_series).

This allows for eliminating sorts that can be based off the known existing preorder. For eg:

SELECT * FROM correct_order_singlecol() ORDER BY e; # Does not need to sort by e since          existing preorder is known.

Eliminating such sorts can be a huge gain, especially if the expected input to needed Sort node is large.

The obvious question that comes is what happens if specified ORDER BY clause is false. For checking the order, a new node is added which is top node of the plan and is responsible for projecting result rows. It tracks the previous row seen and given a sort order, ensures that the current tuple to be projected is in the required sort order.

So, for above example

EXPLAIN (COSTS OFF) SELECT * FROM correct_order_multicol() ORDER BY e;
                  QUERY PLAN                  
-----------------------------------------------
 OrderCheck
   ->  Function Scan on correct_order_multicol
(2 rows)


If order of result rows is not the same as required, an error is raised:

SELECT * FROM incorrect_order_nulls() ORDER BY e NULLS LAST;
ERROR:  Order not same as specified



Preorder columns are first transformed into SortGroupClauses first and then stored directly in pg_proc.


This functionality is a user case seen functionality, and is especially useful when SRF inputs are large and/or might be pipelined from another function (SRFs are used in pipelines in analytical systems many times, with large data).

The overhead of this patch is small. A new path is added for the preorder keys, and OrderCheck node's additional cost is pretty low, given that it only compares two rows and stores only a single row (previous row seen), hence the memory footprint is minuscule.


We can eliminate the new node and put onus or having the right order on the user like we do with volatile setting of the function.
 
In the inner joins thread, Tom mentioned having a new node which has multiple plans and executor can decide which plan to execute given runtime conditions. I played around with the idea, and am open to experiment having a new node which has a Sort based plan and is executed in case OrderCheck node sees that the inherent order of result tuples is not correct. Feedback here would be very welcome.


I will add the patch to current commitfest.

Thoughts?

Regards,

Atri


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers




--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: Patch to add functionality to specify ORDER BY in CREATE FUNCTION for SRFs

От
Atri Sharma
Дата:


The overhead of this patch is small. A new path is added for the preorder keys, and OrderCheck node's additional cost is pretty low, given that it only compares two rows and stores only a single row (previous row seen), hence the memory footprint is minuscule.


We can eliminate the new node and put onus or having the right order on the user like we do with volatile setting of the function.
 

That is exactly what the new node does, since we are not re sorting right now in case the order is wrong. Please see my explanation upthread, OrderCheck node's primary purpose is to check for a user error in the result rows order. The onus right now to give correct order is on user.

Regards,

Atri 
--
Regards,
 
Atri
l'apprenant

Re: Patch to add functionality to specify ORDER BY in CREATE FUNCTION for SRFs

От
Ashutosh Bapat
Дата:


On Tue, Jan 6, 2015 at 12:23 PM, Atri Sharma <atri.jiit@gmail.com> wrote:


The overhead of this patch is small. A new path is added for the preorder keys, and OrderCheck node's additional cost is pretty low, given that it only compares two rows and stores only a single row (previous row seen), hence the memory footprint is minuscule.


We can eliminate the new node and put onus or having the right order on the user like we do with volatile setting of the function.
 

That is exactly what the new node does, since we are not re sorting right now in case the order is wrong. Please see my explanation upthread, OrderCheck node's primary purpose is to check for a user error in the result rows order. The onus right now to give correct order is on user.


Even checking whether the output of the function is in the right order or not, has its cost. I am suggesting that we can eliminate this cost as well. For example, PostgreSQL does not check whether a function is really immutable or not.
 
Regards,

Atri 
--
Regards,
 
Atri
l'apprenant



--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: Patch to add functionality to specify ORDER BY in CREATE FUNCTION for SRFs

От
Amit Langote
Дата:
On 06-01-2015 PM 04:00, Ashutosh Bapat wrote:
> On Tue, Jan 6, 2015 at 12:23 PM, Atri Sharma <atri.jiit@gmail.com> wrote:
>>> We can eliminate the new node and put onus or having the right order on
>>> the user like we do with volatile setting of the function.
>>>
>>>
>>
>> That is exactly what the new node does, since we are not re sorting right
>> now in case the order is wrong. Please see my explanation upthread,
>> OrderCheck node's primary purpose is to check for a user error in the
>> result rows order. The onus right now to give correct order is on user.
>>
>>
> Even checking whether the output of the function is in the right order or
> not, has its cost. I am suggesting that we can eliminate this cost as well.
> For example, PostgreSQL does not check whether a function is really
> immutable or not.
> 

Sounds something like ORDERED BY x implying output is "known" ordered by
x perhaps enough hint for the planner to make necessary plan choices
though I may be wrong.

Amit




Re: Patch to add functionality to specify ORDER BY in CREATE FUNCTION for SRFs

От
Atri Sharma
Дата:


On Tue, Jan 6, 2015 at 12:29 PM, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:
On 06-01-2015 PM 04:00, Ashutosh Bapat wrote:
> On Tue, Jan 6, 2015 at 12:23 PM, Atri Sharma <atri.jiit@gmail.com> wrote:
>>> We can eliminate the new node and put onus or having the right order on
>>> the user like we do with volatile setting of the function.
>>>
>>>
>>
>> That is exactly what the new node does, since we are not re sorting right
>> now in case the order is wrong. Please see my explanation upthread,
>> OrderCheck node's primary purpose is to check for a user error in the
>> result rows order. The onus right now to give correct order is on user.
>>
>>
> Even checking whether the output of the function is in the right order or
> not, has its cost. I am suggesting that we can eliminate this cost as well.
> For example, PostgreSQL does not check whether a function is really
> immutable or not.
>

Sounds something like ORDERED BY x implying output is "known" ordered by
x perhaps enough hint for the planner to make necessary plan choices
though I may be wrong.



I may be missing something, but isnt what you mentioned the exact functionality this patch adds? 

Re: Patch to add functionality to specify ORDER BY in CREATE FUNCTION for SRFs

От
Atri Sharma
Дата:


On Tue, Jan 6, 2015 at 12:30 PM, Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> wrote:


On Tue, Jan 6, 2015 at 12:23 PM, Atri Sharma <atri.jiit@gmail.com> wrote:



Even checking whether the output of the function is in the right order or not, has its cost. I am suggesting that we can eliminate this cost as well. For example, PostgreSQL does not check whether a function is really immutable or not.
 

That implies possibly returning a non ordered result set even when the user explicitly specified an ORDER BY clause. If we are depending on an optimization and it did not work out (even if it is a user error), I think we should error out indicating that the order was incorrect rather than returning non ordered rows, which could be disastrous IMO. 

Re: Patch to add functionality to specify ORDER BY in CREATE FUNCTION for SRFs

От
Pavan Deolasee
Дата:


On Tue, Jan 6, 2015 at 12:38 PM, Atri Sharma <atri.jiit@gmail.com> wrote:


On Tue, Jan 6, 2015 at 12:29 PM, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:

>

Sounds something like ORDERED BY x implying output is "known" ordered by
x perhaps enough hint for the planner to make necessary plan choices
though I may be wrong.



I may be missing something, but isnt what you mentioned the exact functionality this patch adds? 

May be what Amit has in mind is that the planner can choose the most optimal sorting algorithm using the hint that the dataset is probably already sorted. Actually why not just do that?  Though AFAIK its against our stated policy of not adding any planner hints.

Thanks,
Pavan

--

Re: Patch to add functionality to specify ORDER BY in CREATE FUNCTION for SRFs

От
Amit Langote
Дата:
On 06-01-2015 PM 04:08, Atri Sharma wrote:
> On Tue, Jan 6, 2015 at 12:29 PM, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp
>> wrote:
>>> Even checking whether the output of the function is in the right order or
>>> not, has its cost. I am suggesting that we can eliminate this cost as
>> well.
>>> For example, PostgreSQL does not check whether a function is really
>>> immutable or not.
>>>
>>
>> Sounds something like ORDERED BY x implying output is "known" ordered by
>> x perhaps enough hint for the planner to make necessary plan choices
>> though I may be wrong.
>>
>>
>>
> I may be missing something, but isnt what you mentioned the exact
> functionality this patch adds?
> 

I read what Ashutosh says as that a clause like IMMUTABLE does not
entail a node execution. Reading manual for CREATE FUNCTION:

<manual>
IMMUTABLE
STABLE
VOLATILE

These attributes inform the query optimizer about the behavior of the
function.
</manual>

They declare the shape of the kind of output the function produces and
planner simply trusts the declaration meaning it does not add a node to
check if, say, an IMMUTABLE function did not actually modify the
database or that it is returning the same output for a given input.

Though, I have no strong opinion on whether one thing is good or the
other or whether they cover some particular use case all the same.
Perhaps you can say that better.

Thanks,
Amit




Re: Patch to add functionality to specify ORDER BY in CREATE FUNCTION for SRFs

От
Atri Sharma
Дата:


On Tue, Jan 6, 2015 at 12:47 PM, Pavan Deolasee <pavan.deolasee@gmail.com> wrote:


On Tue, Jan 6, 2015 at 12:38 PM, Atri Sharma <atri.jiit@gmail.com> wrote:


May be what Amit has in mind is that the planner can choose the most optimal sorting algorithm using the hint that the dataset is probably already sorted. Actually why not just do that?  Though AFAIK its against our stated policy of not adding any planner hints.



I am not sure I totally understand your point. If the dataset is known to have exact order configuration as requested, why would we need the sort at all?

I would rather not mess around with the sort algorithm choosing mechanism anyways. IIUC, the standard way to hint about any existing order is to make a path and populate its pathkeys with the corresponding existing order pathkeys (ORDINALITY is the best example here that I can quote).

This is what the patch does. It does not in anyways force the planner or give it any planner hints, just an optional Path to evalutate. The Path has pathkeys specified in ORDER BY clause given at time of creation.


Re: Patch to add functionality to specify ORDER BY in CREATE FUNCTION for SRFs

От
Atri Sharma
Дата:


On Tue, Jan 6, 2015 at 12:43 PM, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:
On 06-01-2015 PM 04:08, Atri Sharma wrote:
> On Tue, Jan 6, 2015 at 12:29 PM, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp


I read what Ashutosh says as that a clause like IMMUTABLE does not
entail a node execution. Reading manual for CREATE FUNCTION:

<manual>
IMMUTABLE
STABLE
VOLATILE

These attributes inform the query optimizer about the behavior of the
function.
</manual>

They declare the shape of the kind of output the function produces and
planner simply trusts the declaration meaning it does not add a node to
check if, say, an IMMUTABLE function did not actually modify the
database or that it is returning the same output for a given input.

Though, I have no strong opinion on whether one thing is good or the
other or whether they cover some particular use case all the same.
Perhaps you can say that better.


Personally, I think returning non ordered rows when ORDER BY clause is specifically specified by user is a gross violation of security and could lead to major user application breakdowns, since the application will trust that postgres will return the rows in order since ORDER BY was specified. Of course, what Ashutosh suggested makes the patch much simpler, but I would rather not go down that road. 



Re: Re: Patch to add functionality to specify ORDER BY in CREATE FUNCTION for SRFs

От
David G Johnston
Дата:
Even checking whether the output of the function is in the right order or not, has its cost. I am suggesting that we can eliminate this cost as well. For example, PostgreSQL does not check whether a function is really immutable or not.

Best Wishes,
Ashutosh Bapat


 I imagine if it could be done reliably and cheaply the volatility specifier would be checked.  There are a number of situations where not doing so confuses users and we get complaints.  If we allow a constraint to be specified we should try to error if that constraint is violated - otherwise we probably should not allow it to be defined in the first place.

David J.


View this message in context: Re: [HACKERS] Re: Patch to add functionality to specify ORDER BY in CREATE FUNCTION for SRFs
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

Re: Patch to add functionality to specify ORDER BY in CREATE FUNCTION for SRFs

От
Amit Langote
Дата:
On 06-01-2015 PM 04:26, Atri Sharma wrote:
> On Tue, Jan 6, 2015 at 12:43 PM, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp
>> wrote:
>> Though, I have no strong opinion on whether one thing is good or the
>> other or whether they cover some particular use case all the same.
>> Perhaps you can say that better.
>>
>>
> Personally, I think returning non ordered rows when ORDER BY clause is
> specifically specified by user is a gross violation of security and could
> lead to major user application breakdowns, since the application will trust
> that postgres will return the rows in order since ORDER BY was specified.
> Of course, what Ashutosh suggested makes the patch much simpler, but I
> would rather not go down that road.
> 

I think the same thing applies to IMMUTABLE declarations for example.
Planner trusts (or take as a hint) such declarations during, say,
constraint exclusion where quals involving non-immutable functions are
kept out of the exclusion proof. If a miscreant declares a non-immutable
function IMMUTABLE, then constraint violations may ensue simply because
planner trusted the miscreant. That is, such unsafe restrict clauses
would wrongly prove a partition as being unnecessary to scan. I am sure
there are other sites where such bets are made. In that light, I might
as well call them hints than anything.

<manual>
The volatility category is a *promise* to the optimizer about the
behavior of the function
</manual>

Though, as I said ordering behavior *may not be* a good candidate to
make such promises.

On the other hand, what such a thing might help with, are the situations
where a developer is frustrated because planner would ignore (or is
uninformed about) the order that the developer *knows* his function
produces.

But, if the node you propose to enforce the order is good enough, then
it may be worthwhile to go that route, :)

Thanks,
Amit




Re: Patch to add functionality to specify ORDER BY in CREATE FUNCTION for SRFs

От
Atri Sharma
Дата:


On Tuesday, January 6, 2015, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:
On 06-01-2015 PM 04:26, Atri Sharma wrote:
> On Tue, Jan 6, 2015 at 12:43 PM, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp
>> wrote:
>> Though, I have no strong opinion on whether one thing is good or the
>> other or whether they cover some particular use case all the same.
>> Perhaps you can say that better.
>>
>>
> Personally, I think returning non ordered rows when ORDER BY clause is
> specifically specified by user is a gross violation of security and could
> lead to major user application breakdowns, since the application will trust
> that postgres will return the rows in order since ORDER BY was specified.
> Of course, what Ashutosh suggested makes the patch much simpler, but I
> would rather not go down that road.
>

I think the same thing applies to IMMUTABLE declarations for example.
Planner trusts (or take as a hint) such declarations during, say,
constraint exclusion where quals involving non-immutable functions are
kept out of the exclusion proof. If a miscreant declares a non-immutable
function IMMUTABLE, then constraint violations may ensue simply because
planner trusted the miscreant. That is, such unsafe restrict clauses
would wrongly prove a partition as being unnecessary to scan. I am sure
there are other sites where such bets are made. In that light, I might
as well call them hints than anything.

<manual>
The volatility category is a *promise* to the optimizer about the
behavior of the function
</manual>

Though, as I said ordering behavior *may not be* a good candidate to
make such promises.

On the other hand, what such a thing might help with, are the situations
where a developer is frustrated because planner would ignore (or is
uninformed about) the order that the developer *knows* his function
produces.

But, if the node you propose to enforce the order is good enough, then
it may be worthwhile to go that route, :)



The purpose of the patch is to give the planner an option to use the preorder that the developer knows will be produced. However, since ensuring against developer induced errors in this case is relatively cheap, I think the new node is worth it. 


--
Regards,
 
Atri
l'apprenant

Re: Re: Patch to add functionality to specify ORDER BY in CREATE FUNCTION for SRFs

От
Alvaro Herrera
Дата:
Tom Lane wrote:

> What would make sense to me is to teach the planner about inlining
> SQL functions that include ORDER BY clauses, so that the performance
> issue of a double sort could be avoided entirely transparently to
> the user.

Wouldn't this be applicable to functions in other languages too, not
only SQL?

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Re: Patch to add functionality to specify ORDER BY in CREATE FUNCTION for SRFs

От
Jim Nasby
Дата:
On 1/5/15, 3:14 PM, Tom Lane wrote:
> Jim Nasby <Jim.Nasby@bluetreble.com> writes:
>> Related... I'd like to see a way to inline a function that does something like:
>
>> CREATE FUNCTION foo(text) RETURNS int LANGUAGE sql AS $$
>> SELECT a FROM b WHERE lower(b.c) = lower($1)
>> $$
>
> The reason that's not inlined ATM is that the semantics wouldn't be the
> same (ie, what happens if the SELECT returns more than one row).  It's
> possible they would be the same if we attached a LIMIT 1 to the function's
> query, but I'm not 100% sure about that offhand.  I'm also not real sure
> that you'd still get good performance if there were an inserted LIMIT;
> that would disable at least some optimizations.

In this case there's actually a unique index on lower(b.c). I don't know if the planner is smart enough to recognize
thattoday though.
 

Perhaps a good interim solution would be a flag/option you could set on SQL functions to force (or disallow) inlining?
Thatmeans if the option is set it's on the callers head if it doesn't do what's desired. We should throw an error if
there'ssomething about the function that would prevent inlining though.
 
-- 
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com



Re: Re: Patch to add functionality to specify ORDER BY in CREATE FUNCTION for SRFs

От
Jim Nasby
Дата:
On 1/6/15, 10:32 AM, Alvaro Herrera wrote:
> Tom Lane wrote:
>
>> What would make sense to me is to teach the planner about inlining
>> SQL functions that include ORDER BY clauses, so that the performance
>> issue of a double sort could be avoided entirely transparently to
>> the user.
>
> Wouldn't this be applicable to functions in other languages too, not
> only SQL?

Dumb question... we can inline functions from other languages? What chunk of code handles that?
-- 
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com



Re: Patch to add functionality to specify ORDER BY in CREATE FUNCTION for SRFs

От
Jim Nasby
Дата:
On 1/6/15, 1:00 AM, Ashutosh Bapat wrote:
>
> Even checking whether the output of the function is in the right order or not, has its cost. I am suggesting that we
caneliminate this cost as well. For example, PostgreSQL does not check whether a function is really immutable or not.
 

Actually, it does:

select test();
ERROR:  UPDATE is not allowed in a non-volatile function
CONTEXT:  SQL statement "UPDATE i SET i=i+1"PL/pgSQL function test() line 3 at SQL statement
STATEMENT:  select test();
ERROR:  UPDATE is not allowed in a non-volatile function
CONTEXT:  SQL statement "UPDATE i SET i=i+1"
PL/pgSQL function test() line 3 at SQL statement

-- 
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com



Re: Re: Patch to add functionality to specify ORDER BY in CREATE FUNCTION for SRFs

От
David Johnston
Дата:
On Tue, Jan 6, 2015 at 4:15 PM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:
On 1/6/15, 10:32 AM, Alvaro Herrera wrote:
Tom Lane wrote:

What would make sense to me is to teach the planner about inlining
SQL functions that include ORDER BY clauses, so that the performance
issue of a double sort could be avoided entirely transparently to
the user.

Wouldn't this be applicable to functions in other languages too, not
only SQL?

Dumb question... we can inline functions from other languages? What chunk of code handles that?

​We cannot that I know of.  The point being made here is that suggesting an alternative that requires inlining ​doesn't cover the entire purpose of this feature since the feature can be applied to functions that cannot be inlined.

David J.

Re: Patch to add functionality to specify ORDER BY in CREATE FUNCTION for SRFs

От
Amit Langote
Дата:
On 07-01-2015 AM 08:33, Jim Nasby wrote:
> On 1/6/15, 1:00 AM, Ashutosh Bapat wrote:
>>
>> Even checking whether the output of the function is in the right order
>> or not, has its cost. I am suggesting that we can eliminate this cost
>> as well. For example, PostgreSQL does not check whether a function is
>> really immutable or not.
> 
> Actually, it does:
> 
> select test();
> ERROR:  UPDATE is not allowed in a non-volatile function
> CONTEXT:  SQL statement "UPDATE i SET i=i+1"
>     PL/pgSQL function test() line 3 at SQL statement
> STATEMENT:  select test();
> ERROR:  UPDATE is not allowed in a non-volatile function
> CONTEXT:  SQL statement "UPDATE i SET i=i+1"
> PL/pgSQL function test() line 3 at SQL statement
> 

I think Ashutosh's point is that there is no dedicated executor node to
perform this check. ISTM, the above error is raised during planning
itself as part of the initialization of state for a function.

Thanks,
Amit




Re: Patch to add functionality to specify ORDER BY in CREATE FUNCTION for SRFs

От
Ashutosh Bapat
Дата:
Not in all cases

postgres=# create function non_im_immutable_function() returns float as $$
begin
return random();
end;
$$ language plpgsql immutable;
CREATE FUNCTION

postgres=# select proname, provolatile from pg_proc where proname = 'random' or proname = 'non_im_immutable_function';
          proname          | provolatile
---------------------------+-------------
 random                    | v
 non_im_immutable_function | i

postgres=# select non_im_immutable_function();
 non_im_immutable_function
---------------------------
         0.963812265079468
(1 row)
postgres=# select non_im_immutable_function();
 non_im_immutable_function
---------------------------
         0.362834882922471
(1 row)

Per definition of immutable functions, the function's output shouldn't depend upon a volatile function e.g. random().

On Wed, Jan 7, 2015 at 5:03 AM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:
On 1/6/15, 1:00 AM, Ashutosh Bapat wrote:

Even checking whether the output of the function is in the right order or not, has its cost. I am suggesting that we can eliminate this cost as well. For example, PostgreSQL does not check whether a function is really immutable or not.

Actually, it does:

select test();
ERROR:  UPDATE is not allowed in a non-volatile function
CONTEXT:  SQL statement "UPDATE i SET i=i+1"
        PL/pgSQL function test() line 3 at SQL statement
STATEMENT:  select test();
ERROR:  UPDATE is not allowed in a non-volatile function
CONTEXT:  SQL statement "UPDATE i SET i=i+1"
PL/pgSQL function test() line 3 at SQL statement


--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com



--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: Patch to add functionality to specify ORDER BY in CREATE FUNCTION for SRFs

От
Michael Paquier
Дата:
On Tue, Jan 6, 2015 at 12:12 AM, Atri Sharma <atri.jiit@gmail.com> wrote:
> I will add the patch to current commitfest.
It has been indeed added to the commit fest 2014-12. That's a bit
late, moving it to upcoming one 2015-02.
Thanks,
-- 
Michael



Re: Re: Patch to add functionality to specify ORDER BY in CREATE FUNCTION for SRFs

От
Nicolas Barbier
Дата:
2015-01-05 Tom Lane <tgl@sss.pgh.pa.us>:

> What would make sense to me is to teach the planner about inlining
> SQL functions that include ORDER BY clauses, so that the performance
> issue of a double sort could be avoided entirely transparently to
> the user.

Another way of getting to the point where the extra check-node is not
needed in obvious cases, would be:

* Apply the current patch in some form.
* Later, add code that analyzes the query inside the function. If it
turns out that the result of the analysis implies the declared order,
don't add the check-node.

The analysis can in principle also be performed for other languages,
but that would most likely be way more complex for the typical "Turing
complete" languages.

Nicolas

-- 
A. Because it breaks the logical sequence of discussion.
Q. Why is top posting bad?



Re: Re: Patch to add functionality to specify ORDER BY in CREATE FUNCTION for SRFs

От
Robert Haas
Дата:
On Mon, Jan 5, 2015 at 12:54 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> TBH, my first reaction to this entire patch is unfavorable: it's a
> solution in search of a problem.  It adds substantial complication not
> only for users but for PG developers in order to solve a rather narrow
> performance issue.
>
> What would make sense to me is to teach the planner about inlining
> SQL functions that include ORDER BY clauses, so that the performance
> issue of a double sort could be avoided entirely transparently to
> the user.

That's not a bad idea, but it only helps for SQL functions.  Actually,
the problem I have run into in the past was not that the planner
didn't know the ordering of the SRF's return value, but that it had no
statistics for it, and therefore made bad optimization decisions.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Patch to add functionality to specify ORDER BY in CREATE FUNCTION for SRFs

От
Jeff Janes
Дата:
On Mon, Jan 5, 2015 at 7:12 AM, Atri Sharma <atri.jiit@gmail.com> wrote:

Hi All,

Please forgive if this is a repost.

Please find attached patch for supporting ORDER BY clause in CREATE FUNCTION for SRFs.

Hi Atri,

From the discussion, I don't know if this patch is still being proposed.  If so, it needs a rebase.

Thanks,

Jeff