Обсуждение: Passing arguments to views

Поиск
Список
Период
Сортировка

Passing arguments to views

От
Chris Campbell
Дата:
I've written some complicated queries that I'd like to save inside  
the server and then call from my clients using a short name. For the  
queries that require no external information, views are perfect. For  
queries that *do* require external information (like a search date  
range), I've used functions (written in the SQL language, because I'm  
just writing plain SQL queries but using $1, $2 tokens for passed-in  
arguments).

When I use these functions, I'm typically joining the results of the  
function with other tables. Since much of the work is being done  
inside the function, the planner can't use both the inside-function  
and outside-function query information when generating a query plan.  
Instead, it has to do Function Scans (planning and executing the SQL  
inside the function at each execution, I'm assuming) and then  
manipulate the output.

Ideally, I'd like to be able to write queries that contain $n tokens  
that will be substituted at execution time, save them on the server,  
and let the query planner plan the whole query before it's executed.

Basically, writing views with arguments.

For example, a "sales_figures" view that requires start_date and  
end_date parameters could be used like this:
   CREATE VIEW sales_figures($1, $2) AS       SELECT ... FROM ... WHERE purchase_date BETWEEN $1 AND $2;
   SELECT * FROM sales_figures('2005-08-22', '2005-09-14') sf JOIN  
customers c ON (sf.customer_id = c.customer_id)

What do you think? Is this an interesting feature? Is this the right  
way to go about it, or should I try to get the planner to see through  
SQL function boundaries (e.g., enhance the function system instead of  
enhancing the view system)? Would this be a good project for a newbie  
to the code?

I can see that the syntax used above would be problematic: how would  
it distinguish that from a call to a sales_figures() function? Any  
comments about alternative syntax would be welcome, too!

Thanks!

- Chris


Re: Passing arguments to views

От
Greg Stark
Дата:
Chris Campbell <chris@bignerdranch.com> writes:

> What do you think? Is this an interesting feature? Is this the right  way to go
> about it, or should I try to get the planner to see through  SQL function
> boundaries

The "right" way to go about this in the original abstract set-theoretic
mindset of SQL is to code the view to retrieve all the rows and then apply
further WHERE clause restrictions to the results of the view. 


So for example this:

>     CREATE VIEW sales_figures($1, $2) AS
>         SELECT ... FROM ... WHERE purchase_date BETWEEN $1 AND $2;

Becomes:

CREATE VIEW sales_figures AS SELECT ... FROM ...

And then you query it with 

SELECT * FROM sales_figures WHERE purchase_date BETWEEN $1 AND $2

sales_figures could have any number of joins and complex where clauses
built-in. It could even be an aggregate grouped by some column (like
purchase_date).

This relies on the SQL optimizer to push the WHERE clause down into the view
to the appropriate depth. Postgres isn't always capable of doing so but it
does a pretty decent job.

-- 
greg



Re: Passing arguments to views

От
Chris Campbell
Дата:
On Feb 2, 2006, at 23:33, Greg Stark wrote:

> The "right" way to go about this in the original abstract set- 
> theoretic
> mindset of SQL is to code the view to retrieve all the rows and  
> then apply
> further WHERE clause restrictions to the results of the view.
>
> So for example this:
>
>>     CREATE VIEW sales_figures($1, $2) AS
>>         SELECT ... FROM ... WHERE purchase_date BETWEEN $1 AND $2;
>
> Becomes:
>
> CREATE VIEW sales_figures AS SELECT ... FROM ...
>
> And then you query it with
>
> SELECT * FROM sales_figures WHERE purchase_date BETWEEN $1 AND $2

That was a very simplistic example and didn't illustrate my point --  
I apologize. I was trying to think of something succinct and  
illustrative for a quick mailing list post but came up short.

Maybe a better example would be a situation where you want to do  
substitutions in places other than the WHERE clause? There's no way  
to "push" that out to the calling query. But even in this simple  
case, the easier-to-grok syntax of making a view look like a function  
(and codifying the options for restricting the results as arguments  
to the view) is a nice win in terms of readability and maintainability.

I was hoping that people would overlook my bad example because  
they've had the need for a "view with arguments" tool in their own  
work, and the conversation would just be about how it could be  
implemented. :)

I'll try to distill a better example from some of the projects I'm  
working on.

Thanks!

- Chris



Re: Passing arguments to views

От
Tino Wildenhain
Дата:
Chris Campbell schrieb:
...
> That was a very simplistic example and didn't illustrate my point --  I
> apologize. I was trying to think of something succinct and  illustrative
> for a quick mailing list post but came up short.
> 
> Maybe a better example would be a situation where you want to do 
> substitutions in places other than the WHERE clause? There's no way  to
> "push" that out to the calling query. But even in this simple  case, the
> easier-to-grok syntax of making a view look like a function  (and
> codifying the options for restricting the results as arguments  to the
> view) is a nice win in terms of readability and maintainability.

Well if the view does not suit your needs, why dont you use an
set returnung function instead? Inside it you can do all the magic
you want and still use it similar to a table or view.

Regards
Tino


Re: Passing arguments to views

От
Chris Campbell
Дата:
On Feb 3, 2006, at 02:09, Tino Wildenhain wrote:

> Well if the view does not suit your needs, why dont you use an
> set returnung function instead? Inside it you can do all the magic
> you want and still use it similar to a table or view.

That's what I'm currently doing (as explained in the first message in  
the thread). But the function is a "black box" to the planner when  
the query is executed -- I'd like the planner to be able to combine  
the query inside the function with the outer calling query and plan  
it as one big query. Like it does with views. Thus, "views with  
arguments."

We're certainly not deficient in this area (set-returning functions  
fill the need quite well), but a feature like this would go even  
further in terms of ease-of-use and performance.

Benefits of "views with arguments" versus functions:

* Better query execution performance because the planner can plan the  
whole query (rewriting the original query to replace references to  
the view with the view's definition -- this is how views work today)

* PostgreSQL-tracked dependancies: views create dependencies on the  
relations they reference -- functions do not

* Don't have to manually maintain a composite type for the return value

Basically, better performance and easier administration.

Thanks!

- Chris



Re: Passing arguments to views

От
Martijn van Oosterhout
Дата:
On Fri, Feb 03, 2006 at 08:33:23AM -0500, Chris Campbell wrote:
> On Feb 3, 2006, at 02:09, Tino Wildenhain wrote:
>
> >Well if the view does not suit your needs, why dont you use an
> >set returnung function instead? Inside it you can do all the magic
> >you want and still use it similar to a table or view.
>
> That's what I'm currently doing (as explained in the first message in
> the thread). But the function is a "black box" to the planner when
> the query is executed -- I'd like the planner to be able to combine
> the query inside the function with the outer calling query and plan
> it as one big query. Like it does with views. Thus, "views with
> arguments."
>
> We're certainly not deficient in this area (set-returning functions
> fill the need quite well), but a feature like this would go even
> further in terms of ease-of-use and performance.

Hmm, we actually do inline SQL functions under certain situations, but
only for "simple" queries (see inline_function in
optimizer/util/clauses.c). One approach would be to expand that
function to inline more complicated things.

> Benefits of "views with arguments" versus functions:
>
> * Better query execution performance because the planner can plan the
> whole query (rewriting the original query to replace references to
> the view with the view's definition -- this is how views work today)

Well, the inlining would acheive the same effect.

> * PostgreSQL-tracked dependancies: views create dependencies on the
> relations they reference -- functions do not

Technically a bug. We should be tracking dependancies for functions
anyway.

> * Don't have to manually maintain a composite type for the return value

This is a good point. Though with syntactic sugar you could work around
this too...

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

Re: Passing arguments to views

От
Chris Campbell
Дата:
On Feb 3, 2006, at 08:50, Martijn van Oosterhout wrote:

> Hmm, we actually do inline SQL functions under certain situations, but
> only for "simple" queries (see inline_function in
> optimizer/util/clauses.c). One approach would be to expand that
> function to inline more complicated things.

>> * Better query execution performance because the planner can plan the
>> whole query (rewriting the original query to replace references to
>> the view with the view's definition -- this is how views work today)
>
> Well, the inlining would acheive the same effect.

So you think approaching it from the "beefing up functions" side  
would be better than the "beefing up views" side?

>> * PostgreSQL-tracked dependancies: views create dependencies on the
>> relations they reference -- functions do not
>
> Technically a bug. We should be tracking dependancies for functions
> anyway.

Well, it's impossible to track dependencies for all functions, since  
they're just strings (or compiled code in shared libraries) until  
they're executed. But maybe SQL language functions could be special- 
cased? Do you think it would be easier to add dependancy-tracking for  
functions, or would it be easier to implement this functionality  
using the more-restrictive-language but better-dependency-tracking  
view system? When you add dependencies for things that didn't have  
dependencies before (like non-SQL functions), you create all sorts of  
backwards-compatibility problems due to the ordering that things need  
to be dumped and created, and circular dependancies.

For example, this works:
    CREATE FUNCTION foo(INTEGER) RETURNS INTEGER AS 'BEGIN RETURN bar 
($1-1); END;' LANGUAGE plpgsql;
    CREATE FUNCTION bar(INTEGER) RETURNS INTEGER AS 'BEGIN IF $1 < 0  
THEN RETURN $1; ELSE RETURN foo($1); END IF; END;' LANGUAGE plpgsql;

But it wouldn't work if PostgreSQL tracked and enforced dependancies.  
But it could probably be done with SQL-language functions only. I  
don't know if we'd want to add dependancy tracking for functions if  
it only works for SQL-language functions, though.

> This is a good point. Though with syntactic sugar you could work  
> around
> this too...

Basically, how views do it? :) By auto-creating a table with the  
proper columns (for a function, that would be an auto-created type).

I'm looking for a function/view hybrid, taking features from each. It  
seems to me that views have most of the features I want (only missing  
the ability to pass arguments), so it's a shorter distance to the  
goal than by starting with functions.

Thanks!

- Chris



Re: Passing arguments to views

От
Martijn van Oosterhout
Дата:
On Fri, Feb 03, 2006 at 09:18:51AM -0500, Chris Campbell wrote:
> So you think approaching it from the "beefing up functions" side
> would be better than the "beefing up views" side?

Well yes, I think you're underestimating the issues with trying to
extend views.

> >Technically a bug. We should be tracking dependancies for functions
> >anyway.
>
> Well, it's impossible to track dependencies for all functions, since
> they're just strings (or compiled code in shared libraries) until

<snip>

Sure, for most procedural languages you can't do much. But we do do
syntax checking already and checking that the necessary functions exist
can be considered part of that. It' s not terribly important though.

> Basically, how views do it? :) By auto-creating a table with the
> proper columns (for a function, that would be an auto-created type).
>
> I'm looking for a function/view hybrid, taking features from each. It
> seems to me that views have most of the features I want (only missing
> the ability to pass arguments), so it's a shorter distance to the
> goal than by starting with functions.

Ok, here's the deal. A view is nothing more than a RULE. Creating a
view does this automatically:

CREATE RULE blah AS ON SELECT TO myview DO INSTEAD <select statement>;

Now, say you wanted to add parameters to this, would you restrict it to
SELECT rules, what about UPDATE or DELETE rules?

UPDATE myview(3,4) SET ...;

The other issue is that currently you can tell from looking at a
statement whether something is a function or a table (is it followed by
an open parenthesis?). With the above change you can't anymore, which
might mean you can't have functions and tables with the same names
because they'd be ambiguous.

On the whole, I think allowing the server to inline SRFs would be a far
better way to go...

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

Re: Passing arguments to views

От
Chris Campbell
Дата:
On Feb 3, 2006, at 10:25, Martijn van Oosterhout wrote:

> Sure, for most procedural languages you can't do much. But we do do
> syntax checking already and checking that the necessary functions  
> exist
> can be considered part of that. It' s not terribly important though.

Dependancy tracking needs to be more than a "best effort." If you  
can't do it accurately and completely, then I don't think it's worth  
doing at all.

But I agree with you that syntax checking is probably sufficient. We  
don't need true dependancy tracking for functions.

The only failure case (where syntax checking won't help you) is  
deleting (or renaming, or modifying) a table that a function was  
using. If you were to run or re-define the function, you'd learn  
about the missing (or renamed, or modified) table, whereas the  
dependancy-tracking system would prevent you from making the changes  
to the referenced table in the first place.

> Ok, here's the deal. A view is nothing more than a RULE. Creating a
> view does this automatically

Technically, it's a table and a rule, both of which are created by  
the CREATE VIEW command. We were talking about syntactic sugar, and  
CREATE VIEW is syntactic sugar for doing a CREATE TABLE and a CREATE  
RULE. That was my comparison. I'm aware of how views work. Here's the  
deal: I want to beef up rules versus beefing up functions. Maybe  
that's not the way to go; I'm enjoying this discussion and your  
insights.

> CREATE RULE blah AS ON SELECT TO myview DO INSTEAD <select statement>;
>
> Now, say you wanted to add parameters to this, would you restrict  
> it to
> SELECT rules, what about UPDATE or DELETE rules?

I don't see a huge use case for anything but SELECT rules, but I  
think it could be extended to any rule type. Maybe the CREATE RULE  
syntax could be something like:
    CREATE RULE blah AS ON SELECT(INTEGER, INTEGER, DATE) TO myview  
DO INSTEAD SELECT * FROM sale WHERE sale_date = $3;

> The other issue is that currently you can tell from looking at a
> statement whether something is a function or a table (is it  
> followed by
> an open parenthesis?). With the above change you can't anymore, which
> might mean you can't have functions and tables with the same names
> because they'd be ambiguous.

Right. I said that my example syntax was deficient in this regard in  
the first message in this thread. And I solicited ideas for a better  
(unambiguous) syntax. I'm sure we would be able to come up with  
something. Maybe using square brackets instead of parentheses? Curly  
braces? "myview->(1, 2, 3)" notation? Since views are tables (when  
parsing the query, at least) we'd have to allow this syntax for any  
table reference, but throw an error (or silently discard the  
arguments) if the table didn't have a rule matching the argument types?

> On the whole, I think allowing the server to inline SRFs would be a  
> far
> better way to go...

Maybe, but the highly-structured view definition syntax and  
everything that comes with it (dependancy tracking primarily) is so  
tasty. I think a little grammar hacking and a couple extra columns in  
pg_rewrite (nargs and argtypes) would get us most of the way there.

I would much rather put more stringent requirements on the programmer  
when defining his query (like a view), versus letting him submit any  
old string as a function (like a function). The database can do so  
much more when it's able to work with a better representation of the  
computation.

At the core, I want query rewriting with arguments. That sounds like  
a better fit for views/rules than functions, so that's why I keep  
coming back to it.

Thanks!

- Chris



Re: Passing arguments to views

От
Tom Lane
Дата:
Chris Campbell <chris@bignerdranch.com> writes:
> I want to beef up rules versus beefing up functions.

Martijn didn't present a very convincing argument why this is a bad
idea, but I agree with him that it is.  The problem to me is that a
"view with arguments" is fundamentally wrong --- a view is a virtual
table and there is no such thing as a table with arguments.  The whole
notion distorts the relational data model beyond recognition.

The SRF concept captures what you want a whole lot better.  If the
implementation isn't up to snuff, we should improve it, not warp other
pieces of the system.

Martijn mentioned the idea of inlining SQL functions that return sets
--- this is something I've toyed with too, but not got round to looking
at seriously.  AFAICS it would accomplish everything that you could do
with parameters in ON SELECT rules, considering the existing
restrictions on what can be in an ON SELECT rule.  And it wouldn't
require any new concepts at all, just a few(?) pages of code.

As for the dependency issue, one man's bug is another man's feature.
I think the fact that we don't track the internal dependencies of
functions is not all bad.  We've certainly seen plenty of complaints
about how you can't easily change tables that a view is depending on
because the view dependencies block it...
        regards, tom lane


Re: Passing arguments to views

От
Chris Campbell
Дата:
On Feb 3, 2006, at 11:21, Tom Lane wrote:

> The SRF concept captures what you want a whole lot better.  If the
> implementation isn't up to snuff, we should improve it, not warp other
> pieces of the system.

Point taken. The rewriting concept is what I'm after; if that can be  
done pre-planning with SQL functions, I'm all for it. I just thought  
that since rules already do rewriting, that's the best thing to start  
building on.

> Martijn mentioned the idea of inlining SQL functions that return sets
> --- this is something I've toyed with too, but not got round to  
> looking
> at seriously.  AFAICS it would accomplish everything that you could do
> with parameters in ON SELECT rules, considering the existing
> restrictions on what can be in an ON SELECT rule.  And it wouldn't
> require any new concepts at all, just a few(?) pages of code.

True, as long as there's a hook to do the inlining/rewriting before  
the query's planned. I guess we can see function calls at the parse  
stage, check to see if they're SQL functions or not, grab the prosrc,  
do the substitution, then re-parse?

I guess I can live without the dependancy tracking. I can always dump  
and reload my database to re-parse all the functions. Maybe we could  
have a RELOAD FUNCTION command that would just re-parse an existing  
function, so I don't have to dump and reload?

What about auto-creating a composite type for the function's return  
type based on the query definition? (Like how CREATE VIEW creates an  
appropriate table definition.) Do you see a way for CREATE FUNCTION  
to do that? The problem is that you have to specify a return type in  
CREATE FUNCTION.

Maybe an extension to CREATE FUNCTION as a shorthand for set- 
returning SQL functions? Like:
   CREATE SQL FUNCTION sales_figures(DATE) AS SELECT ... FROM ...  
WHERE sale_date <= $1;

It would (1) automatically create a composite type (newtype) for the  
return value, and (2) do a
   CREATE FUNCTION sales_figures(DATE) RETURNS SETOF newtype AS  
'...' LANGUAGE sql.

How much do I have to justify a patch for non-standard "RELOAD  
FUNCTION" and "CREATE SQL FUNCTION" commands (as described) in the  
grammar? :)

Thanks!

- Chris



Re: Passing arguments to views

От
Tom Lane
Дата:
Chris Campbell <chris@bignerdranch.com> writes:
> True, as long as there's a hook to do the inlining/rewriting before  
> the query's planned. I guess we can see function calls at the parse  
> stage, check to see if they're SQL functions or not, grab the prosrc,  
> do the substitution, then re-parse?

pull_up_subqueries in prepjointree.c would be the appropriate place
I think: if it's an RTE_FUNCTION RTE, look to see if function is SQL
and has the other needed properties, if so replace it by an RTE_SUBQUERY
RTE with the correct subquery, then recurse to try to flatten the
subquery.  (Note: I'm in the middle of hacking that code to flatten
UNION subqueries, so you might want to wait till I commit before
starting on a patch ;-))

> I guess I can live without the dependancy tracking. I can always dump  
> and reload my database to re-parse all the functions. Maybe we could  
> have a RELOAD FUNCTION command that would just re-parse an existing  
> function, so I don't have to dump and reload?

Hm?  I don't understand why you think this is needed.

> What about auto-creating a composite type for the function's return  
> type based on the query definition?

Can't get excited about this --- although I don't have any fundamental
objection either.  Choosing a name for such a type might be a bit of
a problem (I don't think you can just use the function name, as that
would preclude overloading).

> Maybe an extension to CREATE FUNCTION as a shorthand for set- 
> returning SQL functions?

It would be surprising if this form of CREATE FUNCTION defaulted to
assuming SETOF when other forms don't, so I don't like the proposal
as written.  Also the syntax you suggest has noplace to put function
attributes like VOLATILE.

Note that you can already do

regression=# create function fooey(int, out k1 int, out k2 int) returns setof record as
regression-# $$ select unique1, unique2 from tenk1 where thousand = $1 $$ language sql;
CREATE FUNCTION
regression=# select * from fooey(44); k1  |  k2
------+------7044 |  5625044 |  6921044 |  7894044 | 18753044 | 36492044 | 40638044 | 61246044 | 64519044 | 6503  44 |
7059
(10 rows)

regression=#

I'm not convinced that the incremental advantage of not having to write
out the function output column types is worth introducing an
inconsistent variant of CREATE FUNCTION.  Some indeed would argue that
that's not an advantage at all ;-)
        regards, tom lane


Re: Passing arguments to views

От
Chris Campbell
Дата:
On Feb 3, 2006, at 12:27, Tom Lane wrote:

>> I guess I can live without the dependancy tracking. I can always dump
>> and reload my database to re-parse all the functions. Maybe we could
>> have a RELOAD FUNCTION command that would just re-parse an existing
>> function, so I don't have to dump and reload?
>
> Hm?  I don't understand why you think this is needed.

Consider function foo() that references table bar. When you CREATE  
FUNCTION foo() ... AS 'SELECT ... FROM bar' you get an error message  
if bar doesn't exist. If it does exist, CREATE FUNCTION succeeds.

If you later DROP bar, you're not informed that function foo() was  
referencing it. You only find that out if you redefine foo() (using  
CREATE OR REPLACE FUNCTION and passing in the same definition, which  
fails) or if you try to run foo() (and the query fails).

If functions had true dependency tracking, then you couldn't DROP bar  
due to foo()'s dependency on it, unless you did a DROP CASCADE and  
were alerted that foo() was dropped as well.

I'm fine with those limitations. I can confirm that all of my  
functions are not referencing tables that don't exist by doing a  
CREATE OR REPLACE FUNCTION to reload each function. A pg_dump/ 
pg_restore would accomplish this, but it would be nice to have a  
"RELOAD FUNCTION" (or "REPARSE"? or "VERIFY"?) command that would  
just re-parse the function's source code (like CREATE FUNCTION does)  
and spit out errors if the function is referencing relations that  
don't exist. Just as a way to confirm that the table modification I  
just performed didn't break any functions. On-demand dependency  
checking, in a way.

> Note that you can already do
>
> regression=# create function fooey(int, out k1 int, out k2 int)  
> returns setof record as
> regression-# $$ select unique1, unique2 from tenk1 where thousand =  
> $1 $$ language sql;
> CREATE FUNCTION
> regression=# select * from fooey(44);
>   k1  |  k2
> ------+------
>  7044 |  562
>  5044 |  692
>  1044 |  789
>  4044 | 1875
>  3044 | 3649
>  2044 | 4063
>  8044 | 6124
>  6044 | 6451
>  9044 | 6503
>    44 | 7059
> (10 rows)
>
> regression=#

Learn something new every day. I'm still using 7.4 for most of my day  
job, and I can't do this without supplying a column definition list:

ERROR:  a column definition list is required for functions returning  
"record"

I hereby withdraw my proposal for "CREATE SQL FUNCTION."

Thanks!

- Chris




Re: Passing arguments to views

От
Tom Lane
Дата:
Chris Campbell <chris@bignerdranch.com> writes:
> I'm fine with those limitations. I can confirm that all of my  
> functions are not referencing tables that don't exist by doing a  
> CREATE OR REPLACE FUNCTION to reload each function. A pg_dump/ 
> pg_restore would accomplish this, but it would be nice to have a  
> "RELOAD FUNCTION" (or "REPARSE"? or "VERIFY"?) command that would  
> just re-parse the function's source code (like CREATE FUNCTION does)  
> and spit out errors if the function is referencing relations that  
> don't exist.

This is putting way too much trust in the behavior of a
PL-language-specific verifier function.  Anyway, you can do what you
want today:select fmgr_sql_validator(oid) from pg_proc where prolang = 14;
(Generalizing this to work for any language is left as an exercise
for the reader...)
        regards, tom lane


Re: Passing arguments to views

От
Josh Berkus
Дата:
Tom,

> As for the dependency issue, one man's bug is another man's feature.
> I think the fact that we don't track the internal dependencies of
> functions is not all bad.  We've certainly seen plenty of complaints
> about how you can't easily change tables that a view is depending on
> because the view dependencies block it...

I'd agree with this.   I write about 150,000 lines of function code a year, 
and if I had to rebuild all of the cascading functions every time I change 
a table they way I have to with views, it would probably add 20% to my 
overall application development time.

BTW, the other thing that we're still TODOing on SRFs (as far as I know) is 
finding ways to change the row estimate for an SRF.  It's still a flat 
1000 in the code, which can cause a lot of bad query plans.  I proposed a 
year ago that, as a first step, we allow the function owner to assign a 
static estimate variable to the function (i.e. "average rows returned = 
5').  This doesn't solve the whole problem of SRF estimates but it would 
be a significant step forwards in being able to use them in queries.

-- 
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco


Re: Passing arguments to views

От
Mark Dilger
Дата:
Tom Lane wrote:
> Chris Campbell <chris@bignerdranch.com> writes:
> 
>>True, as long as there's a hook to do the inlining/rewriting before  
>>the query's planned. I guess we can see function calls at the parse  
>>stage, check to see if they're SQL functions or not, grab the prosrc,  
>>do the substitution, then re-parse?
> 
> 
> pull_up_subqueries in prepjointree.c would be the appropriate place
> I think: if it's an RTE_FUNCTION RTE, look to see if function is SQL
> and has the other needed properties, if so replace it by an RTE_SUBQUERY
> RTE with the correct subquery, then recurse to try to flatten the
> subquery.  (Note: I'm in the middle of hacking that code to flatten
> UNION subqueries, so you might want to wait till I commit before
> starting on a patch ;-))

If we are talking about inserting the function definition into the query as a 
subquery and then letting the parser treat it as a subquery, then I see no 
reason to use either the existing function or view subsystems.  It sounds more 
like we are discussing a macro language.
  CREATE MACRO foo(bar,baz) AS $$    select a from b where b > bar and b < baz  $$;

Then when you query
  SELECT * FROM foo(1,7) AS f WHERE f % 7 = 3

you get a macro expansion as such:
  SELECT * FROM (a from b where b > bar and b < baz) AS f WHERE f % 7 = 3

Then whatever optimizations the query planner can manage against a subquery will 
work for macros as well.

Thoughts?



Re: Passing arguments to views

От
Tom Lane
Дата:
Josh Berkus <josh@agliodbs.com> writes:
> BTW, the other thing that we're still TODOing on SRFs (as far as I know) is 
> finding ways to change the row estimate for an SRF.  It's still a flat 
> 1000 in the code, which can cause a lot of bad query plans.  I proposed a 
> year ago that, as a first step, we allow the function owner to assign a 
> static estimate variable to the function (i.e. "average rows returned = 
> 5').  This doesn't solve the whole problem of SRF estimates but it would 
> be a significant step forwards in being able to use them in queries.

The inlining thing would solve that much better, at least for the cases
where the function can be inlined.  I'm not sure how we can improve the
situation for things like looping plpgsql functions --- the function
owner probably can't write down a hard estimate for those either, in
most cases.
        regards, tom lane


Re: Passing arguments to views

От
Tom Lane
Дата:
Mark Dilger <pgsql@markdilger.com> writes:
> If we are talking about inserting the function definition into the
> query as a subquery and then letting the parser treat it as a
> subquery, then I see no reason to use either the existing function or
> view subsystems.  It sounds more like we are discussing a macro
> language.

Which is pretty much what a SQL function is already.  I don't see a need
to invent a separate concept.  To the extent that macros have different
semantics than functions (eg, multiple evaluation of arguments) the
differences are generally not improvements IMHO ...
        regards, tom lane


Re: Passing arguments to views

От
Mark Dilger
Дата:
Josh Berkus wrote:
> Tom,
> 
> 
>>As for the dependency issue, one man's bug is another man's feature.
>>I think the fact that we don't track the internal dependencies of
>>functions is not all bad.  We've certainly seen plenty of complaints
>>about how you can't easily change tables that a view is depending on
>>because the view dependencies block it...
> 
> 
> I'd agree with this.   I write about 150,000 lines of function code a year, 
> and if I had to rebuild all of the cascading functions every time I change 
> a table they way I have to with views, it would probably add 20% to my 
> overall application development time.
> 
> BTW, the other thing that we're still TODOing on SRFs (as far as I know) is 
> finding ways to change the row estimate for an SRF.  It's still a flat 
> 1000 in the code, which can cause a lot of bad query plans.  I proposed a 
> year ago that, as a first step, we allow the function owner to assign a 
> static estimate variable to the function (i.e. "average rows returned = 
> 5').  This doesn't solve the whole problem of SRF estimates but it would 
> be a significant step forwards in being able to use them in queries.
> 

This would only seem to work for trivial functions.  Most functions that I write 
are themselves dependent on underlying tables, and without any idea how many 
rows are in the tables, and without any idea of the statistical distribution of 
those rows, I can't really say anything like "average rows returned = 5".

What I have wanted for some time is a function pairing system.  For each set 
returning function F() I create, I would have the option of creating a 
statistics function S() which returns a single integer which represents the 
guess of how many rows will be returned.  S() would be called by the planner, 
and the return value of S() would be used to decide the plan.  S() would need 
access to the table statistics information.  I imagine that the system would 
want to prevent S() from running queries, and only allow it to call certain 
defined table statistics functions and some internal math functions, thereby 
avoiding any infinite recursion in the planner.  (If S() ran any queries, those 
queries would go yet again to the planner, and on down the infinite recursion 
you might go.)

Of course, some (possibly most) people could chose not to write an S() for their 
F(), and the default of 1000 rows would continue to be used.  As such, this new 
extension to the system would be backwards compatible to functions which don't 
have an S() defined.

Thoughts?


Re: Function Stats WAS: Passing arguments to views

От
Josh Berkus
Дата:
Mark,

> This would only seem to work for trivial functions.  Most functions that
> I write are themselves dependent on underlying tables, and without any
> idea how many rows are in the tables, and without any idea of the
> statistical distribution of those rows, I can't really say anything like
> "average rows returned = 5".
>
> What I have wanted for some time is a function pairing system.  For each
> set returning function F() I create, I would have the option of creating
> a statistics function S() which returns a single integer which
> represents the guess of how many rows will be returned.  S() would be
> called by the planner, and the return value of S() would be used to
> decide the plan.  S() would need access to the table statistics
> information.  I imagine that the system would want to prevent S() from
> running queries, and only allow it to call certain defined table
> statistics functions and some internal math functions, thereby avoiding
> any infinite recursion in the planner.  (If S() ran any queries, those
> queries would go yet again to the planner, and on down the infinite
> recursion you might go.)
>
> Of course, some (possibly most) people could chose not to write an S()
> for their F(), and the default of 1000 rows would continue to be used. 
> As such, this new extension to the system would be backwards compatible
> to functions which don't have an S() defined.

I think this is a fine idea, and I think I endorsed it the first time.  
However, even a static "function returns #" would be better than what we 
have now, and I think the S() method could take quite a bit of engineering 
to work out (for example, what if F() is being called in a JOIN or 
correlated subquery?).  So I'm worried that shooting for the S() idea only 
could result in us not doing *anything* for several more versions.

What I'd like to do is implement the constant method for 8.2, and work on 
doing the S() method later on.  Does that make sense?

-- 
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco


Re: Passing arguments to views

От
Mark Dilger
Дата:
Tom Lane wrote:
> Mark Dilger <pgsql@markdilger.com> writes:
> 
>>If we are talking about inserting the function definition into the
>>query as a subquery and then letting the parser treat it as a
>>subquery, then I see no reason to use either the existing function or
>>view subsystems.  It sounds more like we are discussing a macro
>>language.
> 
> 
> Which is pretty much what a SQL function is already.  I don't see a need
> to invent a separate concept.  To the extent that macros have different
> semantics than functions (eg, multiple evaluation of arguments) the
> differences are generally not improvements IMHO ...
> 
>             regards, tom lane

I have numerous times run EXPLAIN ANALYZE on my queries with SQL functions 
embedded and gotten different (far worse) results than if I manually inline the 
function following the macro expansion idea above.  That has led me to wish that 
postgres would inline it for me.  That doesn't prove that the macro idea is 
needed; it might be that the SQL function systems needs more work.  (In fact, I 
haven't done this since 8.0.3, so I'm not sure that 8.1 even does a bad job 
anymore.)


Re: Function Stats WAS: Passing arguments to views

От
Mark Dilger
Дата:
Josh Berkus wrote:
> Mark,
> 
> 
>>This would only seem to work for trivial functions.  Most functions that
>>I write are themselves dependent on underlying tables, and without any
>>idea how many rows are in the tables, and without any idea of the
>>statistical distribution of those rows, I can't really say anything like
>>"average rows returned = 5".
>>
>>What I have wanted for some time is a function pairing system.  For each
>>set returning function F() I create, I would have the option of creating
>>a statistics function S() which returns a single integer which
>>represents the guess of how many rows will be returned.  S() would be
>>called by the planner, and the return value of S() would be used to
>>decide the plan.  S() would need access to the table statistics
>>information.  I imagine that the system would want to prevent S() from
>>running queries, and only allow it to call certain defined table
>>statistics functions and some internal math functions, thereby avoiding
>>any infinite recursion in the planner.  (If S() ran any queries, those
>>queries would go yet again to the planner, and on down the infinite
>>recursion you might go.)
>>
>>Of course, some (possibly most) people could chose not to write an S()
>>for their F(), and the default of 1000 rows would continue to be used. 
>>As such, this new extension to the system would be backwards compatible
>>to functions which don't have an S() defined.
> 
> 
> I think this is a fine idea, and I think I endorsed it the first time.  
> However, even a static "function returns #" would be better than what we 
> have now, and I think the S() method could take quite a bit of engineering 
> to work out (for example, what if F() is being called in a JOIN or 
> correlated subquery?).  So I'm worried that shooting for the S() idea only 
> could result in us not doing *anything* for several more versions.
> 
> What I'd like to do is implement the constant method for 8.2, and work on 
> doing the S() method later on.  Does that make sense?
> 

I have no objections to implementing the constant method sooner than the full 
version.  It might be useful to implement it as a subsyntax to the full version 
though in preparation for later expansion.  For instance, if there is a plstats 
language developed, you could limit the parser for it to just functions like:

CREATE FUNCTION S() RETURNS INTEGER PROVIDES FOR F() AS $$RETURN 5;
$$ LANGUAGE PLSTATS;

Then the language could be expanded later to allow calls to the table statistics 
functions.


Re: Function Stats WAS: Passing arguments to views

От
Tom Lane
Дата:
Josh Berkus <josh@agliodbs.com> writes:
> What I'd like to do is implement the constant method for 8.2, and work on 
> doing the S() method later on.  Does that make sense?

I'm not thrilled with putting in a stopgap that we will have to support
forever.  The constant method is *clearly* inadequate for many (probably
most IMHO) practical cases.  Where do you see it being of use?

W.R.T. the estimator function method, the concern about recursion seems
misplaced.  Such an estimator presumably wouldn't invoke the associated
function itself.  I'm more concerned about coming up with a usable API
for such things.  Our existing mechanisms for estimating operator
selectivities require access to internal planner data structures, which
makes it pretty much impossible to write them in anything but C.  We'd
need something cleaner to have a feature I'd want to export for general
use.
        regards, tom lane


Re: Function Stats WAS: Passing arguments to views

От
Josh Berkus
Дата:
Tom,

> > What I'd like to do is implement the constant method for 8.2, and work
> > on doing the S() method later on.  Does that make sense?
>
> I'm not thrilled with putting in a stopgap that we will have to support
> forever.  The constant method is *clearly* inadequate for many (probably
> most IMHO) practical cases.  Where do you see it being of use?

Well, mostly for the real-world use cases where I've run into SRF estimate 
issues, which have mostly been SRFs which return one row.

> W.R.T. the estimator function method, the concern about recursion seems
> misplaced.  Such an estimator presumably wouldn't invoke the associated
> function itself.  

No, but if you're calling the S() estimator in the context of performing a 
join, what do you supply for parameters?

> I'm more concerned about coming up with a usable API 
> for such things.  Our existing mechanisms for estimating operator
> selectivities require access to internal planner data structures, which
> makes it pretty much impossible to write them in anything but C.  We'd
> need something cleaner to have a feature I'd want to export for general
> use.

Yes -- we need to support the simplest case, which is functions that return 
either (a) a fixed number of rows, or (b) a fixed multiple of the number 
of rows passed to the function.  These simple cases should be easy to 
build.  For more complex estimation, I personally don't see a problem with 
forcing people to hack it in C.

-- 
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco


Re: Function Stats WAS: Passing arguments to views

От
Tom Lane
Дата:
Josh Berkus <josh@agliodbs.com> writes:
>> I'm not thrilled with putting in a stopgap that we will have to support
>> forever.  The constant method is *clearly* inadequate for many (probably
>> most IMHO) practical cases.  Where do you see it being of use?

> Well, mostly for the real-world use cases where I've run into SRF estimate 
> issues, which have mostly been SRFs which return one row.

Well, if they're certain to return one row, you can just declare them as
not SETOF, no?  Since 8.1 we do get that case right:

regression=# explain select * from cos(0);                      QUERY PLAN
--------------------------------------------------------Function Scan on cos  (cost=0.00..0.01 rows=1 width=8)
(1 row)


> No, but if you're calling the S() estimator in the context of performing a 
> join, what do you supply for parameters?

Exactly my point about the API problem.  I'm not sure that joins matter,
but the function parameters sure do, and those might not be simple constants.
        regards, tom lane


Re: Function Stats WAS: Passing arguments to views

От
Mark Dilger
Дата:
Josh Berkus wrote:
> Tom,
> 
> 
>>>What I'd like to do is implement the constant method for 8.2, and work
>>>on doing the S() method later on.  Does that make sense?
>>
>>I'm not thrilled with putting in a stopgap that we will have to support
>>forever.  The constant method is *clearly* inadequate for many (probably
>>most IMHO) practical cases.  Where do you see it being of use?
> 
> 
> Well, mostly for the real-world use cases where I've run into SRF estimate 
> issues, which have mostly been SRFs which return one row.
> 
> 
>>W.R.T. the estimator function method, the concern about recursion seems
>>misplaced.  Such an estimator presumably wouldn't invoke the associated
>>function itself.  
> 
> 
> No, but if you're calling the S() estimator in the context of performing a 
> join, what do you supply for parameters?

I've been thinking about this more, and now I don't see why this is an issue. 
When the planner estimates how many rows will be returned from a subquery that 
is being used within a join, it can't know which "parameters" to use either. 
(Parameters being whatever conditions the subquery will pivot upon which are the 
result of some other part of the execution of the full query.)  So it seems to 
me that function S() is at no more of a disadvantage than the planner.

If I defined a function S(a integer, b integer) which provides an estimate for 
the function F(a integer, b integer), then S(null, null) could be called when 
the planner can't know what a and b are.  S could then still make use of the 
table statistics to provide some sort of estimate.  Of course, this would mean 
that functions S() cannot be defined strict.

>>I'm more concerned about coming up with a usable API 
>>for such things.  Our existing mechanisms for estimating operator
>>selectivities require access to internal planner data structures, which
>>makes it pretty much impossible to write them in anything but C.  We'd
>>need something cleaner to have a feature I'd want to export for general
>>use.
> 
> 
> Yes -- we need to support the simplest case, which is functions that return 
> either (a) a fixed number of rows, or (b) a fixed multiple of the number 
> of rows passed to the function.  These simple cases should be easy to 
> build.  For more complex estimation, I personally don't see a problem with 
> forcing people to hack it in C.

Could we provide table statistics access functions in whatever higher-level 
language S() is written in, or is there something fundamentally squirrelly about 
the statistics that would make this impossible?

Also, since we haven't nailed down a language for S(), if we allowed any of sql, 
plpgsql, plperl, plpython, etc, then we would need access methods for each, 
which would place a burden on all PLs, right?  That argument isn't strong enough 
to make me lean either way; it's just an observation.




Re: Function Stats WAS: Passing arguments to views

От
Mark Dilger
Дата:
Mark Dilger wrote:
> I've been thinking about this more, and now I don't see why this is an 
> issue. When the planner estimates how many rows will be returned from a 
> subquery that is being used within a join, it can't know which 
> "parameters" to use either. (Parameters being whatever conditions the 
> subquery will pivot upon which are the result of some other part of the 
> execution of the full query.)  So it seems to me that function S() is at 
> no more of a disadvantage than the planner.
> 
> If I defined a function S(a integer, b integer) which provides an 
> estimate for the function F(a integer, b integer), then S(null, null) 
> could be called when the planner can't know what a and b are.  S could 
> then still make use of the table statistics to provide some sort of 
> estimate.  Of course, this would mean that functions S() cannot be 
> defined strict.

Ok, null probably isn't a good value.  F(null, null) could be the call being 
made, so S(null, null) would mean "F is being passed nulls" rather than "We 
don't know what F's arguments are yet".  The returned estimate might be quite 
different for these two cases.  You could have:
  F(a integer, b integer)  S(a integer, a_is_known boolean, b integer, b_is_known boolean)

But I'm not fond of the verbosity of doubling the argument list.  Since some 
arguments might be known while others still are not, I don't think a single 
boolean argument all_arguments_are_known is sufficient.