Обсуждение: Re: Function result cacheing - any comments?
OK - I assume from everybody else's silence that they either (a) agree with the idea, or (b) think Tom hit the idea on the head, so they feel they don't need to respond. So what I would like to do is implement a simple version of this to attempt to justify my claims of performance gains. The sort of trivial places where I think gains *may* be had are: create table departments(id integer, name text, manager_id integer); create table people(id integer, department_id, name text); create function get_manager_name(integer) returns text as 'select name from departments d, people p where d.id = $1and p.id = d.manager_id'; select name,get_manager_name(department_id) from people; This is obviously a case where a LOJ or column-select would do the trick, *but* it does represent a class of problems that people frequently write procedures to perform a single (sometimes complex) action. Using a function also encapsulates some knowledge of the data structures, resulting in more maintainable code. eg. even the above simple example becomes a lot less readable and maintainable: select name, (select m.name from departments d, people m where d.id = p.department_id and m.id = d.manager_id)as manager_name from people p; if a function is not used. My theory is that if such a piece of code gets a performance gain, then the code is probably worth including, assuming that the function manager does not need to be butchered to achieve the desired goal. Does that sound reasonable? So the obvious question is - in the opinion of people who know the code, can a function-result-cache be implemented with a lifetime of a single statement, without butchering the function manager? ---------------------------------------------------------------- Philip Warner | __---_____ Albatross Consulting Pty. Ltd. |----/ - \ (A.B.N. 75 008 659 498) | /(@) ______---_ Tel: (+61) 0500 83 82 81 | _________ \ Fax: (+61) 0500 83 82 82 | ___________ | Http://www.rhyme.com.au | / \| | --________-- PGP key available upon request, | / and from pgp5.ai.mit.edu:11371 |/
Philip Warner wrote: > So the obvious question is - in the opinion of people who know the code, > can a function-result-cache be implemented with a lifetime of a single > statement, without butchering the function manager? > I don't know if I fully understand what you're proposing, but if I understand it correctly, I think the table function feature in current sources does just what you want already. If you can write your function as a table function, the results are put in a tuplestore for the duration of the statement, and rescanned when needed. Your example ends up looking like this: create table departments(id integer, name text, manager_id integer); insert into departments values(1, 'manufacturing', 1); insert into departments values(2, 'accounting', 2); create table people(id integer, department_id, name text); insert into people values(1, 1, 'mfg boss'); insert into people values(2, 2, 'acct boss'); insert into people values(3, 1, 'mfg emp'); insert into people values(4, 2, 'acct emp'); create type manager_names as (dept_id int, name text); create function get_manager_names() returns setof manager_names as 'select d.id, p.name from departments d, people p where p.id = d.manager_id' language sql; select p.name, m.name as boss from people p, get_manager_names() m where p.department_id = m.dept_id; name | boss -----------+----------- mfg boss | mfg boss mfg emp | mfg boss acct boss | acct boss acct emp | acct boss (4 rows) Is this anything close what you had in mind? Joe
At 22:29 18/08/2002 -0700, Joe Conway wrote: >create function get_manager_names() returns setof manager_names as > 'select d.id, p.name from departments d, people p > where p.id = d.manager_id' language sql; > >select p.name, m.name as boss from people p, get_manager_names() m where >p.department_id = m.dept_id; >... >Is this anything close what you had in mind? Nice thought, and it probably works for the example I gave, but in the case where the secondary table is large potentially large, I think it falls down. To give an example, in my case I have a function 'has_access_to_object' which does access checking up a tree of ownership & inheritance. While the first level access check is always unique, subsequent ones will be executed more than once in a typical tree. As a result, what I would like to implement is a new attribute for functions (eg. 'invariant') which tells the function manager that in the context of one command, if the function is called with the same args, the result will be the same. The idea is for the function manager(?) to maintain a cache of, say, up to 100K of cached function results for functions marked 'invariant'. A hash will be used to check if a function result is in the cache, and the least recently used results will be purged when necessary. While my example is quite specific, the benefits would apply to any simple lookup function, as well as any external function that is expensive to execute. ---------------------------------------------------------------- Philip Warner | __---_____ Albatross Consulting Pty. Ltd. |----/ - \ (A.B.N. 75 008 659 498) | /(@) ______---_ Tel: (+61) 0500 83 82 81 | _________ \ Fax: (+61) 0500 83 82 82 | ___________ | Http://www.rhyme.com.au | / \| | --________-- PGP key available upon request, | / and from pgp5.ai.mit.edu:11371 |/
On Sun, 18 Aug 2002, Joe Conway wrote: > Philip Warner wrote: > > So the obvious question is - in the opinion of people who know the code, > > can a function-result-cache be implemented with a lifetime of a single > > statement, without butchering the function manager? > > > > I don't know if I fully understand what you're proposing, but if I Hi Joe, What Philip seems to be asking for is a mechanism where by if a function is marked as being mathematically deterministic (given a particular set of parameters the same result is always returned -- eg: cos(), sin(), etc) then the result is cached and next time the function is called with the same argument(s) the result is retrieved from the cache instead of the function being run again. If I have got this correct, there is merit in this request. It is a feature of other databases (such as oracle) and SQL99 provides for a differentiation between deterministic and 'possibly non-deterministic' routines. It does not discuss, to my knowledge, how this information should be used. I do not know if it is worth while implementing a deterministic function result cache in Postgres -- I haven't looked at the complexity. Needless to say, it would not be trivial :-). Gavin
> What Philip seems to be asking for is a mechanism where by if a function > is marked as being mathematically deterministic (given a particular set of > parameters the same result is always returned -- eg: cos(), sin(), > etc) then the result is cached and next time the function is called with > the same argument(s) the result is retrieved from the cache instead of the > function being run again. I was under the impression that the sin, cos, tan and like functions are marked non-volatile in the system catalogs and so are evaluated once per transaction only. Chris
At 17:03 19/08/2002 +1000, Gavin Sherry wrote: >What Philip seems to be asking for is a mechanism where by if a function >is marked as being mathematically deterministic (given a particular set of >parameters the same result is always returned -- eg: cos(), sin(), >etc) then the result is cached and next time the function is called with >the same argument(s) the result is retrieved from the cache instead of the >function being run again. Exactly. But obviously not limited to simple mathematical functions. ---------------------------------------------------------------- Philip Warner | __---_____ Albatross Consulting Pty. Ltd. |----/ - \ (A.B.N. 75 008 659 498) | /(@) ______---_ Tel: (+61) 0500 83 82 81 | _________ \ Fax: (+61) 0500 83 82 82 | ___________ | Http://www.rhyme.com.au | / \| | --________-- PGP key available upon request, | / and from pgp5.ai.mit.edu:11371 |/
> >What Philip seems to be asking for is a mechanism where by if a function > >is marked as being mathematically deterministic (given a > particular set of > >parameters the same result is always returned -- eg: cos(), sin(), > >etc) then the result is cached and next time the function is called with > >the same argument(s) the result is retrieved from the cache > instead of the > >function being run again. > > Exactly. But obviously not limited to simple mathematical functions. >From 7.3 docs: http://candle.pha.pa.us/main/writings/pgsql/sgml/sql-createfunction.html CREATE [ OR REPLACE ] FUNCTION name ( [ argtype [, ...] ] ) RETURNS rettype { LANGUAGE langname | IMMUTABLE| STABLE | VOLATILE | CALLED ON NULL INPUT | RETURNS NULL ON NULL INPUT | STRICT | [EXTERNAL] SECURITYINVOKER | [EXTERNAL] SECURITY DEFINER | AS 'definition' | AS 'obj_file', 'link_symbol' } ... [ WITH ( attribute [, ...] ) ] And: IMMUTABLE STABLE VOLATILE These attributes inform the system whether it is safe to replace multiple evaluations of the function with a single evaluation, for run-time optimization. At most one choice should be specified. If none of these appear, VOLATILE is the default assumption. IMMUTABLE indicates that the function always returns the same result when given the same argument values; that is, it does not do database lookups or otherwise use information not directly present in its parameter list. If this option is given, any call of the function with all-constant arguments can be immediately replaced with the function value. STABLE indicates that within a single table scan the function will consistently return the same result for the same argument values, but that its result could change across SQL statements. This is the appropriate selection for functions whose results depend on database lookups, parameter variables (such as the current time zone), etc. Also note that the CURRENT_TIMESTAMP family of functions qualify as stable, since their values do not change within a transaction. VOLATILE indicates that the function value can change even within a single table scan, so no optimizations can be made. Relatively few database functions are volatile in this sense; some examples are random(), currval(), timeofday(). Note that any function that has side-effects must be classified volatile, even if its result is quite predictable, to prevent calls from being optimized away; an example is setval(). So it seems Philip already has what he wants? Chris
At 15:41 19/08/2002 +0800, Christopher Kings-Lynne wrote: >So it seems Philip already has what he wants? I really hope so, but my understanding is that this information is used during optimization, not execution; I want it to be used in execution. ---------------------------------------------------------------- Philip Warner | __---_____ Albatross Consulting Pty. Ltd. |----/ - \ (A.B.N. 75 008 659 498) | /(@) ______---_ Tel: (+61) 0500 83 82 81 | _________ \ Fax: (+61) 0500 83 82 82 | ___________ | Http://www.rhyme.com.au | / \| | --________-- PGP key available upon request, | / and from pgp5.ai.mit.edu:11371 |/
Philip Warner <pjw@rhyme.com.au> writes: > My theory is that if such a piece of code gets a performance gain, then the > code is probably worth including, assuming that the function manager does > not need to be butchered to achieve the desired goal. Does that sound > reasonable? Some real results would certainly bolster your case. > So the obvious question is - in the opinion of people who know the code, > can a function-result-cache be implemented with a lifetime of a single > statement, without butchering the function manager? I'd suggest trying to make it a function call handler. Look at the way Peter did "SECURITY DEFINER" functions for inspiration. regards, tom lane
Philip Warner <pjw@rhyme.com.au> writes: > At 15:41 19/08/2002 +0800, Christopher Kings-Lynne wrote: >> So it seems Philip already has what he wants? > I really hope so, but my understanding is that this information is used > during optimization, not execution; I want it to be used in execution. Philip is correct that there is no cacheing of the sort he wants. The existing volatility classifications are somewhat relevant in the sense that an IMMUTABLE or STABLE function would be legal to cache the way he wants ... but the system doesn't do so, it only uses these classifications to drive before-the-query constant folding and decisions about whether indexscans are safe. I would resist any attempt to install cacheing by default for immutable or stable functions. Philip was proposing that function authors would have to explicitly ask for cacheing (ie, add another function property) and that seems a necessary component of an acceptable solution IMHO. regards, tom lane