Обсуждение: 2 questions about volatile attribute of pg_proc.

Поиск
Список
Период
Сортировка

2 questions about volatile attribute of pg_proc.

От
Andy Fan
Дата:
Hi:

We know volatile is very harmful for optimizers and it is the default
value (and safest value) if the user doesn't provide that.  Asking user
to set the value is not a good experience,  is it possible to auto-generate
the value for it rather than use the volatile directly for user defined function. I
think it should be possible, we just need to scan the PlpgSQL_stmt to see if there
is a volatile function? 

The second question "It is v for “volatile” functions, whose results might change at any time.
(Use v also for functions with side-effects, so that calls to them cannot get optimized away.)"
I think they are different semantics.  One of the results is volatile functions can't be removed 
by remove_unused_subquery_output even if it doesn't have side effects. for example:
select b from (select an_expensive_random(), b from t);   Is it by design on purpose? 


--
Best Regards

Re: 2 questions about volatile attribute of pg_proc.

От
Pavel Stehule
Дата:


ne 18. 4. 2021 v 17:06 odesílatel Andy Fan <zhihui.fan1213@gmail.com> napsal:
Hi:

We know volatile is very harmful for optimizers and it is the default
value (and safest value) if the user doesn't provide that.  Asking user
to set the value is not a good experience,  is it possible to auto-generate
the value for it rather than use the volatile directly for user defined function. I
think it should be possible, we just need to scan the PlpgSQL_stmt to see if there
is a volatile function? 

plpgsql_check does this check - the performance check check if function can be marked as stable


I don't think so this can be done automatically - plpgsql does not check objects inside in registration time. You can use objects and functions that don't exist in CREATE FUNCTION time. And you need to know this info before optimization time. So if we implement this check automatically, then planning time can be increased a lot.

Regards

Pavel


The second question "It is v for “volatile” functions, whose results might change at any time.
(Use v also for functions with side-effects, so that calls to them cannot get optimized away.)"
I think they are different semantics.  One of the results is volatile functions can't be removed 
by remove_unused_subquery_output even if it doesn't have side effects. for example:
select b from (select an_expensive_random(), b from t);   Is it by design on purpose? 


--
Best Regards

Re: 2 questions about volatile attribute of pg_proc.

От
Tom Lane
Дата:
Andy Fan <zhihui.fan1213@gmail.com> writes:
> We know volatile is very harmful for optimizers and it is the default
> value (and safest value) if the user doesn't provide that.  Asking user
> to set the value is not a good experience,  is it possible to auto-generate
> the value for it rather than use the volatile directly for user defined
> function. I
> think it should be possible, we just need to scan the PlpgSQL_stmt to see
> if there
> is a volatile function?

Are you familiar with the halting problem?  I don't see any meaningful
difference here.

            regards, tom lane



Re: 2 questions about volatile attribute of pg_proc.

От
Isaac Morland
Дата:
On Sun, 18 Apr 2021 at 11:36, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Andy Fan <zhihui.fan1213@gmail.com> writes:
> We know volatile is very harmful for optimizers and it is the default
> value (and safest value) if the user doesn't provide that.  Asking user
> to set the value is not a good experience,  is it possible to auto-generate
> the value for it rather than use the volatile directly for user defined
> function. I
> think it should be possible, we just need to scan the PlpgSQL_stmt to see
> if there
> is a volatile function?

Are you familiar with the halting problem?  I don't see any meaningful
difference here.

I think what is being suggested is akin to type checking, not solving the halting problem. Parse the function text, identify all functions it might call (without solving the equivalent of the halting problem to see if it actually does or could), and apply the most volatile value of called functions to the calling function.

That being said, there are significant difficulties, including but almost certainly not limited to:

- what happens if one modifies a called function after creating the calling function?
- EXECUTE
- a PL/PGSQL function's meaning depends on the search path in effect when it is called, unless it has a SET search_path clause or it fully qualifies all object references, so it isn't actually possible in general to determine what a function calls at definition time

If the Haskell compiler is possible then what is being requested here is conceptually possible even if there are major issues with actually doing it in the Postgres context. The halting problem is not the problem here.

Re: 2 questions about volatile attribute of pg_proc.

От
Tom Lane
Дата:
Isaac Morland <isaac.morland@gmail.com> writes:
> On Sun, 18 Apr 2021 at 11:36, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Are you familiar with the halting problem?  I don't see any meaningful
>> difference here.

> I think what is being suggested is akin to type checking, not solving the
> halting problem.

Yeah, on further thought we'd be satisfied with a conservative
approximation, so that removes the theoretical-impossibility objection.
Still, there are a lot of remaining problems, as you note.

            regards, tom lane



Re: 2 questions about volatile attribute of pg_proc.

От
"David G. Johnston"
Дата:
On Sun, Apr 18, 2021 at 9:08 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Isaac Morland <isaac.morland@gmail.com> writes:
> On Sun, 18 Apr 2021 at 11:36, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Are you familiar with the halting problem?  I don't see any meaningful
>> difference here.

> I think what is being suggested is akin to type checking, not solving the
> halting problem.

Yeah, on further thought we'd be satisfied with a conservative
approximation, so that removes the theoretical-impossibility objection.
Still, there are a lot of remaining problems, as you note.


Yeah, the type checking approach seems blocked by the "black box" nature of functions.  A possibly more promising approach is for the top-level call to declare its expectations (which are set by the user) and during execution if that expectation is violated directly, or is reported as violated deeper in the call stack, the execution of the function fails with some kind of invalid state error.  However, as with other suggestions of this nature, the fundamental blocker here is that to be particularly useful this kind of validation needs to happen by default (as opposed to opt-in) which risks breaking existing code.  And so I foresee this request falling into the same category as those others - an interesting idea that could probably be made to work but by itself isn't worthwhile enough to go and introduce a fundamental change to the amount of "parental oversight" PostgreSQL tries to provide.

David J.

Re: 2 questions about volatile attribute of pg_proc.

От
Andy Fan
Дата:


> - a PL/PGSQL function's meaning depends on the search path in effect when it is called, unless it has a SET search_path clause or it fully qualifies all object references, so it isn't actually possible in general to determine what a function calls at definition time


I'd think this one as a blocker issue at the beginning since I have to insist on
any new features should not cause semantic changes for existing ones. Later I
found the new definition. As for this feature request, I think we can define the
features like this:

1. We define a new attribute named VOLATILE_AUTO;  The semantic is PG will auto
   detect the volatile info based on current search_path / existing
   function. If any embedded function can't be found, we can raise an error if
   VOLATILE_AUTO is used. If people change the volatile attribute later, we can:
   a). do nothing. This can be the documented feature. or. b). Maintain the
   dependency tree between functions and if anyone is changed, other functions
   should be recalculated as well.

2. VOLATILE_AUTO should never be the default value. It only works when people
   requires it.

Then what we can get from this?  Thinking a user is migrating lots of UDF from
other databases.  Asking them to check/set each function's attribute might
be bad. However if we tell them about how VOLATILE_AUTO works, and they
accept it (I guess most people would accept), then the migration would be
pretty productive.

I'm listening to any obvious reason to reject it.

--
Best Regards
Andy Fan (https://www.aliyun.com/)

Re: 2 questions about volatile attribute of pg_proc.

От
Andy Fan
Дата:


I'm listening to any obvious reason to reject it.

Any obvious reason to reject it because of it would be a lose battle for sure,
so I would not waste time on it.  Or vote up if you think it is possible and
useful. 
 
--
Best Regards

Re: 2 questions about volatile attribute of pg_proc.

От
Pavel Stehule
Дата:


út 20. 4. 2021 v 4:47 odesílatel Andy Fan <zhihui.fan1213@gmail.com> napsal:


> - a PL/PGSQL function's meaning depends on the search path in effect when it is called, unless it has a SET search_path clause or it fully qualifies all object references, so it isn't actually possible in general to determine what a function calls at definition time


I'd think this one as a blocker issue at the beginning since I have to insist on
any new features should not cause semantic changes for existing ones. Later I
found the new definition. As for this feature request, I think we can define the
features like this:

1. We define a new attribute named VOLATILE_AUTO;  The semantic is PG will auto
   detect the volatile info based on current search_path / existing
   function. If any embedded function can't be found, we can raise an error if
   VOLATILE_AUTO is used. If people change the volatile attribute later, we can:
   a). do nothing. This can be the documented feature. or. b). Maintain the
   dependency tree between functions and if anyone is changed, other functions
   should be recalculated as well.

2. VOLATILE_AUTO should never be the default value. It only works when people
   requires it.

Then what we can get from this?  Thinking a user is migrating lots of UDF from
other databases.  Asking them to check/set each function's attribute might
be bad. However if we tell them about how VOLATILE_AUTO works, and they
accept it (I guess most people would accept), then the migration would be
pretty productive.

I'm listening to any obvious reason to reject it.

a) This analyses can be very slow - PLpgSQL does lazy planning - query plans are planned only when are required - and this feature requires complete planning current function and all nested VOLATILE_AUTO functions - so start of function can be significantly slower

b) When you migrate from Oracle,then you can use the STABLE flag, and it will be mostly correct.

Regards

Pavel



--
Best Regards
Andy Fan (https://www.aliyun.com/)

Re: 2 questions about volatile attribute of pg_proc.

От
Andy Fan
Дата:


On Tue, Apr 20, 2021 at 10:57 AM Pavel Stehule <pavel.stehule@gmail.com> wrote:


út 20. 4. 2021 v 4:47 odesílatel Andy Fan <zhihui.fan1213@gmail.com> napsal:


> - a PL/PGSQL function's meaning depends on the search path in effect when it is called, unless it has a SET search_path clause or it fully qualifies all object references, so it isn't actually possible in general to determine what a function calls at definition time


I'd think this one as a blocker issue at the beginning since I have to insist on
any new features should not cause semantic changes for existing ones. Later I
found the new definition. As for this feature request, I think we can define the
features like this:

1. We define a new attribute named VOLATILE_AUTO;  The semantic is PG will auto
   detect the volatile info based on current search_path / existing
   function. If any embedded function can't be found, we can raise an error if
   VOLATILE_AUTO is used. If people change the volatile attribute later, we can:
   a). do nothing. This can be the documented feature. or. b). Maintain the
   dependency tree between functions and if anyone is changed, other functions
   should be recalculated as well.

2. VOLATILE_AUTO should never be the default value. It only works when people
   requires it.

Then what we can get from this?  Thinking a user is migrating lots of UDF from
other databases.  Asking them to check/set each function's attribute might
be bad. However if we tell them about how VOLATILE_AUTO works, and they
accept it (I guess most people would accept), then the migration would be
pretty productive.

I'm listening to any obvious reason to reject it.

a) This analyses can be very slow - PLpgSQL does lazy planning - query plans are planned only when are required - and this feature requires complete planning current function and all nested VOLATILE_AUTO functions - so start of function can be significantly slower

Actually I am thinking  we can do this when we compile the function, which means that would 
happen on the "CREATE FUNCTION " stage.   this would need some hacks for sure.  Does
this remove your concern? 
 
b) When you migrate from Oracle,then you can use the STABLE flag, and it will be mostly correct.

This was suggested in our team as well, but I don't think it is very strict.  For example:  
SELECT materialize_bills_for(userId) from users;  Any more proof to say "STABLE" flag
is acceptable? 

 
--
Best Regards
Andy Fan (https://www.aliyun.com/)


--
Best Regards

Re: 2 questions about volatile attribute of pg_proc.

От
Pavel Stehule
Дата:


út 20. 4. 2021 v 5:16 odesílatel Andy Fan <zhihui.fan1213@gmail.com> napsal:


On Tue, Apr 20, 2021 at 10:57 AM Pavel Stehule <pavel.stehule@gmail.com> wrote:


út 20. 4. 2021 v 4:47 odesílatel Andy Fan <zhihui.fan1213@gmail.com> napsal:


> - a PL/PGSQL function's meaning depends on the search path in effect when it is called, unless it has a SET search_path clause or it fully qualifies all object references, so it isn't actually possible in general to determine what a function calls at definition time


I'd think this one as a blocker issue at the beginning since I have to insist on
any new features should not cause semantic changes for existing ones. Later I
found the new definition. As for this feature request, I think we can define the
features like this:

1. We define a new attribute named VOLATILE_AUTO;  The semantic is PG will auto
   detect the volatile info based on current search_path / existing
   function. If any embedded function can't be found, we can raise an error if
   VOLATILE_AUTO is used. If people change the volatile attribute later, we can:
   a). do nothing. This can be the documented feature. or. b). Maintain the
   dependency tree between functions and if anyone is changed, other functions
   should be recalculated as well.

2. VOLATILE_AUTO should never be the default value. It only works when people
   requires it.

Then what we can get from this?  Thinking a user is migrating lots of UDF from
other databases.  Asking them to check/set each function's attribute might
be bad. However if we tell them about how VOLATILE_AUTO works, and they
accept it (I guess most people would accept), then the migration would be
pretty productive.

I'm listening to any obvious reason to reject it.

a) This analyses can be very slow - PLpgSQL does lazy planning - query plans are planned only when are required - and this feature requires complete planning current function and all nested VOLATILE_AUTO functions - so start of function can be significantly slower

Actually I am thinking  we can do this when we compile the function, which means that would 
happen on the "CREATE FUNCTION " stage.   this would need some hacks for sure.  Does
this remove your concern? 

you cannot do it - with this you introduce strong dependency on nested objects - and that means a lot of problems - necessity of rechecks when any nested object is changed. There will be new problems with dependency, when you create functions, and until we have global temp tables, then it is blocker for usage of temporary tables. The current behavior is not perfect, but in this direction is very practical, and I would not change it. Can be nice if some functionality of plpgsql_check can be in core, because I think so it is necessary for development, but the structure and integration of SQL in PLpgSQL is very good (and very practical).

 
b) When you migrate from Oracle,then you can use the STABLE flag, and it will be mostly correct.

This was suggested in our team as well, but I don't think it is very strict.  For example:  
SELECT materialize_bills_for(userId) from users;  Any more proof to say "STABLE" flag
is acceptable? 

Oracle doesn't allow write operations in functions. Or didn't allow it - I am not sure what is possible now. So when you migrate data from Oracle, and if the function is not marked as DETERMINISTIC, you can safely mark it as STABLE. Ora2pg does it. Elsewhere - it works 99% well. In special cases, there is some black magic - with fresh snapshots, and with using autonomous transactions, and these cases should be solved manually. Sometimes is good enough just removing autonomous transactions, sometimes the complete rewrite is necessary - or redesign functionality.







 
--
Best Regards
Andy Fan (https://www.aliyun.com/)


--
Best Regards

Re: 2 questions about volatile attribute of pg_proc.

От
Andy Fan
Дата:


On Tue, Apr 20, 2021 at 11:32 AM Pavel Stehule <pavel.stehule@gmail.com> wrote:


út 20. 4. 2021 v 5:16 odesílatel Andy Fan <zhihui.fan1213@gmail.com> napsal:


On Tue, Apr 20, 2021 at 10:57 AM Pavel Stehule <pavel.stehule@gmail.com> wrote:


út 20. 4. 2021 v 4:47 odesílatel Andy Fan <zhihui.fan1213@gmail.com> napsal:


> - a PL/PGSQL function's meaning depends on the search path in effect when it is called, unless it has a SET search_path clause or it fully qualifies all object references, so it isn't actually possible in general to determine what a function calls at definition time


I'd think this one as a blocker issue at the beginning since I have to insist on
any new features should not cause semantic changes for existing ones. Later I
found the new definition. As for this feature request, I think we can define the
features like this:

1. We define a new attribute named VOLATILE_AUTO;  The semantic is PG will auto
   detect the volatile info based on current search_path / existing
   function. If any embedded function can't be found, we can raise an error if
   VOLATILE_AUTO is used. If people change the volatile attribute later, we can:
   a). do nothing. This can be the documented feature. or. b). Maintain the
   dependency tree between functions and if anyone is changed, other functions
   should be recalculated as well.

2. VOLATILE_AUTO should never be the default value. It only works when people
   requires it.

Then what we can get from this?  Thinking a user is migrating lots of UDF from
other databases.  Asking them to check/set each function's attribute might
be bad. However if we tell them about how VOLATILE_AUTO works, and they
accept it (I guess most people would accept), then the migration would be
pretty productive.

I'm listening to any obvious reason to reject it.

a) This analyses can be very slow - PLpgSQL does lazy planning - query plans are planned only when are required - and this feature requires complete planning current function and all nested VOLATILE_AUTO functions - so start of function can be significantly slower

Actually I am thinking  we can do this when we compile the function, which means that would 
happen on the "CREATE FUNCTION " stage.   this would need some hacks for sure.  Does
this remove your concern? 

you cannot do it - with this you introduce strong dependency on nested objects

What does the plpgsql_check do in this area?  I checked the README[1], but can't find
anything about it.  
 
until we have global temp tables, then it is blocker for usage of temporary tables.

Can you explain more about this? 
 
 Can be nice if some functionality of plpgsql_check can be in core, because I think so it is necessary for development, but the structure and integration of SQL in PLpgSQL is very good (and very practical).

 
I'm interested in plpgsql_check.  However I am still confused why we can do it in this way, but
can't do it in the  VOLATILE_AUTO way. 
 
 
b) When you migrate from Oracle,then you can use the STABLE flag, and it will be mostly correct.

This was suggested in our team as well, but I don't think it is very strict.  For example:  
SELECT materialize_bills_for(userId) from users;  Any more proof to say "STABLE" flag
is acceptable? 

Oracle doesn't allow write operations in functions. Or didn't allow it - I am not sure what is possible now. So when you migrate data from Oracle, and if the function is not marked as DETERMINISTIC, you can safely mark it as STABLE.

You are correct.  Good to know the above points. 
 
 Elsewhere - it works 99% well. In special cases, there is some black magic - with fresh snapshots, and with using autonomous transactions, and these cases should be solved manually. Sometimes is good enough just removing autonomous transactions, sometimes the complete rewrite is necessary - or redesign functionality.


is the 1% == "special cases" ?  Do you mind sharing more information about these cases,
either document or code? 


--
Best Regards

Re: 2 questions about volatile attribute of pg_proc.

От
Pavel Stehule
Дата:


you cannot do it - with this you introduce strong dependency on nested objects

What does the plpgsql_check do in this area?  I checked the README[1], but can't find
anything about it.  

When you run plpgsql_check with performance warning (disabled by default), then it does check if all called functions are on the same or lower level than checked functions have.  So when all called operations are stable (read only), then the function can be marked as stable - and if the function is marked as volatile, then plpgsql_check raises an warning.

 
until we have global temp tables, then it is blocker for usage of temporary tables.

All plpgsql expressions are SQL expressions - and anybody can call a function against a temporary table. But local temporary tables don't exist in typical CREATE FUNCTION time (registration time). Typically doesn't exist in plpgsql compile time too. Usually temporary tables are created inside executed plpgsql functions. So you cannot do any semantical (deeper) check in registration, or compile time. Just because one kind of object (temporary tables) doesn't exist. This is a difficult issue for plpgsql_check too.


Can you explain more about this? 
 
 Can be nice if some functionality of plpgsql_check can be in core, because I think so it is necessary for development, but the structure and integration of SQL in PLpgSQL is very good (and very practical).

 
I'm interested in plpgsql_check.  However I am still confused why we can do it in this way, but
can't do it in the  VOLATILE_AUTO way. 

You can do it. But you solve one issue, and introduce new kinds of more terrible issues (related to dependencies between database's objects). The design of plpgsql is pretty different from the design of Oracle's PL/SQL. So proposed change means total conceptual change, and you need to write a lot of new code that will provide necessary infrastructure. I am not sure if the benefit is higher than the cost. It can be usable, if plpgsql can be really compiled to some machine code - but it means ten thousands codes without significant benefits - the bottleneck inside stored procedures is SQL, and the compilation doesn't help with it.

 
 
b) When you migrate from Oracle,then you can use the STABLE flag, and it will be mostly correct.

This was suggested in our team as well, but I don't think it is very strict.  For example:  
SELECT materialize_bills_for(userId) from users;  Any more proof to say "STABLE" flag
is acceptable? 

Oracle doesn't allow write operations in functions. Or didn't allow it - I am not sure what is possible now. So when you migrate data from Oracle, and if the function is not marked as DETERMINISTIC, you can safely mark it as STABLE.

You are correct.  Good to know the above points. 

And DETERMINISTIC functions are IMMUTABLE on Postgres's side

 
 Elsewhere - it works 99% well. In special cases, there is some black magic - with fresh snapshots, and with using autonomous transactions, and these cases should be solved manually. Sometimes is good enough just removing autonomous transactions, sometimes the complete rewrite is necessary - or redesign functionality.


is the 1% == "special cases" ?  Do you mind sharing more information about these cases,
either document or code? 

Unfortunately not. I have not well structured notes from work on ports from Oracle to Postgres. And these 1% cases are very very variable. People are very creative. But usually this code is almost very dirty, and not critical. In Postgres we can use LISTEN, NOTIFY, or possibility to set app_name or we can use RAISE NOTICE.