Обсуждение: Behavior of PL/pgSQL function following drop and re-create of a table that it uses

Поиск
Список
Период
Сортировка

Behavior of PL/pgSQL function following drop and re-create of a table that it uses

От
Bryn Llewellyn
Дата:
I copied my self-contained testcase, and its output (using PG Version 15.2),  at the end.

I read the doc section "43.11. PL/pgSQL under the Hood": www.postgresql.org/docs/15/plpgsql-implementation.html

Is my mental model, described below, right?

1. With function "s.f()" as presented, the attempt to create it when table "s.t" doesn't yet exist fails with the 42601 syntax error: « invalid type name "s.t.v%type" ». This is expected because some tests must succeed at "create time" else the attempt becomes a no-op.

2. Following creating "s.f()" after creating table "s.t", executing it, and then dropping "s.t", the pg_proc row for "s.f()" remains intact with the original source text. This reflects the fact that the successful "create" didn't record any dependency info so that PG doesn't know what the human observer knows.

3. After "s.t" is re-instated, now with a different data type for "t.v", the SQL query reports the new column data type (and the new content). After all, this is just ordinary query behavior reflecting the current state of the db. However the reported type of the local variable "v_out" is still "text" (as it was at create time) and not "varchar" as it now is. This nominal error reflects the fact that the representation of the to-be-interpreted function, in session memory, was built when "s.f()" was first executed and is now cached. Because there are no dependencies, nothing tells PG to rebuild the representation of the to-be-interpreted function, in session memory.

5. After re-connecting, we have a brand-new session with as yet no cached info. Therefore, the representation of the to-be-interpreted function must be rebuilt when it's first referenced. And now, it sees table "s.t" with a "varchar" column.

6. All this leads to rather obvious practice advice: DDLs on objects that an application uses (at least when the app's database artifacts include PL/pgSQL subprograms) are unsafe while the app is in use. You must stop all client-sessions before doing such DDLs and re-start them only when all DDLs are done successfully.

________________________________________________________________________________

-- Connect as an ordinary user to a convenient database.
\c d0 d0$u0

prepare f_prosrc as
select prosrc
from
  pg_proc p
  inner join
  pg_namespace n
  on p.pronamespace = n.oid
  inner join
  pg_roles r
  on p.proowner = r.oid
where p.proname = 'f'
and   n.nspname = 's'
and   r.rolname = $1;

create schema s;
create table s.t(k serial primary key, v text);
insert into s.t(v) values ('cat-1'), ('dog-1'), ('mouse-1');

create function s.f(k_in in int)
  returns text
  security definer
  set search_path = pg_catalog, pg_temp
  language plpgsql
as $body$
declare
  v_type text      not null := '';
  v_out s.t.v%type not null := '';
begin
  select pg_typeof(t.v)::text, t.v
  into strict v_type, v_out
  from s.t
  where t.k = k_in;
  return 'pg_typeof(t.v): '  ||v_type                 ||' / '||
         'pg_typeof(v_out): '||pg_typeof(v_out)::text ||' / '||
         'value: '           ||v_out;
end;
$body$;

select '----------';
select s.f(1);

drop table s.t cascade;
select '----------';
execute f_prosrc('d0$u0');

create table s.t(k serial primary key, v varchar(10));
insert into s.t(v) values ('cat-2'), ('dog-2'), ('mouse-2');

-- "s.f()" still reports "text" for "pg_typeof(v_out)".
select '----------';
select s.f(1);

\c d0 d0$u0
-- Only now have we lost the cached result of "compiling" the function "s.f()".
-- Now reports "character varying” for "pg_typeof(v_out)".
select '----------';
select s.f(1);

RESULTS (using “\t on” mode)

 ----------

 pg_typeof(t.v): text / pg_typeof(v_out): text / value: cat-1

 ----------

 < The expected "language plpgsql" source text — verbatim as it was entered, including "%type" (untranslated). >

 ----------

 pg_typeof(t.v): character varying / pg_typeof(v_out): text / value: cat-2

 ----------

 pg_typeof(t.v): character varying / pg_typeof(v_out): character varying / value: cat-2

Re: Behavior of PL/pgSQL function following drop and re-create of a table that it uses

От
"David G. Johnston"
Дата:
On Tue, Mar 7, 2023 at 1:24 PM Bryn Llewellyn <bryn@yugabyte.com> wrote:
create table s.t(k serial primary key, v text);
create function s.f(k_in in int)
select s.f(1);

text, function is now compiled with that type resolution fixed.
I think mostly attributable to:

> The mutable nature of record variables presents another problem in this connection. When fields of a record variable are used in expressions or statements, the data types of the fields must not change from one call of the function to the next, since each expression will be analyzed using the data type that is present when the expression is first reached.

Though possibly it (variable declarations):is considered structural:

> The instruction tree fully translates the PL/pgSQL statement structure,

drop table s.t cascade;
create table s.t(k serial primary key, v varchar(10));
select s.f(1);

still text as the compiled artifact is re-executed
 
\c d0 d0$u0
select s.f(1);

now varchar as the function is recompiled during its first use in this session.


Restarting everything is an approach to dealing with uncertainty.  This particular use case, though, isn't one that I'd be overly worried about.  Actually making DDL changes of this nature should be rare if not forbidden.  Once live on-the-fly column type changes just shouldn't happen so having a plan in place that accommodates them is adding cost for no real benefit.

David J.

Re: Behavior of PL/pgSQL function following drop and re-create of a table that it uses

От
Bryn Llewellyn
Дата:
> david.g.johnston@gmail.com wrote:
>
>> bryn@yugabyte.com wrote:
>>
>> create table s.t(k serial primary key, v text);
>> create function s.f(k_in in int)
>> select s.f(1);
>
> text, function is now compiled with that type resolution fixed.
> I think mostly attributable to:
>
> > The mutable nature of record variables presents another problem in this connection. When fields of a record
variableare used in expressions or statements, the data types of the fields must not change from one call of the
functionto the next, since each expression will be analyzed using the data type that is present when the expression is
firstreached. 
>
> Though possibly… variable declarations [are] considered structural:
>
> > The instruction tree fully translates the PL/pgSQL statement structure,
>
>> drop table s.t cascade;
>> create table s.t(k serial primary key, v varchar(10));
>> select s.f(1);
>
> still text as the compiled artifact is re-executed
>
>> \c d0 d0$u0
>> select s.f(1);
>
> now varchar as the function is recompiled during its first use in this session.
>
> Restarting everything is an approach to dealing with uncertainty. This particular use case, though, isn't one that
I'dbe overly worried about. Actually making DDL changes of this nature should be rare if not forbidden. Once live
on-the-flycolumn type changes just shouldn't happen so having a plan in place that accommodates them is adding cost for
noreal benefit. 

Thanks. I believe that you and I agree on the proper practice, paraphrased here slightly w.r.t. what I wrote in my
point#6 in my email that started this thread: 

Regard a DDL on any object that an application uses as unsafe while the app is in use. You must terminate all
client-sessionsbefore doing such a DDL and re-start them only when all such DDLs are done successfully. 




Re: Behavior of PL/pgSQL function following drop and re-create of a table that it uses

От
"David G. Johnston"
Дата:
On Tue, Mar 7, 2023 at 5:52 PM Bryn Llewellyn <bryn@yugabyte.com> wrote:

Regard a DDL on any object that an application uses as unsafe while the app is in use. You must terminate all client-sessions before doing such a DDL and re-start them only when all such DDLs are done successfully.


No.  If you simply "ADD COLUMN" to an existing table the "terminate all client-sessions" action is excessive, IMO.

David J.

Re: Behavior of PL/pgSQL function following drop and re-create of a table that it uses

От
"David G. Johnston"
Дата:
(adding back the list)

On Tue, Mar 7, 2023 at 8:24 PM David G. Johnston <david.g.johnston@gmail.com> wrote:
On Tue, Mar 7, 2023 at 7:54 PM Bryn Llewellyn <bryn@yugabyte.com> wrote:


This is what I expected actually, though I can't point to exactly why.

Where can I read what I need in order to understand the difference here, using %rowtype, and in the first test that I posted, using %type?

I'm not certain there should be.  Given the presence of the bug below and general infrequency of this scenario I wouldn't be totally surprised there is a bug here as well.

So I found where this difference in behavior is at least explicitly noted:


 /*
* If it's a named composite type (or domain over one), find the typcache
* entry and record the current tupdesc ID, so we can detect changes
* (including drops).  We don't currently support on-the-fly replacement
* of non-composite types, else we might want to do this for them too.
*/

If this limitation is documented in a user-facing manner I do not know where.

David J.

Fwd: Behavior of PL/pgSQL function following drop and re-create of a table that it uses

От
"David G. Johnston"
Дата:
Forwarding this to the list; Note the potential bug found at the end.  My actual follow-on reply notes the lack of documentation regarding the composite cache-checking behavior (relative to the non-composite situation)

---------- Forwarded message ---------
From: David G. Johnston <david.g.johnston@gmail.com>
Date: Tue, Mar 7, 2023 at 8:24 PM
Subject: Re: Behavior of PL/pgSQL function following drop and re-create of a table that it uses
To: Bryn Llewellyn <bryn@yugabyte.com>


On Tue, Mar 7, 2023 at 7:54 PM Bryn Llewellyn <bryn@yugabyte.com> wrote:
 (1,cat)

Now, still in the same session:

alter table s.t add c2 text;
update s.t set c2 = 'dog' where k = 1;
select s.f(1);

This is the new result. It surprised me:

 (1,cat,dog)

This is what I expected actually, though I can't point to exactly why.

Where can I read what I need in order to understand the difference here, using %rowtype, and in the first test that I posted, using %type?

I'm not certain there should be.  Given the presence of the bug below and general infrequency of this scenario I wouldn't be totally surprised there is a bug here as well.

 
Why is the meaning of %type frozen at "create" time

Nothing in the body of a pl/pgsql routine is frozen at "create time".  At the earliest, it freezes at first execution in a session.
 

Why don't I get a runtime error telling me that I have more "select list" items than "into" targets?

That would be a bug so far as I can tell.

postgres=# do $$declare c1 text; c2 text; begin select '1','2','3' into c1, c2; end;$$;
DO

> If a row variable or a variable list is used as target, the command's result columns must exactly match the structure of the target as to number and data types, or else a run-time error occurs.


David J.

Re: Behavior of PL/pgSQL function following drop and re-create of a table that it uses

От
Tom Lane
Дата:
"David G. Johnston" <david.g.johnston@gmail.com> writes:
> So I found where this difference in behavior is at least explicitly noted:

>/*
> * If it's a named composite type (or domain over one), find the typcache
> * entry and record the current tupdesc ID, so we can detect changes
> * (including drops).  We don't currently support on-the-fly replacement
> * of non-composite types, else we might want to do this for them too.
> */

I'm not quite sure that that's related, really.  That code is concerned
with detecting changes to an already-identified type (that is, type
OID NNN has different details now than it did before).  It seemed to
me that Bryn's question was more about reacting to cases where a given
string of source code would resolve to a different type OID than it
did a moment ago.  We don't have a great story on that, I'll agree.
You can get into that sort of problem without anywhere near the amount
of complexity embodied in this example --- for instance, I'm pretty
sure we don't re-parse type references just because somebody else
executed an ALTER TYPE RENAME somewhere.

            regards, tom lane



Re: Behavior of PL/pgSQL function following drop and re-create of a table that it uses

От
"David G. Johnston"
Дата:
On Tue, Mar 7, 2023 at 9:49 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
"David G. Johnston" <david.g.johnston@gmail.com> writes:
> So I found where this difference in behavior is at least explicitly noted:

>/*
> * If it's a named composite type (or domain over one), find the typcache
> * entry and record the current tupdesc ID, so we can detect changes
> * (including drops).  We don't currently support on-the-fly replacement
> * of non-composite types, else we might want to do this for them too.
> */

I'm not quite sure that that's related, really.  That code is concerned
with detecting changes to an already-identified type (that is, type
OID NNN has different details now than it did before).  It seemed to
me that Bryn's question was more about reacting to cases where a given
string of source code would resolve to a different type OID than it
did a moment ago.

Sorta, and now I see why %TYPE doesn't work but %ROWTYPE does - a change in the former is necessarily a new OID while a change to the later will have the same OID but a different internal structure that the type reference compiled into the function can leverage to be resolve its structure at runtime.

Of course, I went from almost clueless to this answer in just over an hour so it is possible I'm wrong.  But it seems to fit so far.

David J.

Re: Behavior of PL/pgSQL function following drop and re-create of a table that it uses

От
Bryn Llewellyn
Дата:
> david.g.johnston@gmail.com wrote:
>
>> bryn@yugabyte.com wrote:
>>
>>> david.g.johnston@gmail.com wrote:
>>>
>>>> bryn@yugabyte.com wrote:
>>>>
>>>> Regard a DDL on any object that an application uses as unsafe while the app is in use. You must terminate all
client-sessionsbefore doing such a DDL and re-start them only when all such DDLs are done successfully. 
>>>
>>> No. If you simply "ADD COLUMN" to an existing table the "terminate all client-sessions" action is excessive, IMO.
>>
>> I tried another test. The results surprised me:
>>
>> create table s.t(k int primary key, c1 text);
>> insert into s.t(k, c1) values (1, 'cat');
>>
>> create function s.f(k_in in int)
>>   returns text
>>   security definer
>>   set search_path = pg_catalog, pg_temp
>>   language plpgsql
>> as $body$
>> declare
>>   r s.t%rowtype;
>> begin
>>   select * from s.t into strict r where t.k = k_in;
>>   return r::text;
>> end;
>> $body$;
>>
>> select s.f(1);
>>
>> This is the result (no surprises yet):
>>
>>  (1,cat)
>>
>> Now, still in the same session:
>>
>> alter table s.t add c2 text;
>> update s.t set c2 = 'dog' where k = 1;
>> select s.f(1);
>>
>> This is the new result. It surprised me:
>>
>>  (1,cat,dog)
>>
>> I had expected that %rowtype would be translated, and frozen, at "create" time into the columns "k" and "c1". So I
expectedthe second execution of "s.f()" give some flavor of wrong answer. 
>>
>> Where can I read what I need in order to understand the difference here, using %rowtype, and in the first test that
Iposted, using %type? Why is the meaning of %type frozen at "create" time while (as it seems) %rowtype is re-evaluated
atruntime—presumably on every execution of the subprogram? 
>>
>> I discovered a new surprise in this general space with this test:
>>
>> create function s.g()
>>   returns text
>>   security definer
>>   set search_path = pg_catalog, pg_temp
>>   language plpgsql
>> as $body$
>> declare
>>   c1 text;
>>   c2 text;
>> begin
>>   select 'cat', 'dog', 'mouse' into c1, c2;
>>   return c1||' '||c2;
>> end;
>> $body$;
>>
>> select s.g();
>>
>> It runs without error and shows this:
>>
>>  cat dog
>>
>> Why don't I get a runtime error telling me that I have more "select list" items than "into" targets?
>
> You may want to send this to the mailing list too, for posterity.

Oops… I somehow slipped up and replied only to David. Here it is, now, for the archive.

I also slipped up by saying « frozen, at "create" time ». Thanks for pointing this out, David. I did indeed mean to
write« frozen, in a particular session and for the remainder of that session's duration, when the PL/pgSQL subprogram
isfirst executed. » 

I read the replies from David and Tom. But I must confess that I can't work out what the current consensus on what's
intendedis w.r.t. load-time versus execution-time response to a change definition of %type and %rowtype. 

 (Never mind yet whether, or to what extent, this is currently documented.)

I believe that I'm hearing that there is thought to be a genuine bug, orthogonal to the main thing that I was asking
about,thus: an attempt to select N1 items into N2 targets, where N1 and N2 differ, should cause a run-time error. (N1
andN2 might differ, as I demonstrated, simply because of a programmer-authored error. Or they might differ now, in some
session,where they earlier didn't, because of changes in the environment with which this session's in-memory
representationof the PL/pgSQL program has lost currency). 

Returning to David's earlier comment, thus:

> If you simply "ADD COLUMN" to an existing table the "terminate all client-sessions" action is excessive, IMO.


Why not err on the side of caution and (I trust) guaranteed currency of each session's in-memory representation of a
PL/pgSQLprogram with the environment in which it executes? 

After all, you add a column in order to use it. And this means that at the very least client-side code must be changed
todo this. And this means quiescing use of the application and then re-starting it with new behavior. Is re-starting
theconnection pool before opening up the new app for use so expensive that it's worth trying to reason when it might be
safeto avoid this re-start? 










Re: Behavior of PL/pgSQL function following drop and re-create of a table that it uses

От
"David G. Johnston"
Дата:
On Tue, Mar 7, 2023 at 10:19 PM Bryn Llewellyn <bryn@yugabyte.com> wrote:
Why not err on the side of caution and (I trust) guaranteed currency of each session's in-memory representation of a PL/pgSQL program with the environment in which it executes?

After all, you add a column in order to use it. And this means that at the very least client-side code must be changed to do this. And this means quiescing use of the application and then re-starting it with new behavior. Is re-starting the connection pool before opening up the new app for use so expensive that it's worth trying to reason when it might be safe to avoid this re-start?

True.  I was a bit hasty in forming an opinion on an operational aspect like that.  That just isn't something I typically think about.

David J.

Re: Behavior of PL/pgSQL function following drop and re-create of a table that it uses

От
Bryn Llewellyn
Дата:
tgl@sss.pgh.pa.us wrote:

david.g.johnston@gmail.com wrote:

So I found where this difference in behavior is at least explicitly noted:

/*
* If it's a named composite type (or domain over one), find the typcache
* entry and record the current tupdesc ID, so we can detect changes
* (including drops). We don't currently support on-the-fly replacement
* of non-composite types, else we might want to do this for them too.
*/

I'm not quite sure that that's related, really. That code is concerned with detecting changes to an already-identified type (that is, type OID NNN has different details now than it did before). It seemed to me that Bryn's question was more about reacting to cases where a given string of source code would resolve to a different type OID than it did a moment ago. We don't have a great story on that, I'll agree. You can get into that sort of problem without anywhere near the amount of complexity embodied in this example --- for instance, I'm pretty sure we don't re-parse type references just because somebody else executed an ALTER TYPE RENAME somewhere.

I tried a new test, inspired by what Tom wrote:

create table s.t(k int primary key, c1 int, c2 int, c3 int);
insert into s.t(k, c1, c2, c3) values(1, 17, 42, 57);
create type s.x as (c1 int, c2 int, c3 int);

create function s.f()
  returns text
  security definer
  set search_path = pg_catalog, pg_temp
  language plpgsql
as $body$
declare
  r s.x;
begin
  r  := (select (a.c1, a.c2, a.c3)::s.x from s.t as a where a.k = 1);
  return r::text;
end;
$body$;

select s.f();

It produced the expected result:

 (17,42,57)

Then I did this (still in the same session):

alter type s.x drop attribute c3 cascade;
select s.f();

It produced this new result (with no reported error):

 (17,42)

Then I reconnected as the same user to the same database to force a fresh analysis of the the source code of "s.f()" on it's first execution:

\c - :the_user
select s.f();

Now I got a  42846 error, "cannot cast type record to s.x", with the detail "Input has too many columns".

Here's my conclusion. It's for the scenario that you have PL/pgSQL subprograms among the objects that your client-side app uses. It's rather obvious.

(1) If you do any DDLs that affect any of the objects that an application uses, then you should exit all of the client sessions (presumably this means stopping the connection pool for most apps) before you do the patching. The reasoning is simple. A few spot tests show how things can go wrong if you don't do this. And there's no doc to tell you what, if any, DDLs you might safely do without stopping all but the session(s) that do the patching.

(2) You have to take full responsibility for the impact analysis so that you can make all the changes that are needed to take you from the pre-patch mutually consistent state of all objects to the new post-patch mutually consistent state during the window when only the session(s) doing the patching are active. Native PG doesn't provide much metadata or tooling to help you here. You need your own reliable humanly written external doc of your system.

(3) The same general thinking extends to client-side code. Carefully specified and executed testing, using a dedicated and realistic test env,  is critical.

On 3/8/23 15:29, Bryn Llewellyn wrote:
[snip]
create table s.t(k int primary key, c1 int, c2 int, c3 int);
insert into s.t(k, c1, c2, c3) values(1, 17, 42, 57);
create type s.x as (c1 int, c2 int, c3 int);

[snip]

This is an excellent analysis.


Native PG doesn't provide much metadata or tooling to help you here. You need your own reliable humanly written external doc of your system.


"pg_dump --schema-only" and then grep for type x?

--
Born in Arizona, moved to Babylonia.

Re: Behavior of PL/pgSQL function following drop and re-create of a table that it uses

От
Pavel Stehule
Дата:


st 8. 3. 2023 v 22:29 odesílatel Bryn Llewellyn <bryn@yugabyte.com> napsal:
tgl@sss.pgh.pa.us wrote:

david.g.johnston@gmail.com wrote:

So I found where this difference in behavior is at least explicitly noted:

/*
* If it's a named composite type (or domain over one), find the typcache
* entry and record the current tupdesc ID, so we can detect changes
* (including drops). We don't currently support on-the-fly replacement
* of non-composite types, else we might want to do this for them too.
*/

I'm not quite sure that that's related, really. That code is concerned with detecting changes to an already-identified type (that is, type OID NNN has different details now than it did before). It seemed to me that Bryn's question was more about reacting to cases where a given string of source code would resolve to a different type OID than it did a moment ago. We don't have a great story on that, I'll agree. You can get into that sort of problem without anywhere near the amount of complexity embodied in this example --- for instance, I'm pretty sure we don't re-parse type references just because somebody else executed an ALTER TYPE RENAME somewhere.

I tried a new test, inspired by what Tom wrote:

create table s.t(k int primary key, c1 int, c2 int, c3 int);
insert into s.t(k, c1, c2, c3) values(1, 17, 42, 57);
create type s.x as (c1 int, c2 int, c3 int);

create function s.f()
  returns text
  security definer
  set search_path = pg_catalog, pg_temp
  language plpgsql
as $body$
declare
  r s.x;
begin
  r  := (select (a.c1, a.c2, a.c3)::s.x from s.t as a where a.k = 1);
  return r::text;
end;
$body$;

select s.f();

It produced the expected result:

 (17,42,57)

Then I did this (still in the same session):

alter type s.x drop attribute c3 cascade;
select s.f();

It produced this new result (with no reported error):

 (17,42)

Then I reconnected as the same user to the same database to force a fresh analysis of the the source code of "s.f()" on it's first execution:

\c - :the_user
select s.f();

Now I got a  42846 error, "cannot cast type record to s.x", with the detail "Input has too many columns".

Here's my conclusion. It's for the scenario that you have PL/pgSQL subprograms among the objects that your client-side app uses. It's rather obvious.

(1) If you do any DDLs that affect any of the objects that an application uses, then you should exit all of the client sessions (presumably this means stopping the connection pool for most apps) before you do the patching. The reasoning is simple. A few spot tests show how things can go wrong if you don't do this. And there's no doc to tell you what, if any, DDLs you might safely do without stopping all but the session(s) that do the patching.

(2) You have to take full responsibility for the impact analysis so that you can make all the changes that are needed to take you from the pre-patch mutually consistent state of all objects to the new post-patch mutually consistent state during the window when only the session(s) doing the patching are active. Native PG doesn't provide much metadata or tooling to help you here. You need your own reliable humanly written external doc of your system.

(3) The same general thinking extends to client-side code. Carefully specified and executed testing, using a dedicated and realistic test env,  is critical.

PL/pgSQL currently doesn't try to synchronize the structure of row variables. Row variables hold description of structure until end of session.  Now, plpgsql code is "recompiled" after change of pg_proc record. Theoretically there can be used the same mechanism like plan cache does, and recompilation can be forced after change of related data types. Or only declaration of row variables can be recompiled. This is just about the possibility of invalidating some local cache.

Now, with using record type, the code should be tolerant against these changes. I think this behaviour will be identical - and plpgsql_check can work with the record's type almost well.

Regards

Pavel