Обсуждение: Should we document how column DEFAULT expressions work?

Поиск
Список
Период
Сортировка

Should we document how column DEFAULT expressions work?

От
James Coleman
Дата:
Hello,

It's possible I'm the only one who's been in this situation, but I've
multiple times found myself explaining to a user how column DEFAULT
expressions work: namely how the quoting on an expression following
the keyword DEFAULT controls whether or not the expression is
evaluated at the time of the DDL statement or at the time of an
insertion.

In my experience this is non-obvious to users, and the quoting makes a
big difference.

Is this something that we should document explicitly? I don't see it
called out in the CREATE TABLE reference page, but it's possible I'm
missing something.

Thanks,
James Coleman



Re: Should we document how column DEFAULT expressions work?

От
Tom Lane
Дата:
James Coleman <jtc331@gmail.com> writes:
> It's possible I'm the only one who's been in this situation, but I've
> multiple times found myself explaining to a user how column DEFAULT
> expressions work: namely how the quoting on an expression following
> the keyword DEFAULT controls whether or not the expression is
> evaluated at the time of the DDL statement or at the time of an
> insertion.

Uh ... what?  I recall something about that with respect to certain
features such as nextval(), but you're making it sound like there
is something generic going on with DEFAULT.

            regards, tom lane



Re: Should we document how column DEFAULT expressions work?

От
James Coleman
Дата:
On Tue, Jun 25, 2024 at 4:59 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> James Coleman <jtc331@gmail.com> writes:
> > It's possible I'm the only one who's been in this situation, but I've
> > multiple times found myself explaining to a user how column DEFAULT
> > expressions work: namely how the quoting on an expression following
> > the keyword DEFAULT controls whether or not the expression is
> > evaluated at the time of the DDL statement or at the time of an
> > insertion.
>
> Uh ... what?  I recall something about that with respect to certain
> features such as nextval(), but you're making it sound like there
> is something generic going on with DEFAULT.

Hmm, I guess I'd never considered anything besides cases like
nextval() and now(), but I see now that now() must also be special
cased (when quoted) since 'date_trunc(day, now())'::timestamp doesn't
work but 'now()'::timestamp does.

So I guess what I'm asking about would be limited to those cases (I
assume there are a few others...but I haven't gone digging through the
source yet).

Regards,
James Coleman



Re: Should we document how column DEFAULT expressions work?

От
Tom Lane
Дата:
James Coleman <jtc331@gmail.com> writes:
> On Tue, Jun 25, 2024 at 4:59 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Uh ... what?  I recall something about that with respect to certain
>> features such as nextval(), but you're making it sound like there
>> is something generic going on with DEFAULT.

> Hmm, I guess I'd never considered anything besides cases like
> nextval() and now(), but I see now that now() must also be special
> cased (when quoted) since 'date_trunc(day, now())'::timestamp doesn't
> work but 'now()'::timestamp does.

Hmm, both of those behaviors are documented, but not in the same place
and possibly not anywhere near where you looked for info about
DEFAULT.  For instance, the Tip at the bottom of section 9.9.5

https://www.postgresql.org/docs/current/functions-datetime.html#FUNCTIONS-DATETIME-CURRENT

explains about how 'now'::timestamp isn't what to use in DEFAULT.

            regards, tom lane



Re: Should we document how column DEFAULT expressions work?

От
"David G. Johnston"
Дата:
On Tue, Jun 25, 2024 at 4:11 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
James Coleman <jtc331@gmail.com> writes:
> On Tue, Jun 25, 2024 at 4:59 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Uh ... what?  I recall something about that with respect to certain
>> features such as nextval(), but you're making it sound like there
>> is something generic going on with DEFAULT.

> Hmm, I guess I'd never considered anything besides cases like
> nextval() and now(), but I see now that now() must also be special
> cased (when quoted) since 'date_trunc(day, now())'::timestamp doesn't
> work but 'now()'::timestamp does.

Hmm, both of those behaviors are documented, but not in the same place
and possibly not anywhere near where you looked for info about
DEFAULT.  For instance, the Tip at the bottom of section 9.9.5

https://www.postgresql.org/docs/current/functions-datetime.html#FUNCTIONS-DATETIME-CURRENT

explains about how 'now'::timestamp isn't what to use in DEFAULT.


I'd suggest adding to:

DEFAULT default_expr
The DEFAULT clause assigns a default data value for the column whose column definition it appears within. The value is any variable-free expression (in particular, cross-references to other columns in the current table are not allowed). Subqueries are not allowed either. The data type of the default expression must match the data type of the column.

The default expression will be used in any insert operation that does not specify a value for the column. If there is no default for a column, then the default is null.

+ Be aware that the [special timestamp values 1] are resolved immediately, not upon insert.  Use the [date/time constructor functions 2] to produce a time relative to the future insertion.


David J.


Re: Should we document how column DEFAULT expressions work?

От
David Rowley
Дата:
On Wed, 26 Jun 2024 at 13:31, David G. Johnston
<david.g.johnston@gmail.com> wrote:
> I'd suggest adding to:
>
> DEFAULT default_expr
> The DEFAULT clause assigns a default data value for the column whose column definition it appears within. The value
isany variable-free expression (in particular, cross-references to other columns in the current table are not allowed).
Subqueriesare not allowed either. The data type of the default expression must match the data type of the column. 
>
> The default expression will be used in any insert operation that does not specify a value for the column. If there is
nodefault for a column, then the default is null. 
>
> + Be aware that the [special timestamp values 1] are resolved immediately, not upon insert.  Use the [date/time
constructorfunctions 2] to produce a time relative to the future insertion. 

FWIW, I disagree that we need to write anything about that in this
part of the documentation.  I think any argument for doing this could
equally be applied to something like re-iterating what the operator
precedence rules for arithmetic are, and I don't think that should be
mentioned. Also, what about all the other places where someone could
use one of the special timestamp input values? Should CREATE VIEW get
a memo too?  How about PREPARE?

If people don't properly understand these special timestamp input
values, then maybe the documentation in [1] needs to be improved.  At
the moment the details are within parentheses. Namely "(In particular,
now and related strings are converted to a specific time value as soon
as they are read.)".  Maybe it would be better to be more explicit
there and mention that these are special values that the input
function understands which are translated to actual timestamp values
when the type's input function is called.  That could maybe be tied
into the DEFAULT clause documentation to mention that the input
function for constant values is called at DML time rather than DDL
time.  That way, we're not adding these (unsustainable) special cases
to the documentation.

David

[1] https://www.postgresql.org/docs/current/datatype-datetime.html#DATATYPE-DATETIME-SPECIAL-VALUES



Re: Should we document how column DEFAULT expressions work?

От
Tom Lane
Дата:
David Rowley <dgrowleyml@gmail.com> writes:
> If people don't properly understand these special timestamp input
> values, then maybe the documentation in [1] needs to be improved.  At
> the moment the details are within parentheses. Namely "(In particular,
> now and related strings are converted to a specific time value as soon
> as they are read.)".  Maybe it would be better to be more explicit
> there and mention that these are special values that the input
> function understands which are translated to actual timestamp values
> when the type's input function is called.  That could maybe be tied
> into the DEFAULT clause documentation to mention that the input
> function for constant values is called at DML time rather than DDL
> time.  That way, we're not adding these (unsustainable) special cases
> to the documentation.

This sounds like a reasonable approach to me for the
magic-input-values issue.  Do we want to do anything about
nextval()?  I guess if you hold your head at the correct
angle, that's also a magic-input-value issue, in the sense
that the question is when does regclass input get resolved.

            regards, tom lane



Re: Should we document how column DEFAULT expressions work?

От
"David G. Johnston"
Дата:
On Tue, Jun 25, 2024 at 9:50 PM David Rowley <dgrowleyml@gmail.com> wrote:
On Wed, 26 Jun 2024 at 13:31, David G. Johnston
<david.g.johnston@gmail.com> wrote:
> I'd suggest adding to:
>
> DEFAULT default_expr
> The DEFAULT clause assigns a default data value for the column whose column definition it appears within. The value is any variable-free expression (in particular, cross-references to other columns in the current table are not allowed). Subqueries are not allowed either. The data type of the default expression must match the data type of the column.
>
> The default expression will be used in any insert operation that does not specify a value for the column. If there is no default for a column, then the default is null.
>
> + Be aware that the [special timestamp values 1] are resolved immediately, not upon insert.  Use the [date/time constructor functions 2] to produce a time relative to the future insertion.

Annoyingly even this advice isn't correct:

postgres=# create table tdts2 (ts timestamptz default 'now()');
CREATE TABLE
postgres=# \d tdts2
                                                 Table "public.tdts2"
 Column |           Type           | Collation | Nullable |                          Default          
               
--------+--------------------------+-----------+----------+-------------------------------------------
----------------
 ts     | timestamp with time zone |           |          | '2024-06-25 18:05:33.055377-07'::timestamp
 with time zone

I expected writing what looked like the function now() to be delayed evaluated but since I put it into quotes, the OPs complaint, it got read as the literal with ignored extra bits.


FWIW, I disagree that we need to write anything about that in this
part of the documentation.  I think any argument for doing this could
equally be applied to something like re-iterating what the operator
precedence rules for arithmetic are, and I don't think that should be
mentioned.

I disagree on this equivalence.  The time literals are clear deviations from expected behavior.  Knowing operator precedence rules, they apply everywhere equally.  And we should document the deviations directly where they happen.  Even if it's just a short link back to the source that describes the deviation.  I'm fine with something less verbose pointing only to the data types page, but not with nothing.

Also, what about all the other places where someone could
use one of the special timestamp input values? Should CREATE VIEW get
a memo too?  How about PREPARE?

Yes.


If people don't properly understand these special timestamp input
values, then maybe the documentation in [1] needs to be improved.

Recall, and awareness, is the greater issue, not comprehension.  This intends to increase the former.  I don't believe the latter is an issue, though I haven't deep dived into it.

And the whole type casting happening right away just seems misleading.

postgres=# create table testboold2 (expr boolean default boolean 'false');
CREATE TABLE
postgres=# \d testboold2
             Table "public.testboold2"
 Column |  Type   | Collation | Nullable | Default
--------+---------+-----------+----------+---------
 expr   | boolean |           |          | false

I would expect 'f' in the default column if the boolean casting of the literal happened sooner.  Or I'd expect to see "boolean 'false'" as the default expression if it is captured as-is.

So yes, saving an expression into the default column has nuances that should be documented where default is defined.

Maybe the wording needs to be:

"If the default expression contains any constants [1] they are converted into their typed value during create table execution.  Thus time constants [1] save into the default expression the time the command was executed."


I'd be happy to be pointed to other constants that resolve to an execution-time specific environment in a similar manner.  If there is another one I'll rethink the wisdom of trying to document all of them in each place.  But reminding people that time is special and we have these special values seems to provide meaningful reader benefit for the cost of a couple of sentences repeated in a few places.  That were valid a decade ago no more or less than they are valid now.

David J.

Re: Should we document how column DEFAULT expressions work?

От
"David G. Johnston"
Дата:
On Tue, Jun 25, 2024 at 10:12 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
David Rowley <dgrowleyml@gmail.com> writes:
> If people don't properly understand these special timestamp input
> values, then maybe the documentation in [1] needs to be improved.  At
> the moment the details are within parentheses. Namely "(In particular,
> now and related strings are converted to a specific time value as soon
> as they are read.)".  Maybe it would be better to be more explicit
> there and mention that these are special values that the input
> function understands which are translated to actual timestamp values
> when the type's input function is called.  That could maybe be tied
> into the DEFAULT clause documentation to mention that the input
> function for constant values is called at DML time rather than DDL
> time.  That way, we're not adding these (unsustainable) special cases
> to the documentation.

This sounds like a reasonable approach to me for the
magic-input-values issue.  Do we want to do anything about
nextval()?  I guess if you hold your head at the correct
angle, that's also a magic-input-value issue, in the sense
that the question is when does regclass input get resolved.


From observations we transform constants into the: " 'value'::type " syntax which then makes it an operator resolved at execution time.  For every type except time types the transformation leaves the constant as-is.  The special time values are the exception whereby they get evaluated to a specific time during the transformation.

postgres=# create table tser3 (id integer not null default nextval(regclass 'tser2_id_seq'));
CREATE TABLE
postgres=# \d tser3
                            Table "public.tser3"
 Column |  Type   | Collation | Nullable |              Default              
--------+---------+-----------+----------+-----------------------------------
 id     | integer |           | not null | nextval('tser2_id_seq'::regclass)

I cannot figure out how to get "early binding" into the default. I.e., nextval(9000)

Since early binding is similar to the special timestamp behavior I'd say nextval is behaving just as expected - literal transform, no evaluation.  We need only document the transforms that also evaluate.

David J.

Re: Should we document how column DEFAULT expressions work?

От
Alvaro Herrera
Дата:
On 2024-Jun-25, David G. Johnston wrote:

> On Tue, Jun 25, 2024 at 9:50 PM David Rowley <dgrowleyml@gmail.com> wrote:

> > FWIW, I disagree that we need to write anything about that in this
> > part of the documentation.  I think any argument for doing this could
> > equally be applied to something like re-iterating what the operator
> > precedence rules for arithmetic are, and I don't think that should be
> > mentioned.
> 
> I disagree on this equivalence.  The time literals are clear deviations
> from expected behavior.  Knowing operator precedence rules, they apply
> everywhere equally.  And we should document the deviations directly where
> they happen.  Even if it's just a short link back to the source that
> describes the deviation.  I'm fine with something less verbose pointing
> only to the data types page, but not with nothing.

I agree that it'd be good to have _something_ -- the other stance seems
super unhelpful.  "We're not going to spend two lines to explain some
funny rules that determine surprising behavior here, because we assume
you have read all of our other 3000 pages of almost impenetrably dense
documentation" is not great from a user's point of view.  The behavior
of 'now' in DEFAULT clauses is something that has been asked about for
decades.

Arithmetic precedence is a terrible straw man argument.  Let's put that
aside.

-- 
Álvaro Herrera        Breisgau, Deutschland  —  https://www.EnterpriseDB.com/
"If it is not right, do not do it.
If it is not true, do not say it." (Marcus Aurelius, Meditations)



Re: Should we document how column DEFAULT expressions work?

От
David Rowley
Дата:
On Wed, 26 Jun 2024 at 11:05, James Coleman <jtc331@gmail.com> wrote:
> Hmm, I guess I'd never considered anything besides cases like
> nextval() and now(), but I see now that now() must also be special
> cased (when quoted) since 'date_trunc(day, now())'::timestamp doesn't
> work but 'now()'::timestamp does.

'now()'::timestamp only works because we ignore trailing punctuation
in ParseDateTime() during timestamp_in(). 'now!!'::timestamp works
equally as well.

David



Re: Should we document how column DEFAULT expressions work?

От
"David G. Johnston"
Дата:
On Tuesday, June 25, 2024, James Coleman <jtc331@gmail.com> wrote:
Hello,

It's possible I'm the only one who's been in this situation, but I've
multiple times found myself explaining to a user how column DEFAULT
expressions work: namely how the quoting on an expression following
the keyword DEFAULT controls whether or not the expression is
evaluated at the time of the DDL statement or at the time of an
insertion.

I don’t know if it’s worth documenting but the following sentence is implied by the syntax:

“Do not single quote the expression as a whole.  Write the expression as you would in a select query.”

David J.

Re: Should we document how column DEFAULT expressions work?

От
David Rowley
Дата:
On Wed, 26 Jun 2024 at 17:12, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Do we want to do anything about
> nextval()?  I guess if you hold your head at the correct
> angle, that's also a magic-input-value issue, in the sense
> that the question is when does regclass input get resolved.

I think I'm not understanding what's special about that.  Aren't
'now'::timestamp and 'seq_name'::regclass are just casts that are
evaluated during parse time in transformExpr()?

David



Re: Should we document how column DEFAULT expressions work?

От
Tom Lane
Дата:
David Rowley <dgrowleyml@gmail.com> writes:
> On Wed, 26 Jun 2024 at 17:12, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Do we want to do anything about
>> nextval()?  I guess if you hold your head at the correct
>> angle, that's also a magic-input-value issue, in the sense
>> that the question is when does regclass input get resolved.

> I think I'm not understanding what's special about that.  Aren't
> 'now'::timestamp and 'seq_name'::regclass are just casts that are
> evaluated during parse time in transformExpr()?

Right.  But there is an example in the manual explaining how
these two things act differently:

    'seq_name'::regclass
    'seq_name'::text::regclass

The latter produces a constant of type text with a run-time
cast to regclass (and hence a run-time pg_class lookup).
IIRC, we document that mainly because the latter provides a way
to duplicate nextval()'s old behavior of run-time lookup.

Now that I think about it, there's a very parallel difference in
the behavior of

    'now'::timestamp
    'now'::text::timestamp

but I doubt that that example is shown anywhere.

            regards, tom lane



Re: Should we document how column DEFAULT expressions work?

От
David Rowley
Дата:
On Wed, 26 Jun 2024 at 17:36, David G. Johnston
<david.g.johnston@gmail.com> wrote:
>
> On Tue, Jun 25, 2024 at 9:50 PM David Rowley <dgrowleyml@gmail.com> wrote:
>> FWIW, I disagree that we need to write anything about that in this
>> part of the documentation.  I think any argument for doing this could
>> equally be applied to something like re-iterating what the operator
>> precedence rules for arithmetic are, and I don't think that should be
>> mentioned.
>
>
> I disagree on this equivalence.  The time literals are clear deviations from expected behavior.  Knowing operator
precedencerules, they apply everywhere equally.  And we should document the deviations directly where they happen.
Evenif it's just a short link back to the source that describes the deviation.  I'm fine with something less verbose
pointingonly to the data types page, but not with nothing. 

Are you able to share what the special behaviour is with DEFAULT
constraints and time literals that does not apply everywhere equally?

Maybe I'm slow on the uptake, but I've yet to see anything here where
time literals act in a special way DEFAULT constraints. This is why I
couldn't understand why we should be adding documentation about this
under CREATE TABLE.

I'd be happy to reconsider or retract my argument if you can show me
what I'm missing.

David



Re: Should we document how column DEFAULT expressions work?

От
Tom Lane
Дата:
David Rowley <dgrowleyml@gmail.com> writes:
> Maybe I'm slow on the uptake, but I've yet to see anything here where
> time literals act in a special way DEFAULT constraints. This is why I
> couldn't understand why we should be adding documentation about this
> under CREATE TABLE.

It's not that the parsing rules are any different: it's that in
ordinary DML queries, it seldom matters very much whether a
subexpression is evaluated at parse time versus run time.
In CREATE TABLE that difference is very in-your-face, so people
who haven't understood the rules clearly can get burnt.

However, there are certainly other places where it matters,
such as queries in plpgsql functions.  So I understand your
reluctance to go on about it in CREATE TABLE.  At the same
time, I see where David J. is coming from.

Maybe we could have a discussion of this in some single spot,
and link to it from CREATE TABLE and other relevant places?
ISTR there is something about it in the plpgsql doco already.

            regards, tom lane



Re: Should we document how column DEFAULT expressions work?

От
David Rowley
Дата:
On Thu, 27 Jun 2024 at 12:11, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> David Rowley <dgrowleyml@gmail.com> writes:
> > Maybe I'm slow on the uptake, but I've yet to see anything here where
> > time literals act in a special way DEFAULT constraints. This is why I
> > couldn't understand why we should be adding documentation about this
> > under CREATE TABLE.
>
> It's not that the parsing rules are any different: it's that in
> ordinary DML queries, it seldom matters very much whether a
> subexpression is evaluated at parse time versus run time.
> In CREATE TABLE that difference is very in-your-face, so people
> who haven't understood the rules clearly can get burnt.

Aha, now I understand. Thanks. So, seems like CREATE TABLE is being
targeted or maybe victimised here as it's probably the most common
place people learn about their misuse of the timestamp special input
values.

> However, there are certainly other places where it matters,
> such as queries in plpgsql functions.  So I understand your
> reluctance to go on about it in CREATE TABLE.  At the same
> time, I see where David J. is coming from.
>
> Maybe we could have a discussion of this in some single spot,
> and link to it from CREATE TABLE and other relevant places?
> ISTR there is something about it in the plpgsql doco already.

For the special timestamp stuff, that place is probably the special
timestamp table in [1].  It looks like the large caution you added in
540849814 might not be enough or perhaps wasn't done soon enough to
catch the people who read that part of the manual before the caution
was added. Hard to fix if it's the latter without a time machine. :-(

I'm open to having some section that fleshes this stuff out a bit more
with a few examples with CREATE TABLE and maybe CREATE VIEW that we
can link to.  Linking seems like a much more sustainable practice than
adding special case documentation for non-special case behaviour.

David

[1] https://www.postgresql.org/docs/devel/datatype-datetime.html



Re: Should we document how column DEFAULT expressions work?

От
Peter Eisentraut
Дата:
On 27.06.24 02:34, David Rowley wrote:
> For the special timestamp stuff, that place is probably the special
> timestamp table in [1].  It looks like the large caution you added in
> 540849814 might not be enough or perhaps wasn't done soon enough to
> catch the people who read that part of the manual before the caution
> was added. Hard to fix if it's the latter without a time machine. :-(

Maybe we should really be thinking about deprecating these special 
values and steering users more urgently toward more robust alternatives.

Imagine if 'random' were a valid input value for numeric types.