Обсуждение: Should we document how column DEFAULT expressions work?
Hello, It's possible I'm the only one who's been in this situation, but I've multiple times found myself explaining to a user how column DEFAULT expressions work: namely how the quoting on an expression following the keyword DEFAULT controls whether or not the expression is evaluated at the time of the DDL statement or at the time of an insertion. In my experience this is non-obvious to users, and the quoting makes a big difference. Is this something that we should document explicitly? I don't see it called out in the CREATE TABLE reference page, but it's possible I'm missing something. Thanks, James Coleman
James Coleman <jtc331@gmail.com> writes: > It's possible I'm the only one who's been in this situation, but I've > multiple times found myself explaining to a user how column DEFAULT > expressions work: namely how the quoting on an expression following > the keyword DEFAULT controls whether or not the expression is > evaluated at the time of the DDL statement or at the time of an > insertion. Uh ... what? I recall something about that with respect to certain features such as nextval(), but you're making it sound like there is something generic going on with DEFAULT. regards, tom lane
On Tue, Jun 25, 2024 at 4:59 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > James Coleman <jtc331@gmail.com> writes: > > It's possible I'm the only one who's been in this situation, but I've > > multiple times found myself explaining to a user how column DEFAULT > > expressions work: namely how the quoting on an expression following > > the keyword DEFAULT controls whether or not the expression is > > evaluated at the time of the DDL statement or at the time of an > > insertion. > > Uh ... what? I recall something about that with respect to certain > features such as nextval(), but you're making it sound like there > is something generic going on with DEFAULT. Hmm, I guess I'd never considered anything besides cases like nextval() and now(), but I see now that now() must also be special cased (when quoted) since 'date_trunc(day, now())'::timestamp doesn't work but 'now()'::timestamp does. So I guess what I'm asking about would be limited to those cases (I assume there are a few others...but I haven't gone digging through the source yet). Regards, James Coleman
James Coleman <jtc331@gmail.com> writes: > On Tue, Jun 25, 2024 at 4:59 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Uh ... what? I recall something about that with respect to certain >> features such as nextval(), but you're making it sound like there >> is something generic going on with DEFAULT. > Hmm, I guess I'd never considered anything besides cases like > nextval() and now(), but I see now that now() must also be special > cased (when quoted) since 'date_trunc(day, now())'::timestamp doesn't > work but 'now()'::timestamp does. Hmm, both of those behaviors are documented, but not in the same place and possibly not anywhere near where you looked for info about DEFAULT. For instance, the Tip at the bottom of section 9.9.5 https://www.postgresql.org/docs/current/functions-datetime.html#FUNCTIONS-DATETIME-CURRENT explains about how 'now'::timestamp isn't what to use in DEFAULT. regards, tom lane
On Tue, Jun 25, 2024 at 4:11 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
James Coleman <jtc331@gmail.com> writes:
> On Tue, Jun 25, 2024 at 4:59 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Uh ... what? I recall something about that with respect to certain
>> features such as nextval(), but you're making it sound like there
>> is something generic going on with DEFAULT.
> Hmm, I guess I'd never considered anything besides cases like
> nextval() and now(), but I see now that now() must also be special
> cased (when quoted) since 'date_trunc(day, now())'::timestamp doesn't
> work but 'now()'::timestamp does.
Hmm, both of those behaviors are documented, but not in the same place
and possibly not anywhere near where you looked for info about
DEFAULT. For instance, the Tip at the bottom of section 9.9.5
https://www.postgresql.org/docs/current/functions-datetime.html#FUNCTIONS-DATETIME-CURRENT
explains about how 'now'::timestamp isn't what to use in DEFAULT.
I'd suggest adding to:
DEFAULT default_expr
The DEFAULT clause assigns a default data value for the column whose column definition it appears within. The value is any variable-free expression (in particular, cross-references to other columns in the current table are not allowed). Subqueries are not allowed either. The data type of the default expression must match the data type of the column.
The default expression will be used in any insert operation that does not specify a value for the column. If there is no default for a column, then the default is null.
The DEFAULT clause assigns a default data value for the column whose column definition it appears within. The value is any variable-free expression (in particular, cross-references to other columns in the current table are not allowed). Subqueries are not allowed either. The data type of the default expression must match the data type of the column.
The default expression will be used in any insert operation that does not specify a value for the column. If there is no default for a column, then the default is null.
+ Be aware that the [special timestamp values 1] are resolved immediately, not upon insert. Use the [date/time constructor functions 2] to produce a time relative to the future insertion.
David J.
On Wed, 26 Jun 2024 at 13:31, David G. Johnston <david.g.johnston@gmail.com> wrote: > I'd suggest adding to: > > DEFAULT default_expr > The DEFAULT clause assigns a default data value for the column whose column definition it appears within. The value isany variable-free expression (in particular, cross-references to other columns in the current table are not allowed). Subqueriesare not allowed either. The data type of the default expression must match the data type of the column. > > The default expression will be used in any insert operation that does not specify a value for the column. If there is nodefault for a column, then the default is null. > > + Be aware that the [special timestamp values 1] are resolved immediately, not upon insert. Use the [date/time constructorfunctions 2] to produce a time relative to the future insertion. FWIW, I disagree that we need to write anything about that in this part of the documentation. I think any argument for doing this could equally be applied to something like re-iterating what the operator precedence rules for arithmetic are, and I don't think that should be mentioned. Also, what about all the other places where someone could use one of the special timestamp input values? Should CREATE VIEW get a memo too? How about PREPARE? If people don't properly understand these special timestamp input values, then maybe the documentation in [1] needs to be improved. At the moment the details are within parentheses. Namely "(In particular, now and related strings are converted to a specific time value as soon as they are read.)". Maybe it would be better to be more explicit there and mention that these are special values that the input function understands which are translated to actual timestamp values when the type's input function is called. That could maybe be tied into the DEFAULT clause documentation to mention that the input function for constant values is called at DML time rather than DDL time. That way, we're not adding these (unsustainable) special cases to the documentation. David [1] https://www.postgresql.org/docs/current/datatype-datetime.html#DATATYPE-DATETIME-SPECIAL-VALUES
David Rowley <dgrowleyml@gmail.com> writes: > If people don't properly understand these special timestamp input > values, then maybe the documentation in [1] needs to be improved. At > the moment the details are within parentheses. Namely "(In particular, > now and related strings are converted to a specific time value as soon > as they are read.)". Maybe it would be better to be more explicit > there and mention that these are special values that the input > function understands which are translated to actual timestamp values > when the type's input function is called. That could maybe be tied > into the DEFAULT clause documentation to mention that the input > function for constant values is called at DML time rather than DDL > time. That way, we're not adding these (unsustainable) special cases > to the documentation. This sounds like a reasonable approach to me for the magic-input-values issue. Do we want to do anything about nextval()? I guess if you hold your head at the correct angle, that's also a magic-input-value issue, in the sense that the question is when does regclass input get resolved. regards, tom lane
On Tue, Jun 25, 2024 at 9:50 PM David Rowley <dgrowleyml@gmail.com> wrote:
On Wed, 26 Jun 2024 at 13:31, David G. Johnston
<david.g.johnston@gmail.com> wrote:
> I'd suggest adding to:
>
> DEFAULT default_expr
> The DEFAULT clause assigns a default data value for the column whose column definition it appears within. The value is any variable-free expression (in particular, cross-references to other columns in the current table are not allowed). Subqueries are not allowed either. The data type of the default expression must match the data type of the column.
>
> The default expression will be used in any insert operation that does not specify a value for the column. If there is no default for a column, then the default is null.
>
> + Be aware that the [special timestamp values 1] are resolved immediately, not upon insert. Use the [date/time constructor functions 2] to produce a time relative to the future insertion.
Annoyingly even this advice isn't correct:
postgres=# create table tdts2 (ts timestamptz default 'now()');
CREATE TABLE
postgres=# \d tdts2
Table "public.tdts2"
Column | Type | Collation | Nullable | Default
--------+--------------------------+-----------+----------+-------------------------------------------
----------------
ts | timestamp with time zone | | | '2024-06-25 18:05:33.055377-07'::timestamp
with time zone
CREATE TABLE
postgres=# \d tdts2
Table "public.tdts2"
Column | Type | Collation | Nullable | Default
--------+--------------------------+-----------+----------+-------------------------------------------
----------------
ts | timestamp with time zone | | | '2024-06-25 18:05:33.055377-07'::timestamp
with time zone
I expected writing what looked like the function now() to be delayed evaluated but since I put it into quotes, the OPs complaint, it got read as the literal with ignored extra bits.
FWIW, I disagree that we need to write anything about that in this
part of the documentation. I think any argument for doing this could
equally be applied to something like re-iterating what the operator
precedence rules for arithmetic are, and I don't think that should be
mentioned.
I disagree on this equivalence. The time literals are clear deviations from expected behavior. Knowing operator precedence rules, they apply everywhere equally. And we should document the deviations directly where they happen. Even if it's just a short link back to the source that describes the deviation. I'm fine with something less verbose pointing only to the data types page, but not with nothing.
Also, what about all the other places where someone could
use one of the special timestamp input values? Should CREATE VIEW get
a memo too? How about PREPARE?
Yes.
If people don't properly understand these special timestamp input
values, then maybe the documentation in [1] needs to be improved.
Recall, and awareness, is the greater issue, not comprehension. This intends to increase the former. I don't believe the latter is an issue, though I haven't deep dived into it.
And the whole type casting happening right away just seems misleading.
postgres=# create table testboold2 (expr boolean default boolean 'false');
CREATE TABLE
postgres=# \d testboold2
Table "public.testboold2"
Column | Type | Collation | Nullable | Default
--------+---------+-----------+----------+---------
expr | boolean | | | false
CREATE TABLE
postgres=# \d testboold2
Table "public.testboold2"
Column | Type | Collation | Nullable | Default
--------+---------+-----------+----------+---------
expr | boolean | | | false
I would expect 'f' in the default column if the boolean casting of the literal happened sooner. Or I'd expect to see "boolean 'false'" as the default expression if it is captured as-is.
So yes, saving an expression into the default column has nuances that should be documented where default is defined.
Maybe the wording needs to be:
"If the default expression contains any constants [1] they are converted into their typed value during create table execution. Thus time constants [1] save into the default expression the time the command was executed."
I'd be happy to be pointed to other constants that resolve to an execution-time specific environment in a similar manner. If there is another one I'll rethink the wisdom of trying to document all of them in each place. But reminding people that time is special and we have these special values seems to provide meaningful reader benefit for the cost of a couple of sentences repeated in a few places. That were valid a decade ago no more or less than they are valid now.
David J.
On Tue, Jun 25, 2024 at 10:12 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
David Rowley <dgrowleyml@gmail.com> writes:
> If people don't properly understand these special timestamp input
> values, then maybe the documentation in [1] needs to be improved. At
> the moment the details are within parentheses. Namely "(In particular,
> now and related strings are converted to a specific time value as soon
> as they are read.)". Maybe it would be better to be more explicit
> there and mention that these are special values that the input
> function understands which are translated to actual timestamp values
> when the type's input function is called. That could maybe be tied
> into the DEFAULT clause documentation to mention that the input
> function for constant values is called at DML time rather than DDL
> time. That way, we're not adding these (unsustainable) special cases
> to the documentation.
This sounds like a reasonable approach to me for the
magic-input-values issue. Do we want to do anything about
nextval()? I guess if you hold your head at the correct
angle, that's also a magic-input-value issue, in the sense
that the question is when does regclass input get resolved.
From observations we transform constants into the: " 'value'::type " syntax which then makes it an operator resolved at execution time. For every type except time types the transformation leaves the constant as-is. The special time values are the exception whereby they get evaluated to a specific time during the transformation.
postgres=# create table tser3 (id integer not null default nextval(regclass 'tser2_id_seq'));
CREATE TABLEpostgres=# \d tser3
Table "public.tser3"
Column | Type | Collation | Nullable | Default
--------+---------+-----------+----------+-----------------------------------
id | integer | | not null | nextval('tser2_id_seq'::regclass)
I cannot figure out how to get "early binding" into the default. I.e., nextval(9000)
Since early binding is similar to the special timestamp behavior I'd say nextval is behaving just as expected - literal transform, no evaluation. We need only document the transforms that also evaluate.
David J.
On 2024-Jun-25, David G. Johnston wrote: > On Tue, Jun 25, 2024 at 9:50 PM David Rowley <dgrowleyml@gmail.com> wrote: > > FWIW, I disagree that we need to write anything about that in this > > part of the documentation. I think any argument for doing this could > > equally be applied to something like re-iterating what the operator > > precedence rules for arithmetic are, and I don't think that should be > > mentioned. > > I disagree on this equivalence. The time literals are clear deviations > from expected behavior. Knowing operator precedence rules, they apply > everywhere equally. And we should document the deviations directly where > they happen. Even if it's just a short link back to the source that > describes the deviation. I'm fine with something less verbose pointing > only to the data types page, but not with nothing. I agree that it'd be good to have _something_ -- the other stance seems super unhelpful. "We're not going to spend two lines to explain some funny rules that determine surprising behavior here, because we assume you have read all of our other 3000 pages of almost impenetrably dense documentation" is not great from a user's point of view. The behavior of 'now' in DEFAULT clauses is something that has been asked about for decades. Arithmetic precedence is a terrible straw man argument. Let's put that aside. -- Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/ "If it is not right, do not do it. If it is not true, do not say it." (Marcus Aurelius, Meditations)
On Wed, 26 Jun 2024 at 11:05, James Coleman <jtc331@gmail.com> wrote: > Hmm, I guess I'd never considered anything besides cases like > nextval() and now(), but I see now that now() must also be special > cased (when quoted) since 'date_trunc(day, now())'::timestamp doesn't > work but 'now()'::timestamp does. 'now()'::timestamp only works because we ignore trailing punctuation in ParseDateTime() during timestamp_in(). 'now!!'::timestamp works equally as well. David
On Tuesday, June 25, 2024, James Coleman <jtc331@gmail.com> wrote:
Hello,
It's possible I'm the only one who's been in this situation, but I've
multiple times found myself explaining to a user how column DEFAULT
expressions work: namely how the quoting on an expression following
the keyword DEFAULT controls whether or not the expression is
evaluated at the time of the DDL statement or at the time of an
insertion.
I don’t know if it’s worth documenting but the following sentence is implied by the syntax:
“Do not single quote the expression as a whole. Write the expression as you would in a select query.”
David J.
On Wed, 26 Jun 2024 at 17:12, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Do we want to do anything about > nextval()? I guess if you hold your head at the correct > angle, that's also a magic-input-value issue, in the sense > that the question is when does regclass input get resolved. I think I'm not understanding what's special about that. Aren't 'now'::timestamp and 'seq_name'::regclass are just casts that are evaluated during parse time in transformExpr()? David
David Rowley <dgrowleyml@gmail.com> writes: > On Wed, 26 Jun 2024 at 17:12, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Do we want to do anything about >> nextval()? I guess if you hold your head at the correct >> angle, that's also a magic-input-value issue, in the sense >> that the question is when does regclass input get resolved. > I think I'm not understanding what's special about that. Aren't > 'now'::timestamp and 'seq_name'::regclass are just casts that are > evaluated during parse time in transformExpr()? Right. But there is an example in the manual explaining how these two things act differently: 'seq_name'::regclass 'seq_name'::text::regclass The latter produces a constant of type text with a run-time cast to regclass (and hence a run-time pg_class lookup). IIRC, we document that mainly because the latter provides a way to duplicate nextval()'s old behavior of run-time lookup. Now that I think about it, there's a very parallel difference in the behavior of 'now'::timestamp 'now'::text::timestamp but I doubt that that example is shown anywhere. regards, tom lane
On Wed, 26 Jun 2024 at 17:36, David G. Johnston <david.g.johnston@gmail.com> wrote: > > On Tue, Jun 25, 2024 at 9:50 PM David Rowley <dgrowleyml@gmail.com> wrote: >> FWIW, I disagree that we need to write anything about that in this >> part of the documentation. I think any argument for doing this could >> equally be applied to something like re-iterating what the operator >> precedence rules for arithmetic are, and I don't think that should be >> mentioned. > > > I disagree on this equivalence. The time literals are clear deviations from expected behavior. Knowing operator precedencerules, they apply everywhere equally. And we should document the deviations directly where they happen. Evenif it's just a short link back to the source that describes the deviation. I'm fine with something less verbose pointingonly to the data types page, but not with nothing. Are you able to share what the special behaviour is with DEFAULT constraints and time literals that does not apply everywhere equally? Maybe I'm slow on the uptake, but I've yet to see anything here where time literals act in a special way DEFAULT constraints. This is why I couldn't understand why we should be adding documentation about this under CREATE TABLE. I'd be happy to reconsider or retract my argument if you can show me what I'm missing. David
David Rowley <dgrowleyml@gmail.com> writes: > Maybe I'm slow on the uptake, but I've yet to see anything here where > time literals act in a special way DEFAULT constraints. This is why I > couldn't understand why we should be adding documentation about this > under CREATE TABLE. It's not that the parsing rules are any different: it's that in ordinary DML queries, it seldom matters very much whether a subexpression is evaluated at parse time versus run time. In CREATE TABLE that difference is very in-your-face, so people who haven't understood the rules clearly can get burnt. However, there are certainly other places where it matters, such as queries in plpgsql functions. So I understand your reluctance to go on about it in CREATE TABLE. At the same time, I see where David J. is coming from. Maybe we could have a discussion of this in some single spot, and link to it from CREATE TABLE and other relevant places? ISTR there is something about it in the plpgsql doco already. regards, tom lane
On Thu, 27 Jun 2024 at 12:11, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > David Rowley <dgrowleyml@gmail.com> writes: > > Maybe I'm slow on the uptake, but I've yet to see anything here where > > time literals act in a special way DEFAULT constraints. This is why I > > couldn't understand why we should be adding documentation about this > > under CREATE TABLE. > > It's not that the parsing rules are any different: it's that in > ordinary DML queries, it seldom matters very much whether a > subexpression is evaluated at parse time versus run time. > In CREATE TABLE that difference is very in-your-face, so people > who haven't understood the rules clearly can get burnt. Aha, now I understand. Thanks. So, seems like CREATE TABLE is being targeted or maybe victimised here as it's probably the most common place people learn about their misuse of the timestamp special input values. > However, there are certainly other places where it matters, > such as queries in plpgsql functions. So I understand your > reluctance to go on about it in CREATE TABLE. At the same > time, I see where David J. is coming from. > > Maybe we could have a discussion of this in some single spot, > and link to it from CREATE TABLE and other relevant places? > ISTR there is something about it in the plpgsql doco already. For the special timestamp stuff, that place is probably the special timestamp table in [1]. It looks like the large caution you added in 540849814 might not be enough or perhaps wasn't done soon enough to catch the people who read that part of the manual before the caution was added. Hard to fix if it's the latter without a time machine. :-( I'm open to having some section that fleshes this stuff out a bit more with a few examples with CREATE TABLE and maybe CREATE VIEW that we can link to. Linking seems like a much more sustainable practice than adding special case documentation for non-special case behaviour. David [1] https://www.postgresql.org/docs/devel/datatype-datetime.html
On 27.06.24 02:34, David Rowley wrote: > For the special timestamp stuff, that place is probably the special > timestamp table in [1]. It looks like the large caution you added in > 540849814 might not be enough or perhaps wasn't done soon enough to > catch the people who read that part of the manual before the caution > was added. Hard to fix if it's the latter without a time machine. :-( Maybe we should really be thinking about deprecating these special values and steering users more urgently toward more robust alternatives. Imagine if 'random' were a valid input value for numeric types.