Обсуждение: Show expression of virtual columns in error messages
Hi, When a constraint violation occurs on a table with virtual generated columns, the "Failing row contains" error message currently displays the literal string "virtual" as a placeholder: CREATE TABLE t (a int, b int, c int GENERATED ALWAYS AS (a * 2) VIRTUAL CHECK (c < 10)); INSERT INTO t VALUES (5, 10); ERROR: new row for relation "t" violates check constraint "t_c_check" DETAIL: Failing row contains (5, 10, virtual). This isn't very helpful for debugging, especially when the user needs to understand why the check constraint failed or if multiple virtual generated columns are involved on the original statement. The attached patch changes this behavior to show the virtual column expression instead: ERROR: new row for relation "t" violates check constraint "t_c_check" DETAIL: Failing row contains (5, 10, a * 2). An alternative approach would be to compute and display the actual value of the virtual column (e.g., "10" instead of "a * 2"). However, I chose to show the expression because: 1. It's simpler to implement (no need to evaluate the expression) 2. It shows the user *how* the value is computed, which may be more useful for understanding why a constraint failed 3. It avoids potential issues with expression evaluation in error paths Thoughts? -- Matheus Alcantara EDB: https://www.enterprisedb.com
Вложения
On 03.02.26 15:06, Matheus Alcantara wrote: > The attached patch changes this behavior to show the virtual column > expression instead: > > ERROR: new row for relation "t" violates check constraint "t_c_check" > DETAIL: Failing row contains (5, 10, a * 2). Could be useful, but in this context you don't know which column is "a"?
On Wed Feb 4, 2026 at 11:57 AM -03, Peter Eisentraut wrote:
> On 03.02.26 15:06, Matheus Alcantara wrote:
>> The attached patch changes this behavior to show the virtual column
>> expression instead:
>>
>> ERROR: new row for relation "t" violates check constraint "t_c_check"
>> DETAIL: Failing row contains (5, 10, a * 2).
>
> Could be useful, but in this context you don't know which column is "a"?
I agree but we at least show more information to the user. The user may
still need to check the table definition to understand the constraint or
to get the column ordering of the table to understand what the value of
"a" is being used to compute the expression.
If the statement declare the columns that the values are referecing this
could be easier to know (e.g INSERT INTO t(a, b) VALUES (5, 10)) but the
user may still need to check the table definition to understand the
constraint.
I see this patch as a way to at least give more information to user
about the error.
Another possibility would be to get the actual values of "a" for example
and show it on the error message, e.g:
ERROR: new row for relation "t" violates check constraint "t_c_check"
DETAIL: Failing row contains (5, 10, 5 * 2).
--
Matheus Alcantara
EDB: https://www.enterprisedb.com
On Wed, 04 Feb 2026 14:54:11 -0300 "Matheus Alcantara" <matheusssilv97@gmail.com> wrote: > On Wed Feb 4, 2026 at 11:57 AM -03, Peter Eisentraut wrote: > > On 03.02.26 15:06, Matheus Alcantara wrote: > >> The attached patch changes this behavior to show the virtual column > >> expression instead: > >> > >> ERROR: new row for relation "t" violates check constraint "t_c_check" > >> DETAIL: Failing row contains (5, 10, a * 2). > > > > Could be useful, but in this context you don't know which column is "a"? > > I agree but we at least show more information to the user. The user may > still need to check the table definition to understand the constraint or > to get the column ordering of the table to understand what the value of > "a" is being used to compute the expression. > > If the statement declare the columns that the values are referecing this > could be easier to know (e.g INSERT INTO t(a, b) VALUES (5, 10)) but the > user may still need to check the table definition to understand the > constraint. > > I see this patch as a way to at least give more information to user > about the error. > > Another possibility would be to get the actual values of "a" for example > and show it on the error message, e.g: > > ERROR: new row for relation "t" violates check constraint "t_c_check" > DETAIL: Failing row contains (5, 10, 5 * 2). That would indeed be more useful. One way to achieve this might be to modify deparse_context and get_variable() so that a Var is displayed as its actual value. Another possibility would be to include column names in the DETAIL message, for example: ERROR: new row for relation "t" violates check constraint "t_c_check" DETAIL: Failing row contains (a, b, c)=(5, 10, a * 2). Although this would change the existing message format, including column names could generally provide users with more information about the error. Regards, Yugo Nagata -- Yugo Nagata <nagata@sraoss.co.jp>
On Wed Feb 4, 2026 at 11:45 PM -03, Yugo Nagata wrote: >> Another possibility would be to get the actual values of "a" for example >> and show it on the error message, e.g: >> >> ERROR: new row for relation "t" violates check constraint "t_c_check" >> DETAIL: Failing row contains (5, 10, 5 * 2). > > That would indeed be more useful. One way to achieve this might be to > modify deparse_context and get_variable() so that a Var is displayed as its > actual value. > I'm not sure if I understand how modifying deparse_context_for() could help on this. What I did was to use the expression_tree_mutator API to mutate the virtual column expression to replace any Var reference with the value into the TupleTableSlot. Please see the attached v2 version. > Another possibility would be to include column names in the DETAIL message, > for example: > > ERROR: new row for relation "t" violates check constraint "t_c_check" > DETAIL: Failing row contains (a, b, c)=(5, 10, a * 2). > > Although this would change the existing message format, including column > names could generally provide users with more information about the error. > I think that this could make the error output too verbose when there is a lot of columns involved on the statement. -- Matheus Alcantara EDB: https://www.enterprisedb.com
Вложения
On Thu, 05 Feb 2026 15:51:54 -0300 "Matheus Alcantara" <matheusssilv97@gmail.com> wrote: > On Wed Feb 4, 2026 at 11:45 PM -03, Yugo Nagata wrote: > >> Another possibility would be to get the actual values of "a" for example > >> and show it on the error message, e.g: > >> > >> ERROR: new row for relation "t" violates check constraint "t_c_check" > >> DETAIL: Failing row contains (5, 10, 5 * 2). > > > > That would indeed be more useful. One way to achieve this might be to > > modify deparse_context and get_variable() so that a Var is displayed as its > > actual value. > > > I'm not sure if I understand how modifying deparse_context_for() could > help on this. > > What I did was to use the expression_tree_mutator API to mutate the > virtual column expression to replace any Var reference with the value > into the TupleTableSlot. Please see the attached v2 version. I initially thought about having deparse_context_for() directly output actual values for Var references, but that does not seem like a good approach. Replacing Vars with Consts beforehand using expression_tree_mutator, as you did, looks like a better solution. One thing I noticed is that some expressions in virtual columns (e.g., NULL and negative integers) differ from those in normal columns, for example in the following (null vs. NULL): +DETAIL: Failing row contains (null, NULLIF(NULL::integer, 0)). However, this seems acceptable. Overall, this patch looks good to me. Regards, Yugo Nagata > > Another possibility would be to include column names in the DETAIL message, > > for example: > > > > ERROR: new row for relation "t" violates check constraint "t_c_check" > > DETAIL: Failing row contains (a, b, c)=(5, 10, a * 2). > > > > Although this would change the existing message format, including column > > names could generally provide users with more information about the error. > > > I think that this could make the error output too verbose when there is > a lot of columns involved on the statement. > > -- > Matheus Alcantara > EDB: https://www.enterprisedb.com > -- Yugo Nagata <nagata@sraoss.co.jp>
On Fri Feb 6, 2026 at 12:54 AM -03, Yugo Nagata wrote: >> >> Another possibility would be to get the actual values of "a" for example >> >> and show it on the error message, e.g: >> >> >> >> ERROR: new row for relation "t" violates check constraint "t_c_check" >> >> DETAIL: Failing row contains (5, 10, 5 * 2). >> > >> > That would indeed be more useful. One way to achieve this might be to >> > modify deparse_context and get_variable() so that a Var is displayed as its >> > actual value. >> > >> I'm not sure if I understand how modifying deparse_context_for() could >> help on this. >> >> What I did was to use the expression_tree_mutator API to mutate the >> virtual column expression to replace any Var reference with the value >> into the TupleTableSlot. Please see the attached v2 version. > > I initially thought about having deparse_context_for() directly output > actual values for Var references, but that does not seem like a good > approach. Replacing Vars with Consts beforehand using > expression_tree_mutator, as you did, looks like a better solution. > > One thing I noticed is that some expressions in virtual columns (e.g., > NULL and negative integers) differ from those in normal columns, for > example in the following (null vs. NULL): > > +DETAIL: Failing row contains (null, NULLIF(NULL::integer, 0)). > > However, this seems acceptable. > Yeah, after parsing we don't have the original string representation of the parse node, so at deparse_expression() it has it's own style to deparse nodes. I don't think that it's an issue and I also think that it's accepttable. > Overall, this patch looks good to me. > Thank you for reviewing the patch! -- Matheus Alcantara EDB: https://www.enterprisedb.com
Hi, Attached rebased v3 due to f80bedd52b1. No additional changes compared to v2. -- Matheus Alcantara EDB: https://www.enterprisedb.com
Вложения
Matheus Alcantara <matheusssilv97@gmail.com> writes:
> Attached rebased v3 due to f80bedd52b1. No additional changes compared
> to v2.
I looked at this and frankly I think it's going in the wrong
direction. I agree that showing "virtual" is unhelpful, but
I don't like this approach for a couple of reasons:
1. Inserting a column expression could easily make the error
message annoyingly long.
2. Adding this much complexity in an error code path doesn't
seem like a good idea. Such paths are typically not well
tested, and if there is any bug lurking, it will result in
an error message quite removed from the user's problem.
3. It's making virtual generated columns behave (even more)
differently from stored generated columns. I think the
general plan has been to make them act as alike as possible.
So what would comport better with the behavior of stored columns
is to show the expression's value. I agree with you that
calculating that in the error path is a no-go, but haven't we
computed it earlier? Or could we do so, if there are constraints
to be checked?
regards, tom lane
On Thu Feb 26, 2026 at 3:47 PM -03, Tom Lane wrote: > Matheus Alcantara <matheusssilv97@gmail.com> writes: >> Attached rebased v3 due to f80bedd52b1. No additional changes compared >> to v2. > > I looked at this and frankly I think it's going in the wrong > direction. I agree that showing "virtual" is unhelpful, but > I don't like this approach for a couple of reasons: > > 1. Inserting a column expression could easily make the error > message annoyingly long. > > 2. Adding this much complexity in an error code path doesn't > seem like a good idea. Such paths are typically not well > tested, and if there is any bug lurking, it will result in > an error message quite removed from the user's problem. > > 3. It's making virtual generated columns behave (even more) > differently from stored generated columns. I think the > general plan has been to make them act as alike as possible. > > So what would comport better with the behavior of stored columns > is to show the expression's value. I agree with you that > calculating that in the error path is a no-go, but haven't we > computed it earlier? Or could we do so, if there are constraints > to be checked? Thanks for the feedback. After investigating the code a bit, I found that IIUC virtual column values are actually never computed and stored separately, they're computed by expanding the expression wherever the column is referenced. According to my understanding, here's how it works during constraint checking in ExecRelCheck(): 1. The constraint expression (e.g., c > 10) is loaded from the catalog 2. Virtual column references are expanded via expand_generated_columns_in_expr(), transforming it to (a + b) > 10 3. ExecCheck() evaluates the entire expanded expression, reading base column values (a, b) directly from the slot and returns the if the check was true or false. The virtual column value (a + b) is computed as an intermediate result during expression evaluation of a check constraint but is never stored anywhere accessible. So, if all of this make sense and it's correct it seems to me that we indeed need to compute the virtual expression value on error path to show on error message although it would not be desirable, or I'm missing something? -- Matheus Alcantara EDB: https://www.enterprisedb.com
"Matheus Alcantara" <matheusssilv97@gmail.com> writes:
> On Thu Feb 26, 2026 at 3:47 PM -03, Tom Lane wrote:
>> 3. It's making virtual generated columns behave (even more)
>> differently from stored generated columns. I think the
>> general plan has been to make them act as alike as possible.
>>
>> So what would comport better with the behavior of stored columns
>> is to show the expression's value. I agree with you that
>> calculating that in the error path is a no-go, but haven't we
>> computed it earlier? Or could we do so, if there are constraints
>> to be checked?
> Thanks for the feedback. After investigating the code a bit, I found
> that IIUC virtual column values are actually never computed and stored
> separately, they're computed by expanding the expression wherever the
> column is referenced.
Correct: rather than storing them, we recalculate the expression
whenever the column's value is demanded. But what I'm suggesting is
that we ought to calculate the value during INSERT/UPDATE as well,
even though it will not get written to disk. It'd be useful to do
that for the purposes of this error message. But I think we ought to
do it even when there are no constraints, because that ensures that
it's *possible* to calculate the value, and that there are not for
instance overflow problems. As things stand, it's possible to create
a row that cannot be fetched:
regression=# create table foo (f1 int, f2 int generated always as (f1 * 1000000));
CREATE TABLE
regression=# insert into foo values(1);
INSERT 0 1
regression=# table foo;
f1 | f2
----+---------
1 | 1000000
(1 row)
regression=# insert into foo values(1000000);
INSERT 0 1
regression=# table foo;
ERROR: integer out of range
That may or may not be per spec, but I submit that it's extremely
unhelpful and unfriendly, as well as being an undesirable discrepancy
from the behavior of a stored generated column.
Of course, calculating the value at insertion time won't completely
prevent such problems: if the expression is less immutable than
advertised, we could still fail at readout time. But that's pretty
clearly a case of user misfeasance. Index expressions that aren't
really immutable cause worse problems than this, so I don't have a
problem with saying "you broke it, you get to keep both pieces".
Perhaps someone will argue that their expression is too expensive for
it to be okay to calculate at insertion. But if it's that expensive,
why in the world did they make the column virtual rather than stored?
In short, I think we ought to compute these values during
INSERT/UPDATE, even though we're not going to write them to disk.
And then they'd be available to display in this error message.
regards, tom lane