Обсуждение: misleading error message in ProcessUtilitySlow T_CreateStatsStmt
hi.
while reviewing other work, some error messages in src/backend/tcop/utility.c
seem not accurate.
static void
ProcessUtilitySlow(ParseState *pstate,
                   PlannedStmt *pstmt,
                   const char *queryString,
                   ProcessUtilityContext context,
                   ParamListInfo params,
                   QueryEnvironment *queryEnv,
                   DestReceiver *dest,
                   QueryCompletion *qc)
            case T_CreateStatsStmt:
                {
                    Oid            relid;
                    CreateStatsStmt *stmt = (CreateStatsStmt *) parsetree;
                    RangeVar   *rel = (RangeVar *) linitial(stmt->relations);
                    if (!IsA(rel, RangeVar))
                        ereport(ERROR,
                                (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
                                 errmsg("only a single relation is
allowed in CREATE STATISTICS")));
                    relid = RangeVarGetRelid(rel,
ShareUpdateExclusiveLock, false);
                    /* Run parse analysis ... */
                    stmt = transformStatsStmt(relid, stmt, queryString);
                    address = CreateStatistics(stmt);
                }
for example:
create or replace function tftest(int) returns table(a int, b int) as $$
begin
  return query select $1, $1+i from generate_series(1,5) g(i);
end;
$$ language plpgsql immutable strict;
CREATE STATISTICS alt_stat2 ON a, b FROM tftest(1);
ERROR:  only a single relation is allowed in CREATE STATISTICS
this error message seem misleading?
also this error check seems duplicated in CreateStatistics?
			
		On Thu, 21 Aug 2025 at 17:00, jian he <jian.universality@gmail.com> wrote: > > hi. Hi! > RangeVar *rel = (RangeVar *) linitial(stmt->relations); > if (!IsA(rel, RangeVar)) These two lines are weird. Looks like linitial(stmt->relations) should be assigned to variable with type Node* first? > > for example: > > create or replace function tftest(int) returns table(a int, b int) as $$ > begin > return query select $1, $1+i from generate_series(1,5) g(i); > end; > $$ language plpgsql immutable strict; > > CREATE STATISTICS alt_stat2 ON a, b FROM tftest(1); > ERROR: only a single relation is allowed in CREATE STATISTICS > > this error message seem misleading? I wouldn’t say this is misleading, but " a single relation" is indeed not precise enough. IMO we need a more precise term to distinguish regular relation and table func. > also this error check seems duplicated in CreateStatistics? > Indeed. -- Best regards, Kirill Reshke
Kirill Reshke <reshkekirill@gmail.com> writes:
> On Thu, 21 Aug 2025 at 17:00, jian he <jian.universality@gmail.com> wrote:
>> RangeVar   *rel = (RangeVar *) linitial(stmt->relations);
>> if (!IsA(rel, RangeVar))
> These two lines are weird. Looks like  linitial(stmt->relations)
> should be assigned to variable with type Node* first?
We take that sort of shortcut in many places.  If there's not any need
for the code to deal with more than one node type, an extra variable
that's used just for the IsA test seems like a lot of notational
overhead.
            regards, tom lane
			
		On 2025-Aug-21, Kirill Reshke wrote: > I wouldn’t say this is misleading, but " a single relation" is indeed > not precise enough. IMO we need a more precise term to distinguish > regular relation and table func. I'm not sure. See the definition of relation in the glossary: https://www.postgresql.org/docs/18/glossary.html#GLOSSARY-RELATION The generic term for all objects in a database that have a name and a list of attributes defined in a specific order. Tables, sequences, views, foreign tables, materialized views, composite types, and indexes are all relations. More generically, a relation is a set of tuples; for example, the result of a query is also a relation. In PostgreSQL, Class is an archaic synonym for relation. (I wonder why this says "generically" rather than "generally". Is that word choice a mistake?) Maybe in the "For example" clause we can also mention table functions. -- Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/
On Fri, 22 Aug 2025 at 14:46, Álvaro Herrera <alvherre@kurilemu.de> wrote: > > On 2025-Aug-21, Kirill Reshke wrote: > > > I wouldn’t say this is misleading, but " a single relation" is indeed > > not precise enough. IMO we need a more precise term to distinguish > > regular relation and table func. > > I'm not sure. See the definition of relation in the glossary: > https://www.postgresql.org/docs/18/glossary.html#GLOSSARY-RELATION > > The generic term for all objects in a database that have a name and a > list of attributes defined in a specific order. Tables, sequences, > views, foreign tables, materialized views, composite types, and > indexes are all relations. > > More generically, a relation is a set of tuples; for example, the > result of a query is also a relation. > > In PostgreSQL, Class is an archaic synonym for relation. > > (I wonder why this says "generically" rather than "generally". Is that > word choice a mistake?) Maybe in the "For example" clause we can also > mention table functions. > > -- > Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/ I am sorry: I am not following. CREATE STATISTIC will work only for single HEAP (or other AM) relations. So, for "simple regular tables" as one can say (not tablefunc). You say: relation is a term for both HEAP relation, tablefunc relation and much more. I say: "a single relation" in the error message is not precise enough. Where do we disagree? Anyway, I would say correct error message here should be: ``` db=# CREATE STATISTICS alt_stat2 ON a, b FROM tftest(1); ERROR: cannot define statistics for relation "alt_stat2" DETAIL: This operation is not supported for query result. ``` -- Best regards, Kirill Reshke
hi.
will
+               if (!IsA(rln, RangeVar))
+                       ereport(ERROR,
+                                       errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+                                       errmsg("CREATE STATISTICS only
supports regular tables, materialized views, foreign tables, and
partitioned tables"));
solve the problem?
			
		On Fri, Aug 22, 2025 at 6:46 PM Álvaro Herrera <alvherre@kurilemu.de> wrote: > I'm not sure. See the definition of relation in the glossary: > https://www.postgresql.org/docs/18/glossary.html#GLOSSARY-RELATION > > The generic term for all objects in a database that have a name and a > list of attributes defined in a specific order. Tables, sequences, > views, foreign tables, materialized views, composite types, and > indexes are all relations. > > More generically, a relation is a set of tuples; for example, the > result of a query is also a relation. > > In PostgreSQL, Class is an archaic synonym for relation. > > (I wonder why this says "generically" rather than "generally". Is that > word choice a mistake?) "More generally" feels more natural to me than "more generically" in this sentence, but I'm not a native English speaker, so I could be wrong. Thanks Richard
=?utf-8?Q?=C3=81lvaro?= Herrera <alvherre@kurilemu.de> writes: > I'm not sure. See the definition of relation in the glossary: > https://www.postgresql.org/docs/18/glossary.html#GLOSSARY-RELATION > The generic term for all objects in a database that have a name and a > list of attributes defined in a specific order. Tables, sequences, > views, foreign tables, materialized views, composite types, and > indexes are all relations. > More generically, a relation is a set of tuples; for example, the > result of a query is also a relation. > In PostgreSQL, Class is an archaic synonym for relation. > (I wonder why this says "generically" rather than "generally". Is that > word choice a mistake?) Maybe in the "For example" clause we can also > mention table functions. Yeah, I think s/generically/generally/ would be an improvement. I'm not certain, but I think that our use of "relation" to mean "an object with a pg_class entry" is a Postgres-ism. I do know that the meaning of "a set of tuples" is widely used, as that's where the term "relational database" comes from. Maybe whoever wrote this was trying to get at that point? But this text is hardly clear about that. regards, tom lane
On 2025-Aug-22, Kirill Reshke wrote: > I am sorry: I am not following. CREATE STATISTIC will work only for > single HEAP (or other AM) relations. So, for "simple regular tables" > as one can say (not tablefunc). Ah yeah, you're right, I think we should say "a single table", or maybe "a single table or materialized view" if the latter case is supported (I don't remember offhand if it is). The error message is mostly concerned with rejecting the case of multiple relations being given in the FROM clause, because we said at the time that we needed to make the grammar flexible enough to support that case, but make it clear that it wasn't implemented yet. (It's a pity that we haven't implemented that thus far ... I suppose it must be a difficult problem.) > DETAIL: This operation is not supported for query result. I would say "This operation is only supported for tables [and matviews]." We don't need to list all the things that aren't supported, I think. But your example here is wrong: > db=# CREATE STATISTICS alt_stat2 ON a, b FROM tftest(1); > ERROR: cannot define statistics for relation "alt_stat2" > DETAIL: This operation is not supported for query result. It's not alt_stat2 that's the problem (that's the extstats object being created), but tftest(1). I don't know if we need to specify that the problem is in the FROM clause here -- seems obvious enough to me -- but maybe someone opines differently? ERROR: cannot define statistics for relation "tftest" DETAIL: The FROM clause may only contain a single table. -- Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/ "Si quieres ser creativo, aprende el arte de perder el tiempo"
On 2025-Aug-22, Tom Lane wrote: > I'm not certain, but I think that our use of "relation" to mean > "an object with a pg_class entry" is a Postgres-ism. I do know > that the meaning of "a set of tuples" is widely used, as that's > where the term "relational database" comes from. Maybe whoever > wrote this was trying to get at that point? But this text is > hardly clear about that. Yeah, AFAIR I wrote it (or at least heavily edited the original to get to that), and I was trying to convey exactly those two ideas. If you want to propose improvements, they're very welcome. -- Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/ "Puedes vivir sólo una vez, pero si lo haces bien, una vez es suficiente"
=?utf-8?Q?=C3=81lvaro?= Herrera <alvherre@kurilemu.de> writes:
> Yeah, AFAIR I wrote it (or at least heavily edited the original to get
> to that), and I was trying to convey exactly those two ideas.  If you
> want to propose improvements, they're very welcome.
Hmmm ... maybe something like
    Mathematically, a "relation" is a set of tuples; this is the sense
    meant in the term "relational database".
    In Postgres, "relation" is commonly used to mean a database object
    that has a name and a list of attributes defined in a specific
    order. Tables, sequences, views, foreign tables, materialized
    views, composite types, and indexes are all relations. A relation
    in this sense is a container or descriptor for a set of tuples.
    "Class" is an alternative but archaic term.  The system catalog
    pg_class holds an entry for each Postgres relation.
            regards, tom lane
			
		On 2025-Aug-22, Tom Lane wrote: > Hmmm ... maybe something like > > Mathematically, a "relation" is a set of tuples; this is the sense > meant in the term "relational database". > > In Postgres, "relation" is commonly used to mean a database object > that has a name and a list of attributes defined in a specific > order. Tables, sequences, views, foreign tables, materialized > views, composite types, and indexes are all relations. A relation > in this sense is a container or descriptor for a set of tuples. > > "Class" is an alternative but archaic term. The system catalog > pg_class holds an entry for each Postgres relation. Thanks, pushed like that. I changed "a database object" to "an SQL object", because that's a term we have a definition for. (I also wrote "PostgreSQL" where you had "Postgres". I think it might be okay now to change the product name in various places here, but it seems better to do it consistently across the whole page.) -- Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/ "¿Qué importan los años? Lo que realmente importa es comprobar que a fin de cuentas la mejor edad de la vida es estar vivo" (Mafalda)
=?utf-8?Q?=C3=81lvaro?= Herrera <alvherre@kurilemu.de> writes:
> Thanks, pushed like that.  I changed "a database object" to "an SQL
> object", because that's a term we have a definition for.
> (I also wrote "PostgreSQL" where you had "Postgres".  I think it might
> be okay now to change the product name in various places here, but it
> seems better to do it consistently across the whole page.)
LGTM, thanks.  (My wording was only meant as a draft ...)
            regards, tom lane
			
		On 2025-Aug-21, jian he wrote:
>             case T_CreateStatsStmt:
>                 {
>                     Oid            relid;
>                     CreateStatsStmt *stmt = (CreateStatsStmt *) parsetree;
>                     RangeVar   *rel = (RangeVar *) linitial(stmt->relations);
> 
>                     if (!IsA(rel, RangeVar))
>                         ereport(ERROR,
>                                 (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
>                                  errmsg("only a single relation is
> allowed in CREATE STATISTICS")));
>                     relid = RangeVarGetRelid(rel,
> ShareUpdateExclusiveLock, false);
>                     /* Run parse analysis ... */
>                     stmt = transformStatsStmt(relid, stmt, queryString);
>                     address = CreateStatistics(stmt);
>                 }
Yeah, I think there's room to argue that this ereport() is just wrongly
copy-and-pasted from other nearby errors, and that it should have
completely different wording.  BTW another way to cause this is
CREATE STATISTICS alt_stat2 ON a, b FROM xmltable('foo' passing 'bar' columns a text);
I propose to use this report:
ERROR:  cannot create statistics on specified relation
DETAIL:  CREATE STATISTICS only supports tables, materialized views, foreign tables, and partitioned tables.
(Not sold on saying just "tables" vs. "regular tables" or "plain tables";
thoughts?)
While looking at this whole thing, I noticed that this code is somewhat
bogus in a different way: we're resolving the relation name twice, both
here in ProcessUtilitySlow() (to pass to transformStatsStmt), and later
again inside CreateStatistics().  It's really strange that we decided
not to pass the relation Oids as a list argument to CreateStatistics.  I
suggest to change that as the first attached patch.
Now, such a change would be appropriate only for branch master; it seems too
invasive for stable ones.  For them I propose we only change the error message,
as in the other attachment.
We should add a couple of test cases for this particular error message
also.  It's bad that these changes don't break any tests.
-- 
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"Once again, thank you and all of the developers for your hard work on
PostgreSQL.  This is by far the most pleasant management experience of
any database I've worked on."                             (Dan Harris)
http://archives.postgresql.org/pgsql-performance/2006-04/msg00247.php
			
		Вложения
=?utf-8?Q?=C3=81lvaro?= Herrera <alvherre@kurilemu.de> writes:
> I propose to use this report:
> ERROR:  cannot create statistics on specified relation
> DETAIL:  CREATE STATISTICS only supports tables, materialized views, foreign tables, and partitioned tables.
WFM, although I think you could shorten it to "tables, materialized
views, and foreign tables".  We generally expect that partitioned
tables are included when saying "tables", no?  I'm not dead set on
that either way, though.
> While looking at this whole thing, I noticed that this code is somewhat
> bogus in a different way: we're resolving the relation name twice, both
> here in ProcessUtilitySlow() (to pass to transformStatsStmt), and later
> again inside CreateStatistics().  It's really strange that we decided
> not to pass the relation Oids as a list argument to CreateStatistics.  I
> suggest to change that as the first attached patch.
I think this is a good idea, but I'm not sure that the locking
considerations are straight in this version.  Once we translate a
relation name to OID, we'd better hold lock on the rel continuously to
the end of usage of that OID.  It looks like you do, but then couldn't
the
+        rel = relation_open(relid, ShareUpdateExclusiveLock);
in CreateStatistics use NoLock to signify that we expect to have
the lock already?
Alternatively, maybe the right fix is to move the parse-analysis
work into CreateStatistics altogether.  I think there is not a
very good argument for ProcessUtilitySlow doing all that stuff.
It's supposed to mainly just be a dispatching switch(), IMO.
            regards, tom lane
			
		On Fri, Aug 29, 2025 at 5:46 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > =?utf-8?Q?=C3=81lvaro?= Herrera <alvherre@kurilemu.de> writes: > > I propose to use this report: > > > ERROR: cannot create statistics on specified relation > > DETAIL: CREATE STATISTICS only supports tables, materialized views, foreign tables, and partitioned tables. > > WFM, although I think you could shorten it to "tables, materialized > views, and foreign tables". We generally expect that partitioned > tables are included when saying "tables", no? I'm not dead set on > that either way, though. > https://www.postgresql.org/docs/current/sql-copy.html use "COPY TO can be used only with plain tables, not views, and does not copy rows from child tables or child partitions" > > Alternatively, maybe the right fix is to move the parse-analysis > work into CreateStatistics altogether. I think there is not a > very good argument for ProcessUtilitySlow doing all that stuff. > It's supposed to mainly just be a dispatching switch(), IMO. > seems doable. transformStatsStmt, CreateStatistics both used only twice, refactoring arguments should be fine. please check the attached POC, regress tests also added. main idea is first check CreateStatsStmt->relations, then call transformStatsStmt, transformStatsStmt only to transform CreateStatsStmt->exprs. also the above complaint about the relation lock issue will be resolved.
Вложения
On 2025-Aug-29, jian he wrote: > On Fri, Aug 29, 2025 at 5:46 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > WFM, although I think you could shorten it to "tables, materialized > > views, and foreign tables". We generally expect that partitioned > > tables are included when saying "tables", no? I'm not dead set on > > that either way, though. > > https://www.postgresql.org/docs/current/sql-copy.html > use "COPY TO can be used only with plain tables, not views, and does > not copy rows from child tables or child partitions" I'm inclined to think that we should only mention partitioned tables specifically when they for some reason deviate from what we do for regular tables, i.e., what Tom is saying. I don't think we've had an explicit, consistent rule for that thus far, so there may be places where we fail to follow it. Anyway, I have pushed the error message change. > > Alternatively, maybe the right fix is to move the parse-analysis > > work into CreateStatistics altogether. I think there is not a > > very good argument for ProcessUtilitySlow doing all that stuff. > > It's supposed to mainly just be a dispatching switch(), IMO. > > seems doable. > transformStatsStmt, CreateStatistics both used only twice, refactoring > arguments should be fine. > please check the attached POC, regress tests also added. Yeah, I like how this turned out. I found out this was introduced in commit a4d75c86bf15. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
On Fri, Aug 29, 2025 at 8:48 PM Álvaro Herrera <alvherre@kurilemu.de> wrote: > > > > Alternatively, maybe the right fix is to move the parse-analysis > > > work into CreateStatistics altogether. I think there is not a > > > very good argument for ProcessUtilitySlow doing all that stuff. > > > It's supposed to mainly just be a dispatching switch(), IMO. > > > > seems doable. > > transformStatsStmt, CreateStatistics both used only twice, refactoring > > arguments should be fine. > > please check the attached POC, regress tests also added. > > Yeah, I like how this turned out. I found out this was introduced in > commit a4d75c86bf15. Previously, CreateStatistics and ProcessUtilitySlow opened the same relation twice with a ShareUpdateExclusiveLock. This refactor ensures the relation is opened with ShareUpdateExclusiveLock only once and also reduces redundant error checks. The logic is now more intuitive: we first error checking CreateStatsStmt->relations and then call transformStatsStmt to parse analysis CreateStatsStmt->exprs. please check the attached refactor CreateStatsStmt.
Вложения
On 29.08.25 14:48, Álvaro Herrera wrote: > On 2025-Aug-29, jian he wrote: > >> On Fri, Aug 29, 2025 at 5:46 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > >>> WFM, although I think you could shorten it to "tables, materialized >>> views, and foreign tables". We generally expect that partitioned >>> tables are included when saying "tables", no? I'm not dead set on >>> that either way, though. >> >> https://www.postgresql.org/docs/current/sql-copy.html >> use "COPY TO can be used only with plain tables, not views, and does >> not copy rows from child tables or child partitions" > > I'm inclined to think that we should only mention partitioned tables > specifically when they for some reason deviate from what we do for > regular tables, i.e., what Tom is saying. I don't think we've had an > explicit, consistent rule for that thus far, so there may be places > where we fail to follow it. > > Anyway, I have pushed the error message change. I think this message is still wrong. The check doesn't even look at the relation kind, which is what the message is implying. (If the message were about relkinds, then it should use errdetail_relkind_not_supported().) It checks that the from list entry is a table name instead of some other thing like VALUES or JOIN. So it should be something like CREATE STATISTICS only supports plain table names in the FROM clause The same could also be accomplished by changing the grammar from FROM from_list to FROM qualified_name_list
On 05.09.25 14:30, Peter Eisentraut wrote: > On 29.08.25 14:48, Álvaro Herrera wrote: >> On 2025-Aug-29, jian he wrote: >> >>> On Fri, Aug 29, 2025 at 5:46 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> >>>> WFM, although I think you could shorten it to "tables, materialized >>>> views, and foreign tables". We generally expect that partitioned >>>> tables are included when saying "tables", no? I'm not dead set on >>>> that either way, though. >>> >>> https://www.postgresql.org/docs/current/sql-copy.html >>> use "COPY TO can be used only with plain tables, not views, and does >>> not copy rows from child tables or child partitions" >> >> I'm inclined to think that we should only mention partitioned tables >> specifically when they for some reason deviate from what we do for >> regular tables, i.e., what Tom is saying. I don't think we've had an >> explicit, consistent rule for that thus far, so there may be places >> where we fail to follow it. >> >> Anyway, I have pushed the error message change. > > I think this message is still wrong. The check doesn't even look at the > relation kind, which is what the message is implying. (If the message > were about relkinds, then it should use > errdetail_relkind_not_supported().) It checks that the from list entry > is a table name instead of some other thing like VALUES or JOIN. So it > should be something like > > CREATE STATISTICS only supports plain table names in the FROM clause I propose the attached patch to fix this. I think this restores the original meaning better.
Вложения
Peter Eisentraut <peter@eisentraut.org> writes:
> I propose the attached patch to fix this.  I think this restores the 
> original meaning better.
I'm okay with this wording change, but I would stay with
ERRCODE_FEATURE_NOT_SUPPORTED rather than calling this
a "syntax error".  It's not a syntax error IMV, but a
potential feature that we have deliberately left syntax
space for, even though we don't yet have ideas about
a workable implementation.
            regards, tom lane
			
		On 12.09.25 15:49, Tom Lane wrote: > Peter Eisentraut <peter@eisentraut.org> writes: >> I propose the attached patch to fix this. I think this restores the >> original meaning better. > > I'm okay with this wording change, but I would stay with > ERRCODE_FEATURE_NOT_SUPPORTED rather than calling this > a "syntax error". It's not a syntax error IMV, but a > potential feature that we have deliberately left syntax > space for, even though we don't yet have ideas about > a workable implementation. Ok, done that way.
hi. moving all code in ProcessUtilitySlow (case T_CreateStatsStmt:) to CreateStatistic seems more intuitive to me. so I rebased v3.
Вложения
On 2025-Oct-02, jian he wrote:
> hi.
> 
> moving all code in ProcessUtilitySlow (case T_CreateStatsStmt:)
> to CreateStatistic seems more intuitive to me.
> 
> so I rebased v3.
Yeah, this looks good.  But I think we can go a little further.  Because
CreateStatistics() now calls transformStatsStmt(), we don't need to do
the same in ATPostAlterTypeParse(), which allows us to simplify the code
there.  In turn this means we can pass the whole Relation rather than
merely the OID, so transformStatsStmt doesn't need to open it.  I attach
v4 with those changes.  Comments?
For a moment it looked weird to me to pass a Relation to the parse
analysis routine, but I see that other functions declared in
parse_utilcmd.h are already doing that.
One more thing I noticed is that we're scribbling on the parser node,
which we can easily avoid by creating a copy of the node in
transformStatsStmt() before doing any changes.  I'm not too clear
on whether this is really necessary or not.  We do it in other places,
but we haven't done it here for a long while, and nothing seems to break
because of that.
diff --git a/src/backend/parser/parse_utilcmd.c b/src/backend/parser/parse_utilcmd.c
index 89c8315117b..7797c707f73 100644
--- a/src/backend/parser/parse_utilcmd.c
+++ b/src/backend/parser/parse_utilcmd.c
@@ -3150,6 +3150,9 @@ transformStatsStmt(Relation rel, CreateStatsStmt *stmt, const char *queryString)
     if (stmt->transformed)
         return stmt;
 
+    /* create a copy to scribble on */
+    stmt = copyObject(stmt);
+
     /* Set up pstate */
     pstate = make_parsestate(NULL);
     pstate->p_sourcetext = queryString;
-- 
Álvaro Herrera        Breisgau, Deutschland  —  https://www.EnterpriseDB.com/
"Uno puede defenderse de los ataques; contra los elogios se esta indefenso"
			
		Вложения
On Fri, Oct 17, 2025 at 8:13 PM Álvaro Herrera <alvherre@kurilemu.de> wrote:
>
> Yeah, this looks good.  But I think we can go a little further.  Because
> CreateStatistics() now calls transformStatsStmt(), we don't need to do
> the same in ATPostAlterTypeParse(), which allows us to simplify the code
> there.  In turn this means we can pass the whole Relation rather than
> merely the OID, so transformStatsStmt doesn't need to open it.  I attach
> v4 with those changes.  Comments?
>
/*
 * transformStatsStmt - parse analysis for CREATE STATISTICS
 *
 * To avoid race conditions, it's important that this function relies only on
 * the passed-in rel (and not on stmt->relation) as the target relation.
 */
CreateStatsStmt *
transformStatsStmt(Relation rel, CreateStatsStmt *stmt, const char *queryString)
{
......
}
the (Relation rel) effectively comes from "stmt->relations", which
conflict with the above
comments.
> For a moment it looked weird to me to pass a Relation to the parse
> analysis routine, but I see that other functions declared in
> parse_utilcmd.h are already doing that.
>
>
> One more thing I noticed is that we're scribbling on the parser node,
> which we can easily avoid by creating a copy of the node in
> transformStatsStmt() before doing any changes.  I'm not too clear
> on whether this is really necessary or not.  We do it in other places,
> but we haven't done it here for a long while, and nothing seems to break
> because of that.
>
> diff --git a/src/backend/parser/parse_utilcmd.c b/src/backend/parser/parse_utilcmd.c
> index 89c8315117b..7797c707f73 100644
> --- a/src/backend/parser/parse_utilcmd.c
> +++ b/src/backend/parser/parse_utilcmd.c
> @@ -3150,6 +3150,9 @@ transformStatsStmt(Relation rel, CreateStatsStmt *stmt, const char *queryString)
>         if (stmt->transformed)
>                 return stmt;
>
> +       /* create a copy to scribble on */
> +       stmt = copyObject(stmt);
> +
>         /* Set up pstate */
>         pstate = make_parsestate(NULL);
>         pstate->p_sourcetext = queryString;
>
in src/backend/parser/parse_utilcmd.c, we have only two occurences of
copyObject.
			
		On 2025-Oct-20, jian he wrote: > On Fri, Oct 17, 2025 at 8:13 PM Álvaro Herrera <alvherre@kurilemu.de> wrote: > /* > * transformStatsStmt - parse analysis for CREATE STATISTICS > * > * To avoid race conditions, it's important that this function relies only on > * the passed-in rel (and not on stmt->relation) as the target relation. > */ > CreateStatsStmt * > transformStatsStmt(Relation rel, CreateStatsStmt *stmt, const char *queryString) > the (Relation rel) effectively comes from "stmt->relations", which > conflict with the above comments. Hmm? It does not. The whole point is that the relation name (RangeVar stmt->relations) has already been resolved to an OID and locked, which is what we pass as 'Relation rel'. Trying to resolve the name to an OID again opens us up for race conditions. This is alleviated if the relation has already been locked, so maybe we can relax this comment; but it's not outright contradictory AFAICS. -- Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/ "Oh, great altar of passive entertainment, bestow upon me thy discordant images at such speed as to render linear thought impossible" (Calvin a la TV)
On Mon, Oct 20, 2025 at 3:31 PM Álvaro Herrera <alvherre@kurilemu.de> wrote:
>
> On 2025-Oct-20, jian he wrote:
>
> > On Fri, Oct 17, 2025 at 8:13 PM Álvaro Herrera <alvherre@kurilemu.de> wrote:
>
> > /*
> >  * transformStatsStmt - parse analysis for CREATE STATISTICS
> >  *
> >  * To avoid race conditions, it's important that this function relies only on
> >  * the passed-in rel (and not on stmt->relation) as the target relation.
> >  */
> > CreateStatsStmt *
> > transformStatsStmt(Relation rel, CreateStatsStmt *stmt, const char *queryString)
>
> > the (Relation rel) effectively comes from "stmt->relations", which
> > conflict with the above comments.
>
> Hmm?  It does not.  The whole point is that the relation name (RangeVar
> stmt->relations) has already been resolved to an OID and locked, which
> is what we pass as 'Relation rel'.  Trying to resolve the name to an OID
> again opens us up for race conditions.  This is alleviated if the
> relation has already been locked, so maybe we can relax this comment;
> but it's not outright contradictory AFAICS.
>
I think I understand your point.
When ALTER TABLE SET DATA TYPE invokes CreateStatistics, if the Relation (rel)
returned by CreateStatistics->relation_openrv is not the same as
ATPostAlterTypeParse.oldRelId, the regression test would already fail.
@@ -15632,11 +15623,6 @@ ATPostAlterTypeParse(Oid oldId, Oid oldRelId,
Oid refRelId, char *cmd,
  querytree_list = lappend(querytree_list, stmt);
  querytree_list = list_concat(querytree_list, afterStmts);
  }
- else if (IsA(stmt, CreateStatsStmt))
- querytree_list = lappend(querytree_list,
- transformStatsStmt(oldRelId,
- (CreateStatsStmt *) stmt,
- cmd));
The above "cmd" is the queryString, which is useful for error reporting.
create table t(a  int);
create statistics xxx on (a + 1 is not null) from t;
alter table t alter column a set data type text;
with patch, v4-0001-Restructure-CreateStatsStmt-parse-analysis-proces.patch
ERROR:  operator does not exist: text + integer
DETAIL:  No operator of that name accepts the given argument types.
HINT:  You might need to add explicit type casts.
In the master branch, the error message also includes the error position.
ERROR:  operator does not exist: text + integer
LINE 1: alter table t alter column a set data type text;
                                            ^
DETAIL:  No operator of that name accepts the given argument types.
HINT:  You might need to add explicit type casts.
			
		On 2025-Oct-20, jian he wrote: > @@ -15632,11 +15623,6 @@ ATPostAlterTypeParse(Oid oldId, Oid oldRelId, > Oid refRelId, char *cmd, > querytree_list = lappend(querytree_list, stmt); > querytree_list = list_concat(querytree_list, afterStmts); > } > - else if (IsA(stmt, CreateStatsStmt)) > - querytree_list = lappend(querytree_list, > - transformStatsStmt(oldRelId, > - (CreateStatsStmt *) stmt, > - cmd)); > > The above "cmd" is the queryString, which is useful for error reporting. > > create table t(a int); > create statistics xxx on (a + 1 is not null) from t; > alter table t alter column a set data type text; > > with patch, v4-0001-Restructure-CreateStatsStmt-parse-analysis-proces.patch > ERROR: operator does not exist: text + integer > DETAIL: No operator of that name accepts the given argument types. > HINT: You might need to add explicit type casts. > > In the master branch, the error message also includes the error position. > ERROR: operator does not exist: text + integer > LINE 1: alter table t alter column a set data type text; > ^ > DETAIL: No operator of that name accepts the given argument types. > HINT: You might need to add explicit type casts. Interesting, thanks for the example. I think this example illustrates very well why the 'cmd' is rather useless here -- note how the error message refers to an operator that appears nowhere in the query string. The user says "SET DATA TYPE text" and then we complain about the "text+integer" operator? That makes no sense and I don't think it's in any way useful. I think a user-friendly thing we could do is add an errcontext callback that shows that we're trying to reapply a statistics object. We should discuss that in a separate thread for a separate patch though, and also I don't think that thread should prevent this rework from being applied now. We could add this test scenario to the regression tests though, if only to show how it would change later. -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/