Обсуждение: some grammar refactoring

Поиск
Список
Период
Сортировка

some grammar refactoring

От
Peter Eisentraut
Дата:
Here is a series of patches to do some refactoring in the grammar around 
the commands COMMENT, DROP, SECURITY LABEL, and ALTER EXTENSION ... 
ADD/DROP.  In the grammar, these commands (with some exceptions) 
basically just take a reference to an object and later look it up in C 
code.  Some of that was already generalized individually for each 
command (drop_type_any_name, drop_type_name, etc.).  This patch combines 
it into common lists for all these commands.

Advantages:

- Avoids having to list each object type at least four times.

- Object types not supported by security labels or extensions are now 
explicitly listed and give a proper error message.  Previously, this was 
just encoded in the grammar itself and specifying a non-supported object 
type would just give a parse error.

- Reduces lines of code in gram.y.

- Removes some old cruft.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Вложения

Re: some grammar refactoring

От
Robert Haas
Дата:
On Tue, May 19, 2020 at 2:43 AM Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
> Here is a series of patches to do some refactoring in the grammar around
> the commands COMMENT, DROP, SECURITY LABEL, and ALTER EXTENSION ...
> ADD/DROP.  In the grammar, these commands (with some exceptions)
> basically just take a reference to an object and later look it up in C
> code.  Some of that was already generalized individually for each
> command (drop_type_any_name, drop_type_name, etc.).  This patch combines
> it into common lists for all these commands.

I haven't reviewed the code, but +1 for the idea.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: some grammar refactoring

От
Rushabh Lathia
Дата:


On Tue, May 19, 2020 at 12:13 PM Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote:
Here is a series of patches to do some refactoring in the grammar around
the commands COMMENT, DROP, SECURITY LABEL, and ALTER EXTENSION ...
ADD/DROP.  In the grammar, these commands (with some exceptions)
basically just take a reference to an object and later look it up in C
code.  Some of that was already generalized individually for each
command (drop_type_any_name, drop_type_name, etc.).  This patch combines
it into common lists for all these commands.

Advantages:

- Avoids having to list each object type at least four times.

- Object types not supported by security labels or extensions are now
explicitly listed and give a proper error message.  Previously, this was
just encoded in the grammar itself and specifying a non-supported object
type would just give a parse error.

- Reduces lines of code in gram.y.

- Removes some old cruft.


I liked the idea.

I had quick glance through the patches and also did quick review and testing.
I haven't found any issue with the patch.

--
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Rushabh Lathia

Re: some grammar refactoring

От
Peter Eisentraut
Дата:
On 2020-05-19 08:43, Peter Eisentraut wrote:
> Here is a series of patches to do some refactoring in the grammar around
> the commands COMMENT, DROP, SECURITY LABEL, and ALTER EXTENSION ...
> ADD/DROP.  In the grammar, these commands (with some exceptions)
> basically just take a reference to an object and later look it up in C
> code.  Some of that was already generalized individually for each
> command (drop_type_any_name, drop_type_name, etc.).  This patch combines
> it into common lists for all these commands.

While most of this patch set makes no behavior changes by design, I 
should point out this little change hidden in the middle:

     Remove deprecated syntax from CREATE/DROP LANGUAGE

     Remove the option to specify the language name as a single-quoted
     string.  This has been obsolete since ee8ed85da3b.  Removing it allows
     better grammar refactoring.

     The syntax of the CREATE FUNCTION LANGUAGE clause is not changed.

(ee8ed85da3b is in PG 7.2.)

I expect this to be uncontroversial, but it should be pointed out.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: some grammar refactoring

От
Mark Dilger
Дата:

> On May 18, 2020, at 11:43 PM, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote:
>
> Here is a series of patches to do some refactoring in the grammar around the commands COMMENT, DROP, SECURITY LABEL,
andALTER EXTENSION ... ADD/DROP.  In the grammar, these commands (with some exceptions) basically just take a reference
toan object and later look it up in C code.  Some of that was already generalized individually for each command
(drop_type_any_name,drop_type_name, etc.).  This patch combines it into common lists for all these commands. 
>
> Advantages:
>
> - Avoids having to list each object type at least four times.
>
> - Object types not supported by security labels or extensions are now explicitly listed and give a proper error
message. Previously, this was just encoded in the grammar itself and specifying a non-supported object type would just
givea parse error. 
>
> - Reduces lines of code in gram.y.
>
> - Removes some old cruft.

I like the general direction you are going with this, but the decision in v1-0006 to move the error for invalid object
typesout of gram.y and into extension.c raises an organizational question.   At some places in gram.y, there is C code
thatchecks parsed tokens and ereports if they are invalid, in some sense extending the grammar right within gram.y.  In
manyother places, including what you are doing in this patch, the token is merely stored in a Stmt object with the
errorchecking delayed until command processing.  For tokens which need to be checked against the catalogs, that
decisionmakes perfect sense.  But for ones where all the information necessary to validate the token exists in the
parser,it is not clear to me why it gets delayed until command processing.  Is there a design principle behind when
thesechecks are done in gram.y vs. when they are delayed to the command processing?  I'm guessing in v1-0006 that you
aredoing it this way because there are multiple places in gram.y where tokens would need to be checked, and by delaying
thecheck until ExecAlterExtensionContentsStmt, you can put the check all in one place.  Is that all it is? 

I have had reason in the past to want to reorganize gram.y to have all these types of checks in a single, consistent
formatand location, rather than scattered through gram.y and backend/commands/.  Does anybody else have an interest in
this?

My interest in this stems from the fact that bison can be run to generate data files that can then be used in reverse
togenerate random SQL.  The more the parsing logic is visible to bison, the more useful the generated data files are.
Buta single, consistent design for extra-grammatical error checks could help augment those files fairly well, too. 

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company






Re: some grammar refactoring

От
Peter Eisentraut
Дата:
On 2020-05-22 18:53, Mark Dilger wrote:
> I like the general direction you are going with this, but the decision in v1-0006 to move the error for invalid
objecttypes out of gram.y and into extension.c raises an organizational question.   At some places in gram.y, there is
Ccode that checks parsed tokens and ereports if they are invalid, in some sense extending the grammar right within
gram.y. In many other places, including what you are doing in this patch, the token is merely stored in a Stmt object
withthe error checking delayed until command processing.  For tokens which need to be checked against the catalogs,
thatdecision makes perfect sense.  But for ones where all the information necessary to validate the token exists in the
parser,it is not clear to me why it gets delayed until command processing.  Is there a design principle behind when
thesechecks are done in gram.y vs. when they are delayed to the command processing?  I'm guessing in v1-0006 that you
aredoing it this way because there are multiple places in gram.y where tokens would need to be checked, and by delaying
thecheck until ExecAlterExtensionContentsStmt, you can put the check all in one place.  Is that all it is?
 

We have been for some time moving to a style where we rely on switch 
statements around OBJECT_* constants to (a) decide what is allowed with 
certain object types, and (b) make sure we have an explicit decision on 
each object type and don't forget any.  This has worked well, I think.

This is more of that.  Before this patch, it would have been pretty hard 
to find out which object types are supported with extensions or security 
labels, except by very carefully reading the grammar.

Moreover, you now get a proper error message for unsupported object 
types rather than just a generic parse error.

> I have had reason in the past to want to reorganize gram.y to have all these types of checks in a single, consistent
formatand location, rather than scattered through gram.y and backend/commands/.  Does anybody else have an interest in
this?
> 
> My interest in this stems from the fact that bison can be run to generate data files that can then be used in reverse
togenerate random SQL.  The more the parsing logic is visible to bison, the more useful the generated data files are.
Buta single, consistent design for extra-grammatical error checks could help augment those files fairly well, too.
 

It's certainly already the case that the grammar accepts statements that 
end up being invalid, even if you ignore catalog lookup.  I don't think 
my patch moves the needle on this in a significant way.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: some grammar refactoring

От
Mark Dilger
Дата:

> On May 25, 2020, at 2:55 AM, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote:
>
> On 2020-05-22 18:53, Mark Dilger wrote:
>> I like the general direction you are going with this, but the decision in v1-0006 to move the error for invalid
objecttypes out of gram.y and into extension.c raises an organizational question.   At some places in gram.y, there is
Ccode that checks parsed tokens and ereports if they are invalid, in some sense extending the grammar right within
gram.y. In many other places, including what you are doing in this patch, the token is merely stored in a Stmt object
withthe error checking delayed until command processing.  For tokens which need to be checked against the catalogs,
thatdecision makes perfect sense.  But for ones where all the information necessary to validate the token exists in the
parser,it is not clear to me why it gets delayed until command processing.  Is there a design principle behind when
thesechecks are done in gram.y vs. when they are delayed to the command processing?  I'm guessing in v1-0006 that you
aredoing it this way because there are multiple places in gram.y where tokens would need to be checked, and by delaying
thecheck until ExecAlterExtensionContentsStmt, you can put the check all in one place.  Is that all it is? 
>
> We have been for some time moving to a style where we rely on switch statements around OBJECT_* constants to (a)
decidewhat is allowed with certain object types, and (b) make sure we have an explicit decision on each object type and
don'tforget any.  This has worked well, I think. 

Yes, I think so, too.  I like that overall design.

> This is more of that.

Yes, it is.

> Before this patch, it would have been pretty hard to find out which object types are supported with extensions or
securitylabels, except by very carefully reading the grammar. 

Fair enough.

> Moreover, you now get a proper error message for unsupported object types rather than just a generic parse error.

Sounds great.

>> I have had reason in the past to want to reorganize gram.y to have all these types of checks in a single, consistent
formatand location, rather than scattered through gram.y and backend/commands/.  Does anybody else have an interest in
this?
>> My interest in this stems from the fact that bison can be run to generate data files that can then be used in
reverseto generate random SQL.  The more the parsing logic is visible to bison, the more useful the generated data
filesare.  But a single, consistent design for extra-grammatical error checks could help augment those files fairly
well,too. 
>
> It's certainly already the case that the grammar accepts statements that end up being invalid, even if you ignore
cataloglookup.  I don't think my patch moves the needle on this in a significant way. 

I don't think it moves the needle too much, either.  But since your patch is entirely a refactoring patch and not a
featurepatch, I thought it would be fair to ask larger questions about how the code should be structured.  I like using
enumsand switch statements and getting better error messages, but there doesn't seem to be any fundamental reason why
thatshould be in the command execution step.  It feels like a layering violation to me. 

I don't object to this patch getting committed.  A subsequent patch to consolidate all the grammar checks into
src/backend/parserand out of src/backend/commands won't be blocked by this. 

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company






Re: some grammar refactoring

От
Peter Eisentraut
Дата:
On 2020-05-25 21:09, Mark Dilger wrote:
> I don't think it moves the needle too much, either.  But since your patch is entirely a refactoring patch and not a
featurepatch, I thought it would be fair to ask larger questions about how the code should be structured.  I like using
enumsand switch statements and getting better error messages, but there doesn't seem to be any fundamental reason why
thatshould be in the command execution step.  It feels like a layering violation to me.
 

Most utility commands don't have an intermediate parse analysis pass. 
They just go straight from the grammar to the execution.  Maybe that 
could be rethought, but that's the way it is now.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: some grammar refactoring

От
Robert Haas
Дата:
On Tue, May 26, 2020 at 4:28 AM Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
> On 2020-05-25 21:09, Mark Dilger wrote:
> > I don't think it moves the needle too much, either.  But since your patch is entirely a refactoring patch and not a
featurepatch, I thought it would be fair to ask larger questions about how the code should be structured.  I like using
enumsand switch statements and getting better error messages, but there doesn't seem to be any fundamental reason why
thatshould be in the command execution step.  It feels like a layering violation to me. 
>
> Most utility commands don't have an intermediate parse analysis pass.
> They just go straight from the grammar to the execution.  Maybe that
> could be rethought, but that's the way it is now.

I think it can and should be rethought at some point. The present
split leads to a lot of weird coding. We've had security
vulnerabilities that were due to things like passing the same RangeVar
to two different places, leading to two different lookups for the name
that could be induced to return different OIDs. It also leads to a lot
of fuzzy thinking about where locks are taken, in which order, and how
many times, and with what strength. The code for queries seems to have
been thought through a lot more carefully, because the existence of
prepared queries makes mistakes a lot more noticeable. I hope some day
someone will be motivated to improve the situation for DDL as well,
though it will probably be a thankless task.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: some grammar refactoring

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
> On Tue, May 26, 2020 at 4:28 AM Peter Eisentraut
> <peter.eisentraut@2ndquadrant.com> wrote:
>> Most utility commands don't have an intermediate parse analysis pass.
>> They just go straight from the grammar to the execution.  Maybe that
>> could be rethought, but that's the way it is now.

> I think it can and should be rethought at some point.

The other problem is that the ones that do have explicit parse analysis
tend to be doing it at the wrong time.  I've fixed some ALTER TABLE
problems by rearranging that, but we still have open bugs that are due
to this type of mistake, eg [1].  I agree that we need a rethink, and
we need it badly.

If this patch is changing when any parse-analysis-like actions happen,
then I would say that it needs very careful review --- much more than
the "refactoring" label would suggest.  Maybe it's making things better,
or maybe it doesn't matter; but this area is a minefield.

            regards, tom lane

[1] https://www.postgresql.org/message-id/flat/16272-6e32da020e9a9381%40postgresql.org



Re: some grammar refactoring

От
Peter Eisentraut
Дата:
On 2020-05-19 08:43, Peter Eisentraut wrote:
> Here is a series of patches to do some refactoring in the grammar around
> the commands COMMENT, DROP, SECURITY LABEL, and ALTER EXTENSION ...
> ADD/DROP.

These patches have been committed.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services