Обсуждение: some grammar refactoring
Here is a series of patches to do some refactoring in the grammar around the commands COMMENT, DROP, SECURITY LABEL, and ALTER EXTENSION ... ADD/DROP. In the grammar, these commands (with some exceptions) basically just take a reference to an object and later look it up in C code. Some of that was already generalized individually for each command (drop_type_any_name, drop_type_name, etc.). This patch combines it into common lists for all these commands. Advantages: - Avoids having to list each object type at least four times. - Object types not supported by security labels or extensions are now explicitly listed and give a proper error message. Previously, this was just encoded in the grammar itself and specifying a non-supported object type would just give a parse error. - Reduces lines of code in gram.y. - Removes some old cruft. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Вложения
- v1-0001-Remove-redundant-grammar-symbols.patch
- v1-0002-Fold-AlterForeignTableStmt-into-AlterTableStmt.patch
- v1-0003-Remove-deprecated-syntax-from-CREATE-DROP-LANGUAG.patch
- v1-0004-Refactor-DROP-LANGUAGE-grammar.patch
- v1-0005-Grammar-object-type-refactoring.patch
- v1-0006-Refactor-AlterExtensionContentsStmt-grammar.patch
On Tue, May 19, 2020 at 2:43 AM Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote: > Here is a series of patches to do some refactoring in the grammar around > the commands COMMENT, DROP, SECURITY LABEL, and ALTER EXTENSION ... > ADD/DROP. In the grammar, these commands (with some exceptions) > basically just take a reference to an object and later look it up in C > code. Some of that was already generalized individually for each > command (drop_type_any_name, drop_type_name, etc.). This patch combines > it into common lists for all these commands. I haven't reviewed the code, but +1 for the idea. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Tue, May 19, 2020 at 12:13 PM Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote:
Here is a series of patches to do some refactoring in the grammar around
the commands COMMENT, DROP, SECURITY LABEL, and ALTER EXTENSION ...
ADD/DROP. In the grammar, these commands (with some exceptions)
basically just take a reference to an object and later look it up in C
code. Some of that was already generalized individually for each
command (drop_type_any_name, drop_type_name, etc.). This patch combines
it into common lists for all these commands.
Advantages:
- Avoids having to list each object type at least four times.
- Object types not supported by security labels or extensions are now
explicitly listed and give a proper error message. Previously, this was
just encoded in the grammar itself and specifying a non-supported object
type would just give a parse error.
- Reduces lines of code in gram.y.
- Removes some old cruft.
I liked the idea.
I had quick glance through the patches and also did quick review and testing.
I haven't found any issue with the patch.
--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Rushabh Lathia
On 2020-05-19 08:43, Peter Eisentraut wrote: > Here is a series of patches to do some refactoring in the grammar around > the commands COMMENT, DROP, SECURITY LABEL, and ALTER EXTENSION ... > ADD/DROP. In the grammar, these commands (with some exceptions) > basically just take a reference to an object and later look it up in C > code. Some of that was already generalized individually for each > command (drop_type_any_name, drop_type_name, etc.). This patch combines > it into common lists for all these commands. While most of this patch set makes no behavior changes by design, I should point out this little change hidden in the middle: Remove deprecated syntax from CREATE/DROP LANGUAGE Remove the option to specify the language name as a single-quoted string. This has been obsolete since ee8ed85da3b. Removing it allows better grammar refactoring. The syntax of the CREATE FUNCTION LANGUAGE clause is not changed. (ee8ed85da3b is in PG 7.2.) I expect this to be uncontroversial, but it should be pointed out. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
> On May 18, 2020, at 11:43 PM, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote: > > Here is a series of patches to do some refactoring in the grammar around the commands COMMENT, DROP, SECURITY LABEL, andALTER EXTENSION ... ADD/DROP. In the grammar, these commands (with some exceptions) basically just take a reference toan object and later look it up in C code. Some of that was already generalized individually for each command (drop_type_any_name,drop_type_name, etc.). This patch combines it into common lists for all these commands. > > Advantages: > > - Avoids having to list each object type at least four times. > > - Object types not supported by security labels or extensions are now explicitly listed and give a proper error message. Previously, this was just encoded in the grammar itself and specifying a non-supported object type would just givea parse error. > > - Reduces lines of code in gram.y. > > - Removes some old cruft. I like the general direction you are going with this, but the decision in v1-0006 to move the error for invalid object typesout of gram.y and into extension.c raises an organizational question. At some places in gram.y, there is C code thatchecks parsed tokens and ereports if they are invalid, in some sense extending the grammar right within gram.y. In manyother places, including what you are doing in this patch, the token is merely stored in a Stmt object with the errorchecking delayed until command processing. For tokens which need to be checked against the catalogs, that decisionmakes perfect sense. But for ones where all the information necessary to validate the token exists in the parser,it is not clear to me why it gets delayed until command processing. Is there a design principle behind when thesechecks are done in gram.y vs. when they are delayed to the command processing? I'm guessing in v1-0006 that you aredoing it this way because there are multiple places in gram.y where tokens would need to be checked, and by delaying thecheck until ExecAlterExtensionContentsStmt, you can put the check all in one place. Is that all it is? I have had reason in the past to want to reorganize gram.y to have all these types of checks in a single, consistent formatand location, rather than scattered through gram.y and backend/commands/. Does anybody else have an interest in this? My interest in this stems from the fact that bison can be run to generate data files that can then be used in reverse togenerate random SQL. The more the parsing logic is visible to bison, the more useful the generated data files are. Buta single, consistent design for extra-grammatical error checks could help augment those files fairly well, too. — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2020-05-22 18:53, Mark Dilger wrote: > I like the general direction you are going with this, but the decision in v1-0006 to move the error for invalid objecttypes out of gram.y and into extension.c raises an organizational question. At some places in gram.y, there is Ccode that checks parsed tokens and ereports if they are invalid, in some sense extending the grammar right within gram.y. In many other places, including what you are doing in this patch, the token is merely stored in a Stmt object withthe error checking delayed until command processing. For tokens which need to be checked against the catalogs, thatdecision makes perfect sense. But for ones where all the information necessary to validate the token exists in the parser,it is not clear to me why it gets delayed until command processing. Is there a design principle behind when thesechecks are done in gram.y vs. when they are delayed to the command processing? I'm guessing in v1-0006 that you aredoing it this way because there are multiple places in gram.y where tokens would need to be checked, and by delaying thecheck until ExecAlterExtensionContentsStmt, you can put the check all in one place. Is that all it is? We have been for some time moving to a style where we rely on switch statements around OBJECT_* constants to (a) decide what is allowed with certain object types, and (b) make sure we have an explicit decision on each object type and don't forget any. This has worked well, I think. This is more of that. Before this patch, it would have been pretty hard to find out which object types are supported with extensions or security labels, except by very carefully reading the grammar. Moreover, you now get a proper error message for unsupported object types rather than just a generic parse error. > I have had reason in the past to want to reorganize gram.y to have all these types of checks in a single, consistent formatand location, rather than scattered through gram.y and backend/commands/. Does anybody else have an interest in this? > > My interest in this stems from the fact that bison can be run to generate data files that can then be used in reverse togenerate random SQL. The more the parsing logic is visible to bison, the more useful the generated data files are. Buta single, consistent design for extra-grammatical error checks could help augment those files fairly well, too. It's certainly already the case that the grammar accepts statements that end up being invalid, even if you ignore catalog lookup. I don't think my patch moves the needle on this in a significant way. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
> On May 25, 2020, at 2:55 AM, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote: > > On 2020-05-22 18:53, Mark Dilger wrote: >> I like the general direction you are going with this, but the decision in v1-0006 to move the error for invalid objecttypes out of gram.y and into extension.c raises an organizational question. At some places in gram.y, there is Ccode that checks parsed tokens and ereports if they are invalid, in some sense extending the grammar right within gram.y. In many other places, including what you are doing in this patch, the token is merely stored in a Stmt object withthe error checking delayed until command processing. For tokens which need to be checked against the catalogs, thatdecision makes perfect sense. But for ones where all the information necessary to validate the token exists in the parser,it is not clear to me why it gets delayed until command processing. Is there a design principle behind when thesechecks are done in gram.y vs. when they are delayed to the command processing? I'm guessing in v1-0006 that you aredoing it this way because there are multiple places in gram.y where tokens would need to be checked, and by delaying thecheck until ExecAlterExtensionContentsStmt, you can put the check all in one place. Is that all it is? > > We have been for some time moving to a style where we rely on switch statements around OBJECT_* constants to (a) decidewhat is allowed with certain object types, and (b) make sure we have an explicit decision on each object type and don'tforget any. This has worked well, I think. Yes, I think so, too. I like that overall design. > This is more of that. Yes, it is. > Before this patch, it would have been pretty hard to find out which object types are supported with extensions or securitylabels, except by very carefully reading the grammar. Fair enough. > Moreover, you now get a proper error message for unsupported object types rather than just a generic parse error. Sounds great. >> I have had reason in the past to want to reorganize gram.y to have all these types of checks in a single, consistent formatand location, rather than scattered through gram.y and backend/commands/. Does anybody else have an interest in this? >> My interest in this stems from the fact that bison can be run to generate data files that can then be used in reverseto generate random SQL. The more the parsing logic is visible to bison, the more useful the generated data filesare. But a single, consistent design for extra-grammatical error checks could help augment those files fairly well,too. > > It's certainly already the case that the grammar accepts statements that end up being invalid, even if you ignore cataloglookup. I don't think my patch moves the needle on this in a significant way. I don't think it moves the needle too much, either. But since your patch is entirely a refactoring patch and not a featurepatch, I thought it would be fair to ask larger questions about how the code should be structured. I like using enumsand switch statements and getting better error messages, but there doesn't seem to be any fundamental reason why thatshould be in the command execution step. It feels like a layering violation to me. I don't object to this patch getting committed. A subsequent patch to consolidate all the grammar checks into src/backend/parserand out of src/backend/commands won't be blocked by this. — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2020-05-25 21:09, Mark Dilger wrote: > I don't think it moves the needle too much, either. But since your patch is entirely a refactoring patch and not a featurepatch, I thought it would be fair to ask larger questions about how the code should be structured. I like using enumsand switch statements and getting better error messages, but there doesn't seem to be any fundamental reason why thatshould be in the command execution step. It feels like a layering violation to me. Most utility commands don't have an intermediate parse analysis pass. They just go straight from the grammar to the execution. Maybe that could be rethought, but that's the way it is now. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Tue, May 26, 2020 at 4:28 AM Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote: > On 2020-05-25 21:09, Mark Dilger wrote: > > I don't think it moves the needle too much, either. But since your patch is entirely a refactoring patch and not a featurepatch, I thought it would be fair to ask larger questions about how the code should be structured. I like using enumsand switch statements and getting better error messages, but there doesn't seem to be any fundamental reason why thatshould be in the command execution step. It feels like a layering violation to me. > > Most utility commands don't have an intermediate parse analysis pass. > They just go straight from the grammar to the execution. Maybe that > could be rethought, but that's the way it is now. I think it can and should be rethought at some point. The present split leads to a lot of weird coding. We've had security vulnerabilities that were due to things like passing the same RangeVar to two different places, leading to two different lookups for the name that could be induced to return different OIDs. It also leads to a lot of fuzzy thinking about where locks are taken, in which order, and how many times, and with what strength. The code for queries seems to have been thought through a lot more carefully, because the existence of prepared queries makes mistakes a lot more noticeable. I hope some day someone will be motivated to improve the situation for DDL as well, though it will probably be a thankless task. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Tue, May 26, 2020 at 4:28 AM Peter Eisentraut > <peter.eisentraut@2ndquadrant.com> wrote: >> Most utility commands don't have an intermediate parse analysis pass. >> They just go straight from the grammar to the execution. Maybe that >> could be rethought, but that's the way it is now. > I think it can and should be rethought at some point. The other problem is that the ones that do have explicit parse analysis tend to be doing it at the wrong time. I've fixed some ALTER TABLE problems by rearranging that, but we still have open bugs that are due to this type of mistake, eg [1]. I agree that we need a rethink, and we need it badly. If this patch is changing when any parse-analysis-like actions happen, then I would say that it needs very careful review --- much more than the "refactoring" label would suggest. Maybe it's making things better, or maybe it doesn't matter; but this area is a minefield. regards, tom lane [1] https://www.postgresql.org/message-id/flat/16272-6e32da020e9a9381%40postgresql.org
On 2020-05-19 08:43, Peter Eisentraut wrote: > Here is a series of patches to do some refactoring in the grammar around > the commands COMMENT, DROP, SECURITY LABEL, and ALTER EXTENSION ... > ADD/DROP. These patches have been committed. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services