Обсуждение: gen_guc_tables improvements
Hello While reviewing a patch, I noticed a typo in guc_params.dat. The code compiled and seemingly worked, and I was very surprised that the generator script didn't catch the mistake. I looked into it, and I found several missing checks in gen_guc_tables. I attached fixes for 4 that I think would definitely improve the script (for now as separate patches, so it is easy to select only some of them): * 0001 fixes the issue that started this, it validates the allowed field names, preventing typos in their names * 0002 goes a step further and validates that fields specific to some types can only appear for those types * 0003 just improves the error reported by duplicate names, previously this was confusing (it referred to incorrect ordering) * 0004 adds basic checks about allowed characters in GUC names I was also thinking about adding validations for the enum/define values (config group, flags, guc context), but that requires a somewhat fragile extraction code, and I decided to leave that out for now. What do you think about these changes?
Вложения
On Sun, Mar 15, 2026 at 12:18:53PM +0000, Zsolt Parragi wrote: > While reviewing a patch, I noticed a typo in guc_params.dat. The code > compiled and seemingly worked, and I was very surprised that the > generator script didn't catch the mistake. Nice. > I was also thinking about adding validations for the enum/define > values (config group, flags, guc context), but that requires a > somewhat fragile extraction code, and I decided to leave that out for > now. I'd say no on this one, which would mean going back-and-forth with the input. If these fail due to a compilation failure, it's not that bad either, it would be easy to pin-point that the failures are related to the values of the GUC, not the overall shape of the data in the input file. Saying that, it would depend on how much complexity this adds and what kind of validation we'd get out of it. If the additions to the script are simple like the ones you are proposing here, that would be probably OK if these improvements catch a bunch of ground-breaking inconsistencies. If the whole script would need to be refactored, probably not. > What do you think about these changes? Such sanity checks can save from stupid mistakes, for new or even seasoned developers of the tree. When I worked on the wait event parsing scripts or even the query jumbling for DDLs, I have noticed that checking ahead the shape of the input files to make sure that correct C code is produced may not catch all the mistakes, but catching most of them is more productive than catching none of them. The GUC script is new as of HEAD, so it seems more useful to do such adjustments now. -- Michael
Вложения
> On Mar 15, 2026, at 20:18, Zsolt Parragi <zsolt.parragi@percona.com> wrote:
>
> Hello
>
> While reviewing a patch, I noticed a typo in guc_params.dat. The code
> compiled and seemingly worked, and I was very surprised that the
> generator script didn't catch the mistake.
>
> I looked into it, and I found several missing checks in
> gen_guc_tables. I attached fixes for 4 that I think would definitely
> improve the script (for now as separate patches, so it is easy to
> select only some of them):
>
> * 0001 fixes the issue that started this, it validates the allowed
> field names, preventing typos in their names
> * 0002 goes a step further and validates that fields specific to some
> types can only appear for those types
> * 0003 just improves the error reported by duplicate names, previously
> this was confusing (it referred to incorrect ordering)
> * 0004 adds basic checks about allowed characters in GUC names
>
> I was also thinking about adding validations for the enum/define
> values (config group, flags, guc context), but that requires a
> somewhat fragile extraction code, and I decided to leave that out for
> now.
>
> What do you think about these changes?
>
<0001-gen_guc_tables-reject-unrecognized-field-names-in-gu.patch><0002-gen_guc_tables-reject-type-inappropriate-fields-in-g.patch><0003-gen_guc_tables-report-duplicate-entry-names-distinct.patch><0004-gen_guc_tables-validate-GUC-name-format.patch>
I reviewed the patch and played a little bit. Overall looks good to me.
My only comment is on 0004:
```
+ unless ($entry->{name} =~ /^[a-zA-Z][a-zA-Z0-9_]*$/)
+ {
+ die sprintf(
+ qq{%s:%d: error: entry name "%s" is not a valid GUC name (must start with a letter, contain only letters,
digits,and underscores)\n},
+ $input_fname, $entry->{line_number}, $entry->{name});
+ }
+
```
I think dot is allowed in a GUC name. Looking at the C code:
```
/*
* Decide whether a proposed custom variable name is allowed.
*
* It must be two or more identifiers separated by dots, where the rules
* for what is an identifier agree with scan.l. (If you change this rule,
* adjust the errdetail in assignable_custom_variable_name().)
*/
static bool
valid_custom_variable_name(const char *name)
```
A GUC name can be dot separated multiple identifiers, the rule this patch used applies to identifiers.
Thought current guc_parameters.dat doesn’t have any GUC whose name contains a dot, but dot should be allowed. For
example,I tried on master branch with adding “.test” to an existing GUC name:
```
{ name => 'debug_print_raw_parse.test', type => 'bool', context => 'PGC_USERSET', group => 'LOGGING_WHAT',
short_desc => 'Logs each query\'s raw parse tree.',
variable => 'Debug_print_raw_parse',
boot_val => 'false',
},
```
And it works:
```
evantest=# set debug_print_raw_parse.test = 1;
SET
```
BTW, I have similar patch [1] that verify duplicate GUC enum values.
[1] https://www.postgresql.org/message-id/CAEoWx2nAKPx1N1VzKVHjtwqP%2Bs%3DyMR%2BFdmrh44uVNODg--T03w%40mail.gmail.com
Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/
On Sun, Mar 15, 2026 at 10:27 PM Michael Paquier <michael@paquier.xyz> wrote: > I'd say no on this one, which would mean going back-and-forth with the > input. If these fail due to a compilation failure, it's not that bad > either, it would be easy to pin-point that the failures are related to > the values of the GUC, not the overall shape of the data in the input > file. Saying that, it would depend on how much complexity this adds > and what kind of validation we'd get out of it. If the additions to > the script are simple like the ones you are proposing here, that would > be probably OK if these improvements catch a bunch of ground-breaking > inconsistencies. If the whole script would need to be refactored, > probably not. We can have two kinds of errors: 1. A typo in an enum name - this will result in a compilation failure most likely, and the compiler will provide nice error messages with the fix suggestion (e.g. GUC_NOT_IN_SAMPLE -> GUC_NOT_NI_SAMPLE, or PGC_SUSET -> PGC_SUSTE) 2. a mismatched enum because of a copy paste mistake (group says PGC_SUSET instead of WAL_ARCHIVING, or boot_val says WAL_ARCHIVING instead of ARCHIVE_MODE_OFF). The examples I provided there aren't likely copy-paste errors, that one is when you copy-paste an existing enum guc and forget to change its boot value. (2) currently compiles without warnings in both cases, but it is also usually easier to spot during review. The patch itself would be well contained: one or two utility functions (GUC_NOT_IN_SAMPLE and other flags are defines, not enums), and then calling them at a few places. So it's not a significant refactoring. But I'm also not a fan of parsing C source code in perl, that's what I meant by fragile. It seems to work, but I wouldn't guarantee that it can properly parse all edge cases. Thinking about this more, these checks would better be implemented as a custom clang-tidy check for example, which I already started to look into for another issue. On Mon, Mar 16, 2026 at 3:13 AM Chao Li <li.evan.chao@gmail.com> wrote: > I think dot is allowed in a GUC name. Looking at the C code: Yes, but by convention dot is only used by extensions to mark the prefix/namespace of the extension (<extension_name>.<guc_name>). None of the core server GUCs use it. The point of this script isn't to ensure that the generated C code will compile, but to prevent hidden errors and ensure we follow existing proper coding conventions. > BTW, I have similar patch [1] that verify duplicate GUC enum values. I forgotten about that, thanks for the reminder! I didn't like the runtime check approach of it, and I think there are more exceptions than the whitelisted items in the patch, but adding the ability to mark a guc value list unique could be another good candidate for a custom clang-tidy check. (assuming people like my clang-tidy idea, which I didn't even start a thread about yet)
On Mon, Mar 16, 2026 at 05:57:39AM +0000, Zsolt Parragi wrote: > Yes, but by convention dot is only used by extensions to mark the > prefix/namespace of the extension (<extension_name>.<guc_name>). None > of the core server GUCs use it. The point of this script isn't to > ensure that the generated C code will compile, but to prevent hidden > errors and ensure we follow existing proper coding conventions. I am not sure that it is a good idea to enforce that in the script this way, to be honest. There may be a point about this new rule being annoying for forks of the core code, at least, where they would like to add their own parameters with dots in the names. I have done that in the past, as one example, and I am sure that there are projects out there that do so, meaning the requirement to live with one more custom patch to make things work if this rule is enforced in gen_guc_tables.pl. Saying that, as far as I can see 0001 and 0003 will save some time when inserting some incorrect data. The case of duplicated names was indeed confusing, and 0003 can save from typos when specifying non-mandatory fields. So I have applied these two. 0004 has some value here, to save the .dat from copy-paste bloat. I am wondering what others think about it. -- Michael