Обсуждение: gen_guc_tables improvements

Поиск
Список
Период
Сортировка

gen_guc_tables improvements

От
Zsolt Parragi
Дата:
Hello

While reviewing a patch, I noticed a typo in guc_params.dat. The code
compiled and seemingly worked, and I was very surprised that the
generator script didn't catch the mistake.

I looked into it, and I found several missing checks in
gen_guc_tables. I attached fixes for 4 that I think would definitely
improve the script (for now as separate patches, so it is easy to
select only some of them):

* 0001 fixes the issue that started this, it validates the allowed
field names, preventing typos in their names
* 0002 goes a step further and validates that fields specific to some
types can only appear for those types
* 0003 just improves the error reported by duplicate names, previously
this was confusing (it referred to incorrect ordering)
* 0004 adds basic checks about allowed characters in GUC names

I was also thinking about adding validations for the enum/define
values (config group, flags, guc context), but that requires a
somewhat fragile extraction code, and I decided to leave that out for
now.

What do you think about these changes?

Вложения

Re: gen_guc_tables improvements

От
Michael Paquier
Дата:
On Sun, Mar 15, 2026 at 12:18:53PM +0000, Zsolt Parragi wrote:
> While reviewing a patch, I noticed a typo in guc_params.dat. The code
> compiled and seemingly worked, and I was very surprised that the
> generator script didn't catch the mistake.

Nice.

> I was also thinking about adding validations for the enum/define
> values (config group, flags, guc context), but that requires a
> somewhat fragile extraction code, and I decided to leave that out for
> now.

I'd say no on this one, which would mean going back-and-forth with the
input.  If these fail due to a compilation failure, it's not that bad
either, it would be easy to pin-point that the failures are related to
the values of the GUC, not the overall shape of the data in the input
file.  Saying that, it would depend on how much complexity this adds
and what kind of validation we'd get out of it.  If the additions to
the script are simple like the ones you are proposing here, that would
be probably OK if these improvements catch a bunch of ground-breaking
inconsistencies.  If the whole script would need to be refactored,
probably not.

> What do you think about these changes?

Such sanity checks can save from stupid mistakes, for new or even
seasoned developers of the tree.  When I worked on the wait event
parsing scripts or even the query jumbling for DDLs, I have noticed
that checking ahead the shape of the input files to make sure that
correct C code is produced may not catch all the mistakes, but
catching most of them is more productive than catching none of them.
The GUC script is new as of HEAD, so it seems more useful to do such
adjustments now.
--
Michael

Вложения

Re: gen_guc_tables improvements

От
Chao Li
Дата:

> On Mar 15, 2026, at 20:18, Zsolt Parragi <zsolt.parragi@percona.com> wrote:
>
> Hello
>
> While reviewing a patch, I noticed a typo in guc_params.dat. The code
> compiled and seemingly worked, and I was very surprised that the
> generator script didn't catch the mistake.
>
> I looked into it, and I found several missing checks in
> gen_guc_tables. I attached fixes for 4 that I think would definitely
> improve the script (for now as separate patches, so it is easy to
> select only some of them):
>
> * 0001 fixes the issue that started this, it validates the allowed
> field names, preventing typos in their names
> * 0002 goes a step further and validates that fields specific to some
> types can only appear for those types
> * 0003 just improves the error reported by duplicate names, previously
> this was confusing (it referred to incorrect ordering)
> * 0004 adds basic checks about allowed characters in GUC names
>
> I was also thinking about adding validations for the enum/define
> values (config group, flags, guc context), but that requires a
> somewhat fragile extraction code, and I decided to leave that out for
> now.
>
> What do you think about these changes?
>
<0001-gen_guc_tables-reject-unrecognized-field-names-in-gu.patch><0002-gen_guc_tables-reject-type-inappropriate-fields-in-g.patch><0003-gen_guc_tables-report-duplicate-entry-names-distinct.patch><0004-gen_guc_tables-validate-GUC-name-format.patch>

I reviewed the patch and played a little bit. Overall looks good to me.

My only comment is on 0004:
```
+    unless ($entry->{name} =~ /^[a-zA-Z][a-zA-Z0-9_]*$/)
+    {
+        die sprintf(
+            qq{%s:%d: error: entry name "%s" is not a valid GUC name (must start with a letter, contain only letters,
digits,and underscores)\n}, 
+            $input_fname, $entry->{line_number}, $entry->{name});
+    }
+
```

I think dot is allowed in a GUC name. Looking at the C code:
```
/*
* Decide whether a proposed custom variable name is allowed.
*
* It must be two or more identifiers separated by dots, where the rules
* for what is an identifier agree with scan.l. (If you change this rule,
* adjust the errdetail in assignable_custom_variable_name().)
*/
static bool
valid_custom_variable_name(const char *name)
```

A GUC name can be dot separated multiple identifiers, the rule this patch used applies to identifiers.

Thought current guc_parameters.dat doesn’t have any GUC whose name contains a dot, but dot should be allowed. For
example,I tried on master branch with adding “.test” to an existing GUC name: 
```
{ name => 'debug_print_raw_parse.test', type => 'bool', context => 'PGC_USERSET', group => 'LOGGING_WHAT',
  short_desc => 'Logs each query\'s raw parse tree.',
  variable => 'Debug_print_raw_parse',
  boot_val => 'false',
},
```

And it works:
```
evantest=# set debug_print_raw_parse.test = 1;
SET
```

BTW, I have similar patch [1] that verify duplicate GUC enum values.

[1] https://www.postgresql.org/message-id/CAEoWx2nAKPx1N1VzKVHjtwqP%2Bs%3DyMR%2BFdmrh44uVNODg--T03w%40mail.gmail.com


Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/






Re: gen_guc_tables improvements

От
Zsolt Parragi
Дата:
On Sun, Mar 15, 2026 at 10:27 PM Michael Paquier <michael@paquier.xyz> wrote:

> I'd say no on this one, which would mean going back-and-forth with the
> input.  If these fail due to a compilation failure, it's not that bad
> either, it would be easy to pin-point that the failures are related to
> the values of the GUC, not the overall shape of the data in the input
> file.  Saying that, it would depend on how much complexity this adds
> and what kind of validation we'd get out of it.  If the additions to
> the script are simple like the ones you are proposing here, that would
> be probably OK if these improvements catch a bunch of ground-breaking
> inconsistencies.  If the whole script would need to be refactored,
> probably not.

We can have two kinds of errors:

1. A typo in an enum name - this will result in a compilation failure
most likely, and the compiler will provide nice error messages with
the fix suggestion (e.g. GUC_NOT_IN_SAMPLE -> GUC_NOT_NI_SAMPLE, or
PGC_SUSET -> PGC_SUSTE)
2. a mismatched enum because of a copy paste mistake (group says
PGC_SUSET instead of WAL_ARCHIVING, or boot_val says WAL_ARCHIVING
instead of ARCHIVE_MODE_OFF). The examples I provided there aren't
likely copy-paste errors, that one is when you copy-paste an existing
enum guc and forget to change its boot value.

(2) currently compiles without warnings in both cases, but it is also
usually easier to spot during review.

The patch itself would be well contained: one or two utility functions
(GUC_NOT_IN_SAMPLE and other flags are defines, not enums), and then
calling them at a few places. So it's not a significant refactoring.

But I'm also not a fan of parsing C source code in perl, that's what I
meant by fragile. It seems to work, but I wouldn't guarantee that it
can properly parse all edge cases. Thinking about this more, these
checks would better be implemented as a custom clang-tidy check for
example, which I already started to look into for another issue.



On Mon, Mar 16, 2026 at 3:13 AM Chao Li <li.evan.chao@gmail.com> wrote:

> I think dot is allowed in a GUC name. Looking at the C code:

Yes, but by convention dot is only used by extensions to mark the
prefix/namespace of the extension (<extension_name>.<guc_name>). None
of the core server GUCs use it. The point of this script isn't to
ensure that the generated C code will compile, but to prevent hidden
errors and ensure we follow existing proper coding conventions.

> BTW, I have similar patch [1] that verify duplicate GUC enum values.

I forgotten about that, thanks for the reminder! I didn't like the
runtime check approach of it, and I think there are more exceptions
than the whitelisted items in the patch, but adding the ability to
mark a guc value list unique could be another good candidate for a
custom clang-tidy check. (assuming people like my clang-tidy idea,
which I didn't even start a thread about yet)



Re: gen_guc_tables improvements

От
Michael Paquier
Дата:
On Mon, Mar 16, 2026 at 05:57:39AM +0000, Zsolt Parragi wrote:
> Yes, but by convention dot is only used by extensions to mark the
> prefix/namespace of the extension (<extension_name>.<guc_name>). None
> of the core server GUCs use it. The point of this script isn't to
> ensure that the generated C code will compile, but to prevent hidden
> errors and ensure we follow existing proper coding conventions.

I am not sure that it is a good idea to enforce that in the script
this way, to be honest.  There may be a point about this new rule
being annoying for forks of the core code, at least, where they would
like to add their own parameters with dots in the names.  I have done
that in the past, as one example, and I am sure that there are
projects out there that do so, meaning the requirement to live with
one more custom patch to make things work if this rule is enforced in
gen_guc_tables.pl.

Saying that, as far as I can see 0001 and 0003 will save some time
when inserting some incorrect data.  The case of duplicated names was
indeed confusing, and 0003 can save from typos when specifying
non-mandatory fields.  So I have applied these two.

0004 has some value here, to save the .dat from copy-paste bloat.  I
am wondering what others think about it.
--
Michael

Вложения