Обсуждение: Re: Coccinelle for PostgreSQL development [1/N]: coccicheck.py
On 2025-01-07 Tu 2:44 PM, Mats Kindahl wrote: > I got some time over during the holidays, so I spent some of it > doing something I've been thinking about for a while. > > For those of you that are not aware of it: Coccinelle is a tool for > pattern matching and text transformation for C code and can be used > for detection of problematic programming patterns and to make complex, > tree-wide patches easy. It is aware of the structure of C code and is > better suited to make complicated changes than what is possible using > normal text substitution tools like Sed and Perl. > > Coccinelle have been successfully been used in the Linux project since > 2008 and is now an established tool for Linux development and a large > number of semantic patches have been added to the source tree to > capture everything from generic issues (like eliminating the redundant > A in expressions like "!A || (A && B)") to more Linux-specific > problems like adding a missing call to kfree(). > > Although PostgreSQL is nowhere the size of the Linux kernel, it is > nevertheless of a significant size and would benefit from > incorporating Coccinelle into the development. I noticed it's been > used in a few cases way back (like 10 years back) to fix issues in the > PostgreSQL code, but I thought it might be useful to make it part of > normal development practice to, among other things: > > - Identify and correct bugs in the source code both during development > and review. > - Make large-scale changes to the source tree to improve the code > based on new insights. > - Encode and enforce APIs by ensuring that function calls are used > correctly. > - Use improved coding patterns for more efficient code. > - Allow extensions to automatically update code for later PostgreSQL > versions. > > To that end, I created a series of patches to show how it could be > used in the PostgreSQL tree. It is a lot easier to discuss concrete > code and I split it up into separate messages since that makes it > easier to discuss each individual patch. The series contains code to > make it easy to work with Coccinelle during development and reviews, > as well as examples of semantic patches that capture problems, > demonstrate how to make large-scale changes, how to enforce APIs, and > also improve some coding patterns. > > This first patch contains the coccicheck.py script, which is a > re-implementation of the coccicheck script that the Linux kernel uses. > We cannot immediately use the coccicheck script since it is quite > closely tied to the Linux source code tree and we need to have > something that both supports autoconf and Meson. Since Python seems to > be used more and more in the tree, it seems to be the most natural > choice. (I have no strong opinion on what language to use, but think > it would be good to have something that is as platform-independent as > possible.) > > The intention is that we should be able to use the Linux semantic > patches directly, so it supports the "Requires" and "Options" > keywords, which can be used to require a specific version of spatch(1) > and add options to the execution of that semantic patch, respectively. > Please don't start multiple threads like this. If you want to submit a set of patches for a single feature, send them all as attachments in a single email. Otherwise this just makes life hard for threading email readers. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
On 2025-01-07 Tu 2:44 PM, Mats Kindahl wrote:
> I got some time over during the holidays, so I spent some of it
> doing something I've been thinking about for a while.
>
> For those of you that are not aware of it: Coccinelle is a tool for
> pattern matching and text transformation for C code and can be used
> for detection of problematic programming patterns and to make complex,
> tree-wide patches easy. It is aware of the structure of C code and is
> better suited to make complicated changes than what is possible using
> normal text substitution tools like Sed and Perl.
>
> Coccinelle have been successfully been used in the Linux project since
> 2008 and is now an established tool for Linux development and a large
> number of semantic patches have been added to the source tree to
> capture everything from generic issues (like eliminating the redundant
> A in expressions like "!A || (A && B)") to more Linux-specific
> problems like adding a missing call to kfree().
>
> Although PostgreSQL is nowhere the size of the Linux kernel, it is
> nevertheless of a significant size and would benefit from
> incorporating Coccinelle into the development. I noticed it's been
> used in a few cases way back (like 10 years back) to fix issues in the
> PostgreSQL code, but I thought it might be useful to make it part of
> normal development practice to, among other things:
>
> - Identify and correct bugs in the source code both during development
> and review.
> - Make large-scale changes to the source tree to improve the code
> based on new insights.
> - Encode and enforce APIs by ensuring that function calls are used
> correctly.
> - Use improved coding patterns for more efficient code.
> - Allow extensions to automatically update code for later PostgreSQL
> versions.
>
> To that end, I created a series of patches to show how it could be
> used in the PostgreSQL tree. It is a lot easier to discuss concrete
> code and I split it up into separate messages since that makes it
> easier to discuss each individual patch. The series contains code to
> make it easy to work with Coccinelle during development and reviews,
> as well as examples of semantic patches that capture problems,
> demonstrate how to make large-scale changes, how to enforce APIs, and
> also improve some coding patterns.
>
> This first patch contains the coccicheck.py script, which is a
> re-implementation of the coccicheck script that the Linux kernel uses.
> We cannot immediately use the coccicheck script since it is quite
> closely tied to the Linux source code tree and we need to have
> something that both supports autoconf and Meson. Since Python seems to
> be used more and more in the tree, it seems to be the most natural
> choice. (I have no strong opinion on what language to use, but think
> it would be good to have something that is as platform-independent as
> possible.)
>
> The intention is that we should be able to use the Linux semantic
> patches directly, so it supports the "Requires" and "Options"
> keywords, which can be used to require a specific version of spatch(1)
> and add options to the execution of that semantic patch, respectively.
>
Please don't start multiple threads like this. If you want to submit a
set of patches for a single feature, send them all as attachments in a
single email. Otherwise this just makes life hard for threading email
readers.
Hi, > My apologies, I thought this would make it easier to discuss and review the code. I will send a single email in the future. > > Should I resend this as a single email with all the patches? IMO the best solution would be re-submitting all the patches to this thread. Also please make sure the patchset is registered on the nearest open CF [1] This will ensure that the patchset is built on our CI (aka cfbot [2]) and will not be lost. [1]: https://commitfest.postgresql.org/ [2]: http://cfbot.cputube.org/ -- Best regards, Aleksander Alekseev
On Sat, Jan 18, 2025 at 08:44:00PM +0100, Mats Kindahl wrote:
> For PostgreSQL 16, Peter extended the palloc()/pg_malloc() interface in
> commit 2016055a92f to provide more type-safety, but these functions are not
> widely used. This semantic patch captures and replaces all uses of palloc()
> where palloc_array() or palloc_object() could be used instead. It
> deliberately does not touch cases where it is not clear that the
> replacement can be done.
I am not sure how much a dependency to coccicheck would cost (usually
such changes should require a case-by-case analysis rather than a
blind automation), but palloc_array() and palloc_object() are
available down to v13.
Based on this argument, it would be tempting to apply this rule
across the stable branches to reduce conflict churn. However this is
an improvement in readability, like the talloc() things as Peter has
mentioned, hence it should be a HEAD-only thing.
I do like the idea
of forcing more the object-palloc style on HEAD in the tree in some
areas of the code, even if it would come with some backpatching cost
for existing code.
Thoughts? Perhaps this has been discussed previously?
On Tue, Jan 14, 2025 at 4:19 PM Aleksander Alekseev <aleksander@timescale.com> wrote:IMO the best solution would be re-submitting all the patches to this
thread. Also please make sure the patchset is registered on the
nearest open CF [1] This will ensure that the patchset is built on our
CI (aka cfbot [2]) and will not be lost.
[1]: https://commitfest.postgresql.org/
[2]: http://cfbot.cputube.org/
ninja coccicheck-patch | patch -d .. -p1 && ninja
For those of you that are not aware of it: Coccinelle is a tool for pattern matching and text transformation for C code and can be used for detection of problematic programming patterns and to make complex, tree-wide patches easy. It is aware of the structure of C code and is better suited to make complicated changes than what is possible using normal text substitution tools like Sed and Perl. I've noticed it's been used at a few cases way back to fix issues.[1]
Coccinelle have been successfully been used in the Linux project since 2008 and is now an established tool for Linux development and a large number of semantic patches have been added to the source tree to capture everything from generic issues (like eliminating the redundant A in expressions like "!A || (A && B)") to more Linux-specific problems like adding a missing call to kfree().
Although PostgreSQL is nowhere the size of the Linux kernel, it is nevertheless of a significant size and would benefit from incorporating Coccinelle into the development. I noticed it's been used in a few cases way back (like 10 years back) to fix issues in the PostgreSQL code, but I thought it might be useful to make it part of normal development practice to, among other things:
- Identify and correct bugs in the source code both during development and review.
- Make large-scale changes to the source tree to improve the code based on new insights.
- Encode and enforce APIs by ensuring that function calls are used correctly.
- Use improved coding patterns for more efficient code.
- Allow extensions to automatically update code for later PostgreSQL versions.
To that end, I created a series of patches to show how it could be used in the PostgreSQL tree. It is a lot easier to discuss concrete code and I split it up into separate messages since that makes it easier to discuss each individual patch. The series contains code to make it easy to work with Coccinelle during development and reviews, as well as examples of semantic patches that capture problems, demonstrate how to make large-scale changes, how to enforce APIs, and also improve some coding patterns.
The first three patches contain the coccicheck.py script and the integration with the build system (both Meson and Autoconf).
# Coccicheck Script
It is a re-implementation of the coccicheck script that the Linux kernel uses. We cannot immediately use the coccicheck script since it is quite closely tied to the Linux source code tree and we need to have something that both supports Autoconf and Meson. Since Python seems to be used more and more in the tree, it seems to be the most natural choice. (I have no strong opinion on what language to use, but think it would be good to have something that is as platform-independent as possible.)
The intention is that we should be able to use the Linux semantic patches directly, so it supports the "Requires" and "Options" keywords, which can be used to require a specific version of spatch(1) and add options to the execution of that semantic patch, respectively.
# Autoconf support
The changes to Autoconf modifies the configure.ac and related files (in particular Makefile.global.in). At this point, I have deliberately not added support for pgxs so extensions cannot use coccicheck through the PostgreSQL installation. This is something that we can add later though.
The semantic patches are expected to live in cocci/ directory under the root and the patch uses the pattern cocci/**/*.cocci to find all semantic patches. Right now there are no subdirectories for the semantic patches, but this might be something we want to add to create different categories of scripts.
The coccicheck target is used in the same way as for the Linux kernel, that is, to generate and apply all patches suggested by the semantic patches, you type:
make coccicheck MODE=patch | patch -p1
Linux as support for a few more variables: V to set the verbosity, J to use multiple jobs for processing the semantic patches, M to select a different directory to apply the semantic patches to, and COCCI to use a single specific semantic patch rather than all available. I have not added support for this right now, but if you think this is valuable, it should be straightforward to add.
I used autoconf 2.69, as mentioned in configure.ac, but that generate a bigger diff than I expected. Any advice here is welcome.
make coccicheck MODE=patch JOBS=4 | patch -p1
# Meson Support
The support for Meson is done by adding three coccicheck targets: one for each mode. To apply all patches suggested by the semantic patches using ninja (as is done in [2]), you type the following in the build directory generated by Meson (e.g., the "build/" subdirectory).
ninja coccicheck-patch | patch -p1 -d ..
If you want to pass other flags you have to set the SPFLAGS environment variable when calling ninja:
SPFLAGS=--debug ninja coccicheck-report
JOBS=4 ninja coccicheck-patch | patch -d .. -p1
# Semantic Patch: Wrong type for palloc()
This is the first example of a semantic patch and shows how to capture and fix a common problem.
If you use an palloc() to allocate memory for an object (or an array of objects) and by mistake type something like:
StringInfoData *info = palloc(sizeof(StringInfoData*));
You will not allocate enough memory for storing the object. This semantic patch catches any cases where you are either allocating an array of objects or a single object that do not have corret types in this sense, more precisely, it captures assignments to a variable of type T* where palloc() uses sizeof(T) either alone or with a single expression (assuming this is an array count).
The semantic patch is overzealous in the sense that using the wrong typedef will suggest a change (this can be seen in the patch). Although the sizes of these are the same, it is probably be better to just follow the convention of always using the type "T*" for any "palloc(sizeof(T))" since the typedef can change at any point and would then introduce a bug. Coccicheck can easily fix this for you, so it is straightforward to enforce this. It also simplifies other automated checking to follow this convention.
We don't really have any real bugs as a result from this, but we have one case where an allocation of "sizeof(LLVMBasicBlockRef*)" is allocated to an "LLVMBasicBlockRef*", which strictly speaking is not correct (it should be "sizeof(LLVMBasicBlockRef)"). However, since they are both pointers, there is no risk of incorrect allocation size. One typedef usage that does not match.
# Semantic Patch: Introduce palloc_array() and palloc_object() where possible
This is an example of a large-scale refactoring to improve the code.
For PostgreSQL 16, Peter extended the palloc()/pg_malloc() interface in commit 2016055a92f to provide more type-safety, but these functions are not widely used. This semantic patch captures and replaces all uses of palloc() where palloc_array() or palloc_object() could be used instead. It deliberately does not touch cases where it is not clear that the replacement can be done.
For example, this code:
StringInfo info = makeStringInfo();
...
appendStringInfo(info, ...);
...
return do_stuff(..., info->data, ...);
Can be replaced with:
StringInfoData info;
initStringInfo(&info);
...
appendStringInfo(&info, ...);
...
return do_stuff(..., info.data, ...);
It does not do a replacement in these cases:
- If the variable is assigned to an expression. In this case, the pointer can "leak" outside the function either through a global variable or a parameter assignment.
 - If an assignment is done to the expression. This cannot leak the data, but could mean a value-assignment of a structure, so we avoid this case.
 - If the pointer is returned.
 
Вложения
- 0005-Semantic-patch-for-palloc_array-and-palloc_object.v3.patch
 - 0003-Add-meson-build-for-coccicheck.v3.patch
 - 0004-Semantic-patch-for-sizeof-using-palloc.v3.patch
 - 0006-Semantic-patch-for-pg_cmp_-functions.v3.patch
 - 0007-Semantic-patch-to-use-stack-allocated-StringInfoData.v3.patch
 - 0001-Add-initial-coccicheck-script.v3.patch
 - 0002-Create-coccicheck-target-for-autoconf.v3.patch
 
On Sat, Jan 18, 2025 at 8:44 PM Mats Kindahl <mats@timescale.com> wrote:On Tue, Jan 14, 2025 at 4:19 PM Aleksander Alekseev <aleksander@timescale.com> wrote:IMO the best solution would be re-submitting all the patches to this
thread. Also please make sure the patchset is registered on the
nearest open CF [1] This will ensure that the patchset is built on our
CI (aka cfbot [2]) and will not be lost.
[1]: https://commitfest.postgresql.org/
[2]: http://cfbot.cputube.org/Hi all,Here is a new set of patches rebased on the latest version of the Postgres. I decided to just include the semantic patches in each patch since it is trivial to generate the patch and build using:ninja coccicheck-patch | patch -d .. -p1 && ninjaI repeat the description from the previous patch set and add comments where things have changed, but I have also added two semantic patches, which are described last.For those of you that are not aware of it: Coccinelle is a tool for pattern matching and text transformation for C code and can be used for detection of problematic programming patterns and to make complex, tree-wide patches easy. It is aware of the structure of C code and is better suited to make complicated changes than what is possible using normal text substitution tools like Sed and Perl. I've noticed it's been used at a few cases way back to fix issues.[1]
Coccinelle have been successfully been used in the Linux project since 2008 and is now an established tool for Linux development and a large number of semantic patches have been added to the source tree to capture everything from generic issues (like eliminating the redundant A in expressions like "!A || (A && B)") to more Linux-specific problems like adding a missing call to kfree().
Although PostgreSQL is nowhere the size of the Linux kernel, it is nevertheless of a significant size and would benefit from incorporating Coccinelle into the development. I noticed it's been used in a few cases way back (like 10 years back) to fix issues in the PostgreSQL code, but I thought it might be useful to make it part of normal development practice to, among other things:
- Identify and correct bugs in the source code both during development and review.
- Make large-scale changes to the source tree to improve the code based on new insights.
- Encode and enforce APIs by ensuring that function calls are used correctly.
- Use improved coding patterns for more efficient code.
- Allow extensions to automatically update code for later PostgreSQL versions.
To that end, I created a series of patches to show how it could be used in the PostgreSQL tree. It is a lot easier to discuss concrete code and I split it up into separate messages since that makes it easier to discuss each individual patch. The series contains code to make it easy to work with Coccinelle during development and reviews, as well as examples of semantic patches that capture problems, demonstrate how to make large-scale changes, how to enforce APIs, and also improve some coding patterns.
The first three patches contain the coccicheck.py script and the integration with the build system (both Meson and Autoconf).
# Coccicheck Script
It is a re-implementation of the coccicheck script that the Linux kernel uses. We cannot immediately use the coccicheck script since it is quite closely tied to the Linux source code tree and we need to have something that both supports Autoconf and Meson. Since Python seems to be used more and more in the tree, it seems to be the most natural choice. (I have no strong opinion on what language to use, but think it would be good to have something that is as platform-independent as possible.)
The intention is that we should be able to use the Linux semantic patches directly, so it supports the "Requires" and "Options" keywords, which can be used to require a specific version of spatch(1) and add options to the execution of that semantic patch, respectively.I have added support for using multiple jobs similar to how "make -jN" works. This is also supported by the autoconf and ninja builds# Autoconf support
The changes to Autoconf modifies the configure.ac and related files (in particular Makefile.global.in). At this point, I have deliberately not added support for pgxs so extensions cannot use coccicheck through the PostgreSQL installation. This is something that we can add later though.
The semantic patches are expected to live in cocci/ directory under the root and the patch uses the pattern cocci/**/*.cocci to find all semantic patches. Right now there are no subdirectories for the semantic patches, but this might be something we want to add to create different categories of scripts.
The coccicheck target is used in the same way as for the Linux kernel, that is, to generate and apply all patches suggested by the semantic patches, you type:
make coccicheck MODE=patch | patch -p1
Linux as support for a few more variables: V to set the verbosity, J to use multiple jobs for processing the semantic patches, M to select a different directory to apply the semantic patches to, and COCCI to use a single specific semantic patch rather than all available. I have not added support for this right now, but if you think this is valuable, it should be straightforward to add.
I used autoconf 2.69, as mentioned in configure.ac, but that generate a bigger diff than I expected. Any advice here is welcome.Using the parameter "JOBS" allow you to use multiple jobs, e.g.:make coccicheck MODE=patch JOBS=4 | patch -p1
# Meson Support
The support for Meson is done by adding three coccicheck targets: one for each mode. To apply all patches suggested by the semantic patches using ninja (as is done in [2]), you type the following in the build directory generated by Meson (e.g., the "build/" subdirectory).
ninja coccicheck-patch | patch -p1 -d ..
If you want to pass other flags you have to set the SPFLAGS environment variable when calling ninja:
SPFLAGS=--debug ninja coccicheck-reportIf you want to use multiple jobs, you use something like this:JOBS=4 ninja coccicheck-patch | patch -d .. -p1--# Semantic Patch: Wrong type for palloc()
This is the first example of a semantic patch and shows how to capture and fix a common problem.
If you use an palloc() to allocate memory for an object (or an array of objects) and by mistake type something like:
StringInfoData *info = palloc(sizeof(StringInfoData*));
You will not allocate enough memory for storing the object. This semantic patch catches any cases where you are either allocating an array of objects or a single object that do not have corret types in this sense, more precisely, it captures assignments to a variable of type T* where palloc() uses sizeof(T) either alone or with a single expression (assuming this is an array count).
The semantic patch is overzealous in the sense that using the wrong typedef will suggest a change (this can be seen in the patch). Although the sizes of these are the same, it is probably be better to just follow the convention of always using the type "T*" for any "palloc(sizeof(T))" since the typedef can change at any point and would then introduce a bug. Coccicheck can easily fix this for you, so it is straightforward to enforce this. It also simplifies other automated checking to follow this convention.
We don't really have any real bugs as a result from this, but we have one case where an allocation of "sizeof(LLVMBasicBlockRef*)" is allocated to an "LLVMBasicBlockRef*", which strictly speaking is not correct (it should be "sizeof(LLVMBasicBlockRef)"). However, since they are both pointers, there is no risk of incorrect allocation size. One typedef usage that does not match.
# Semantic Patch: Introduce palloc_array() and palloc_object() where possible
This is an example of a large-scale refactoring to improve the code.
For PostgreSQL 16, Peter extended the palloc()/pg_malloc() interface in commit 2016055a92f to provide more type-safety, but these functions are not widely used. This semantic patch captures and replaces all uses of palloc() where palloc_array() or palloc_object() could be used instead. It deliberately does not touch cases where it is not clear that the replacement can be done.# Semantic Patch: replace code with pg_cmp_*This is an example of a large-scale refactoring to improve the code.In commit 3b42bdb4716 and 6b80394781c overflow-safe comparison functions were introduced, but they are not widely used. This semantic patch identifies some of the more common cases and replaces them with calls to the corresponding pg_cmp_* function.The patches give a few instructions of improvement in performance when checking with Godbolt. It's not much, but since it is so easy to apply them, it might still be worthwhile.# Semantic Patch: Replace dynamic allocation of StringInfo with StringInfoDataUse improved coding patterns for more efficient code.This semantic patch replaces uses of StringInfo with StringInfoData where the info is dynamically allocated but (optionally) freed at the end of the block. This will avoid one dynamic allocation that otherwise has to be dealt with.
For example, this code:
StringInfo info = makeStringInfo();
...
appendStringInfo(info, ...);
...
return do_stuff(..., info->data, ...);
Can be replaced with:
StringInfoData info;
initStringInfo(&info);
...
appendStringInfo(&info, ...);
...
return do_stuff(..., info.data, ...);
It does not do a replacement in these cases:
- If the variable is assigned to an expression. In this case, the pointer can "leak" outside the function either through a global variable or a parameter assignment.
 - If an assignment is done to the expression. This cannot leak the data, but could mean a value-assignment of a structure, so we avoid this case.
 - If the pointer is returned.
 The cases that this semantic patch fixed when I uploaded the first version of the other patches seems to have been dealt with, but having it as part of the code base prevents such cases from surfacing again.Best wishes,Mats Kindahl, Timescale
Вложения
- 0001-Add-initial-coccicheck-script.v4.patch
 - 0003-Add-meson-build-for-coccicheck.v4.patch
 - 0004-Semantic-patch-for-sizeof-using-palloc.v4.patch
 - 0002-Create-coccicheck-target-for-autoconf.v4.patch
 - 0005-Semantic-patch-for-palloc_array-and-palloc_object.v4.patch
 - 0007-Semantic-patch-to-use-stack-allocated-StringInfoData.v4.patch
 - 0006-Semantic-patch-for-pg_cmp_-functions.v4.patch
 
Hi all, Here is an updated set of patches based on the latest HEAD of PostgreSQL. Best wishes, Mats Kindahl
Вложения
- 0001-Add-initial-coccicheck-script.v5.patch
 - 0002-Create-coccicheck-target-for-autoconf.v5.patch
 - 0003-Add-meson-build-for-coccicheck.v5.patch
 - 0004-Semantic-patch-for-sizeof-using-palloc.v5.patch
 - 0005-Semantic-patch-for-palloc_array-and-palloc_object.v5.patch
 - 0006-Semantic-patch-for-pg_cmp_-functions.v5.patch
 - 0007-Semantic-patch-to-use-stack-allocated-StringInfoData.v5.patch
 
Hi, On Thu, Oct 30, 2025 at 02:32:45PM +0100, Mats Kindahl wrote: > Hi all, > > Here is an updated set of patches based on the latest HEAD of PostgreSQL. Thanks for those patches and the initiative! Sorry to be late, I started to look at Coccinelle "for PostgreSQL" in [1] (to ensure some macros are used) and saw this thread. I did not look at the patches in detail, but sharing some thoughts. I agree that Coccinelle could be/is useful in relation to PostgreSQL development, but I think that we'd need to determine why it would useful to add the coccicheck.py script and the new dependencies in autoconf/meson. Coming back to your points and thinking if it could be used as a tool or integrated into the build system: 1) Identify and correct bugs in the source code both during development and review I agree that makes sense here. Having some bugs detected "automatically" would be great. 2) Make large-scale changes to the source tree to improve the code based on new insights I'm not sure we need to introduce coccicheck.py and add dependencies in autoconf/meson for this. The developer would need to know that he could use the Coccinelle and if he already knows then nothing prevents him from using it in his development tool box. 3) Encode and enforce APIs by ensuring that function calls are used correctly Same as 2) That said we could also imagine running yearly checks automatically too using coccicheck.py. 4) Use improved coding patterns for more efficient code Same as 3) from my point of view. 5) Allow extensions to automatically update code for later PostgreSQL versions Same as 2) from my point of view. So, I think that the current proposal (i.e build system integration) is a good fit for 1), less so for 3) and 4) and not necessarily needed for 2) and 5). The proposal will add new dependencies (as Michael stated up-thread) and introduce a new language (SmPL) that folks would need to be comfortable with to review the .cocci scripts. I don't have an answer to it but I think that the main question is: Should we integrate this into the build system, or just document it as an optional developer tool (wiki or such and provide .cocci scripts example)? [1]: https://www.postgresql.org/message-id/aQMtR/m4kW4Rkul%2B%40ip-10-97-1-34.eu-west-3.compute.internal Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Hi, On Thu, Oct 30, 2025 at 02:32:45PM +0100, Mats Kindahl wrote:Hi all, Here is an updated set of patches based on the latest HEAD of PostgreSQL.Thanks for those patches and the initiative! Sorry to be late, I started to look at Coccinelle "for PostgreSQL" in [1] (to ensure some macros are used) and saw this thread.
Hi Bertrand and thank you for the comments.
I did not look at the patches in detail, but sharing some thoughts. I agree that Coccinelle could be/is useful in relation to PostgreSQL development, but I think that we'd need to determine why it would useful to add the coccicheck.py script and the new dependencies in autoconf/meson. Coming back to your points and thinking if it could be used as a tool or integrated into the build system: 1) Identify and correct bugs in the source code both during development and review I agree that makes sense here. Having some bugs detected "automatically" would be great. 2) Make large-scale changes to the source tree to improve the code based on new insights I'm not sure we need to introduce coccicheck.py and add dependencies in autoconf/meson for this. The developer would need to know that he could use the Coccinelle and if he already knows then nothing prevents him from using it in his development tool box. 3) Encode and enforce APIs by ensuring that function calls are used correctly Same as 2) That said we could also imagine running yearly checks automatically too using coccicheck.py. 4) Use improved coding patterns for more efficient code Same as 3) from my point of view. 5) Allow extensions to automatically update code for later PostgreSQL versions Same as 2) from my point of view. So, I think that the current proposal (i.e build system integration) is a good fit for 1), less so for 3) and 4) and not necessarily needed for 2) and 5). The proposal will add new dependencies (as Michael stated up-thread) and introduce a new language (SmPL) that folks would need to be comfortable with to review the .cocci scripts. I don't have an answer to it but I think that the main question is: Should we integrate this into the build system, or just document it as an optional developer tool (wiki or such and provide .cocci scripts example)?
All of these are good points. My main reason for proposing this is that I think that using Coccinelle more systematically would improve the quality of the code and make it easier to maintain. The build system changes and the coccicheck.py script is a proposal for how to make this happen, but I realize there are alternative ways as well as tradeoffs to be made (e.g., is the extra maintenance burden worth the improvement in quality?)
I think there are three issues here:
- Shall the semantic patches be in the source tree?
 - Shall we use the coccicheck.py script?
 - Shall we support this in the build system, that is, meson and automake?
 
Putting the coccicheck.py script aside for a short while, I think there is value in keeping the semantic patches in the source tree rather than adding them to a wiki for the following reasons:
It makes it easy for reviewers to check that the code reviewed follows "best practices" (of course, assuming that the .cocci scripts represent "best practices") instead of having to download and run these scripts from another source. Having them on a separate page would reduce the value since it makes it more difficult to use.
If they are part of the repository the .cocci scripts will be maintained and updated to match the code in the source tree. If they are on a separate wiki page, or a different repository, they are likely to be version dependent and code might change so that they are not relevant any more, which makes it more difficult to use them for reviewing and checking code since that would also require updating them to match the new code before applying them.
A drawback is that it is an extra maintenance burden (probably small, but extra work nevertheless) and would require the semantic patches to be executed regularly, by the build system or by reviewers, and check that they do not report anything. If these checks are not done regularly, it is possible that the scripts would become obsolete after a while. By making it easy to run the scripts, we are more likely to run them and discover issues in the scripts as well as in the code.
About integrating it with the build system. Note that there are three modes: report, context, and (generate) patch. The important modes are report and patch.
The main advantage of integrating it with the build system is is that it is easy to run, which makes it easy to ensure that the semantic patches do not report anything for both reviewers, developers, and build systems. This will improve the quality of both the semantic patches and the source code.
As you say, it does add an extra dependency, but this should optional and only be used if spatch is installed. For parties not interested in using spatch, hence does not have spatch installed, it will not be used so it should not interfere with them.
Integrating it with the build system would also allow builders and reviewers to use the same method for checking the code using report mode without regard to what version is being used, you would run the checks the same way for each minor release, for example.
As you say, coccicheck.py script is strictly speaking not needed to use Coccinelle and is more intended as a convenience. The advantages of using something like the coccicheck.py script is that:
The coccicheck.py script reads options from the semantic patches and uses them when running spatch. If using spatch directly, you would have to check the scripts, extract the options, and then use them when running spatch. Not impossible, but an extra step that makes it more inconvenient to use Coccinelle, which in turn makes it less likely to be used and maintained.
You could automate the extraction of options from the semantic patches, but this script provides a platform-independent method to run the checks for reviewers, developers, and build systems. Note that different semantic patches can use different options, so if you want to automate this, you would need to write something like coccicheck.py anyway.
Best wishes,
 Mats Kindahl
[1]: https://www.postgresql.org/message-id/aQMtR/m4kW4Rkul%2B%40ip-10-97-1-34.eu-west-3.compute.internal Regards,