Обсуждение: Use stack-allocated StringInfoData
Hi all,
While working on other things I noted that we have a lot of cases where a StringInfo instance is allocated dynamically even when it is either thrown away or destroyed at the end, which seems unnecessary, that is, instead of using:
StringInfo info = makeStringInfo();
...
appendStringInfo(info, ...);
...
return info->data;
We can use
StringInfoData info;
initStringInfo(&info);
...
appendStringInfo(&info, ...);
...
return info.data;
It was corrected in an earlier commit, but that seems to have been removed so we still have a lot of these cases.
I created a semantic patch to capture most of these cases, which is present in [1], but this is a slightly modified version that might be interesting to include regardless of other changes. The patch is applied and one case that couldn't be matched is manually fixed.
[1]: https://www.postgresql.org/message-id/8895cba9-48cf-40fe-9c47-9b43ec6b2ab3%40gmail.com
Best wishes,
Mats Kindahl
Вложения
On Mon, 3 Nov 2025 at 20:27, Mats Kindahl <mats.kindahl@gmail.com> wrote: > We can use > > StringInfoData info; > initStringInfo(&info); > ... > appendStringInfo(&info, ...); > ... > return info.data; > > It was corrected in an earlier commit, but that seems to have been removed so we still have a lot of these cases. > > I created a semantic patch to capture most of these cases, which is present in [1], but this is a slightly modified versionthat might be interesting to include regardless of other changes. The patch is applied and one case that couldn'tbe matched is manually fixed. I think this is a worthwhile conversion. Are you able to create a more complete version of this? The draft version does introduce quite a few whitespace changes that aren't wanted. You've also introduced a memory leak in check_publications(), fetch_relation_list(), jsonb_send() and xml_errorHandler() (NB: destroyStringInfo() doesn't just pfree the memory for the struct, it pfree's the data too). The patch shouldn't be leaving any memory around that the current master is careful to pfree. Do you have any semi-automated method to find these? Or is it a case of manually reviewing code with a makeStringInfo() call? David
On Mon, 3 Nov 2025 at 20:27, Mats Kindahl <mats.kindahl@gmail.com> wrote:We can use StringInfoData info; initStringInfo(&info); ... appendStringInfo(&info, ...); ... return info.data; It was corrected in an earlier commit, but that seems to have been removed so we still have a lot of these cases. I created a semantic patch to capture most of these cases, which is present in [1], but this is a slightly modified version that might be interesting to include regardless of other changes. The patch is applied and one case that couldn't be matched is manually fixed.
Hi David, and thanks for the review.
I think this is a worthwhile conversion. Are you able to create a more complete version of this? The draft version does introduce quite a few whitespace changes that aren't wanted.
Absolutely, I'll look into it.
You've also introduced a memory leak in check_publications(), fetch_relation_list(), jsonb_send() and xml_errorHandler() (NB: destroyStringInfo() doesn't just pfree the memory for the struct, it pfree's the data too).
Ah, that was a mistake from my side. Will fix.
The patch shouldn't be leaving any memory around that the current master is careful to pfree. Do you have any semi-automated method to find these? Or is it a case of manually reviewing code with a makeStringInfo() call?
This is using Coccinelle to transform the text based on a semantic patch (which is included in the patch). Unfortunately it mess with the whitespace a bit so it is necessary to run pg_intent after. I'm surprised that there are whitespace changes that should not be there, but I'll take a look. Thanks again, and best wishes,
Mats Kindahl
On Tue, 4 Nov 2025 at 09:20, Mats Kindahl <mats.kindahl@gmail.com> wrote: > This is using Coccinelle to transform the text based on a semantic patch (which is included in the patch). Unfortunatelyit mess with the whitespace a bit so it is necessary to run pg_intent after. I'm surprised that there are whitespacechanges that should not be there, but I'll take a look. pgindent likely won't put back the types of changes that were made here. I think you're going to have to manually review anything that comes out of any automated tool, at least certainly until there's some confidence that the scripts are configured correctly. Loading extra work onto reviewers or committers that you could have easily done yourself isn't really a sustainable thing. We just lack bandwidth for that, plus it's bad etiquette. A few tips: It's fine to post rough patches to demonstrate ideas to the list, but make that clear when doing so. If your intention is that what you're sending gets committed, then please aim for as high a quality as you can. There just won't be any long-term patience for series of unvetted-by-human transformation patches from tools such as Coccinelle around here, especially so if you expect the community to vet them for you. I'm not saying that's what you're doing, but if you are, please take this advice. David
On 11/3/25 22:32, David Rowley wrote: > On Tue, 4 Nov 2025 at 09:20, Mats Kindahl <mats.kindahl@gmail.com> wrote: >> This is using Coccinelle to transform the text based on a semantic patch (which is included in the patch). Unfortunatelyit mess with the whitespace a bit so it is necessary to run pg_intent after. I'm surprised that there are whitespacechanges that should not be there, but I'll take a look. > pgindent likely won't put back the types of changes that were made > here. I think you're going to have to manually review anything that > comes out of any automated tool, at least certainly until there's some > confidence that the scripts are configured correctly. Loading extra > work onto reviewers or committers that you could have easily done > yourself isn't really a sustainable thing. We just lack bandwidth for > that, plus it's bad etiquette. > > A few tips: It's fine to post rough patches to demonstrate ideas to > the list, but make that clear when doing so. If your intention is that > what you're sending gets committed, then please aim for as high a > quality as you can. There just won't be any long-term patience for > series of unvetted-by-human transformation patches from tools such as > Coccinelle around here, especially so if you expect the community to > vet them for you. I'm not saying that's what you're doing, but if you > are, please take this advice. Hi David, My apologies for not being clear. I understand the situation and thank you for the advice. I'll keep that in mind in the future. Here is an updated patch that I checked manually for unnecessary whitespace changes. I also checked that "destroyStringInfo(info)" was handled correctly, which would be replacing it with a call to "pfree(info.data)" since the StringInfoData structure is now on the stack but the data buffer needs to be released since this is what a call to destroyStringInfo() would do. Removing any calls of "pfree(info)" when "info" was changed to "StringInfoData" since it is no longer a dynamically allocated variable. Note that in this case it is not necessary to free "info.data" since the original code does not do that. There is one such case affected by this patch, and there the buffer pointer is copied to another variable and then returned. This can probably be improved by just returning the buffer pointer directly without intermediate assignment to a variable, but I avoided this to make reviewing the code easier. It is easy to add this change either now or later and should not affect the generated code except at very low optimization levels. I also added fixes manually inside check_publications_origin_tables, check_publications, and fetch_relation_list. In check_publication_origin_table three StringInfo was dynamically allocated but not released after. In check_publications and fetch_relation_list there were extra cases of using a dynamically allocated StringInfo that was not necessary. Best wishes, Mats Kindahl
Вложения
On Tue, 4 Nov 2025 at 21:25, Mats Kindahl <mats.kindahl@gmail.com> wrote: > Here is an updated patch that I checked manually for unnecessary > whitespace changes. Thanks for updating the patch. > There is one such case affected by this patch, and there the buffer > pointer is copied to another variable and then returned. This can > probably be improved by just returning the buffer pointer directly > without intermediate assignment to a variable, but I avoided this to > make reviewing the code easier. It is easy to add this change either now > or later and should not affect the generated code except at very low > optimization levels. Yeah, I think you must mean in build_backup_content(). I noticed that too. > I also added fixes manually inside check_publications_origin_tables, > check_publications, and fetch_relation_list. In > check_publication_origin_table three StringInfo was dynamically > allocated but not released after. In check_publications and > fetch_relation_list there were extra cases of using a dynamically > allocated StringInfo that was not necessary. It's not your fault, but check_publications_origin_tables() looks like a mess. It seems like excessive use of additional code just to avoid two separate ereport() calls. Might be worth a follow-up patch to clean that up. The patch looks good to me. The only thing I'd adjust is to get rid of the "data" variable from build_backup_content(), which I think you mentioned above. I can take care of that one. I can push this tomorrow morning UTC+13. I'll leave it til then in case anyone else has any comments. David
> On Nov 5, 2025, at 19:01, David Rowley <dgrowleyml@gmail.com> wrote:
>
> It's not your fault, but check_publications_origin_tables() looks like
> a mess. It seems like excessive use of additional code just to avoid
> two separate ereport() calls. Might be worth a follow-up patch to
> clean that up.
>
> The patch looks good to me. The only thing I'd adjust is to get rid of
> the "data" variable from build_backup_content(), which I think you
> mentioned above. I can take care of that one.
>
> I can push this tomorrow morning UTC+13. I'll leave it til then in
> case anyone else has any comments.
No objection. But I just find that the patch missed one place in check_publications():
```
if (list_length(publicationsCopy))
{
/* Prepare the list of non-existent publication(s) for error message. */
StringInfo pubnames = makeStringInfo();
GetPublicationsStr(publicationsCopy, pubnames, false);
ereport(WARNING,
errcode(ERRCODE_UNDEFINED_OBJECT),
errmsg_plural("publication %s does not exist on the publisher",
"publications %s do not exist on the publisher",
list_length(publicationsCopy),
pubnames->data));
}
```
This “pubnames” can be replaced as well, and “pfree(pubnames.data)” should be added after “ereport”. As one place in
thefunction has been replaced in this patch, I guess we should not leave this place to a separate patch.
Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/
On Thu, 6 Nov 2025 at 00:01, David Rowley <dgrowleyml@gmail.com> wrote: > I can push this tomorrow morning UTC+13. I'll leave it til then in > case anyone else has any comments. I found a few more cases that could get the same treatment and also included the check_publications() one mentioned by Chao. I did end up removing the subscriptioncmds.c. I thought those should likely just be fixed to use errmsg() and errhint() instead of their "internal" counterpart. I left out the Coccinelle script. I'm not currently qualified to review that, and not aware of the policy about that. I'm aware there's another thread with some discussion about Coccinelle, but I've not read it. David
On Tue, 4 Nov 2025 at 21:25, Mats Kindahl <mats.kindahl@gmail.com> wrote:Here is an updated patch that I checked manually for unnecessary whitespace changes.Thanks for updating the patch.There is one such case affected by this patch, and there the buffer pointer is copied to another variable and then returned. This can probably be improved by just returning the buffer pointer directly without intermediate assignment to a variable, but I avoided this to make reviewing the code easier. It is easy to add this change either now or later and should not affect the generated code except at very low optimization levels.Yeah, I think you must mean in build_backup_content(). I noticed that too.
Yes, that is the one I meant.
I also added fixes manually inside check_publications_origin_tables, check_publications, and fetch_relation_list. In check_publication_origin_table three StringInfo was dynamically allocated but not released after. In check_publications and fetch_relation_list there were extra cases of using a dynamically allocated StringInfo that was not necessary.It's not your fault, but check_publications_origin_tables() looks like a mess. It seems like excessive use of additional code just to avoid two separate ereport() calls. Might be worth a follow-up patch to clean that up.
Yeah, I have noted that too, but I wanted to avoid making reviews more complicated than necessary. I thought it might be better to do as a separate effort.
I'll take a look at it.
The patch looks good to me. The only thing I'd adjust is to get rid of the "data" variable from build_backup_content(), which I think you mentioned above. I can take care of that one.
Thanks David.
Best wishes,
Mats Kindahl
On 2025-Nov-06, Mats Kindahl wrote:
> On 11/5/25 12:01, David Rowley wrote:
> > It's not your fault, but check_publications_origin_tables() looks like
> > a mess. It seems like excessive use of additional code just to avoid
> > two separate ereport() calls. Might be worth a follow-up patch to
> > clean that up.
>
> Yeah, I have noted that too, but I wanted to avoid making reviews more
> complicated than necessary. I thought it might be better to do as a separate
> effort.
Yeah, I came across this a few weeks ago as well. I was thinking maybe
we can expand the ereport() macro instead of duplicating things. It
looks a bit ugly and I don't think we do it anywhere else ... I mean
something like
errstart(WARNING, NULL);
if (cond)
errmsg( ... );
else
errmsg( ... );
// and so on
errfinish(__FILE__, __LINE__, __func__);
Is this too ugly to live? (I swear I had this on a branch already, but
can't find it right this instant.)
--
Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/
"No necesitamos banderas
No reconocemos fronteras" (Jorge González)
On Fri, 7 Nov 2025 at 00:24, Álvaro Herrera <alvherre@kurilemu.de> wrote: > Yeah, I came across this a few weeks ago as well. I was thinking maybe > we can expand the ereport() macro instead of duplicating things. It > looks a bit ugly and I don't think we do it anywhere else ... I mean > something like > > errstart(WARNING, NULL); > if (cond) > errmsg( ... ); > else > errmsg( ... ); > // and so on > errfinish(__FILE__, __LINE__, __func__); > > > Is this too ugly to live? (I swear I had this on a branch already, but > can't find it right this instant.) I thought it could just become 2 separate ereport calls. The duplicate string consts for the plural stuff are going to get de-duplicated anyway. I didn't try it to see how it'd look, however. David
=?utf-8?Q?=C3=81lvaro?= Herrera <alvherre@kurilemu.de> writes:
> Yeah, I came across this a few weeks ago as well. I was thinking maybe
> we can expand the ereport() macro instead of duplicating things. It
> looks a bit ugly and I don't think we do it anywhere else ... I mean
> something like
> errstart(WARNING, NULL);
> if (cond)
> errmsg( ... );
> else
> errmsg( ... );
> // and so on
> errfinish(__FILE__, __LINE__, __func__);
> Is this too ugly to live? (I swear I had this on a branch already, but
> can't find it right this instant.)
I really dislike exposing errstart/errfinish to calling code like
that. I agree that check_publications_origin_tables is not pretty
as-is, but it's better than breaking open the ereport abstraction.
In the particular case at hand, it seems best to just write
two separate ereport calls for the check_table_sync and
not-check_table_sync cases, as David mentioned. Duplicating
the errdetail_plural call is slightly annoying but it's net less
code and less surprise than what's there now.
In general, in most other places where we have conditional message
text, it's done via ?: operators within the ereport macro,
for example this decades-old bit in sysv_shmem.c:
ereport(FATAL,
(errmsg("could not create shared memory segment: %m"),
errdetail("Failed system call was shmget(key=%lu, size=%zu, 0%o).",
(unsigned long) memKey, size,
IPC_CREAT | IPC_EXCL | IPCProtection),
(shmget_errno == EINVAL) ?
errhint("This error usually means that PostgreSQL's request for a shared memory "
"segment exceeded your kernel's SHMMAX parameter, or possibly that "
"it is less than "
"your kernel's SHMMIN parameter.\n"
"The PostgreSQL documentation contains more information about shared "
"memory configuration.") : 0,
(shmget_errno == ENOMEM) ?
errhint("This error usually means that PostgreSQL's request for a shared "
"memory segment exceeded your kernel's SHMALL parameter. You might need "
"to reconfigure the kernel with larger SHMALL.\n"
"The PostgreSQL documentation contains more information about shared "
"memory configuration.") : 0,
(shmget_errno == ENOSPC) ?
errhint("This error does *not* mean that you have run out of disk space. "
"It occurs either if all available shared memory IDs have been taken, "
"in which case you need to raise the SHMMNI parameter in your kernel, "
"or because the system's overall limit for shared memory has been "
"reached.\n"
"The PostgreSQL documentation contains more information about shared "
"memory configuration.") : 0));
We could do something similar here, but I'm not convinced
it's better than just writing two ereport's.
regards, tom lane