Обсуждение: Use stack-allocated StringInfoData

Поиск
Список
Период
Сортировка

Use stack-allocated StringInfoData

От
Mats Kindahl
Дата:

Hi all,

While working on other things I noted that we have a lot of cases where a StringInfo instance is allocated dynamically even when it is either thrown away or destroyed at the end, which seems unnecessary, that is, instead of using:

       StringInfo info = makeStringInfo();
       ...
       appendStringInfo(info, ...);
       ...
       return info->data;

We can use

       StringInfoData info;
       initStringInfo(&info);
       ...
       appendStringInfo(&info, ...);
       ...
       return info.data;

It was corrected in an earlier commit, but that seems to have been removed so we still have a lot of these cases.

I created a semantic patch to capture most of these cases, which is present in [1], but this is a slightly modified version that might be interesting to include regardless of other changes. The patch is applied and one case that couldn't be matched is manually fixed.

[1]: https://www.postgresql.org/message-id/8895cba9-48cf-40fe-9c47-9b43ec6b2ab3%40gmail.com

Best wishes,
Mats Kindahl

Вложения

Re: Use stack-allocated StringInfoData

От
David Rowley
Дата:
On Mon, 3 Nov 2025 at 20:27, Mats Kindahl <mats.kindahl@gmail.com> wrote:
> We can use
>
>        StringInfoData info;
>        initStringInfo(&info);
>        ...
>        appendStringInfo(&info, ...);
>        ...
>        return info.data;
>
> It was corrected in an earlier commit, but that seems to have been removed so we still have a lot of these cases.
>
> I created a semantic patch to capture most of these cases, which is present in [1], but this is a slightly modified
versionthat might be interesting to include regardless of other changes. The patch is applied and one case that
couldn'tbe matched is manually fixed. 

I think this is a worthwhile conversion. Are you able to create a more
complete version of this? The draft version does introduce quite a few
whitespace changes that aren't wanted. You've also introduced a memory
leak in check_publications(), fetch_relation_list(), jsonb_send() and
xml_errorHandler() (NB: destroyStringInfo() doesn't just pfree the
memory for the struct, it pfree's the data too). The patch shouldn't
be leaving any memory around that the current master is careful to
pfree.

Do you have any semi-automated method to find these? Or is it a case
of manually reviewing code with a makeStringInfo() call?

David



Re: Use stack-allocated StringInfoData

От
Mats Kindahl
Дата:


On 11/3/25 11:27, David Rowley wrote:
On Mon, 3 Nov 2025 at 20:27, Mats Kindahl <mats.kindahl@gmail.com> wrote:
We can use
       StringInfoData info;       initStringInfo(&info);       ...       appendStringInfo(&info, ...);       ...       return info.data;

It was corrected in an earlier commit, but that seems to have been removed so we still have a lot of these cases.

I created a semantic patch to capture most of these cases, which is present in [1], but this is a slightly modified version that might be interesting to include regardless of other changes. The patch is applied and one case that couldn't be matched is manually fixed.

Hi David, and thanks for the review.

I think this is a worthwhile conversion. Are you able to create a more
complete version of this? The draft version does introduce quite a few
whitespace changes that aren't wanted.

Absolutely, I'll look into it.

You've also introduced a memory
leak in check_publications(), fetch_relation_list(), jsonb_send() and
xml_errorHandler() (NB: destroyStringInfo() doesn't just pfree the
memory for the struct, it pfree's the data too).

Ah, that was a mistake from my side. Will fix.

The patch shouldn't
be leaving any memory around that the current master is careful to
pfree.

Do you have any semi-automated method to find these? Or is it a case
of manually reviewing code with a makeStringInfo() call?

This is using Coccinelle to transform the text based on a semantic patch (which is included in the patch). Unfortunately it mess with the whitespace a bit so it is necessary to run pg_intent after.  I'm surprised that there are whitespace changes that should not be there, but I'll take a look. Thanks again, and best wishes,
Mats Kindahl

Re: Use stack-allocated StringInfoData

От
David Rowley
Дата:
On Tue, 4 Nov 2025 at 09:20, Mats Kindahl <mats.kindahl@gmail.com> wrote:
> This is using Coccinelle to transform the text based on a semantic patch (which is included in the patch).
Unfortunatelyit mess with the whitespace a bit so it is necessary to run pg_intent after.  I'm surprised that there are
whitespacechanges that should not be there, but I'll take a look. 

pgindent likely won't put back the types of changes that were made
here. I think you're going to have to manually review anything that
comes out of any automated tool, at least certainly until there's some
confidence that the scripts are configured correctly. Loading extra
work onto reviewers or committers that you could have easily done
yourself isn't really a sustainable thing. We just lack bandwidth for
that, plus it's bad etiquette.

A few tips: It's fine to post rough patches to demonstrate ideas to
the list, but make that clear when doing so. If your intention is that
what you're sending gets committed, then please aim for as high a
quality as you can. There just won't be any long-term patience for
series of unvetted-by-human transformation patches from tools such as
Coccinelle around here, especially so if you expect the community to
vet them for you. I'm not saying that's what you're doing, but if you
are, please take this advice.

David



Re: Use stack-allocated StringInfoData

От
Mats Kindahl
Дата:
On 11/3/25 22:32, David Rowley wrote:
> On Tue, 4 Nov 2025 at 09:20, Mats Kindahl <mats.kindahl@gmail.com> wrote:
>> This is using Coccinelle to transform the text based on a semantic patch (which is included in the patch).
Unfortunatelyit mess with the whitespace a bit so it is necessary to run pg_intent after.  I'm surprised that there are
whitespacechanges that should not be there, but I'll take a look.
 
> pgindent likely won't put back the types of changes that were made
> here. I think you're going to have to manually review anything that
> comes out of any automated tool, at least certainly until there's some
> confidence that the scripts are configured correctly. Loading extra
> work onto reviewers or committers that you could have easily done
> yourself isn't really a sustainable thing. We just lack bandwidth for
> that, plus it's bad etiquette.
>
> A few tips: It's fine to post rough patches to demonstrate ideas to
> the list, but make that clear when doing so. If your intention is that
> what you're sending gets committed, then please aim for as high a
> quality as you can. There just won't be any long-term patience for
> series of unvetted-by-human transformation patches from tools such as
> Coccinelle around here, especially so if you expect the community to
> vet them for you. I'm not saying that's what you're doing, but if you
> are, please take this advice.

Hi David,

My apologies for not being clear. I understand the situation and thank 
you for the advice. I'll keep that in mind in the future.

Here is an updated patch that I checked manually for unnecessary 
whitespace changes.

I also checked that "destroyStringInfo(info)" was handled correctly, 
which would be replacing it with a call to "pfree(info.data)" since the 
StringInfoData structure is now on the stack but the data buffer needs 
to be released since this is what a call to destroyStringInfo() would do.

Removing any calls of "pfree(info)" when "info" was changed to 
"StringInfoData" since it is no longer a dynamically allocated variable. 
Note that in this case it is not necessary to free "info.data" since the 
original code does not do that.

There is one such case affected by this patch, and there the buffer 
pointer is copied to another variable and then returned. This can 
probably be improved by just returning the buffer pointer directly 
without intermediate assignment to a variable, but I avoided this to 
make reviewing the code easier. It is easy to add this change either now 
or later and should not affect the generated code except at very low 
optimization levels.

I also added fixes manually inside check_publications_origin_tables, 
check_publications, and fetch_relation_list. In 
check_publication_origin_table three StringInfo was dynamically 
allocated but not released after. In check_publications and 
fetch_relation_list there were extra cases of using a dynamically 
allocated StringInfo that was not necessary.

Best wishes,
Mats Kindahl

Вложения

Re: Use stack-allocated StringInfoData

От
David Rowley
Дата:
On Tue, 4 Nov 2025 at 21:25, Mats Kindahl <mats.kindahl@gmail.com> wrote:
> Here is an updated patch that I checked manually for unnecessary
> whitespace changes.

Thanks for updating the patch.

> There is one such case affected by this patch, and there the buffer
> pointer is copied to another variable and then returned. This can
> probably be improved by just returning the buffer pointer directly
> without intermediate assignment to a variable, but I avoided this to
> make reviewing the code easier. It is easy to add this change either now
> or later and should not affect the generated code except at very low
> optimization levels.

Yeah, I think you must mean in build_backup_content(). I noticed that too.

> I also added fixes manually inside check_publications_origin_tables,
> check_publications, and fetch_relation_list. In
> check_publication_origin_table three StringInfo was dynamically
> allocated but not released after. In check_publications and
> fetch_relation_list there were extra cases of using a dynamically
> allocated StringInfo that was not necessary.

It's not your fault, but check_publications_origin_tables() looks like
a mess. It seems like excessive use of additional code just to avoid
two separate ereport() calls. Might be worth a follow-up patch to
clean that up.

The patch looks good to me. The only thing I'd adjust is to get rid of
the "data" variable from build_backup_content(), which I think you
mentioned above. I can take care of that one.

I can push this tomorrow morning UTC+13. I'll leave it til then in
case anyone else has any comments.

David



Re: Use stack-allocated StringInfoData

От
Chao Li
Дата:

> On Nov 5, 2025, at 19:01, David Rowley <dgrowleyml@gmail.com> wrote:
>
> It's not your fault, but check_publications_origin_tables() looks like
> a mess. It seems like excessive use of additional code just to avoid
> two separate ereport() calls. Might be worth a follow-up patch to
> clean that up.
>
> The patch looks good to me. The only thing I'd adjust is to get rid of
> the "data" variable from build_backup_content(), which I think you
> mentioned above. I can take care of that one.
>
> I can push this tomorrow morning UTC+13. I'll leave it til then in
> case anyone else has any comments.

No objection. But I just find that the patch missed one place in check_publications():

```
    if (list_length(publicationsCopy))
    {
        /* Prepare the list of non-existent publication(s) for error message. */
        StringInfo    pubnames = makeStringInfo();

        GetPublicationsStr(publicationsCopy, pubnames, false);
        ereport(WARNING,
                errcode(ERRCODE_UNDEFINED_OBJECT),
                errmsg_plural("publication %s does not exist on the publisher",
                              "publications %s do not exist on the publisher",
                              list_length(publicationsCopy),
                              pubnames->data));
    }
```

This “pubnames” can be replaced as well, and “pfree(pubnames.data)” should be added after “ereport”. As one place in
thefunction has been replaced in this patch, I guess we should not leave this place to a separate patch. 

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/







Re: Use stack-allocated StringInfoData

От
David Rowley
Дата:
On Thu, 6 Nov 2025 at 00:01, David Rowley <dgrowleyml@gmail.com> wrote:
> I can push this tomorrow morning UTC+13. I'll leave it til then in
> case anyone else has any comments.

I found a few more cases that could get the same treatment and also
included the check_publications() one mentioned by Chao.

I did end up removing the subscriptioncmds.c. I thought those should
likely just be fixed to use errmsg() and errhint() instead of their
"internal" counterpart.

I left out the Coccinelle script. I'm not currently qualified to
review that, and not aware of the policy about that. I'm aware there's
another thread with some discussion about Coccinelle, but I've not
read it.

David



Re: Use stack-allocated StringInfoData

От
Mats Kindahl
Дата:
On 11/5/25 12:01, David Rowley wrote:
On Tue, 4 Nov 2025 at 21:25, Mats Kindahl <mats.kindahl@gmail.com> wrote:
Here is an updated patch that I checked manually for unnecessary
whitespace changes.
Thanks for updating the patch.

There is one such case affected by this patch, and there the buffer
pointer is copied to another variable and then returned. This can
probably be improved by just returning the buffer pointer directly
without intermediate assignment to a variable, but I avoided this to
make reviewing the code easier. It is easy to add this change either now
or later and should not affect the generated code except at very low
optimization levels.
Yeah, I think you must mean in build_backup_content(). I noticed that too.


Yes, that is the one I meant. 

I also added fixes manually inside check_publications_origin_tables,
check_publications, and fetch_relation_list. In
check_publication_origin_table three StringInfo was dynamically
allocated but not released after. In check_publications and
fetch_relation_list there were extra cases of using a dynamically
allocated StringInfo that was not necessary.
It's not your fault, but check_publications_origin_tables() looks like
a mess. It seems like excessive use of additional code just to avoid
two separate ereport() calls. Might be worth a follow-up patch to
clean that up.


Yeah, I have noted that too, but I wanted to avoid making reviews more complicated than necessary. I thought it might be better to do as a separate effort.

I'll take a look at it.


The patch looks good to me. The only thing I'd adjust is to get rid of
the "data" variable from build_backup_content(), which I think you
mentioned above. I can take care of that one.

Thanks David.

Best wishes,
Mats Kindahl

Re: Use stack-allocated StringInfoData

От
Álvaro Herrera
Дата:
On 2025-Nov-06, Mats Kindahl wrote:

> On 11/5/25 12:01, David Rowley wrote:

> > It's not your fault, but check_publications_origin_tables() looks like
> > a mess. It seems like excessive use of additional code just to avoid
> > two separate ereport() calls. Might be worth a follow-up patch to
> > clean that up.
> 
> Yeah, I have noted that too, but I wanted to avoid making reviews more
> complicated than necessary. I thought it might be better to do as a separate
> effort.

Yeah, I came across this a few weeks ago as well.  I was thinking maybe
we can expand the ereport() macro instead of duplicating things.  It
looks a bit ugly and I don't think we do it anywhere else ...  I mean
something like

errstart(WARNING, NULL);
if (cond)
    errmsg( ... );
else
    errmsg( ... );
// and so on
errfinish(__FILE__, __LINE__, __func__);


Is this too ugly to live?  (I swear I had this on a branch already, but
can't find it right this instant.)

-- 
Álvaro Herrera        Breisgau, Deutschland  —  https://www.EnterpriseDB.com/
"No necesitamos banderas
 No reconocemos fronteras"                  (Jorge González)



Re: Use stack-allocated StringInfoData

От
David Rowley
Дата:
On Fri, 7 Nov 2025 at 00:24, Álvaro Herrera <alvherre@kurilemu.de> wrote:
> Yeah, I came across this a few weeks ago as well.  I was thinking maybe
> we can expand the ereport() macro instead of duplicating things.  It
> looks a bit ugly and I don't think we do it anywhere else ...  I mean
> something like
>
> errstart(WARNING, NULL);
> if (cond)
>         errmsg( ... );
> else
>         errmsg( ... );
> // and so on
> errfinish(__FILE__, __LINE__, __func__);
>
>
> Is this too ugly to live?  (I swear I had this on a branch already, but
> can't find it right this instant.)

I thought it could just become 2 separate ereport calls. The duplicate
string consts for the plural stuff are going to get de-duplicated
anyway. I didn't try it to see how it'd look, however.

David



Re: Use stack-allocated StringInfoData

От
Tom Lane
Дата:
=?utf-8?Q?=C3=81lvaro?= Herrera <alvherre@kurilemu.de> writes:
> Yeah, I came across this a few weeks ago as well.  I was thinking maybe
> we can expand the ereport() macro instead of duplicating things.  It
> looks a bit ugly and I don't think we do it anywhere else ...  I mean
> something like

> errstart(WARNING, NULL);
> if (cond)
>     errmsg( ... );
> else
>     errmsg( ... );
> // and so on
> errfinish(__FILE__, __LINE__, __func__);

> Is this too ugly to live?  (I swear I had this on a branch already, but
> can't find it right this instant.)

I really dislike exposing errstart/errfinish to calling code like
that.  I agree that check_publications_origin_tables is not pretty
as-is, but it's better than breaking open the ereport abstraction.

In the particular case at hand, it seems best to just write
two separate ereport calls for the check_table_sync and
not-check_table_sync cases, as David mentioned.  Duplicating
the errdetail_plural call is slightly annoying but it's net less
code and less surprise than what's there now.

In general, in most other places where we have conditional message
text, it's done via ?: operators within the ereport macro,
for example this decades-old bit in sysv_shmem.c:

        ereport(FATAL,
                (errmsg("could not create shared memory segment: %m"),
                 errdetail("Failed system call was shmget(key=%lu, size=%zu, 0%o).",
                           (unsigned long) memKey, size,
                           IPC_CREAT | IPC_EXCL | IPCProtection),
                 (shmget_errno == EINVAL) ?
                 errhint("This error usually means that PostgreSQL's request for a shared memory "
                         "segment exceeded your kernel's SHMMAX parameter, or possibly that "
                         "it is less than "
                         "your kernel's SHMMIN parameter.\n"
                         "The PostgreSQL documentation contains more information about shared "
                         "memory configuration.") : 0,
                 (shmget_errno == ENOMEM) ?
                 errhint("This error usually means that PostgreSQL's request for a shared "
                         "memory segment exceeded your kernel's SHMALL parameter.  You might need "
                         "to reconfigure the kernel with larger SHMALL.\n"
                         "The PostgreSQL documentation contains more information about shared "
                         "memory configuration.") : 0,
                 (shmget_errno == ENOSPC) ?
                 errhint("This error does *not* mean that you have run out of disk space.  "
                         "It occurs either if all available shared memory IDs have been taken, "
                         "in which case you need to raise the SHMMNI parameter in your kernel, "
                         "or because the system's overall limit for shared memory has been "
                         "reached.\n"
                         "The PostgreSQL documentation contains more information about shared "
                         "memory configuration.") : 0));

We could do something similar here, but I'm not convinced
it's better than just writing two ereport's.

            regards, tom lane