Обсуждение: appendStringInfo vs appendStringInfoString

Поиск
Список
Период
Сортировка

appendStringInfo vs appendStringInfoString

От
David Rowley
Дата:
I did some benchmarking earlier in the week for the new patch which was just commited to allow formatting in the log_line_prefix string. In version 0.4 of the patch there was some performance regression as I was doing appendStringInfo(buf, "%*s", padding, variable); instead of appendStringInfoString(buf, variable); This regression was fixed in a later version of the patch by only using appendStringInfo when the padding was 0.


Today I started looking through the entire source tree to look for places where appendStringInfo() could be replaced by appendStringInfoString(), I now have a patch which is around 100 KB in size which changes a large number of these instances.

Example:

diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 3107f9c..d0dea4f 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -6174,7 +6174,7 @@ flatten_set_variable_args(const char *name, List *args)
  A_Const    *con;
 
  if (l != list_head(args))
- appendStringInfo(&buf, ", ");
+ appendStringInfoString(&buf, ", ");
 

I know lots of these places are not really in performance critical parts of the code, so it's likely not too big a performance gain in most places, though to be honest I didn't pay a great deal of attention to how hot the code path might be when I was editing.

I thought I would post here just to test the water on this type of change. I personally don't think it makes the code any less readable, but if performance is not going to be greatly improved then perhaps people would have objections to code churn.

I didn't run any benchmarks on the core code, but I did pull out all the stringinfo functions and write my own test application. I also did a trial on a new macro which could further improve performance when the string being appended in a string constant, although I just wrote this to gather opinions about the idea. It is not currently a part of the patch.

In my benchmarks I was just appending a 8 char string constant 1000 times onto a stringinfo, I did this 3000 times in my tests. The results were as follows:

Results:
1. appendStringInfoString in 0.404000 sec
2. appendStringInfo with %s in 1.118000 sec
3 .appendStringInfo in 1.300000 sec
4. appendStringInfoStringConst with in 0.221000 sec


You can see that appendStringInfoString() is far faster than appendStringInfo() this will be down to appendStringInfo using va_args whereas appendStringInfoString() just does a strlen() then memcpy(). Test 4 bypasses the strlen() by using sizeof() which of course can only be used with string constants.

Actual code:
1. appendStringInfoString(str, "test1234");
2. appendStringInfo(str, "%s", "test1234");
3. appendStringInfo(str, "test1234");
4. appendStringInfoStringConst(str, "test1234");


The macro for test 4 was as follows:
#define appendStringInfoStringConst(buf, s) appendBinaryStringInfo(buf, (s), sizeof(s)-1)

I should say at this point that I ran these benchmarks on windows 7 64bit, though my tests for the log_line_prefix patch were all on Linux and saw a similar slowdown on appendStringInfo() vs appendStringInfoString().

So let this be the thread to gather opinions on if my 100kb patch which replaces appendStringInfo with appendStringInfoString where possible would be welcomed by the community. Also would using appendStringInfoStringConst() be going 1 step too far?

Regards

David Rowley

Re: appendStringInfo vs appendStringInfoString

От
David Rowley
Дата:
On Sat, Sep 28, 2013 at 9:44 PM, David Rowley <dgrowleyml@gmail.com> wrote:
I did some benchmarking earlier in the week for the new patch which was just commited to allow formatting in the log_line_prefix string. In version 0.4 of the patch there was some performance regression as I was doing appendStringInfo(buf, "%*s", padding, variable); instead of appendStringInfoString(buf, variable); This regression was fixed in a later version of the patch by only using appendStringInfo when the padding was 0.


Today I started looking through the entire source tree to look for places where appendStringInfo() could be replaced by appendStringInfoString(), I now have a patch which is around 100 KB in size which changes a large number of these instances.



...

Also on making the changes I noticed a possible small bug in the code that could cause a crash if for some reason a translation contained a %s. I know it is an unlikely scenario, never-the-less here is a patch to fix it.

diff --git a/src/backend/utils/fmgr/dfmgr.c b/src/backend/utils/fmgr/dfmgr.c
index 562a7c9..91da50b 100644
--- a/src/backend/utils/fmgr/dfmgr.c
+++ b/src/backend/utils/fmgr/dfmgr.c
@@ -372,7 +372,7 @@ incompatible_module_error(const char *libname,
  }
 
  if (details.len == 0)
- appendStringInfo(&details,
+ appendStringInfoString(&details,
   _("Magic block has unexpected length or padding difference.")); 


David

Re: appendStringInfo vs appendStringInfoString

От
Heikki Linnakangas
Дата:
On 28.09.2013 12:44, David Rowley wrote:
> The macro for test 4 was as follows:
> #define appendStringInfoStringConst(buf, s) appendBinaryStringInfo(buf,
> (s), sizeof(s)-1)

If that makes any difference in practice, I wonder if we should just do:

#define appendStringInfoString(buf, s) appendBinaryStringInfo(buf, (s), 
strlen(s))

With a compiler worth its salt, the strlen(s) will be optimized into a 
constant, if s is a constant. If it's not a constant, we'll just end up 
calling strlen(), like appendStringInfoString would anyway. That would 
turn a single function call into two in all of the non-constant 
callsites, though, bloating the code, so it might not be a win overall.

- Heikki



Re: appendStringInfo vs appendStringInfoString

От
David Rowley
Дата:
On Sat, Sep 28, 2013 at 11:11 PM, Heikki Linnakangas <hlinnakangas@vmware.com> wrote:
On 28.09.2013 12:44, David Rowley wrote:
The macro for test 4 was as follows:
#define appendStringInfoStringConst(buf, s) appendBinaryStringInfo(buf,
(s), sizeof(s)-1)

If that makes any difference in practice, I wonder if we should just do:

#define appendStringInfoString(buf, s) appendBinaryStringInfo(buf, (s), strlen(s))

With a compiler worth its salt, the strlen(s) will be optimized into a constant, if s is a constant. If it's not a constant, we'll just end up calling strlen(), like appendStringInfoString would anyway. That would turn a single function call into two in all of the non-constant callsites, though, bloating the code, so it might not be a win overall.


Nice idea.
I quick test shows that this works with the MS compiler I'm using on windows.

appendStringInfoString in 0.249000 sec <---
appendStringInfo with %s in 1.135000 sec
appendStringInfo in 1.295000 sec
appendStringInfoStringConst with in 0.245000 sec

Regards

David
 
- Heikki

Re: appendStringInfo vs appendStringInfoString

От
Andres Freund
Дата:
On 2013-09-28 14:11:29 +0300, Heikki Linnakangas wrote:
> On 28.09.2013 12:44, David Rowley wrote:
> >The macro for test 4 was as follows:
> >#define appendStringInfoStringConst(buf, s) appendBinaryStringInfo(buf,
> >(s), sizeof(s)-1)
> 
> If that makes any difference in practice, I wonder if we should just do:
> 
> #define appendStringInfoString(buf, s) appendBinaryStringInfo(buf, (s),
> strlen(s))

Doesn't that have a bit too much of an multiple evaluation danger? Maybe
make it a static inline in the header instead for the platforms that
support it?

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: appendStringInfo vs appendStringInfoString

От
David Rowley
Дата:
On Sat, Sep 28, 2013 at 11:11 PM, Heikki Linnakangas <hlinnakangas@vmware.com> wrote:
On 28.09.2013 12:44, David Rowley wrote:
The macro for test 4 was as follows:
#define appendStringInfoStringConst(buf, s) appendBinaryStringInfo(buf,
(s), sizeof(s)-1)

If that makes any difference in practice, I wonder if we should just do:

#define appendStringInfoString(buf, s) appendBinaryStringInfo(buf, (s), strlen(s))

With a compiler worth its salt, the strlen(s) will be optimized into a constant, if s is a constant. If it's not a constant, we'll just end up calling strlen(), like appendStringInfoString would anyway. That would turn a single function call into two in all of the non-constant callsites, though, bloating the code, so it might not be a win overall.


I'm just looking at this again wondering how much impact changing appendStringInfoString into a macro would have.

If I search the entire source for the regular expression appendStringInfoString\(.*\,\s["].*["].*
which I think should match any usage with a string constant like appendStringInfoString(buf, "some text here"); but perhaps may miss instances that spread over more than 1 line. I'm getting 611 matches.

If I search for appendStringInfoString\(.*\,\s\w.* which should get the instances that are not appending string constants I get 161 matches.
So it looks like with the macro idea it would only add the extra function call in around 161 places and it would speed up 611 cases.

Note that I did these checks on my patched version of the source, the current head will have less matches of each.

I just checked how much the binary increased in size after making this change.

Before changing the macro the binary was 4,473,856 bytes
After changing the macro the binary is 4,476,928 bytes.


Regards

David


 
- Heikki

Re: appendStringInfo vs appendStringInfoString

От
Peter Eisentraut
Дата:
On Sat, 2013-09-28 at 21:50 +1200, David Rowley wrote:
> Also on making the changes I noticed a possible small bug in the code
> that could cause a crash if for some reason a translation contained a
> %s. I know it is an unlikely scenario, never-the-less here is a patch
> to fix it.

There are mechanisms in place that prevent a translation from having
different format specifiers than the original.  So this is nothing you
have to be concerned about.  (If it were, we'd have a much much bigger
problem.)




Re: appendStringInfo vs appendStringInfoString

От
David Rowley
Дата:
On Sat, Sep 28, 2013 at 9:44 PM, David Rowley <dgrowleyml@gmail.com> wrote:
I did some benchmarking earlier in the week for the new patch which was just commited to allow formatting in the log_line_prefix string. In version 0.4 of the patch there was some performance regression as I was doing appendStringInfo(buf, "%*s", padding, variable); instead of appendStringInfoString(buf, variable); This regression was fixed in a later version of the patch by only using appendStringInfo when the padding was 0.


Today I started looking through the entire source tree to look for places where appendStringInfo() could be replaced by appendStringInfoString(), I now have a patch which is around 100 KB in size which changes a large number of these instances.

Example:

diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 3107f9c..d0dea4f 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -6174,7 +6174,7 @@ flatten_set_variable_args(const char *name, List *args)
  A_Const    *con;
 
  if (l != list_head(args))
- appendStringInfo(&buf, ", ");
+ appendStringInfoString(&buf, ", ");
 

I know lots of these places are not really in performance critical parts of the code, so it's likely not too big a performance gain in most places, though to be honest I didn't pay a great deal of attention to how hot the code path might be when I was editing.

I thought I would post here just to test the water on this type of change. I personally don't think it makes the code any less readable, but if performance is not going to be greatly improved then perhaps people would have objections to code churn.

I didn't run any benchmarks on the core code, but I did pull out all the stringinfo functions and write my own test application. I also did a trial on a new macro which could further improve performance when the string being appended in a string constant, although I just wrote this to gather opinions about the idea. It is not currently a part of the patch.

In my benchmarks I was just appending a 8 char string constant 1000 times onto a stringinfo, I did this 3000 times in my tests. The results were as follows:

Results:
1. appendStringInfoString in 0.404000 sec
2. appendStringInfo with %s in 1.118000 sec
3 .appendStringInfo in 1.300000 sec
4. appendStringInfoStringConst with in 0.221000 sec


You can see that appendStringInfoString() is far faster than appendStringInfo() this will be down to appendStringInfo using va_args whereas appendStringInfoString() just does a strlen() then memcpy(). Test 4 bypasses the strlen() by using sizeof() which of course can only be used with string constants.

Actual code:
1. appendStringInfoString(str, "test1234");
2. appendStringInfo(str, "%s", "test1234");
3. appendStringInfo(str, "test1234");
4. appendStringInfoStringConst(str, "test1234");


The macro for test 4 was as follows:
#define appendStringInfoStringConst(buf, s) appendBinaryStringInfo(buf, (s), sizeof(s)-1)

I should say at this point that I ran these benchmarks on windows 7 64bit, though my tests for the log_line_prefix patch were all on Linux and saw a similar slowdown on appendStringInfo() vs appendStringInfoString().

So let this be the thread to gather opinions on if my 100kb patch which replaces appendStringInfo with appendStringInfoString where possible would be welcomed by the community. Also would using appendStringInfoStringConst() be going 1 step too far?

Regards


I've attached a the cleanup patch for this. This one just converts instances of appendStringInfo into appendStringInfoString where appendStringInfo does no formatting or just has the format "%s".

 
David Rowley

Вложения

Re: appendStringInfo vs appendStringInfoString

От
David Rowley
Дата:
On Mon, Sep 30, 2013 at 10:10 PM, David Rowley <dgrowleyml@gmail.com> wrote:
On Sat, Sep 28, 2013 at 9:44 PM, David Rowley <dgrowleyml@gmail.com> wrote:
I did some benchmarking earlier in the week for the new patch which was just commited to allow formatting in the log_line_prefix string. In version 0.4 of the patch there was some performance regression as I was doing appendStringInfo(buf, "%*s", padding, variable); instead of appendStringInfoString(buf, variable); This regression was fixed in a later version of the patch by only using appendStringInfo when the padding was 0.


I've attached a the cleanup patch for this. This one just converts instances of appendStringInfo into appendStringInfoString where appendStringInfo does no formatting or just has the format "%s".


I've attached a re-based version of this.
 
 
David Rowley


Вложения

Re: appendStringInfo vs appendStringInfoString

От
Robert Haas
Дата:
On Wed, Oct 30, 2013 at 4:51 AM, David Rowley <dgrowleyml@gmail.com> wrote:
> I've attached a re-based version of this.

I don't see any compelling reason not to commit this.  Does anyone
wish to object?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: appendStringInfo vs appendStringInfoString

От
Alvaro Herrera
Дата:
Robert Haas escribió:
> On Wed, Oct 30, 2013 at 4:51 AM, David Rowley <dgrowleyml@gmail.com> wrote:
> > I've attached a re-based version of this.
> 
> I don't see any compelling reason not to commit this.  Does anyone
> wish to object?

I think a blanket substitution of places that currently have %s might
cause bugs, particularly if the string is user-supplied.  It might be
right for many cases, but did you/someone review each such callsite?

No objection to the other half, that substitute calls that don't have
%s.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services



Re: appendStringInfo vs appendStringInfoString

От
Andres Freund
Дата:
On 2013-10-30 10:52:05 -0300, Alvaro Herrera wrote:
> Robert Haas escribió:
> > On Wed, Oct 30, 2013 at 4:51 AM, David Rowley <dgrowleyml@gmail.com> wrote:
> > > I've attached a re-based version of this.
> >
> > I don't see any compelling reason not to commit this.  Does anyone
> > wish to object?
>
> I think a blanket substitution of places that currently have %s might
> cause bugs, particularly if the string is user-supplied.  It might be
> right for many cases, but did you/someone review each such callsite?
>
> No objection to the other half, that substitute calls that don't have
> %s.

appendStringInfoString() doesn't expand format strings, so I am not sure
what you're worried about?


Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: appendStringInfo vs appendStringInfoString

От
Alvaro Herrera
Дата:
Andres Freund escribió:
> On 2013-10-30 10:52:05 -0300, Alvaro Herrera wrote:
> > Robert Haas escribió:
> > > On Wed, Oct 30, 2013 at 4:51 AM, David Rowley <dgrowleyml@gmail.com> wrote:
> > > > I've attached a re-based version of this.
> > > 
> > > I don't see any compelling reason not to commit this.  Does anyone
> > > wish to object?
> > 
> > I think a blanket substitution of places that currently have %s might
> > cause bugs, particularly if the string is user-supplied.  It might be
> > right for many cases, but did you/someone review each such callsite?
> > 
> > No objection to the other half, that substitute calls that don't have
> > %s.
> 
> appendStringInfoString() doesn't expand format strings, so I am not sure
> what you're worried about?

Um.  Blame my lack of decent breakfast this morning.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services



Re: appendStringInfo vs appendStringInfoString

От
Robert Haas
Дата:
On Wed, Oct 30, 2013 at 12:12 PM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
> Andres Freund escribió:
>> On 2013-10-30 10:52:05 -0300, Alvaro Herrera wrote:
>> > Robert Haas escribió:
>> > > On Wed, Oct 30, 2013 at 4:51 AM, David Rowley <dgrowleyml@gmail.com> wrote:
>> > > > I've attached a re-based version of this.
>> > >
>> > > I don't see any compelling reason not to commit this.  Does anyone
>> > > wish to object?
>> >
>> > I think a blanket substitution of places that currently have %s might
>> > cause bugs, particularly if the string is user-supplied.  It might be
>> > right for many cases, but did you/someone review each such callsite?
>> >
>> > No objection to the other half, that substitute calls that don't have
>> > %s.
>>
>> appendStringInfoString() doesn't expand format strings, so I am not sure
>> what you're worried about?
>
> Um.  Blame my lack of decent breakfast this morning.

So, does that mean we're good to go?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: appendStringInfo vs appendStringInfoString

От
Alvaro Herrera
Дата:
Robert Haas escribió:

> So, does that mean we're good to go?

Looks fine to me ...

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services