Обсуждение: appendStringInfo vs appendStringInfoString
I did some benchmarking earlier in the week for the new patch which was just commited to allow formatting in the log_line_prefix string. In version 0.4 of the patch there was some performance regression as I was doing appendStringInfo(buf, "%*s", padding, variable); instead of appendStringInfoString(buf, variable); This regression was fixed in a later version of the patch by only using appendStringInfo when the padding was 0.
More details here: http://www.postgresql.org/message-id/CAApHDvreSGYvtXJvqHcXZL8_tXiKKiFXhQyXgqtnQ5Yo=MEfMg@mail.gmail.com
Today I started looking through the entire source tree to look for places where appendStringInfo() could be replaced by appendStringInfoString(), I now have a patch which is around 100 KB in size which changes a large number of these instances.
Example:
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 3107f9c..d0dea4f 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -6174,7 +6174,7 @@ flatten_set_variable_args(const char *name, List *args)
A_Const *con;
if (l != list_head(args))
- appendStringInfo(&buf, ", ");
+ appendStringInfoString(&buf, ", ");
I know lots of these places are not really in performance critical parts of the code, so it's likely not too big a performance gain in most places, though to be honest I didn't pay a great deal of attention to how hot the code path might be when I was editing.
I thought I would post here just to test the water on this type of change. I personally don't think it makes the code any less readable, but if performance is not going to be greatly improved then perhaps people would have objections to code churn.
I didn't run any benchmarks on the core code, but I did pull out all the stringinfo functions and write my own test application. I also did a trial on a new macro which could further improve performance when the string being appended in a string constant, although I just wrote this to gather opinions about the idea. It is not currently a part of the patch.
In my benchmarks I was just appending a 8 char string constant 1000 times onto a stringinfo, I did this 3000 times in my tests. The results were as follows:
Results:
1. appendStringInfoString in 0.404000 sec
2. appendStringInfo with %s in 1.118000 sec
3 .appendStringInfo in 1.300000 sec
4. appendStringInfoStringConst with in 0.221000 sec
You can see that appendStringInfoString() is far faster than appendStringInfo() this will be down to appendStringInfo using va_args whereas appendStringInfoString() just does a strlen() then memcpy(). Test 4 bypasses the strlen() by using sizeof() which of course can only be used with string constants.
Actual code:
1. appendStringInfoString(str, "test1234");
2. appendStringInfo(str, "%s", "test1234");
3. appendStringInfo(str, "test1234");
4. appendStringInfoStringConst(str, "test1234");
The macro for test 4 was as follows:
#define appendStringInfoStringConst(buf, s) appendBinaryStringInfo(buf, (s), sizeof(s)-1)
I should say at this point that I ran these benchmarks on windows 7 64bit, though my tests for the log_line_prefix patch were all on Linux and saw a similar slowdown on appendStringInfo() vs appendStringInfoString().
So let this be the thread to gather opinions on if my 100kb patch which replaces appendStringInfo with appendStringInfoString where possible would be welcomed by the community. Also would using appendStringInfoStringConst() be going 1 step too far?
Regards
David Rowley
On Sat, Sep 28, 2013 at 9:44 PM, David Rowley <dgrowleyml@gmail.com> wrote:
I did some benchmarking earlier in the week for the new patch which was just commited to allow formatting in the log_line_prefix string. In version 0.4 of the patch there was some performance regression as I was doing appendStringInfo(buf, "%*s", padding, variable); instead of appendStringInfoString(buf, variable); This regression was fixed in a later version of the patch by only using appendStringInfo when the padding was 0.More details here: http://www.postgresql.org/message-id/CAApHDvreSGYvtXJvqHcXZL8_tXiKKiFXhQyXgqtnQ5Yo=MEfMg@mail.gmail.comToday I started looking through the entire source tree to look for places where appendStringInfo() could be replaced by appendStringInfoString(), I now have a patch which is around 100 KB in size which changes a large number of these instances.
...
Also on making the changes I noticed a possible small bug in the code that could cause a crash if for some reason a translation contained a %s. I know it is an unlikely scenario, never-the-less here is a patch to fix it.
diff --git a/src/backend/utils/fmgr/dfmgr.c b/src/backend/utils/fmgr/dfmgr.c
index 562a7c9..91da50b 100644
--- a/src/backend/utils/fmgr/dfmgr.c
+++ b/src/backend/utils/fmgr/dfmgr.c
@@ -372,7 +372,7 @@ incompatible_module_error(const char *libname,
}
if (details.len == 0)
- appendStringInfo(&details,
+ appendStringInfoString(&details,
_("Magic block has unexpected length or padding difference."));
David
On 28.09.2013 12:44, David Rowley wrote: > The macro for test 4 was as follows: > #define appendStringInfoStringConst(buf, s) appendBinaryStringInfo(buf, > (s), sizeof(s)-1) If that makes any difference in practice, I wonder if we should just do: #define appendStringInfoString(buf, s) appendBinaryStringInfo(buf, (s), strlen(s)) With a compiler worth its salt, the strlen(s) will be optimized into a constant, if s is a constant. If it's not a constant, we'll just end up calling strlen(), like appendStringInfoString would anyway. That would turn a single function call into two in all of the non-constant callsites, though, bloating the code, so it might not be a win overall. - Heikki
On Sat, Sep 28, 2013 at 11:11 PM, Heikki Linnakangas <hlinnakangas@vmware.com> wrote:
On 28.09.2013 12:44, David Rowley wrote:If that makes any difference in practice, I wonder if we should just do:The macro for test 4 was as follows:
#define appendStringInfoStringConst(buf, s) appendBinaryStringInfo(buf,
(s), sizeof(s)-1)
#define appendStringInfoString(buf, s) appendBinaryStringInfo(buf, (s), strlen(s))
With a compiler worth its salt, the strlen(s) will be optimized into a constant, if s is a constant. If it's not a constant, we'll just end up calling strlen(), like appendStringInfoString would anyway. That would turn a single function call into two in all of the non-constant callsites, though, bloating the code, so it might not be a win overall.
Nice idea.
I quick test shows that this works with the MS compiler I'm using on windows.
appendStringInfoString in 0.249000 sec <---
appendStringInfo with %s in 1.135000 sec
appendStringInfo in 1.295000 sec
appendStringInfoStringConst with in 0.245000 sec
Regards
David
- Heikki
On 2013-09-28 14:11:29 +0300, Heikki Linnakangas wrote: > On 28.09.2013 12:44, David Rowley wrote: > >The macro for test 4 was as follows: > >#define appendStringInfoStringConst(buf, s) appendBinaryStringInfo(buf, > >(s), sizeof(s)-1) > > If that makes any difference in practice, I wonder if we should just do: > > #define appendStringInfoString(buf, s) appendBinaryStringInfo(buf, (s), > strlen(s)) Doesn't that have a bit too much of an multiple evaluation danger? Maybe make it a static inline in the header instead for the platforms that support it? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Sat, Sep 28, 2013 at 11:11 PM, Heikki Linnakangas <hlinnakangas@vmware.com> wrote:
On 28.09.2013 12:44, David Rowley wrote:If that makes any difference in practice, I wonder if we should just do:The macro for test 4 was as follows:
#define appendStringInfoStringConst(buf, s) appendBinaryStringInfo(buf,
(s), sizeof(s)-1)
#define appendStringInfoString(buf, s) appendBinaryStringInfo(buf, (s), strlen(s))
With a compiler worth its salt, the strlen(s) will be optimized into a constant, if s is a constant. If it's not a constant, we'll just end up calling strlen(), like appendStringInfoString would anyway. That would turn a single function call into two in all of the non-constant callsites, though, bloating the code, so it might not be a win overall.
I'm just looking at this again wondering how much impact changing appendStringInfoString into a macro would have.
If I search the entire source for the regular expression appendStringInfoString\(.*\,\s["].*["].*
which I think should match any usage with a string constant like appendStringInfoString(buf, "some text here"); but perhaps may miss instances that spread over more than 1 line. I'm getting 611 matches.
If I search for appendStringInfoString\(.*\,\s\w.* which should get the instances that are not appending string constants I get 161 matches.
So it looks like with the macro idea it would only add the extra function call in around 161 places and it would speed up 611 cases.
Note that I did these checks on my patched version of the source, the current head will have less matches of each.
I just checked how much the binary increased in size after making this change.
Before changing the macro the binary was 4,473,856 bytes
After changing the macro the binary is 4,476,928 bytes.
Regards
David
- Heikki
On Sat, 2013-09-28 at 21:50 +1200, David Rowley wrote: > Also on making the changes I noticed a possible small bug in the code > that could cause a crash if for some reason a translation contained a > %s. I know it is an unlikely scenario, never-the-less here is a patch > to fix it. There are mechanisms in place that prevent a translation from having different format specifiers than the original. So this is nothing you have to be concerned about. (If it were, we'd have a much much bigger problem.)
On Sat, Sep 28, 2013 at 9:44 PM, David Rowley <dgrowleyml@gmail.com> wrote:
I did some benchmarking earlier in the week for the new patch which was just commited to allow formatting in the log_line_prefix string. In version 0.4 of the patch there was some performance regression as I was doing appendStringInfo(buf, "%*s", padding, variable); instead of appendStringInfoString(buf, variable); This regression was fixed in a later version of the patch by only using appendStringInfo when the padding was 0.More details here: http://www.postgresql.org/message-id/CAApHDvreSGYvtXJvqHcXZL8_tXiKKiFXhQyXgqtnQ5Yo=MEfMg@mail.gmail.comToday I started looking through the entire source tree to look for places where appendStringInfo() could be replaced by appendStringInfoString(), I now have a patch which is around 100 KB in size which changes a large number of these instances.Example:diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.cindex 3107f9c..d0dea4f 100644--- a/src/backend/utils/misc/guc.c+++ b/src/backend/utils/misc/guc.c@@ -6174,7 +6174,7 @@ flatten_set_variable_args(const char *name, List *args)A_Const *con;if (l != list_head(args))- appendStringInfo(&buf, ", ");+ appendStringInfoString(&buf, ", ");I know lots of these places are not really in performance critical parts of the code, so it's likely not too big a performance gain in most places, though to be honest I didn't pay a great deal of attention to how hot the code path might be when I was editing.I thought I would post here just to test the water on this type of change. I personally don't think it makes the code any less readable, but if performance is not going to be greatly improved then perhaps people would have objections to code churn.I didn't run any benchmarks on the core code, but I did pull out all the stringinfo functions and write my own test application. I also did a trial on a new macro which could further improve performance when the string being appended in a string constant, although I just wrote this to gather opinions about the idea. It is not currently a part of the patch.In my benchmarks I was just appending a 8 char string constant 1000 times onto a stringinfo, I did this 3000 times in my tests. The results were as follows:Results:1. appendStringInfoString in 0.404000 sec2. appendStringInfo with %s in 1.118000 sec3 .appendStringInfo in 1.300000 sec4. appendStringInfoStringConst with in 0.221000 secYou can see that appendStringInfoString() is far faster than appendStringInfo() this will be down to appendStringInfo using va_args whereas appendStringInfoString() just does a strlen() then memcpy(). Test 4 bypasses the strlen() by using sizeof() which of course can only be used with string constants.Actual code:1. appendStringInfoString(str, "test1234");2. appendStringInfo(str, "%s", "test1234");3. appendStringInfo(str, "test1234");4. appendStringInfoStringConst(str, "test1234");The macro for test 4 was as follows:#define appendStringInfoStringConst(buf, s) appendBinaryStringInfo(buf, (s), sizeof(s)-1)I should say at this point that I ran these benchmarks on windows 7 64bit, though my tests for the log_line_prefix patch were all on Linux and saw a similar slowdown on appendStringInfo() vs appendStringInfoString().So let this be the thread to gather opinions on if my 100kb patch which replaces appendStringInfo with appendStringInfoString where possible would be welcomed by the community. Also would using appendStringInfoStringConst() be going 1 step too far?Regards
I've attached a the cleanup patch for this. This one just converts instances of appendStringInfo into appendStringInfoString where appendStringInfo does no formatting or just has the format "%s".
David Rowley
Вложения
On Mon, Sep 30, 2013 at 10:10 PM, David Rowley <dgrowleyml@gmail.com> wrote:
On Sat, Sep 28, 2013 at 9:44 PM, David Rowley <dgrowleyml@gmail.com> wrote:I did some benchmarking earlier in the week for the new patch which was just commited to allow formatting in the log_line_prefix string. In version 0.4 of the patch there was some performance regression as I was doing appendStringInfo(buf, "%*s", padding, variable); instead of appendStringInfoString(buf, variable); This regression was fixed in a later version of the patch by only using appendStringInfo when the padding was 0.More details here: http://www.postgresql.org/message-id/CAApHDvreSGYvtXJvqHcXZL8_tXiKKiFXhQyXgqtnQ5Yo=MEfMg@mail.gmail.comI've attached a the cleanup patch for this. This one just converts instances of appendStringInfo into appendStringInfoString where appendStringInfo does no formatting or just has the format "%s".
I've attached a re-based version of this.
David Rowley
Вложения
On Wed, Oct 30, 2013 at 4:51 AM, David Rowley <dgrowleyml@gmail.com> wrote: > I've attached a re-based version of this. I don't see any compelling reason not to commit this. Does anyone wish to object? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas escribió: > On Wed, Oct 30, 2013 at 4:51 AM, David Rowley <dgrowleyml@gmail.com> wrote: > > I've attached a re-based version of this. > > I don't see any compelling reason not to commit this. Does anyone > wish to object? I think a blanket substitution of places that currently have %s might cause bugs, particularly if the string is user-supplied. It might be right for many cases, but did you/someone review each such callsite? No objection to the other half, that substitute calls that don't have %s. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 2013-10-30 10:52:05 -0300, Alvaro Herrera wrote: > Robert Haas escribió: > > On Wed, Oct 30, 2013 at 4:51 AM, David Rowley <dgrowleyml@gmail.com> wrote: > > > I've attached a re-based version of this. > > > > I don't see any compelling reason not to commit this. Does anyone > > wish to object? > > I think a blanket substitution of places that currently have %s might > cause bugs, particularly if the string is user-supplied. It might be > right for many cases, but did you/someone review each such callsite? > > No objection to the other half, that substitute calls that don't have > %s. appendStringInfoString() doesn't expand format strings, so I am not sure what you're worried about? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Andres Freund escribió: > On 2013-10-30 10:52:05 -0300, Alvaro Herrera wrote: > > Robert Haas escribió: > > > On Wed, Oct 30, 2013 at 4:51 AM, David Rowley <dgrowleyml@gmail.com> wrote: > > > > I've attached a re-based version of this. > > > > > > I don't see any compelling reason not to commit this. Does anyone > > > wish to object? > > > > I think a blanket substitution of places that currently have %s might > > cause bugs, particularly if the string is user-supplied. It might be > > right for many cases, but did you/someone review each such callsite? > > > > No objection to the other half, that substitute calls that don't have > > %s. > > appendStringInfoString() doesn't expand format strings, so I am not sure > what you're worried about? Um. Blame my lack of decent breakfast this morning. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Wed, Oct 30, 2013 at 12:12 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > Andres Freund escribió: >> On 2013-10-30 10:52:05 -0300, Alvaro Herrera wrote: >> > Robert Haas escribió: >> > > On Wed, Oct 30, 2013 at 4:51 AM, David Rowley <dgrowleyml@gmail.com> wrote: >> > > > I've attached a re-based version of this. >> > > >> > > I don't see any compelling reason not to commit this. Does anyone >> > > wish to object? >> > >> > I think a blanket substitution of places that currently have %s might >> > cause bugs, particularly if the string is user-supplied. It might be >> > right for many cases, but did you/someone review each such callsite? >> > >> > No objection to the other half, that substitute calls that don't have >> > %s. >> >> appendStringInfoString() doesn't expand format strings, so I am not sure >> what you're worried about? > > Um. Blame my lack of decent breakfast this morning. So, does that mean we're good to go? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas escribió: > So, does that mean we're good to go? Looks fine to me ... -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services