Обсуждение: [PATCH] Use strchr() to search for a single character
Code in src/port/pgmkdirp.c uses strstr() to find a single character in a string, but strstr() seems to be too generic for this job. Another function, strchr(), might be better suited for this purpose, because it is optimized to search for exactly one character in a string. In addition, if strchr() is used, the compiler doesn't have to generate a terminating \0 byte for the substring, producing slightly smaller code. I'm attaching the patch. Regards, Dmitry
Вложения
On Sun, Jul 20, 2025 at 6:21 PM Dmitry Mityugov <d.mityugov@postgrespro.ru> wrote:
Code in src/port/pgmkdirp.c uses strstr() to find a single character in
a string, but strstr() seems to be too generic for this job. Another
function, strchr(), might be better suited for this purpose, because it
is optimized to search for exactly one character in a string. In
addition, if strchr() is used, the compiler doesn't have to generate a
terminating \0 byte for the substring, producing slightly smaller code.
I'm attaching the patch.
Regards,
Dmitry
Seems like a simple-enough change, not a huge win but probably worth doing.
Using ripgrep to search for 'strstr(.*".")' turns up two similar situations in contrib/fuzzystrmatch/dmetaphone.c, so perhaps we include those.
There's also a match in src/bin/pg_rewind/filemap.c, but that one is a false positive.
Corey Huinker писал(а) 2025-07-22 22:42: > On Sun, Jul 20, 2025 at 6:21 PM Dmitry Mityugov > <d.mityugov@postgrespro.ru> wrote: > >> Code in src/port/pgmkdirp.c uses strstr() to find a single character >> in >> a string, but strstr() seems to be too generic for this job. Another >> >> function, strchr(), might be better suited for this purpose, because >> it >> is optimized to search for exactly one character in a string. In >> addition, if strchr() is used, the compiler doesn't have to generate >> a >> terminating \0 byte for the substring, producing slightly smaller >> code. >> I'm attaching the patch. >> >> Regards, >> Dmitry > > Seems like a simple-enough change, not a huge win but probably worth > doing. > > Using ripgrep to search for 'strstr(.*".")' turns up two similar > situations in contrib/fuzzystrmatch/dmetaphone.c, so perhaps we > include those. > > There's also a match in src/bin/pg_rewind/filemap.c, but that one is a > false positive. Thank you for your attention to this problem. The code in contrib/fuzzystrmatch/dmetaphone.c indeed uses several calls to strstr() to search for a single character, but it also uses strstr() to search for strings that consist of more than a single character on adjacent lines, and replacing half of those strstr()s with strchr()s would make the code less consistent in my opinion. What's more important, it seems that this code in contrib/fuzzystrmatch/dmetaphone.c contains a bug. Statement `else if (strstr(s->str, "WITZ"))` at line 317 will never be executed, because if the string contains substring “W”, it will be handled at line 311, `if (strstr(s->str, "W"))`. Probably this bug should be fixed in a separate commit.
On Wed, 23 Jul 2025 at 09:34, Dmitry Mityugov <d.mityugov@postgrespro.ru> wrote: > Thank you for your attention to this problem. The code in > contrib/fuzzystrmatch/dmetaphone.c indeed uses several calls to strstr() > to search for a single character, but it also uses strstr() to search > for strings that consist of more than a single character on adjacent > lines, and replacing half of those strstr()s with strchr()s would make > the code less consistent in my opinion. That depends on what you're making consistent. If the consistency is that we always use strchr() when the search is for a single char, then it's not consistent to ignore that one. Looking at [1], it seems even ancient versions of gcc and clang rewrite the strstr() into a strchr() call when the search term is a single char string. So it might not be worth doing to any trouble here. [1] https://godbolt.org/z/q1xcKdzd7 David
David Rowley <dgrowleyml@gmail.com> writes:
> Looking at [1], it seems even ancient versions of gcc and clang
> rewrite the strstr() into a strchr() call when the search term is a
> single char string. So it might not be worth doing to any trouble
> here.
I was wondering if that might be true. However, your godbolt results
show that MSVC doesn't do this optimization, and the usage in
pgmkdirp.c is inside "#ifdef WIN32", so maybe it's worth fixing there.
I can't get excited about dmetaphone --- that's about as old and
crufty as anything in our tree. If we were going to touch it the
first thing I'd want to do is get rid of its non-multibyte-safe
upcasing logic (MakeUpper). The "WITZ" business is kind of
silly-looking, but the question it brings to my mind is whether
the algorithm was mistranscribed; so rather than just delete that
line we should do some research. If we care, that is.
regards, tom lane
On Wed, 23 Jul 2025 at 10:36, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > David Rowley <dgrowleyml@gmail.com> writes: > > Looking at [1], it seems even ancient versions of gcc and clang > > rewrite the strstr() into a strchr() call when the search term is a > > single char string. So it might not be worth doing to any trouble > > here. > > I was wondering if that might be true. However, your godbolt results > show that MSVC doesn't do this optimization, and the usage in > pgmkdirp.c is inside "#ifdef WIN32", so maybe it's worth fixing there. Yeah, I noticed MSVC not doing the rewrite. I didn't notice the mentioned use case was within an #ifdef WIN32. I'm currently thinking we should just fix the pgmkdirp.c instance and call it good. David
David Rowley <dgrowleyml@gmail.com> writes:
> I'm currently thinking we should just fix the pgmkdirp.c instance and
> call it good.
+1
regards, tom lane
On Wed, 23 Jul 2025 at 11:06, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > David Rowley <dgrowleyml@gmail.com> writes: > > I'm currently thinking we should just fix the pgmkdirp.c instance and > > call it good. > > +1 Done. David