Обсуждение: Some efforts to get rid of "long" in our codebase
I was playing around with the following imperfect regex to try and get an idea of how many "long" usages we have. git grep -E "^\s*(static)?\s*(unsigned)?\s*long\b" -- '*.c' '*.h' | wc -l REL_13_STABLE -> 486 REL_14_STABLE -> 482 REL_15_STABLE -> 476 REL_16_STABLE -> 485 REL_17_STABLE -> 449 REL_18_STABLE -> 439 master -> 386 (Generally trending downwards with quite a good reduction since v18) Many of the "long" usages are genuine, e.g for compatibility with library functions, or in snprintf.c for the %ld and %lu format specifiers; those we need. However, many of these we don't need. Many are a possible source of bugs due to not all platforms we support having the same idea about how wide longs are. We've had and fixed bugs around this before. I've attached a couple of patches to get the ball rolling. 0001: CATCACHE_STATS is using signed longs as counters for cache hits/misses etc. I've changed this to uint64 rather than int64 as I didn't see the need to have a signed type to count the occurrences of an event. Maybe there's an anti-universe where negative occurrences are a thing, but they're not in this one. If something goes awry with the counters and that causes the subtraction that's being done to wrap, then we're more likely to notice the bug via the excessively large number the wrap would end up displaying. 0002: MemSet / MemSetAligned macros. It's probably about time someone made these disappear, but that's likely for another thread with more research than I'd like to do here. I replaced "long" with "Size". I also considered "uintptr_t", but after some reading of the C standard, I settled on Size as it seems it's possible for platforms to exist where the pointer width is smaller than the processor's width. I suspect it might not matter for the platforms we support? Size could also be smaller than the processor's width, but I feel that's less likely. Which yields a drop in the Ocean fewer longs... 0001+0002 -> 367 David
Вложения
On 06/11/2025 13:46, David Rowley wrote: > I've attached a couple of patches to get the ball rolling. Thanks! > 0001: CATCACHE_STATS is using signed longs as counters for cache > hits/misses etc. I've changed this to uint64 rather than int64 as I > didn't see the need to have a signed type to count the occurrences of > an event. Maybe there's an anti-universe where negative occurrences > are a thing, but they're not in this one. If something goes awry with > the counters and that causes the subtraction that's being done to > wrap, then we're more likely to notice the bug via the excessively > large number the wrap would end up displaying. Unsigned makes sense for these. > @@ -476,7 +476,7 @@ CatCachePrintStats(int code, Datum arg) > > if (cache->cc_ntup == 0 && cache->cc_searches == 0) > continue; /* don't print unused caches */ > - elog(DEBUG2, "catcache %s/%u: %d tup, %ld srch, %ld+%ld=%ld hits, %ld+%ld=%ld loads, %ld invals, %d lists, %ldlsrch, %ld lhits", > + elog(DEBUG2, "catcache %s/%u: %d tup, %" PRIu64 " srch, %" PRIu64 "+%" PRIu64 "=%" PRIu64 " hits, %" PRIu64 "+%"PRIu64 "=%" PRIu64 " loads, %" PRIu64 " invals, %d lists, %" PRIu64 " lsrch, %" PRIu64 " lhits", > cache->cc_relname, > cache->cc_indexoid, > cache->cc_ntup, Unfortunately PRIu64 makes these much longer and less readable. I don't think there's much we can do about that though. Perhaps split the format string to multiple lines? > 0002: MemSet / MemSetAligned macros. It's probably about time someone > made these disappear, but that's likely for another thread with more > research than I'd like to do here. I replaced "long" with "Size". I > also considered "uintptr_t", but after some reading of the C standard, > I settled on Size as it seems it's possible for platforms to exist > where the pointer width is smaller than the processor's width. I > suspect it might not matter for the platforms we support? Size could > also be smaller than the processor's width, but I feel that's less > likely. Let's use size_t instead of Size in new code, per previous discussions (https://www.postgresql.org/message-id/flat/CA%2BRLCQyVqb9Xxni6x2iJYTawMbJv5gZY2fzNaw39%3D_yOtu_QKw%40mail.gmail.com#74391f75a649df13920f87360e361183) - Heikki
On 06.11.25 13:17, Heikki Linnakangas wrote: >> @@ -476,7 +476,7 @@ CatCachePrintStats(int code, Datum arg) >> >> if (cache->cc_ntup == 0 && cache->cc_searches == 0) >> continue; /* don't print unused caches */ >> - elog(DEBUG2, "catcache %s/%u: %d tup, %ld srch, %ld+%ld=%ld >> hits, %ld+%ld=%ld loads, %ld invals, %d lists, %ld lsrch, %ld lhits", >> + elog(DEBUG2, "catcache %s/%u: %d tup, %" PRIu64 " srch, %" >> PRIu64 "+%" PRIu64 "=%" PRIu64 " hits, %" PRIu64 "+%" PRIu64 "=%" >> PRIu64 " loads, %" PRIu64 " invals, %d lists, %" PRIu64 " lsrch, %" >> PRIu64 " lhits", >> cache->cc_relname, >> cache->cc_indexoid, >> cache->cc_ntup, > > Unfortunately PRIu64 makes these much longer and less readable. I don't > think there's much we can do about that though. Perhaps split the format > string to multiple lines? You could also use unsigned long long int for these, to make the format strings more readable.
On 06.11.25 12:46, David Rowley wrote: > 0002: MemSet / MemSetAligned macros. It's probably about time someone > made these disappear, but that's likely for another thread with more > research than I'd like to do here. I replaced "long" with "Size". I > also considered "uintptr_t", but after some reading of the C standard, > I settled on Size as it seems it's possible for platforms to exist > where the pointer width is smaller than the processor's width. I > suspect it might not matter for the platforms we support? Size could > also be smaller than the processor's width, but I feel that's less > likely. I think size_t/Size could be misleading here. You're not measuring any size, you're just chunking up the bytes to zero into something that we thing the compiler or CPU can handle very efficiently. So in a sense, using long isn't wrong here. It might well be the best for this. If there is an aversion to using any long at all, why not long long.
On Fri, 7 Nov 2025 at 07:33, Peter Eisentraut <peter@eisentraut.org> wrote: > > On 06.11.25 12:46, David Rowley wrote: > > 0002: MemSet / MemSetAligned macros. It's probably about time someone > > made these disappear, but that's likely for another thread with more > > research than I'd like to do here. I replaced "long" with "Size". I > > also considered "uintptr_t", but after some reading of the C standard, > > I settled on Size as it seems it's possible for platforms to exist > > where the pointer width is smaller than the processor's width. I > > suspect it might not matter for the platforms we support? Size could > > also be smaller than the processor's width, but I feel that's less > > likely. > > I think size_t/Size could be misleading here. You're not measuring any > size, you're just chunking up the bytes to zero into something that we > thing the compiler or CPU can handle very efficiently. > > So in a sense, using long isn't wrong here. It might well be the best > for this. If there is an aversion to using any long at all, why not > long long. The point in changing this was that long isn't a good fit as it's not 64-bit on 64-bit Windows. My aim is to find a type with a width the same as the processor's word size. long long tends to be 64-bit on 32-bit platforms, so doesn't seem a very good fit. Someone might argue that doesn't matter now since we no longer have 4-byte Datums, but IMO, this isn't any reason to make things even worse for 32-bit builds. David
On 06.11.25 21:06, David Rowley wrote: > On Fri, 7 Nov 2025 at 07:33, Peter Eisentraut <peter@eisentraut.org> wrote: >> >> On 06.11.25 12:46, David Rowley wrote: >>> 0002: MemSet / MemSetAligned macros. It's probably about time someone >>> made these disappear, but that's likely for another thread with more >>> research than I'd like to do here. I replaced "long" with "Size". I >>> also considered "uintptr_t", but after some reading of the C standard, >>> I settled on Size as it seems it's possible for platforms to exist >>> where the pointer width is smaller than the processor's width. I >>> suspect it might not matter for the platforms we support? Size could >>> also be smaller than the processor's width, but I feel that's less >>> likely. >> >> I think size_t/Size could be misleading here. You're not measuring any >> size, you're just chunking up the bytes to zero into something that we >> thing the compiler or CPU can handle very efficiently. >> >> So in a sense, using long isn't wrong here. It might well be the best >> for this. If there is an aversion to using any long at all, why not >> long long. > > The point in changing this was that long isn't a good fit as it's not > 64-bit on 64-bit Windows. My aim is to find a type with a width the > same as the processor's word size. long long tends to be 64-bit on > 32-bit platforms, so doesn't seem a very good fit. Someone might argue > that doesn't matter now since we no longer have 4-byte Datums, but > IMO, this isn't any reason to make things even worse for 32-bit > builds. Yeah you're right. If we cared a lot about MemSet's longevity, I would suggest making a new type for this, but that would leak into all users of c.h, so maybe not. I do suggest some kind of comment, that we're using size_t as a convenient proxy for the most suitable chunk/step size for the platform, not to actually measure a size.
Peter Eisentraut <peter@eisentraut.org> writes:
> I do suggest some kind of comment, that we're using size_t as a
> convenient proxy for the most suitable chunk/step size for the platform,
> not to actually measure a size.
Yeah. There's not actually anything wrong with using "long" here,
except that we believe that on Win64 it's probably not the processor's
native word width. Size/size_t is more likely to match that.
([u]intptr_t is even more likely to match, but I don't think that's a
good choice because it invites confusion with the usage of uintptr_t
for address arithmetic elsewhere in the macro. That's quite unrelated
to the choice of copy step size.)
regards, tom lane