Обсуждение: Some efforts to get rid of "long" in our codebase

Поиск
Список
Период
Сортировка

Some efforts to get rid of "long" in our codebase

От
David Rowley
Дата:
I was playing around with the following imperfect regex to try and get
an idea of how many "long" usages we have.

git grep -E "^\s*(static)?\s*(unsigned)?\s*long\b" -- '*.c' '*.h' | wc -l

REL_13_STABLE -> 486
REL_14_STABLE -> 482
REL_15_STABLE -> 476
REL_16_STABLE -> 485
REL_17_STABLE -> 449
REL_18_STABLE -> 439
master -> 386

(Generally trending downwards with quite a good reduction since v18)

Many of the "long" usages are genuine, e.g for compatibility with
library functions, or in snprintf.c for the %ld and %lu format
specifiers; those we need. However, many of these we don't need. Many
are a possible source of bugs due to not all platforms we support
having the same idea about how wide longs are. We've had and fixed
bugs around this before.

I've attached a couple of patches to get the ball rolling.

0001: CATCACHE_STATS is using signed longs as counters for cache
hits/misses etc. I've changed this to uint64 rather than int64 as I
didn't see the need to have a signed type to count the occurrences of
an event. Maybe there's an anti-universe where negative occurrences
are a thing, but they're not in this one. If something goes awry with
the counters and that causes the subtraction that's being done to
wrap, then we're more likely to notice the bug via the excessively
large number the wrap would end up displaying.

0002: MemSet / MemSetAligned macros. It's probably about time someone
made these disappear, but that's likely for another thread with more
research than I'd like to do here. I replaced "long" with "Size". I
also considered "uintptr_t", but after some reading of the C standard,
I settled on Size as it seems it's possible for platforms to exist
where the pointer width is smaller than the processor's width. I
suspect it might not matter for the platforms we support? Size could
also be smaller than the processor's width, but I feel that's less
likely.

Which yields a drop in the Ocean fewer longs...

0001+0002 -> 367

David

Вложения

Re: Some efforts to get rid of "long" in our codebase

От
Heikki Linnakangas
Дата:
On 06/11/2025 13:46, David Rowley wrote:
> I've attached a couple of patches to get the ball rolling.

Thanks!

> 0001: CATCACHE_STATS is using signed longs as counters for cache
> hits/misses etc. I've changed this to uint64 rather than int64 as I
> didn't see the need to have a signed type to count the occurrences of
> an event. Maybe there's an anti-universe where negative occurrences
> are a thing, but they're not in this one. If something goes awry with
> the counters and that causes the subtraction that's being done to
> wrap, then we're more likely to notice the bug via the excessively
> large number the wrap would end up displaying.

Unsigned makes sense for these.

> @@ -476,7 +476,7 @@ CatCachePrintStats(int code, Datum arg)
>  
>          if (cache->cc_ntup == 0 && cache->cc_searches == 0)
>              continue;            /* don't print unused caches */
> -        elog(DEBUG2, "catcache %s/%u: %d tup, %ld srch, %ld+%ld=%ld hits, %ld+%ld=%ld loads, %ld invals, %d lists,
%ldlsrch, %ld lhits",
 
> +        elog(DEBUG2, "catcache %s/%u: %d tup, %" PRIu64 " srch, %" PRIu64 "+%" PRIu64 "=%" PRIu64 " hits, %" PRIu64
"+%"PRIu64 "=%" PRIu64 " loads, %" PRIu64 " invals, %d lists, %" PRIu64 " lsrch, %" PRIu64 " lhits",
 
>               cache->cc_relname,
>               cache->cc_indexoid,
>               cache->cc_ntup,

Unfortunately PRIu64 makes these much longer and less readable. I don't 
think there's much we can do about that though. Perhaps split the format 
string to multiple lines?

> 0002: MemSet / MemSetAligned macros. It's probably about time someone
> made these disappear, but that's likely for another thread with more
> research than I'd like to do here. I replaced "long" with "Size". I
> also considered "uintptr_t", but after some reading of the C standard,
> I settled on Size as it seems it's possible for platforms to exist
> where the pointer width is smaller than the processor's width. I
> suspect it might not matter for the platforms we support? Size could
> also be smaller than the processor's width, but I feel that's less
> likely.

Let's use size_t instead of Size in new code, per previous discussions 

(https://www.postgresql.org/message-id/flat/CA%2BRLCQyVqb9Xxni6x2iJYTawMbJv5gZY2fzNaw39%3D_yOtu_QKw%40mail.gmail.com#74391f75a649df13920f87360e361183)

- Heikki




Re: Some efforts to get rid of "long" in our codebase

От
Peter Eisentraut
Дата:
On 06.11.25 13:17, Heikki Linnakangas wrote:
>> @@ -476,7 +476,7 @@ CatCachePrintStats(int code, Datum arg)
>>
>>          if (cache->cc_ntup == 0 && cache->cc_searches == 0)
>>              continue;            /* don't print unused caches */
>> -        elog(DEBUG2, "catcache %s/%u: %d tup, %ld srch, %ld+%ld=%ld 
>> hits, %ld+%ld=%ld loads, %ld invals, %d lists, %ld lsrch, %ld lhits",
>> +        elog(DEBUG2, "catcache %s/%u: %d tup, %" PRIu64 " srch, %" 
>> PRIu64 "+%" PRIu64 "=%" PRIu64 " hits, %" PRIu64 "+%" PRIu64 "=%" 
>> PRIu64 " loads, %" PRIu64 " invals, %d lists, %" PRIu64 " lsrch, %" 
>> PRIu64 " lhits",
>>               cache->cc_relname,
>>               cache->cc_indexoid,
>>               cache->cc_ntup,
> 
> Unfortunately PRIu64 makes these much longer and less readable. I don't 
> think there's much we can do about that though. Perhaps split the format 
> string to multiple lines?

You could also use unsigned long long int for these, to make the format 
strings more readable.




Re: Some efforts to get rid of "long" in our codebase

От
Peter Eisentraut
Дата:
On 06.11.25 12:46, David Rowley wrote:
> 0002: MemSet / MemSetAligned macros. It's probably about time someone
> made these disappear, but that's likely for another thread with more
> research than I'd like to do here. I replaced "long" with "Size". I
> also considered "uintptr_t", but after some reading of the C standard,
> I settled on Size as it seems it's possible for platforms to exist
> where the pointer width is smaller than the processor's width. I
> suspect it might not matter for the platforms we support? Size could
> also be smaller than the processor's width, but I feel that's less
> likely.

I think size_t/Size could be misleading here.  You're not measuring any 
size, you're just chunking up the bytes to zero into something that we 
thing the compiler or CPU can handle very efficiently.

So in a sense, using long isn't wrong here.  It might well be the best 
for this.  If there is an aversion to using any long at all, why not 
long long.



Re: Some efforts to get rid of "long" in our codebase

От
David Rowley
Дата:
On Fri, 7 Nov 2025 at 07:33, Peter Eisentraut <peter@eisentraut.org> wrote:
>
> On 06.11.25 12:46, David Rowley wrote:
> > 0002: MemSet / MemSetAligned macros. It's probably about time someone
> > made these disappear, but that's likely for another thread with more
> > research than I'd like to do here. I replaced "long" with "Size". I
> > also considered "uintptr_t", but after some reading of the C standard,
> > I settled on Size as it seems it's possible for platforms to exist
> > where the pointer width is smaller than the processor's width. I
> > suspect it might not matter for the platforms we support? Size could
> > also be smaller than the processor's width, but I feel that's less
> > likely.
>
> I think size_t/Size could be misleading here.  You're not measuring any
> size, you're just chunking up the bytes to zero into something that we
> thing the compiler or CPU can handle very efficiently.
>
> So in a sense, using long isn't wrong here.  It might well be the best
> for this.  If there is an aversion to using any long at all, why not
> long long.

The point in changing this was that long isn't a good fit as it's not
64-bit on 64-bit Windows. My aim is to find a type with a width the
same as the processor's word size. long long tends to be 64-bit on
32-bit platforms, so doesn't seem a very good fit. Someone might argue
that doesn't matter now since we no longer have 4-byte Datums, but
IMO, this isn't any reason to make things even worse for 32-bit
builds.

David



Re: Some efforts to get rid of "long" in our codebase

От
Peter Eisentraut
Дата:
On 06.11.25 21:06, David Rowley wrote:
> On Fri, 7 Nov 2025 at 07:33, Peter Eisentraut <peter@eisentraut.org> wrote:
>>
>> On 06.11.25 12:46, David Rowley wrote:
>>> 0002: MemSet / MemSetAligned macros. It's probably about time someone
>>> made these disappear, but that's likely for another thread with more
>>> research than I'd like to do here. I replaced "long" with "Size". I
>>> also considered "uintptr_t", but after some reading of the C standard,
>>> I settled on Size as it seems it's possible for platforms to exist
>>> where the pointer width is smaller than the processor's width. I
>>> suspect it might not matter for the platforms we support? Size could
>>> also be smaller than the processor's width, but I feel that's less
>>> likely.
>>
>> I think size_t/Size could be misleading here.  You're not measuring any
>> size, you're just chunking up the bytes to zero into something that we
>> thing the compiler or CPU can handle very efficiently.
>>
>> So in a sense, using long isn't wrong here.  It might well be the best
>> for this.  If there is an aversion to using any long at all, why not
>> long long.
> 
> The point in changing this was that long isn't a good fit as it's not
> 64-bit on 64-bit Windows. My aim is to find a type with a width the
> same as the processor's word size. long long tends to be 64-bit on
> 32-bit platforms, so doesn't seem a very good fit. Someone might argue
> that doesn't matter now since we no longer have 4-byte Datums, but
> IMO, this isn't any reason to make things even worse for 32-bit
> builds.

Yeah you're right.  If we cared a lot about MemSet's longevity, I would 
suggest making a new type for this, but that would leak into all users 
of c.h, so maybe not.

I do suggest some kind of comment, that we're using size_t as a 
convenient proxy for the most suitable chunk/step size for the platform, 
not to actually measure a size.




Re: Some efforts to get rid of "long" in our codebase

От
Tom Lane
Дата:
Peter Eisentraut <peter@eisentraut.org> writes:
> I do suggest some kind of comment, that we're using size_t as a 
> convenient proxy for the most suitable chunk/step size for the platform, 
> not to actually measure a size.

Yeah.  There's not actually anything wrong with using "long" here,
except that we believe that on Win64 it's probably not the processor's
native word width.  Size/size_t is more likely to match that.

([u]intptr_t is even more likely to match, but I don't think that's a
good choice because it invites confusion with the usage of uintptr_t
for address arithmetic elsewhere in the macro.  That's quite unrelated
to the choice of copy step size.)

            regards, tom lane