Обсуждение: Safer hash table initialization macro

Поиск

Список

Период

Сортировка

Safer hash table initialization macro

От

Bertrand Drouvot

Дата:

01 декабря 2025 г., 16:45:00

Hi hackers,

Currently to create a hash table we do things like:

A) create a struct, say:

 typedef struct SeenRelsEntry
 {
    Oid   rel_id;
    int   list_index;
 } SeenRelsEntry;

where the first member is the hash key, and then later:

B)

 ctl.keysize = sizeof(Oid);
 ctl.entrysize = sizeof(SeenRelsEntry);
 ctl.hcxt = CurrentMemoryContext;

 seen_rels = hash_create("find_all_inheritors temporary table",
                         32, /* start small and extend */
                         &ctl,

I can see 2 possible issues:

1)

We manually specify the type for keysize, which could become incorrect (from the
start) or if the key member's type changes.

2) 

It may be possible to remove the key member without the compiler noticing it.

Take this example and remove:

diff --git a/src/backend/catalog/pg_inherits.c b/src/backend/catalog/pg_inherits.c
index 929bb53b620..eb11976afef 100644
--- a/src/backend/catalog/pg_inherits.c
+++ b/src/backend/catalog/pg_inherits.c
@@ -36,7 +36,6 @@
  */
 typedef struct SeenRelsEntry
 {
-       Oid                     rel_id;                 /* relation oid */
        int                     list_index;             /* its position in output list(s) */
 } SeenRelsEntry;

That would compile without any issues because this rel_id member is not
referenced in the code (for this particular example). That's rare but possible.

But then, on my machine, during make check:

TRAP: failed Assert("!found"), File: "nodeModifyTable.c", Line: 5157, PID: 140430

The reason is that the struct member access is done only for bytes level
operations (within the hash related macros). So it's easy to think that this
member is unused (because it is not referenced in the code).

I'm thinking about what kind of safety we could put in place to better deal with
1) and 2).

What about adding a macro that:

- requests the key member name
- ensures that it is at offset 0
- computes the key size based on the member

Something like:

"
#define HASH_ELEM_INIT(ctl, entrytype, keymember) \
    do { \
        StaticAssertStmt(offsetof(entrytype, keymember) == 0, \
                        #keymember " must be first member in " #entrytype); \
        (ctl).keysize = sizeof(((entrytype *)0)->keymember); \
        (ctl).entrysize = sizeof(entrytype); \
    } while (0)
"

That way:

- The key member is explicitly referenced in the code (preventing "unused"
false positives)
- The key size is automatically computed from the actual member type (preventing
type mismatches)
- We enforce that the key is at offset 0

An additional benefit: it avoids repeating the "keysize ="  followed by "entrysize ="
in a lot of places in the code (currently about 100 times).

If that sounds like a good idea, I could work on a patch doing so.

Thoughts?

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Safer hash table initialization macro

От

Jelte Fennema-Nio

Дата:

01 декабря 2025 г., 17:44:41

On Mon, 1 Dec 2025 at 14:45, Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
> Thoughts?

I think the hashtable creation API in postgres is so terrible that it
actively discourages usage. At Citus we definitely had the problem
that we would use Lists for cases where a hashtable was preferable
perf wise, just because the API was so terrible.
That's why I eventually implemented some wrapper helper functions to
make it less verbose and error prone to use in by far the most common
patterns we had[1].

So I'm definitely in favor of improving this API (probably by adding a
few new functions). I have a few main thoughts on what could be
improved:

1. Automatically determine keysize and entrysize given a keymember and
entrytype (like you suggested).
2. Autodect most of the flags.
a. HASH_PARTITION, HASH_SEGMENT, HASH_FUNCTION, HASH_DIRSIZE,
HASH_COMPARE, HASH_KEYCOPY, HASH_ALLOC can all be simply be detected
based on the relevant fields from HASHCTL. Passing them in explicitly
is just duplication that causes code noise and is easy to forget
accidentally.
b. HASH_ELEM is useless noise because it is required
c. HASH_BLOBS could be the default (and maybe use HASH_STRINGS by
default if the keytype is char*)
3. Default to CurrentMemoryContext instead of TopMemoryContext. Like
all of the other functions that allocate, because right now it's way
too easy to accidentally use TopMemoryContext when you did not intend
to.
4. Have a creation function that doesn't require HASHCTL at all (just
takes entrytype and keymember and maybe a version of this that takes a
memorycontext).

[1]:
https://github.com/citusdata/citus/blob/ae2eb65be082d52db646b68a644474e24bc6cea1/src/include/distributed/hash_helpers.h#L74

Re: Safer hash table initialization macro

От

Bertrand Drouvot

Дата:

03 декабря 2025 г., 14:17:43

Hi,

On Mon, Dec 01, 2025 at 03:44:41PM +0100, Jelte Fennema-Nio wrote:
> On Mon, 1 Dec 2025 at 14:45, Bertrand Drouvot
> <bertranddrouvot.pg@gmail.com> wrote:
> > Thoughts?
> 
> I think the hashtable creation API in postgres is so terrible that it
> actively discourages usage.

Thanks for sharing your thoughts!

> So I'm definitely in favor of improving this API (probably by adding a
> few new functions). I have a few main thoughts on what could be
> improved:
> 
> 1. Automatically determine keysize and entrysize given a keymember and
> entrytype (like you suggested).

PFA a patch introducing and using the new macro. Note that it also introduces
HASH_ELEM_INIT_FULL for the rare cases where the whole struct is the
key.

Also one case remains untouched:

$ git grep "entrysize = sizeof" "*.c"
src/backend/replication/logical/relation.c:     ctl.entrysize = sizeof(LogicalRepRelMapEntry);

That's because the key is a member of a nested struct so that the new
macro can not be used. As there is only one occurrence of it, I think we can keep
it as it is. But we could create a dedicated macro for those cases if we feel
the need. Now that I'm writing this, that might be a better idea: that way we'd
avoid any "entrysize/keysize = " in the .c files.

Also a nice side effect of using the macros:

138 insertions(+), 203 deletions(-)

> 2. Autodect most of the flags.
>     a. HASH_PARTITION, HASH_SEGMENT, HASH_FUNCTION, HASH_DIRSIZE,
> HASH_COMPARE, HASH_KEYCOPY, HASH_ALLOC can all be simply be detected
> based on the relevant fields from HASHCTL. Passing them in explicitly
> is just duplication that causes code noise and is easy to forget
> accidentally.
>     b. HASH_ELEM is useless noise because it is required
>     c. HASH_BLOBS could be the default (and maybe use HASH_STRINGS by
> default if the keytype is char*)
> 3. Default to CurrentMemoryContext instead of TopMemoryContext. Like
> all of the other functions that allocate, because right now it's way
> too easy to accidentally use TopMemoryContext when you did not intend
> to.
> 4. Have a creation function that doesn't require HASHCTL at all (just
> takes entrytype and keymember and maybe a version of this that takes a
> memorycontext).

Thanks for the above suggestions! I did not think so deep as you did during
your Citus time, but will think about those too. I suggest we move forward one
step at a time, first step being the new macros. Does that make sense to you?

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Вложения

v1-0001-Safer-hash-table-initialization-macro.patch

Re: Safer hash table initialization macro

От

"Jelte Fennema-Nio"

Дата:

04 декабря 2025 г., 18:29:55

On Wed Dec 3, 2025 at 12:17 PM CET, Bertrand Drouvot wrote:
> I suggest we move forward one
> step at a time, first step being the new macros. Does that make sense to you?

Normally I would agree, but in this case I think the new macros you
proposing would become obsolete once we have the better hash table
creation functions I have in mind. And if we're going to update all
places where we create hash tables, I'd rather update them to something
really nice than a small improvement.

I couldn't let it go (nerd-sniped). So here's a patchset that adds some
macros that I think are pretty nice. Including a foreach_hash macro.

I'm a bit on the fence about the C11 _Generic code to determine whether
we should use HASH_BLOBS or HASH_STRINGS based on the type of the key.
It works really nicely in practice, but I'm worried it's a bit too much
magic. Probably we should at least have an override to allow using
HASH_BLOBS anyway for a char array (in case it's not null terminated).

Вложения

Re: Safer hash table initialization macro

От

Thomas Munro

Дата:

05 декабря 2025 г., 04:29:48

On Fri, Dec 5, 2025 at 4:30 AM Jelte Fennema-Nio <postgres@jeltef.nl> wrote:
> I'm a bit on the fence about the C11 _Generic code to determine whether
> we should use HASH_BLOBS or HASH_STRINGS based on the type of the key.
> It works really nicely in practice, but I'm worried it's a bit too much
> magic. Probably we should at least have an override to allow using
> HASH_BLOBS anyway for a char array (in case it's not null terminated).

How much of our header stuff is supposed to work from C++ too?  I
assume that if this is passing cpluspluscheck then it's only because
those _Generic macros aren't being expanded.  I suppose you could
write the typeof-based version you already hinted at, but only use it
for __cplusplus__ (where typeof exists as decltype).

What I mean is, if something like DuckDB (C++ IIRC) wants to actually
use these APIs with the new interfaces, they won't work if the macros
expant to _Generic (not legal C++), but a typeof/decltype-based
version should work just fine, but at the same time I don't think
we'd want to use typeof just because it is available, or we'd never
test the _Generic version: all 3 C compilers have something we can use
for typeof, it's just not standard C before C23, and we want
PostgreSQL to be a valid C11 program and actually have coverage of
those code branches.

Another consideration is what impact we have on the Rust world, and
potentially other languages used for extensions that call C via FFI
etc, if we start using _Generic in our public interfaces, as opposed
to just using it in smaller ways as part of our internal assertions
and C implementation code that doesn't affect extensions.  Of course
people using C APIs have to deal with the complexities of C evolution
too, I'm not saying that's a blocker, I'm just trying to think through
the impact of these sorts of API changes on the ecosystem.

Re: Safer hash table initialization macro

От

Jelte Fennema-Nio

Дата:

05 декабря 2025 г., 17:32:32

On Fri, 5 Dec 2025 at 02:30, Thomas Munro <thomas.munro@gmail.com> wrote:
> How much of our header stuff is supposed to work from C++ too?

I think it's nice if it works, but it doesn't seem the most important.
Especially since C++ has its own hashmaps. And if it really needs to
create a hashmap it's still possible to call the.

> I suppose you could
> write the typeof-based version you already hinted at, but only use it
> for __cplusplus__ (where typeof exists as decltype).

I tried to figure something out that would work in C++ (with help of
Claude), but I wasn't able to create a version of the macros without
also needing to add:

#ifdef __cplusplus
}
#include <type_traits>
extern "C" {
#endif

It seems quite ugly to escape the extern "C" from the parent like that
and then re-enter it. Overall it doesn't seem worth the hassle to me
to make these macros work in C++.

> Another consideration is what impact we have on the Rust world, and
> potentially other languages used for extensions that call C via FFI
> etc

FFI generally cannot call macros anyway, only actual symbols.

Re: Safer hash table initialization macro

От

Thomas Munro

Дата:

06 декабря 2025 г., 03:56:32

On Sat, Dec 6, 2025 at 3:32 AM Jelte Fennema-Nio <postgres@jeltef.nl> wrote:
> On Fri, 5 Dec 2025 at 02:30, Thomas Munro <thomas.munro@gmail.com> wrote:
> > How much of our header stuff is supposed to work from C++ too?
>
> I think it's nice if it works, but it doesn't seem the most important.
> Especially since C++ has its own hashmaps. And if it really needs to
> create a hashmap it's still possible to call the.

... C functions without the helper macros.  Yeah.  That seems OK to me.

> > I suppose you could
> > write the typeof-based version you already hinted at, but only use it
> > for __cplusplus__ (where typeof exists as decltype).
>
> I tried to figure something out that would work in C++ (with help of
> Claude), but I wasn't able to create a version of the macros without
> also needing to add:
>
> #ifdef __cplusplus
> }
> #include <type_traits>
> extern "C" {
> #endif
>
> It seems quite ugly to escape the extern "C" from the parent like that
> and then re-enter it. Overall it doesn't seem worth the hassle to me
> to make these macros work in C++.

Yeah.  I don't think we want that sort of thing all over the place.
We could eventually come up with a small set of tools in a central
place though, so people can work with this stuff without also known
C++ meta-programming voodoo.  For example something like (untested, I
didn't think about char[size], just spitballing here...):

(pg_expr_has_type_p(ptr, char *) || pg_expr_has_type_p(ptr, NameData *))

... given the definition I posted recently[1].

I take your point that it's not really important for this case though.

> > Another consideration is what impact we have on the Rust world, and
> > potentially other languages used for extensions that call C via FFI
> > etc
>
> FFI generally cannot call macros anyway, only actual symbols.

Sure, I was just thinking about how such cross-language usage would be
forced to unpick our macrology and call the underlying C functions
without them.  Doesn't seem like the end of the world anyway, I was
just thinking out loud about the consequences of this phenomenon in
headers.

[1] https://www.postgresql.org/message-id/CA+hUKGL7trhWiJ4qxpksBztMMTWDyPnP1QN+Lq341V7QL775DA@mail.gmail.com

Re: Safer hash table initialization macro

От

"Jelte Fennema-Nio"

Дата:

08 декабря 2025 г., 13:53:02

On Sat Dec 6, 2025 at 1:56 AM CET, Thomas Munro wrote:
> On Sat, Dec 6, 2025 at 3:32 AM Jelte Fennema-Nio <postgres@jeltef.nl> wrote:
>> On Fri, 5 Dec 2025 at 02:30, Thomas Munro <thomas.munro@gmail.com> wrote:
>> create a hashmap it's still possible to call the.
>
> ... C functions without the helper macros.

Oops, forgot to finish that sentence.
>
> Yeah.  I don't think we want that sort of thing all over the place.
> We could eventually come up with a small set of tools in a central
> place though, so people can work with this stuff without also known
> C++ meta-programming voodoo.  For example something like (untested, I
> didn't think about char[size], just spitballing here...):
>
> (pg_expr_has_type_p(ptr, char *) || pg_expr_has_type_p(ptr, NameData *))
>
> ... given the definition I posted recently[1].

Ugh... It would have saved me some time if I'd seen that email before. I
also had no clue that 'extern "C++"' was a thing. Attached is a new
patchset where your proposed macro is used. I also needed a
pg_nullptr_of macro, because the trick that worked in C didn't in C++.

Also, it's probably worth checking out this thread[2]. Especially
because it overlaps significantly with[1].

> Sure, I was just thinking about how such cross-language usage would be
> forced to unpick our macrology and call the underlying C functions
> without them.  Doesn't seem like the end of the world anyway, I was
> just thinking out loud about the consequences of this phenomenon in
> headers.

It's not great, but yeah that's the situation. Stuff like PG_TRY and
PG_CATCH are especially painful to reimplement. Luckily those things
can usually be re-implemented similarly in the target language, so the
macros only need to be disected once.

> [1] https://www.postgresql.org/message-id/CA+hUKGL7trhWiJ4qxpksBztMMTWDyPnP1QN+Lq341V7QL775DA@mail.gmail.com
[2]: https://www.postgresql.org/message-id/flat/CAGECzQR21OnnKiZO_1rLWO0-16kg1JBxnVq-wymYW0-_1cUNtg%40mail.gmail.com

Вложения

Re: Safer hash table initialization macro

От

Bertrand Drouvot

Дата:

09 декабря 2025 г., 10:27:03

Hi,

On Mon, Dec 08, 2025 at 11:53:02AM +0100, Jelte Fennema-Nio wrote:
> On Sat Dec 6, 2025 at 1:56 AM CET, Thomas Munro wrote:
> > On Sat, Dec 6, 2025 at 3:32 AM Jelte Fennema-Nio <postgres@jeltef.nl> wrote:
> > > On Fri, 5 Dec 2025 at 02:30, Thomas Munro <thomas.munro@gmail.com> wrote:
> > > create a hashmap it's still possible to call the.
> > 
> > ... C functions without the helper macros.
> 
> Oops, forgot to finish that sentence.

Thanks for this patch series!

> > Yeah.  I don't think we want that sort of thing all over the place.
> > We could eventually come up with a small set of tools in a central
> > place though, so people can work with this stuff without also known
> > C++ meta-programming voodoo.  For example something like (untested, I
> > didn't think about char[size], just spitballing here...):
> > 
> > (pg_expr_has_type_p(ptr, char *) || pg_expr_has_type_p(ptr, NameData *))
> > 
> > ... given the definition I posted recently[1].

+#if defined(__cplusplus)
+#define pg_expr_has_type_p(expr, type) (std::is_same<decltype(expr), type>::value)
+#else
+#define pg_expr_has_type_p(expr, type) \
+       _Generic((expr), type: 1, default: 0)
+#endif

What about relying on the existing __builtin_types_compatible_p() instead of
_Generic() here? 

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Safer hash table initialization macro

От

Thomas Munro

Дата:

09 декабря 2025 г., 12:10:39

On Tue, Dec 9, 2025 at 8:27 PM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
> On Mon, Dec 08, 2025 at 11:53:02AM +0100, Jelte Fennema-Nio wrote:
> > On Sat Dec 6, 2025 at 1:56 AM CET, Thomas Munro wrote:
> +#if defined(__cplusplus)
> +#define pg_expr_has_type_p(expr, type) (std::is_same<decltype(expr), type>::value)
> +#else
> +#define pg_expr_has_type_p(expr, type) \
> +       _Generic((expr), type: 1, default: 0)
> +#endif
>
> What about relying on the existing __builtin_types_compatible_p() instead of
> _Generic() here?

If we used standard C/C++ it'd work on MSVC too.

Re: Safer hash table initialization macro

От

Jelte Fennema-Nio

Дата:

09 декабря 2025 г., 13:43:01

On Tue, 9 Dec 2025 at 10:11, Thomas Munro <thomas.munro@gmail.com> wrote:
> > What about relying on the existing __builtin_types_compatible_p() instead of
> > _Generic() here?
>
> If we used standard C/C++ it'd work on MSVC too.

And to be clear, that's important because the result of
pg_expr_has_type_p fundamentally impacts the meaning of the code (i.e.
it determines the hash function). Our existing usage of
__builtin_types_compatible_p only adds some *optional* type checking,
so for that it's not critical that it works on MSVC too.

Re: Safer hash table initialization macro

От

Jelte Fennema-Nio

Дата:

09 декабря 2025 г., 13:45:11

On Tue, 9 Dec 2025 at 08:27, Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
> Thanks for this patch series!

To clarify: Does that mean +1 from you on the proposed API?

Re: Safer hash table initialization macro

От

Bertrand Drouvot

Дата:

09 декабря 2025 г., 17:15:30

Hi,

On Tue, Dec 09, 2025 at 11:45:11AM +0100, Jelte Fennema-Nio wrote:
> On Tue, 9 Dec 2025 at 08:27, Bertrand Drouvot
> <bertranddrouvot.pg@gmail.com> wrote:
> > Thanks for this patch series!
> 
> To clarify: Does that mean +1 from you on the proposed API?

Yeah, I think the hash table API needs improvement. I did not look at all the
details of your patches, but from what I have seen I think your API proposal
make sense.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Safer hash table initialization macro

От

"Jelte Fennema-Nio"

Дата:

10 января, 14:32:39

On Tue Dec 9, 2025 at 8:27 AM CET, Bertrand Drouvot wrote:
> Thanks for this patch series!

v4 attached, which fixes some rebase conflicts. Also I moved the patches
that start using the new API to the end of the series, so the
introduction of the new APIs is now done in the first two patches.

Вложения

Re: Safer hash table initialization macro

От

Bertrand Drouvot

Дата:

13 января, 10:27:11

Hi,

On Sat, Jan 10, 2026 at 12:32:39PM +0100, Jelte Fennema-Nio wrote:
> On Tue Dec 9, 2025 at 8:27 AM CET, Bertrand Drouvot wrote:
> > Thanks for this patch series!
> 
> v4 attached, which fixes some rebase conflicts. Also I moved the patches
> that start using the new API to the end of the series, so the
> introduction of the new APIs is now done in the first two patches.

Thanks for the new version!

A few random comments:

=== 1

+#define HASH_PTR_AS_STRING(ptr, size) \
+       (pg_expr_has_type_p((ptr), char(*)[size]) || pg_expr_has_type_p((ptr), NameData *))
+#define HASH_KEY_AS_STRING(entrytype, keymember) \
+       HASH_PTR_AS_STRING(&((entrytype *)0)->keymember, \
+                                          sizeof(((entrytype *)0)->keymember))
+#define HASH_TYPE_AS_STRING(type) \
+       HASH_PTR_AS_STRING(pg_nullptr_of(type), sizeof(type))

I've probably a too paranoid concern: what if someone use char[N] to store
binary stuff with embedded null? That would detect it as string and then
make use of strncmp() and then would provide incorrect result.

While the risk is probably very low, I think it is there.

What about using:

hash_make_str()
hash_make_blob()

or force a flag:

hash_make(..., HASH_STRINGS)
hash_make(..., HASH_BLOBS)

instead? That breaks some of the patch's intend, so I'm not sure it's
worth it given the low probability risk...

=== 2

The patch introduces foreach_hash() but the foreach_hash_with_hash_value()
counterpart seems missing. It could be used in TypeCacheTypCallback() and
InvalidateAttoptCacheCallback().

I think that we could add it or make the new hash_seq_new() accept an extra
hashvalue parameter? (when set to 0 would call hash_seq_init() and
hash_seq_init_with_hash_value() otherwise?

Some minor comments:

=== 3

+extern "C++" {

should be "+extern "C++"
+{"

as per pgindent.

=== 4

+#define pg_nullptr_of(type) ((type *)NULL)

s/(type *)NULL/(type *) NULL/ ? (and in the comment in top of it)

=== 5

+#define foreach_hash(type, var, htab) \
+       for (type *var = 0, *var##__outerloop = (type *) 1; \

s/type *var = 0/type *var = NULL/? (see ec782f56b0c)

=== 6

+           * serialized++ = *value;

s/* serialized/*serialized

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Safer hash table initialization macro

От

"Jelte Fennema-Nio"

Дата:

13 января, 12:31:18

On Tue Jan 13, 2026 at 8:27 AM CET, Bertrand Drouvot wrote:
> I've probably a too paranoid concern: what if someone use char[N] to store
> binary stuff with embedded null? That would detect it as string and then
> make use of strncmp() and then would provide incorrect result.
>
> While the risk is probably very low, I think it is there.

Added a warning in the comment for these macros. For non of our
usages this was the case (i.e. the char arrays were all storing null
terminated strings). So I'm not too worried about this being a problem
in practice. Especially because in most cases there will be no null byte
in the key, and instead you'll start reading out of bounds, which will
cause problems pretty much immediately during development.

> === 2
>
> The patch introduces foreach_hash() but the foreach_hash_with_hash_value()
> counterpart seems missing. It could be used in TypeCacheTypCallback() and
> InvalidateAttoptCacheCallback().
>
> I think that we could add it or make the new hash_seq_new() accept an extra
> hashvalue parameter? (when set to 0 would call hash_seq_init() and
> hash_seq_init_with_hash_value() otherwise?

I'm not sure it's worth adding that macro. Given there's only two usages
of this function inside our codebase, and I don't expect extensions to
use this low level one. Especially, because to make this macro nice to
use in the two cases that it would apply to we'd have to make it treat 0
as a special value.

> Some minor comments:

fixed all of these.

Finally, I converted the last couple of hash_seq_init stragglers (some
I had missed/were added) and others needed the now newly added
foreach_hash_reset macro to be converted.

Вложения

Re: Safer hash table initialization macro

От

Bertrand Drouvot

Дата:

14 января, 10:57:29

Hi,

On Tue, Jan 13, 2026 at 10:31:18AM +0100, Jelte Fennema-Nio wrote:
> On Tue Jan 13, 2026 at 8:27 AM CET, Bertrand Drouvot wrote:
> > I've probably a too paranoid concern: what if someone use char[N] to store
> > binary stuff with embedded null? That would detect it as string and then
> > make use of strncmp() and then would provide incorrect result.
> > 
> > While the risk is probably very low, I think it is there.
> 
> Added a warning in the comment for these macros. For non of our
> usages this was the case (i.e. the char arrays were all storing null
> terminated strings).

Agreed, I did check that too before doing the initial comment.

> So I'm not too worried about this being a problem
> in practice.

I agree, it's very low risk that one adds a new "bad" one in the future. Adding
a comment looks enough then.

+ * WARNING: If you use char[N] to store binary data that is not null-terminated,
+ * the automatic detection will incorrectly treat it as a string and use string
+ * comparison. In such cases, use hash_make_ext() with .force_blobs = true to
+ * override the automatic detection

maybe s/is not null-terminated/may contain null bytes/?

Also, nit, "Note or NOTE" looks more commonly used that "WARNING". We might want
to use that instead.

> Especially because in most cases there will be no null byte
> in the key, and instead you'll start reading out of bounds, which wil
> cause problems pretty much immediately during development.

Agreed.

> Especially, because to make this macro nice to
> use in the two cases that it would apply to we'd have to make it treat 0
> as a special value.

Not necessary, we could also just add a foreach_hash_with_hash_value() to make
things more consistent?

> Finally, I converted the last couple of hash_seq_init stragglers (some
> I had missed/were added) and others needed the now newly added
> foreach_hash_reset macro to be converted.

I see that you added foreach_hash_restart(), I think that makes sense
(even if it's used only in 3 places).

Two more comments:

=== 1

-static void
-cfunc_hashtable_init(void)
-{
-       /* don't allow double-initialization */
-       Assert(cfunc_hashtable == NULL);

Most of the hash_make_fn_cxt() callers check that the destination is already
NULL so that removing the Assert() is not that worrying for them. There are 2
places where it's not the case: InitializeAttoptCache() and build_guc_variables()
, should we add an Assert (or an if check) in them? I think that an Assert is
more appropriate for those 2.

=== 2

"    At the very least we should choose a few places where we use the new
    macros to make sure they have coverage."

I do agree that the refactoring is quite large. I do think there is no rush
to do all the changes at once. We could update a subset of files at a time per
month so that, that would:

- keep changes consistent within each file
- ease the review(s)
- avoid "large" rebases for patches waiting in the commitfest

Thoughts?

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Safer hash table initialization macro

От

"Jelte Fennema-Nio"

Дата:

14 января, 11:46:41

On Wed, 14 Jan 2026 at 08:57, Bertrand Drouvot <bertranddrouvot.pg@gmail.com> wrote:
> + * WARNING: If you use char[N] to store binary data that is not null-terminated,
> + * the automatic detection will incorrectly treat it as a string and use string
> + * comparison. In such cases, use hash_make_ext() with .force_blobs = true to
> + * override the automatic detection
>
> maybe s/is not null-terminated/may contain null bytes/?

Changed wording and changed to NOTE.

> > Especially, because to make this macro nice to
> > use in the two cases that it would apply to we'd have to make it treat 0
> > as a special value.
>
> Not necessary, we could also just add a foreach_hash_with_hash_value() to make
> things more consistent?

It would be more consistent, but the calling code would actually be more
verbose in the two places where we use seq_init_hash_with_hash_value.
Because in both places the code calls it conditionally like this:

    if (hashvalue == 0)
        hash_seq_init(&status, TypeCacheHash);
    else
        hash_seq_init_with_hash_value(&status, TypeCacheHash, hashvalue);

    while ((typentry = (TypeCacheEntry *) hash_seq_search(&status)) != NULL)
    {

So if we don't have the foreach_hash_with_hash_value macro treat 0
special, then we'd need to replace this single while with two for
foreach loops. A foreach_hash for the 0 case and a
foreach_hash_with_hash_value for the non-zero one.

> -static void
> -cfunc_hashtable_init(void)
> -{
> -       /* don't allow double-initialization */
> -       Assert(cfunc_hashtable == NULL);
>
> Most of the hash_make_fn_cxt() callers check that the destination is already
> NULL so that removing the Assert() is not that worrying for them. There are 2
> places where it's not the case: InitializeAttoptCache() and build_guc_variables()
> , should we add an Assert (or an if check) in them? I think that an Assert is
> more appropriate for those 2.

I'm a bit confused about this comment. I didn't change anything for
those two places about their checks for NULL. Did you mean this as a
separate but related improvement to existing code?

(To be clear, the removed Assert that you quoted doesn't matter when
inlining, because it's the only item in an if (!cfunc_hashtable) block)

> === 2
>
> "    At the very least we should choose a few places where we use the new
>     macros to make sure they have coverage."
>
> I do agree that the refactoring is quite large. I do think there is no rush
> to do all the changes at once. We could update a subset of files at a time per
> month so that, that would:
>
> - keep changes consistent within each file
> - ease the review(s)
> - avoid "large" rebases for patches waiting in the commitfest
>
> Thoughts?

As long as the new macro definitions get in, any way a committer thinks
that's sensible is fine by me. e.g. The List foreach macros also haven't
been replaced in bulk, but I'm still very happy that I can use them for
new code.

Вложения

Re: Safer hash table initialization macro

От

Bertrand Drouvot

Дата:

14 января, 12:28:37

Hi,

On Wed, Jan 14, 2026 at 09:46:41AM +0100, Jelte Fennema-Nio wrote:
> On Wed, 14 Jan 2026 at 08:57, Bertrand Drouvot <bertranddrouvot.pg@gmail.com> wrote:
> > + * WARNING: If you use char[N] to store binary data that is not null-terminated,
> > + * the automatic detection will incorrectly treat it as a string and use string
> > + * comparison. In such cases, use hash_make_ext() with .force_blobs = true to
> > + * override the automatic detection
> > 
> > maybe s/is not null-terminated/may contain null bytes/?
> 
> Changed wording and changed to NOTE.

Thanks, looks good.

> So if we don't have the foreach_hash_with_hash_value macro treat 0
> special, then we'd need to replace this single while with two for
> foreach loops. A foreach_hash for the 0 case and a
> foreach_hash_with_hash_value for the non-zero one.

Okay, let's forget about introducing foreach_hash_with_hash_value() then.

> > -static void
> > -cfunc_hashtable_init(void)
> > -{
> > -       /* don't allow double-initialization */
> > -       Assert(cfunc_hashtable == NULL);
> > 
> > Most of the hash_make_fn_cxt() callers check that the destination is already
> > NULL so that removing the Assert() is not that worrying for them. There are 2
> > places where it's not the case: InitializeAttoptCache() and build_guc_variables()
> > , should we add an Assert (or an if check) in them? I think that an Assert is
> > more appropriate for those 2.
> 
> I'm a bit confused about this comment. I didn't change anything for
> those two places about their checks for NULL. Did you mean this as a
> separate but related improvement to existing code?

Agree that you didn't change for NULL check in those places.
I meant that 0003 introduced calling hash_make_fn_cxt() in InitializeAttoptCache()
and build_guc_variables(), which made me think if we could add an Assert while
at it.

> (To be clear, the removed Assert that you quoted doesn't matter when
> inlining, because it's the only item in an if (!cfunc_hashtable) block)

Right, the removed Assert is fine.

> 
> As long as the new macro definitions get in, any way a committer thinks
> that's sensible is fine by me. e.g. The List foreach macros also haven't
> been replaced in bulk, but I'm still very happy that I can use them for
> new code.

+1

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: Safer hash table initialization macro

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения