Обсуждение: Safer hash table initialization macro
Hi hackers,
Currently to create a hash table we do things like:
A) create a struct, say:
typedef struct SeenRelsEntry
{
Oid rel_id;
int list_index;
} SeenRelsEntry;
where the first member is the hash key, and then later:
B)
ctl.keysize = sizeof(Oid);
ctl.entrysize = sizeof(SeenRelsEntry);
ctl.hcxt = CurrentMemoryContext;
seen_rels = hash_create("find_all_inheritors temporary table",
32, /* start small and extend */
&ctl,
I can see 2 possible issues:
1)
We manually specify the type for keysize, which could become incorrect (from the
start) or if the key member's type changes.
2)
It may be possible to remove the key member without the compiler noticing it.
Take this example and remove:
diff --git a/src/backend/catalog/pg_inherits.c b/src/backend/catalog/pg_inherits.c
index 929bb53b620..eb11976afef 100644
--- a/src/backend/catalog/pg_inherits.c
+++ b/src/backend/catalog/pg_inherits.c
@@ -36,7 +36,6 @@
*/
typedef struct SeenRelsEntry
{
- Oid rel_id; /* relation oid */
int list_index; /* its position in output list(s) */
} SeenRelsEntry;
That would compile without any issues because this rel_id member is not
referenced in the code (for this particular example). That's rare but possible.
But then, on my machine, during make check:
TRAP: failed Assert("!found"), File: "nodeModifyTable.c", Line: 5157, PID: 140430
The reason is that the struct member access is done only for bytes level
operations (within the hash related macros). So it's easy to think that this
member is unused (because it is not referenced in the code).
I'm thinking about what kind of safety we could put in place to better deal with
1) and 2).
What about adding a macro that:
- requests the key member name
- ensures that it is at offset 0
- computes the key size based on the member
Something like:
"
#define HASH_ELEM_INIT(ctl, entrytype, keymember) \
do { \
StaticAssertStmt(offsetof(entrytype, keymember) == 0, \
#keymember " must be first member in " #entrytype); \
(ctl).keysize = sizeof(((entrytype *)0)->keymember); \
(ctl).entrysize = sizeof(entrytype); \
} while (0)
"
That way:
- The key member is explicitly referenced in the code (preventing "unused"
false positives)
- The key size is automatically computed from the actual member type (preventing
type mismatches)
- We enforce that the key is at offset 0
An additional benefit: it avoids repeating the "keysize =" followed by "entrysize ="
in a lot of places in the code (currently about 100 times).
If that sounds like a good idea, I could work on a patch doing so.
Thoughts?
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
On Mon, 1 Dec 2025 at 14:45, Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
> Thoughts?
I think the hashtable creation API in postgres is so terrible that it
actively discourages usage. At Citus we definitely had the problem
that we would use Lists for cases where a hashtable was preferable
perf wise, just because the API was so terrible.
That's why I eventually implemented some wrapper helper functions to
make it less verbose and error prone to use in by far the most common
patterns we had[1].
So I'm definitely in favor of improving this API (probably by adding a
few new functions). I have a few main thoughts on what could be
improved:
1. Automatically determine keysize and entrysize given a keymember and
entrytype (like you suggested).
2. Autodect most of the flags.
a. HASH_PARTITION, HASH_SEGMENT, HASH_FUNCTION, HASH_DIRSIZE,
HASH_COMPARE, HASH_KEYCOPY, HASH_ALLOC can all be simply be detected
based on the relevant fields from HASHCTL. Passing them in explicitly
is just duplication that causes code noise and is easy to forget
accidentally.
b. HASH_ELEM is useless noise because it is required
c. HASH_BLOBS could be the default (and maybe use HASH_STRINGS by
default if the keytype is char*)
3. Default to CurrentMemoryContext instead of TopMemoryContext. Like
all of the other functions that allocate, because right now it's way
too easy to accidentally use TopMemoryContext when you did not intend
to.
4. Have a creation function that doesn't require HASHCTL at all (just
takes entrytype and keymember and maybe a version of this that takes a
memorycontext).
[1]:
https://github.com/citusdata/citus/blob/ae2eb65be082d52db646b68a644474e24bc6cea1/src/include/distributed/hash_helpers.h#L74
Hi, On Mon, Dec 01, 2025 at 03:44:41PM +0100, Jelte Fennema-Nio wrote: > On Mon, 1 Dec 2025 at 14:45, Bertrand Drouvot > <bertranddrouvot.pg@gmail.com> wrote: > > Thoughts? > > I think the hashtable creation API in postgres is so terrible that it > actively discourages usage. Thanks for sharing your thoughts! > So I'm definitely in favor of improving this API (probably by adding a > few new functions). I have a few main thoughts on what could be > improved: > > 1. Automatically determine keysize and entrysize given a keymember and > entrytype (like you suggested). PFA a patch introducing and using the new macro. Note that it also introduces HASH_ELEM_INIT_FULL for the rare cases where the whole struct is the key. Also one case remains untouched: $ git grep "entrysize = sizeof" "*.c" src/backend/replication/logical/relation.c: ctl.entrysize = sizeof(LogicalRepRelMapEntry); That's because the key is a member of a nested struct so that the new macro can not be used. As there is only one occurrence of it, I think we can keep it as it is. But we could create a dedicated macro for those cases if we feel the need. Now that I'm writing this, that might be a better idea: that way we'd avoid any "entrysize/keysize = " in the .c files. Also a nice side effect of using the macros: 138 insertions(+), 203 deletions(-) > 2. Autodect most of the flags. > a. HASH_PARTITION, HASH_SEGMENT, HASH_FUNCTION, HASH_DIRSIZE, > HASH_COMPARE, HASH_KEYCOPY, HASH_ALLOC can all be simply be detected > based on the relevant fields from HASHCTL. Passing them in explicitly > is just duplication that causes code noise and is easy to forget > accidentally. > b. HASH_ELEM is useless noise because it is required > c. HASH_BLOBS could be the default (and maybe use HASH_STRINGS by > default if the keytype is char*) > 3. Default to CurrentMemoryContext instead of TopMemoryContext. Like > all of the other functions that allocate, because right now it's way > too easy to accidentally use TopMemoryContext when you did not intend > to. > 4. Have a creation function that doesn't require HASHCTL at all (just > takes entrytype and keymember and maybe a version of this that takes a > memorycontext). Thanks for the above suggestions! I did not think so deep as you did during your Citus time, but will think about those too. I suggest we move forward one step at a time, first step being the new macros. Does that make sense to you? Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Вложения
On Wed Dec 3, 2025 at 12:17 PM CET, Bertrand Drouvot wrote: > I suggest we move forward one > step at a time, first step being the new macros. Does that make sense to you? Normally I would agree, but in this case I think the new macros you proposing would become obsolete once we have the better hash table creation functions I have in mind. And if we're going to update all places where we create hash tables, I'd rather update them to something really nice than a small improvement. I couldn't let it go (nerd-sniped). So here's a patchset that adds some macros that I think are pretty nice. Including a foreach_hash macro. I'm a bit on the fence about the C11 _Generic code to determine whether we should use HASH_BLOBS or HASH_STRINGS based on the type of the key. It works really nicely in practice, but I'm worried it's a bit too much magic. Probably we should at least have an override to allow using HASH_BLOBS anyway for a char array (in case it's not null terminated).
Вложения
On Fri, Dec 5, 2025 at 4:30 AM Jelte Fennema-Nio <postgres@jeltef.nl> wrote: > I'm a bit on the fence about the C11 _Generic code to determine whether > we should use HASH_BLOBS or HASH_STRINGS based on the type of the key. > It works really nicely in practice, but I'm worried it's a bit too much > magic. Probably we should at least have an override to allow using > HASH_BLOBS anyway for a char array (in case it's not null terminated). How much of our header stuff is supposed to work from C++ too? I assume that if this is passing cpluspluscheck then it's only because those _Generic macros aren't being expanded. I suppose you could write the typeof-based version you already hinted at, but only use it for __cplusplus__ (where typeof exists as decltype). What I mean is, if something like DuckDB (C++ IIRC) wants to actually use these APIs with the new interfaces, they won't work if the macros expant to _Generic (not legal C++), but a typeof/decltype-based version should work just fine, but at the same time I don't think we'd want to use typeof just because it is available, or we'd never test the _Generic version: all 3 C compilers have something we can use for typeof, it's just not standard C before C23, and we want PostgreSQL to be a valid C11 program and actually have coverage of those code branches. Another consideration is what impact we have on the Rust world, and potentially other languages used for extensions that call C via FFI etc, if we start using _Generic in our public interfaces, as opposed to just using it in smaller ways as part of our internal assertions and C implementation code that doesn't affect extensions. Of course people using C APIs have to deal with the complexities of C evolution too, I'm not saying that's a blocker, I'm just trying to think through the impact of these sorts of API changes on the ecosystem.
On Fri, 5 Dec 2025 at 02:30, Thomas Munro <thomas.munro@gmail.com> wrote:
> How much of our header stuff is supposed to work from C++ too?
I think it's nice if it works, but it doesn't seem the most important.
Especially since C++ has its own hashmaps. And if it really needs to
create a hashmap it's still possible to call the.
> I suppose you could
> write the typeof-based version you already hinted at, but only use it
> for __cplusplus__ (where typeof exists as decltype).
I tried to figure something out that would work in C++ (with help of
Claude), but I wasn't able to create a version of the macros without
also needing to add:
#ifdef __cplusplus
}
#include <type_traits>
extern "C" {
#endif
It seems quite ugly to escape the extern "C" from the parent like that
and then re-enter it. Overall it doesn't seem worth the hassle to me
to make these macros work in C++.
> Another consideration is what impact we have on the Rust world, and
> potentially other languages used for extensions that call C via FFI
> etc
FFI generally cannot call macros anyway, only actual symbols.
On Sat, Dec 6, 2025 at 3:32 AM Jelte Fennema-Nio <postgres@jeltef.nl> wrote:
> On Fri, 5 Dec 2025 at 02:30, Thomas Munro <thomas.munro@gmail.com> wrote:
> > How much of our header stuff is supposed to work from C++ too?
>
> I think it's nice if it works, but it doesn't seem the most important.
> Especially since C++ has its own hashmaps. And if it really needs to
> create a hashmap it's still possible to call the.
... C functions without the helper macros. Yeah. That seems OK to me.
> > I suppose you could
> > write the typeof-based version you already hinted at, but only use it
> > for __cplusplus__ (where typeof exists as decltype).
>
> I tried to figure something out that would work in C++ (with help of
> Claude), but I wasn't able to create a version of the macros without
> also needing to add:
>
> #ifdef __cplusplus
> }
> #include <type_traits>
> extern "C" {
> #endif
>
> It seems quite ugly to escape the extern "C" from the parent like that
> and then re-enter it. Overall it doesn't seem worth the hassle to me
> to make these macros work in C++.
Yeah. I don't think we want that sort of thing all over the place.
We could eventually come up with a small set of tools in a central
place though, so people can work with this stuff without also known
C++ meta-programming voodoo. For example something like (untested, I
didn't think about char[size], just spitballing here...):
(pg_expr_has_type_p(ptr, char *) || pg_expr_has_type_p(ptr, NameData *))
... given the definition I posted recently[1].
I take your point that it's not really important for this case though.
> > Another consideration is what impact we have on the Rust world, and
> > potentially other languages used for extensions that call C via FFI
> > etc
>
> FFI generally cannot call macros anyway, only actual symbols.
Sure, I was just thinking about how such cross-language usage would be
forced to unpick our macrology and call the underlying C functions
without them. Doesn't seem like the end of the world anyway, I was
just thinking out loud about the consequences of this phenomenon in
headers.
[1] https://www.postgresql.org/message-id/CA+hUKGL7trhWiJ4qxpksBztMMTWDyPnP1QN+Lq341V7QL775DA@mail.gmail.com
On Sat Dec 6, 2025 at 1:56 AM CET, Thomas Munro wrote: > On Sat, Dec 6, 2025 at 3:32 AM Jelte Fennema-Nio <postgres@jeltef.nl> wrote: >> On Fri, 5 Dec 2025 at 02:30, Thomas Munro <thomas.munro@gmail.com> wrote: >> create a hashmap it's still possible to call the. > > ... C functions without the helper macros. Oops, forgot to finish that sentence. > > Yeah. I don't think we want that sort of thing all over the place. > We could eventually come up with a small set of tools in a central > place though, so people can work with this stuff without also known > C++ meta-programming voodoo. For example something like (untested, I > didn't think about char[size], just spitballing here...): > > (pg_expr_has_type_p(ptr, char *) || pg_expr_has_type_p(ptr, NameData *)) > > ... given the definition I posted recently[1]. Ugh... It would have saved me some time if I'd seen that email before. I also had no clue that 'extern "C++"' was a thing. Attached is a new patchset where your proposed macro is used. I also needed a pg_nullptr_of macro, because the trick that worked in C didn't in C++. Also, it's probably worth checking out this thread[2]. Especially because it overlaps significantly with[1]. > Sure, I was just thinking about how such cross-language usage would be > forced to unpick our macrology and call the underlying C functions > without them. Doesn't seem like the end of the world anyway, I was > just thinking out loud about the consequences of this phenomenon in > headers. It's not great, but yeah that's the situation. Stuff like PG_TRY and PG_CATCH are especially painful to reimplement. Luckily those things can usually be re-implemented similarly in the target language, so the macros only need to be disected once. > [1] https://www.postgresql.org/message-id/CA+hUKGL7trhWiJ4qxpksBztMMTWDyPnP1QN+Lq341V7QL775DA@mail.gmail.com [2]: https://www.postgresql.org/message-id/flat/CAGECzQR21OnnKiZO_1rLWO0-16kg1JBxnVq-wymYW0-_1cUNtg%40mail.gmail.com
Вложения
Hi, On Mon, Dec 08, 2025 at 11:53:02AM +0100, Jelte Fennema-Nio wrote: > On Sat Dec 6, 2025 at 1:56 AM CET, Thomas Munro wrote: > > On Sat, Dec 6, 2025 at 3:32 AM Jelte Fennema-Nio <postgres@jeltef.nl> wrote: > > > On Fri, 5 Dec 2025 at 02:30, Thomas Munro <thomas.munro@gmail.com> wrote: > > > create a hashmap it's still possible to call the. > > > > ... C functions without the helper macros. > > Oops, forgot to finish that sentence. Thanks for this patch series! > > Yeah. I don't think we want that sort of thing all over the place. > > We could eventually come up with a small set of tools in a central > > place though, so people can work with this stuff without also known > > C++ meta-programming voodoo. For example something like (untested, I > > didn't think about char[size], just spitballing here...): > > > > (pg_expr_has_type_p(ptr, char *) || pg_expr_has_type_p(ptr, NameData *)) > > > > ... given the definition I posted recently[1]. +#if defined(__cplusplus) +#define pg_expr_has_type_p(expr, type) (std::is_same<decltype(expr), type>::value) +#else +#define pg_expr_has_type_p(expr, type) \ + _Generic((expr), type: 1, default: 0) +#endif What about relying on the existing __builtin_types_compatible_p() instead of _Generic() here? Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
On Tue, Dec 9, 2025 at 8:27 PM Bertrand Drouvot <bertranddrouvot.pg@gmail.com> wrote: > On Mon, Dec 08, 2025 at 11:53:02AM +0100, Jelte Fennema-Nio wrote: > > On Sat Dec 6, 2025 at 1:56 AM CET, Thomas Munro wrote: > +#if defined(__cplusplus) > +#define pg_expr_has_type_p(expr, type) (std::is_same<decltype(expr), type>::value) > +#else > +#define pg_expr_has_type_p(expr, type) \ > + _Generic((expr), type: 1, default: 0) > +#endif > > What about relying on the existing __builtin_types_compatible_p() instead of > _Generic() here? If we used standard C/C++ it'd work on MSVC too.
On Tue, 9 Dec 2025 at 10:11, Thomas Munro <thomas.munro@gmail.com> wrote: > > What about relying on the existing __builtin_types_compatible_p() instead of > > _Generic() here? > > If we used standard C/C++ it'd work on MSVC too. And to be clear, that's important because the result of pg_expr_has_type_p fundamentally impacts the meaning of the code (i.e. it determines the hash function). Our existing usage of __builtin_types_compatible_p only adds some *optional* type checking, so for that it's not critical that it works on MSVC too.
On Tue, 9 Dec 2025 at 08:27, Bertrand Drouvot <bertranddrouvot.pg@gmail.com> wrote: > Thanks for this patch series! To clarify: Does that mean +1 from you on the proposed API?
Hi, On Tue, Dec 09, 2025 at 11:45:11AM +0100, Jelte Fennema-Nio wrote: > On Tue, 9 Dec 2025 at 08:27, Bertrand Drouvot > <bertranddrouvot.pg@gmail.com> wrote: > > Thanks for this patch series! > > To clarify: Does that mean +1 from you on the proposed API? Yeah, I think the hash table API needs improvement. I did not look at all the details of your patches, but from what I have seen I think your API proposal make sense. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
On Tue Dec 9, 2025 at 8:27 AM CET, Bertrand Drouvot wrote: > Thanks for this patch series! v4 attached, which fixes some rebase conflicts. Also I moved the patches that start using the new API to the end of the series, so the introduction of the new APIs is now done in the first two patches.
Вложения
Hi,
On Sat, Jan 10, 2026 at 12:32:39PM +0100, Jelte Fennema-Nio wrote:
> On Tue Dec 9, 2025 at 8:27 AM CET, Bertrand Drouvot wrote:
> > Thanks for this patch series!
>
> v4 attached, which fixes some rebase conflicts. Also I moved the patches
> that start using the new API to the end of the series, so the
> introduction of the new APIs is now done in the first two patches.
Thanks for the new version!
A few random comments:
=== 1
+#define HASH_PTR_AS_STRING(ptr, size) \
+ (pg_expr_has_type_p((ptr), char(*)[size]) || pg_expr_has_type_p((ptr), NameData *))
+#define HASH_KEY_AS_STRING(entrytype, keymember) \
+ HASH_PTR_AS_STRING(&((entrytype *)0)->keymember, \
+ sizeof(((entrytype *)0)->keymember))
+#define HASH_TYPE_AS_STRING(type) \
+ HASH_PTR_AS_STRING(pg_nullptr_of(type), sizeof(type))
I've probably a too paranoid concern: what if someone use char[N] to store
binary stuff with embedded null? That would detect it as string and then
make use of strncmp() and then would provide incorrect result.
While the risk is probably very low, I think it is there.
What about using:
hash_make_str()
hash_make_blob()
or force a flag:
hash_make(..., HASH_STRINGS)
hash_make(..., HASH_BLOBS)
instead? That breaks some of the patch's intend, so I'm not sure it's
worth it given the low probability risk...
=== 2
The patch introduces foreach_hash() but the foreach_hash_with_hash_value()
counterpart seems missing. It could be used in TypeCacheTypCallback() and
InvalidateAttoptCacheCallback().
I think that we could add it or make the new hash_seq_new() accept an extra
hashvalue parameter? (when set to 0 would call hash_seq_init() and
hash_seq_init_with_hash_value() otherwise?
Some minor comments:
=== 3
+extern "C++" {
should be "+extern "C++"
+{"
as per pgindent.
=== 4
+#define pg_nullptr_of(type) ((type *)NULL)
s/(type *)NULL/(type *) NULL/ ? (and in the comment in top of it)
=== 5
+#define foreach_hash(type, var, htab) \
+ for (type *var = 0, *var##__outerloop = (type *) 1; \
s/type *var = 0/type *var = NULL/? (see ec782f56b0c)
=== 6
+ * serialized++ = *value;
s/* serialized/*serialized
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
On Tue Jan 13, 2026 at 8:27 AM CET, Bertrand Drouvot wrote: > I've probably a too paranoid concern: what if someone use char[N] to store > binary stuff with embedded null? That would detect it as string and then > make use of strncmp() and then would provide incorrect result. > > While the risk is probably very low, I think it is there. Added a warning in the comment for these macros. For non of our usages this was the case (i.e. the char arrays were all storing null terminated strings). So I'm not too worried about this being a problem in practice. Especially because in most cases there will be no null byte in the key, and instead you'll start reading out of bounds, which will cause problems pretty much immediately during development. > === 2 > > The patch introduces foreach_hash() but the foreach_hash_with_hash_value() > counterpart seems missing. It could be used in TypeCacheTypCallback() and > InvalidateAttoptCacheCallback(). > > I think that we could add it or make the new hash_seq_new() accept an extra > hashvalue parameter? (when set to 0 would call hash_seq_init() and > hash_seq_init_with_hash_value() otherwise? I'm not sure it's worth adding that macro. Given there's only two usages of this function inside our codebase, and I don't expect extensions to use this low level one. Especially, because to make this macro nice to use in the two cases that it would apply to we'd have to make it treat 0 as a special value. > Some minor comments: fixed all of these. Finally, I converted the last couple of hash_seq_init stragglers (some I had missed/were added) and others needed the now newly added foreach_hash_reset macro to be converted.
Вложения
Hi,
On Tue, Jan 13, 2026 at 10:31:18AM +0100, Jelte Fennema-Nio wrote:
> On Tue Jan 13, 2026 at 8:27 AM CET, Bertrand Drouvot wrote:
> > I've probably a too paranoid concern: what if someone use char[N] to store
> > binary stuff with embedded null? That would detect it as string and then
> > make use of strncmp() and then would provide incorrect result.
> >
> > While the risk is probably very low, I think it is there.
>
> Added a warning in the comment for these macros. For non of our
> usages this was the case (i.e. the char arrays were all storing null
> terminated strings).
Agreed, I did check that too before doing the initial comment.
> So I'm not too worried about this being a problem
> in practice.
I agree, it's very low risk that one adds a new "bad" one in the future. Adding
a comment looks enough then.
+ * WARNING: If you use char[N] to store binary data that is not null-terminated,
+ * the automatic detection will incorrectly treat it as a string and use string
+ * comparison. In such cases, use hash_make_ext() with .force_blobs = true to
+ * override the automatic detection
maybe s/is not null-terminated/may contain null bytes/?
Also, nit, "Note or NOTE" looks more commonly used that "WARNING". We might want
to use that instead.
> Especially because in most cases there will be no null byte
> in the key, and instead you'll start reading out of bounds, which wil
> cause problems pretty much immediately during development.
Agreed.
> Especially, because to make this macro nice to
> use in the two cases that it would apply to we'd have to make it treat 0
> as a special value.
Not necessary, we could also just add a foreach_hash_with_hash_value() to make
things more consistent?
> Finally, I converted the last couple of hash_seq_init stragglers (some
> I had missed/were added) and others needed the now newly added
> foreach_hash_reset macro to be converted.
I see that you added foreach_hash_restart(), I think that makes sense
(even if it's used only in 3 places).
Two more comments:
=== 1
-static void
-cfunc_hashtable_init(void)
-{
- /* don't allow double-initialization */
- Assert(cfunc_hashtable == NULL);
Most of the hash_make_fn_cxt() callers check that the destination is already
NULL so that removing the Assert() is not that worrying for them. There are 2
places where it's not the case: InitializeAttoptCache() and build_guc_variables()
, should we add an Assert (or an if check) in them? I think that an Assert is
more appropriate for those 2.
=== 2
" At the very least we should choose a few places where we use the new
macros to make sure they have coverage."
I do agree that the refactoring is quite large. I do think there is no rush
to do all the changes at once. We could update a subset of files at a time per
month so that, that would:
- keep changes consistent within each file
- ease the review(s)
- avoid "large" rebases for patches waiting in the commitfest
Thoughts?
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
On Wed, 14 Jan 2026 at 08:57, Bertrand Drouvot <bertranddrouvot.pg@gmail.com> wrote:
> + * WARNING: If you use char[N] to store binary data that is not null-terminated,
> + * the automatic detection will incorrectly treat it as a string and use string
> + * comparison. In such cases, use hash_make_ext() with .force_blobs = true to
> + * override the automatic detection
>
> maybe s/is not null-terminated/may contain null bytes/?
Changed wording and changed to NOTE.
> > Especially, because to make this macro nice to
> > use in the two cases that it would apply to we'd have to make it treat 0
> > as a special value.
>
> Not necessary, we could also just add a foreach_hash_with_hash_value() to make
> things more consistent?
It would be more consistent, but the calling code would actually be more
verbose in the two places where we use seq_init_hash_with_hash_value.
Because in both places the code calls it conditionally like this:
if (hashvalue == 0)
hash_seq_init(&status, TypeCacheHash);
else
hash_seq_init_with_hash_value(&status, TypeCacheHash, hashvalue);
while ((typentry = (TypeCacheEntry *) hash_seq_search(&status)) != NULL)
{
So if we don't have the foreach_hash_with_hash_value macro treat 0
special, then we'd need to replace this single while with two for
foreach loops. A foreach_hash for the 0 case and a
foreach_hash_with_hash_value for the non-zero one.
> -static void
> -cfunc_hashtable_init(void)
> -{
> - /* don't allow double-initialization */
> - Assert(cfunc_hashtable == NULL);
>
> Most of the hash_make_fn_cxt() callers check that the destination is already
> NULL so that removing the Assert() is not that worrying for them. There are 2
> places where it's not the case: InitializeAttoptCache() and build_guc_variables()
> , should we add an Assert (or an if check) in them? I think that an Assert is
> more appropriate for those 2.
I'm a bit confused about this comment. I didn't change anything for
those two places about their checks for NULL. Did you mean this as a
separate but related improvement to existing code?
(To be clear, the removed Assert that you quoted doesn't matter when
inlining, because it's the only item in an if (!cfunc_hashtable) block)
> === 2
>
> " At the very least we should choose a few places where we use the new
> macros to make sure they have coverage."
>
> I do agree that the refactoring is quite large. I do think there is no rush
> to do all the changes at once. We could update a subset of files at a time per
> month so that, that would:
>
> - keep changes consistent within each file
> - ease the review(s)
> - avoid "large" rebases for patches waiting in the commitfest
>
> Thoughts?
As long as the new macro definitions get in, any way a committer thinks
that's sensible is fine by me. e.g. The List foreach macros also haven't
been replaced in bulk, but I'm still very happy that I can use them for
new code.
Вложения
Hi,
On Wed, Jan 14, 2026 at 09:46:41AM +0100, Jelte Fennema-Nio wrote:
> On Wed, 14 Jan 2026 at 08:57, Bertrand Drouvot <bertranddrouvot.pg@gmail.com> wrote:
> > + * WARNING: If you use char[N] to store binary data that is not null-terminated,
> > + * the automatic detection will incorrectly treat it as a string and use string
> > + * comparison. In such cases, use hash_make_ext() with .force_blobs = true to
> > + * override the automatic detection
> >
> > maybe s/is not null-terminated/may contain null bytes/?
>
> Changed wording and changed to NOTE.
Thanks, looks good.
> So if we don't have the foreach_hash_with_hash_value macro treat 0
> special, then we'd need to replace this single while with two for
> foreach loops. A foreach_hash for the 0 case and a
> foreach_hash_with_hash_value for the non-zero one.
Okay, let's forget about introducing foreach_hash_with_hash_value() then.
> > -static void
> > -cfunc_hashtable_init(void)
> > -{
> > - /* don't allow double-initialization */
> > - Assert(cfunc_hashtable == NULL);
> >
> > Most of the hash_make_fn_cxt() callers check that the destination is already
> > NULL so that removing the Assert() is not that worrying for them. There are 2
> > places where it's not the case: InitializeAttoptCache() and build_guc_variables()
> > , should we add an Assert (or an if check) in them? I think that an Assert is
> > more appropriate for those 2.
>
> I'm a bit confused about this comment. I didn't change anything for
> those two places about their checks for NULL. Did you mean this as a
> separate but related improvement to existing code?
Agree that you didn't change for NULL check in those places.
I meant that 0003 introduced calling hash_make_fn_cxt() in InitializeAttoptCache()
and build_guc_variables(), which made me think if we could add an Assert while
at it.
> (To be clear, the removed Assert that you quoted doesn't matter when
> inlining, because it's the only item in an if (!cfunc_hashtable) block)
Right, the removed Assert is fine.
>
> As long as the new macro definitions get in, any way a committer thinks
> that's sensible is fine by me. e.g. The List foreach macros also haven't
> been replaced in bulk, but I'm still very happy that I can use them for
> new code.
+1
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com