Re: Reduce TupleHashEntryData struct size by half
От | Heikki Linnakangas |
---|---|
Тема | Re: Reduce TupleHashEntryData struct size by half |
Дата | |
Msg-id | 0f3e7407-fb19-42c0-ad10-46eace6c1399@iki.fi обсуждение исходный текст |
Ответы |
Re: Reduce TupleHashEntryData struct size by half
|
Список | pgsql-hackers |
On 18/11/2024 02:01, Jeff Davis wrote: > > TupleHashEntryData is the size of a hash bucket, and it's currently 24 > bytes. The size of the structure is important for HashAgg and other > callers that can build up large hashtables of small tuples. Waste in > this structure is even worse when the hash table is sparse, because the > space is consumed whether the bucket is occupied or not. > > The attached patch series brings it down to 12. There are several > unappealing hacks, so ideally we'd find something cleaner, but halving > the bucket size seems quite desirable if it can be done in an > acceptable way. Makes sense. > 0001: This patch makes the structure private, so that it's easier to > control the way the structure is accessed. Seems pretty uncontroversial. You removed the typedef for struct TupleHashEntryData, which is a bit unusual for our usual source style. Was there a reason for that? > 0002: Removes the "additional" pointer, instead storing it right after > the tuple, which is already stored in a separate chunk. Hack: this > crams the "additional" pointer after the MinimalTuple in the same chunk > of memory to avoid adding additional palloc()s. Hmm, it would seem more straightforward to store it in the beginning, i.e. have something like this: struct { void *additional; MinimalTupleData mtup; } ; Come to think of it, how important is it that we use MinimalTuple here at all? Some other representation could be faster to deal with in TupleHashTableMatch() anyway. > 0003: simplehash: allow the caller to decide which entries are empty vs > in-use rather than requiring a separate "status" field. This may limit > other possible status values in the future (i.e. adding on to the > enum), but I'm not sure what those other stutus values might be. +1. I've wanted to have this in the past. > 0004: Removes the "status" field from TupleHashEntryData, using > firstTuple==NULL to mean "empty", otherwise "in use". Hack: need an > additional "special" pointer value to mean "input slot" now that NULL > means "empty". +1 > 0005: Pack TupleHashEntryData. IIUC, this is fine even on machines that > can't do unaligned access, so long as we are accessing the fields > through the struct, and not taking the address of individual members. Seems OK. I wonder if the compiler understands that the elements are still 4-byte aligned, or if it forces byte-per-byte access? Playing with godbolt a little, it seems like GCC at least understands it, but clang does not. On architectures with non-strict alignment, it doesn't matter as a simple load/store instruction is the fastest option anyway. -- Heikki Linnakangas Neon (https://neon.tech)
В списке pgsql-hackers по дате отправления: