Re: Reducing memory consumed by RestrictInfo list translations in partitionwise join planning
От | Ashutosh Bapat |
---|---|
Тема | Re: Reducing memory consumed by RestrictInfo list translations in partitionwise join planning |
Дата | |
Msg-id | CAExHW5tOd00=66=fASn=0ntGBnanQuYe5v7rBHymBn7MT6MbcQ@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Reducing memory consumed by RestrictInfo list translations in partitionwise join planning (David Rowley <dgrowleyml@gmail.com>) |
Ответы |
Re: Reducing memory consumed by RestrictInfo list translations in partitionwise join planning
|
Список | pgsql-hackers |
On Mon, Mar 31, 2025 at 8:27 AM David Rowley <dgrowleyml@gmail.com> wrote: > > On Sat, 29 Mar 2025 at 05:46, Ashutosh Bapat > <ashutosh.bapat.oss@gmail.com> wrote: > > PFA patches. 0001 and 0002 are the same as the previous set. 0003 > > changes the initial hash table size to the length of ec_derives. > > I'm just not following the logic in making it the length of the > ec_derives List. If you have 32 buckets and try to insert 32 elements, > you're guaranteed to need a resize after inserting 28 elements. See > the grow_threshold logic SH_UPDATE_PARAMETERS(). The point of making > it 64 was to ensure the table is never unnecessarily sparse and to > also ensure we make it at least big enough for the minimum number of > ec_derives that we're about to insert. If I am reading SH_CREATE correctly, it will initially set the size to 32/.9 = 36 and in SH_UPDATE_PARAMETERS will set grow_threshold to 36 * .9 = 32. So it will leave some room even after inserting the initial elements. But looking at SH_INSERT_HASH_INTERNAL(), it will soon expand the table even if there's space. We certainly need a size much more than 32. 32 is an arbitrary/empirical threshold to create a hash table. Using that threshold as initial hash table size means the table size would be arbitrary too. Using twice the length of ec_derives_list seems more reasonable since the length will decide the initial number of entries. > > Looking more closely at the patch's ec_add_clause_to_derives_hash() > function, I see you're actually making two hash table entries for each > RestrictInfo, so without any em_is_const members, you'll insert 64 > entries into the hash table with a ec_derives list of 32, in which > case 64 buckets isn't enough and the table will end up growing to 128 > elements. > Yes, that's right. > I think you'd be better off coming up with some logic like putting the > lowest pointer value'd EM first in the key and ensure that all lookups > do that too by wrapping the lookups in some helper function. That'll > half the number of hash table entries at the cost of some very cheap > comparisons. That's a good suggestion. I searched for C standard documentation which specifies that the pointer comparison, especially inequality, is stable and safe to use. But I didn't find any. While according to the C standard, the result of comparison between pointers within the same array or a struct is specified, that between pointers from two different objects is unspecified. The existing code relies on the EM pointers being stable and also relies on equality between them to be stable. It has withstood the test of time and a variety of compilers. Hence I think it should be safe to rely on pointer comparisons being stable. But since I didn't find any documentation which confirms it, I have left those changes as a separate patch. Some internet sources discussing pointer comparison can be found at [1], [2] (which mentions the C standard but doesn't provide a link). PFA the next patchset 0001, 0002 are same as in the previous set 0003 changes the initial hash table size to length(ec->ec_derives_list) * 4 to accomodate for commuted entries. 0004 uses canonical keys per your suggestion and also reduces the initial hash table size to length(ec->ec_derives_list) * 2. When the number of initial elements to be added to the hash table is known, I see users of simplehash.h using that as an estimate rather than the nearest power of two. Hence following the logic here. [1] https://stackoverflow.com/questions/59516346/how-does-pointer-comparison-work-in-c-is-it-ok-to-compare-pointers-that-dont-p [2] https://www.gnu.org/software/c-intro-and-ref/manual/html_node/Pointer-Comparison.html -- Best Wishes, Ashutosh Bapat
Вложения
В списке pgsql-hackers по дате отправления: