Обсуждение: Add bms_offset_members() function for bitshifting Bitmapsets
(v20 material) While working on some new code which required offsetting the members of a Bitmapset, I decided to go and write a function to do this rather than copy the various other places where we manually construct a new set with a bms_next_member() -> bms_add_member() loop. The new use case I have is from pulling varattnos from a scan's targetlist, which there could be several hundred Vars in. I considered this might be noticeably expensive. The current manual way we have of doing this is a bit laborious since we're only doing 1 member per bms_next_member() loop, and also, if the set has multiple words, we may end up doing several repallocs to expand the set, perhaps as little as 1 word at a time. That's not to mention the rather repetitive code that we have to do this in various places that might be nice to consolidate. I've attached a patch which adds bms_offset_members(), which does bitshifting to move the members up or down by the given offset. While working on this I made a few choices which might be worth a revisit: 1) The function modifies the given set in-place rather than making a new set. 2) The function will ERROR if any member would go above INT_MAX. These would be inaccessible, and that seems weird/wrong. 3) When offsetting by a negative value, members that would go below zero fall out of the set silently. For #1; my original use-case that made me write this didn't need a copy, so I wrote the function to modify the set in-place. After hunting down and replacing the relevant existing bms_next_member() loops with the new function, I noticed all these seem to need a copy. Currently, I have coded the patch to do bms_offset_members(bms_copy(set), ...), but that's a little inefficient as it *could* result in a palloc for the copy then a repalloc in the offset. If bms_offset_members() just created a new set, then it could palloc() a set to the exact required size. The counterargument to that is that I don't really want to copy the set for my intended use case. I thought about two versions, but thought that might be overkill. There could be a boolean parameter to define that, but we don't do that anywhere else in bitmapset.c, or we could avoid a copy-paste of the code with an always-inlined helper function. I couldn't decide, so left it as-is. For #2, I could have equally made these fall off the top of the set, but I thought we might want to know about it in the unlikely event that this ever happens. #3 We commonly want to get rid of system columns from a pull_varattnos() set which is offset by FirstLowInvalidHeapAttributeNumber, so those disappearing silently is what most use cases seem to want. I expect that's not for revisiting, but I listed this one anyway as it flies in the face of #2. Patch attached. Comments welcome. David
Вложения
> On Apr 13, 2026, at 12:35, David Rowley <dgrowleyml@gmail.com> wrote:
>
> (v20 material)
>
> While working on some new code which required offsetting the members
> of a Bitmapset, I decided to go and write a function to do this rather
> than copy the various other places where we manually construct a new
> set with a bms_next_member() -> bms_add_member() loop. The new use
> case I have is from pulling varattnos from a scan's targetlist, which
> there could be several hundred Vars in. I considered this might be
> noticeably expensive.
>
> The current manual way we have of doing this is a bit laborious since
> we're only doing 1 member per bms_next_member() loop, and also, if the
> set has multiple words, we may end up doing several repallocs to
> expand the set, perhaps as little as 1 word at a time. That's not to
> mention the rather repetitive code that we have to do this in various
> places that might be nice to consolidate.
>
> I've attached a patch which adds bms_offset_members(), which does
> bitshifting to move the members up or down by the given offset. While
> working on this I made a few choices which might be worth a revisit:
>
> 1) The function modifies the given set in-place rather than making a new set.
> 2) The function will ERROR if any member would go above INT_MAX. These
> would be inaccessible, and that seems weird/wrong.
> 3) When offsetting by a negative value, members that would go below
> zero fall out of the set silently.
>
> For #1; my original use-case that made me write this didn't need a
> copy, so I wrote the function to modify the set in-place. After
> hunting down and replacing the relevant existing bms_next_member()
> loops with the new function, I noticed all these seem to need a copy.
> Currently, I have coded the patch to do
> bms_offset_members(bms_copy(set), ...), but that's a little
> inefficient as it *could* result in a palloc for the copy then a
> repalloc in the offset. If bms_offset_members() just created a new
> set, then it could palloc() a set to the exact required size. The
> counterargument to that is that I don't really want to copy the set
> for my intended use case. I thought about two versions, but thought
> that might be overkill. There could be a boolean parameter to define
> that, but we don't do that anywhere else in bitmapset.c, or we could
> avoid a copy-paste of the code with an always-inlined helper function.
> I couldn't decide, so left it as-is.
>
> For #2, I could have equally made these fall off the top of the set,
> but I thought we might want to know about it in the unlikely event
> that this ever happens.
>
> #3 We commonly want to get rid of system columns from a
> pull_varattnos() set which is offset by
> FirstLowInvalidHeapAttributeNumber, so those disappearing silently is
> what most use cases seem to want. I expect that's not for revisiting,
> but I listed this one anyway as it flies in the face of #2.
>
> Patch attached. Comments welcome.
>
> David
> <v1-0001-Introduce-bms_offset_members-function.patch>
I basically agree with design choices 1/2/3. And the implementation of v1 overall looks good to me.
The only issue I found is this overflow check:
```
+ /* Handle a positive offset (bitshift left) */
+ if (offset > 0)
+ {
+ /*
+ * An unlikely scenario, but check we're not going to create a member
+ * greater than PG_INT32_MAX.
+ */
+ if (((uint64) new_nwords - 1) * BITS_PER_BITMAPWORD + high_bit + offset_bits > PG_INT32_MAX)
+ elog(ERROR, "bitmapset overflow");
```
This overflow check seems wrong. Because when high_bit + offset_bits > BITS_PER_BITMAPWORD, new_nwords has been
increasedby 1, so there high_bit + offset_bits are double counted.
To verify that, I added a new test:
```
-- 2147483583 is PG_INT32_MAX - 64, so offsetting by 1 should succeed,
-- but offsetting it by 65 should fail with overflow error
SELECT test_random_offset_operations_check_highest(2147483583, 1) AS result;
SELECT test_random_offset_operations_check_highest(2147483583, 65) AS result;
```
With v1, test_random_offset_operations_check_highest(2147483583, 1) reports an overflow error, but it should not.
Please see the attached diff for the test I added. In that diff, I also included a fix, and with that fix, the tests
pass.
Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/
Вложения
Thanks for looking. On Tue, 14 Apr 2026 at 20:46, Chao Li <li.evan.chao@gmail.com> wrote: > + if (((uint64) new_nwords - 1) * BITS_PER_BITMAPWORD + high_bit + offset_bits > PG_INT32_MAX) > + elog(ERROR, "bitmapset overflow"); > This overflow check seems wrong. Because when high_bit + offset_bits > BITS_PER_BITMAPWORD, new_nwords has been increasedby 1, so there high_bit + offset_bits are double counted. Your idea of checking the old highest member plus the offset seems a more robust method, so I've adjusted the patch to use that. David
Вложения
David Rowley <dgrowleyml@gmail.com> writes:
> Your idea of checking the old highest member plus the offset seems a
> more robust method, so I've adjusted the patch to use that.
I question the decision to make this change the set in-place.
Wouldn't it be cheaper and less surprise-prone to always make
a copy?
regards, tom lane
On Wed, 15 Apr 2026 at 12:29, Tom Lane <tgl@sss.pgh.pa.us> wrote: > I question the decision to make this change the set in-place. > Wouldn't it be cheaper and less surprise-prone to always make > a copy? I'd not considered surprise-prone as an aspect. I understand we have bms_join and bms_union, which do the same thing if you only care about the value of the result and not what happens to the inputs. So I didn't think I was introducing anything too surpising given we've got various other Bitmapset functions that modify the input in-place. My expectation there was that people are used to checking what the behaviour of the bitmapset function is. For the current use cases of the function in the patch, I agree that it would likely be better for performance if the new function allocated a new set. It was more a question of whether we want to throw away performance for other cases which are fine with an in-place update and have a positive offset. Those will never repalloc(). I didn't really know the answer. I suspect with the current patch that each of the existing cases will be faster than today's bms_next_member loops, regardless. When I wrote the function, I was mainly thinking of the new use-case that I was working on, which won't require any palloc() or repalloc() with the current design. Since I was adding that to a fairly common code path, I thought I had more of a chance of not having to jump through too many hoops to ensure I don't cause any performance regressions. In short, I don't really know what's best. I'm certainly open to changing it if someone comes up with a good reason to do it the other way. Maybe catering for the majority of callers is a good enough reason to change it. David
David Rowley <dgrowleyml@gmail.com> writes:
> On Wed, 15 Apr 2026 at 12:29, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I question the decision to make this change the set in-place.
>> Wouldn't it be cheaper and less surprise-prone to always make
>> a copy?
> I'd not considered surprise-prone as an aspect. I understand we have
> bms_join and bms_union, which do the same thing if you only care about
> the value of the result and not what happens to the inputs.
Sure, but bms_join is an optional optimization of the far safer
bms_union operation. It bothers me to create the optimized case
but not the base case.
regards, tom lane
On Wed, 15 Apr 2026 at 14:30, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > David Rowley <dgrowleyml@gmail.com> writes: > > I'd not considered surprise-prone as an aspect. I understand we have > > bms_join and bms_union, which do the same thing if you only care about > > the value of the result and not what happens to the inputs. > > Sure, but bms_join is an optional optimization of the far safer > bms_union operation. It bothers me to create the optimized case > but not the base case. Hmm, yeah. That seems like a good argument for making a new set. I'll go make it so. David
On 15.04.26 04:33, David Rowley wrote: > On Wed, 15 Apr 2026 at 14:30, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> >> David Rowley <dgrowleyml@gmail.com> writes: >>> I'd not considered surprise-prone as an aspect. I understand we have >>> bms_join and bms_union, which do the same thing if you only care about >>> the value of the result and not what happens to the inputs. >> >> Sure, but bms_join is an optional optimization of the far safer >> bms_union operation. It bothers me to create the optimized case >> but not the base case. > > Hmm, yeah. That seems like a good argument for making a new set. I'll > go make it so. Depending on what you end up doing, maybe a sprinkling of pg_nodiscard could be appropriate.
On Wed, 15 Apr 2026 at 14:33, David Rowley <dgrowleyml@gmail.com> wrote: > > On Wed, 15 Apr 2026 at 14:30, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > > > David Rowley <dgrowleyml@gmail.com> writes: > > > I'd not considered surprise-prone as an aspect. I understand we have > > > bms_join and bms_union, which do the same thing if you only care about > > > the value of the result and not what happens to the inputs. > > > > Sure, but bms_join is an optional optimization of the far safer > > bms_union operation. It bothers me to create the optimized case > > but not the base case. > > Hmm, yeah. That seems like a good argument for making a new set. I'll > go make it so. Patch attached for the version that creates a new set rather than modifying the input set in-place. David
Вложения
On Thu, 16 Apr 2026 at 07:17, Peter Eisentraut <peter@eisentraut.org> wrote: > Depending on what you end up doing, maybe a sprinkling of pg_nodiscard > could be appropriate. Yeah maybe. It wouldn't do any harm, at least. I didn't do that in the patch I just sent as it felt like something we should do or not do for all the bitmapset functions it's relevant for. REALLOCATE_BITMAPSETS is meant to give us a stronger guarantee of people forgetting to do this, as it would cause a breakage if there were multiple pointers to the same set and only one of them got updated. David
David Rowley <dgrowleyml@gmail.com> writes:
> On Thu, 16 Apr 2026 at 07:17, Peter Eisentraut <peter@eisentraut.org> wrote:
>> Depending on what you end up doing, maybe a sprinkling of pg_nodiscard
>> could be appropriate.
> Yeah maybe. It wouldn't do any harm, at least.
> I didn't do that in the patch I just sent as it felt like something we
> should do or not do for all the bitmapset functions it's relevant for.
Agreed, seems like it should be a separate patch.
regards, tom lane
On Sat, Apr 18, 2026, at 3:49 AM, David Rowley wrote:
> On Wed, 15 Apr 2026 at 14:33, David Rowley <dgrowleyml@gmail.com> wrote:
>>
>> On Wed, 15 Apr 2026 at 14:30, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> >
>> > David Rowley <dgrowleyml@gmail.com> writes:
>> > > I'd not considered surprise-prone as an aspect. I understand we have
>> > > bms_join and bms_union, which do the same thing if you only care about
>> > > the value of the result and not what happens to the inputs.
>> >
>> > Sure, but bms_join is an optional optimization of the far safer
>> > bms_union operation. It bothers me to create the optimized case
>> > but not the base case.
>>
>> Hmm, yeah. That seems like a good argument for making a new set. I'll
>> go make it so.
>
> Patch attached for the version that creates a new set rather than
> modifying the input set in-place.
>
> David
Hey David,
> Attachments:
> * v2-0001-Introduce-bms_offset_members-function.patch
I applied, tested, and reviewed these changes. Thanks for doing this, only a few small things jumped out.
nit: in bitmapset.c there is a new line added above bms_add_range()
+ * Arguments:
+ * arg1: optional random seed, or < 0 to use a random seed.
+ * arg2: the number of operations to perform.
+ * arg3: the maximum bitmapset member number to use in the random set.
+ * arg4: the minimum bitmapset member number to use in the random set.
nit: whitespace ahead of arg1, also should be "NULL" not "< 0"
in test_bitmapset.sql
+-- perform some random test on bms_offset_members()
nit: "tests"
Also, I think the random testing will likely cover these, but here are a few more explicit tests for odd corner cases.
-- shift that shrinks nwords
SELECT test_bms_offset_members('(b 64 65 66)', -64); -- drops into word 0
-- shift that drops some low members and keeps others
SELECT test_bms_offset_members('(b 0 1 2 10)', -2); -- expect (b 0 8)
-- entire set shifts below zero -> empty
SELECT test_bms_offset_members('(b 1 2 3)', -10); -- expect empty
-- word-aligned positive and negative shifts
SELECT test_bms_offset_members('(b 1 2 3)', 64);
SELECT test_bms_offset_members('(b 65 66 67)', -64);
-- INT_MIN boundary
SELECT test_bms_offset_members('(b 1)', -2147483648);
I like the functionality and the reduction of repeated code that you've identified and fixed.
best.
-greg
On Mon, 20 Apr 2026 at 07:22, Greg Burd <greg@burd.me> wrote:
> I applied, tested, and reviewed these changes. Thanks for doing this, only a few small things jumped out.
Many thanks. I took all of those suggestions.
> SELECT test_bms_offset_members('(b 1)', -2147483648);
I made that one use member 0 instead of 1. That'll mean "new_highest"
goes to INT_MIN rather than INT_MIN + 1.
David
Вложения
On Sun, Apr 19, 2026, at 7:52 PM, David Rowley wrote:
> On Mon, 20 Apr 2026 at 07:22, Greg Burd <greg@burd.me> wrote:
>> I applied, tested, and reviewed these changes. Thanks for doing this, only a few small things jumped out.
>
> Many thanks. I took all of those suggestions.
Happy to help.
>> SELECT test_bms_offset_members('(b 1)', -2147483648);
>
> I made that one use member 0 instead of 1. That'll mean "new_highest"
> goes to INT_MIN rather than INT_MIN + 1.
Perfect, that covers the gap nicely.
Were you planning on writing the optimized non-copy version as well? I don't think it is strictly necessary, more a
curiosity.
bms_offset_members() -> new bms, might repalloc() replaces existing loops you've found
bms_shift_members() -> bms is modified in place and fits your new use case a bit better
best.
-greg
> David
>
> Attachments:
> * v3-0001-Introduce-bms_offset_members-function.patch
On Tue, 21 Apr 2026 at 02:55, Greg Burd <greg@burd.me> wrote: > Were you planning on writing the optimized non-copy version as well? I don't think it is strictly necessary, more a curiosity. > > bms_offset_members() -> new bms, might repalloc() replaces existing loops you've found > bms_shift_members() -> bms is modified in place and fits your new use case a bit better Not at this stage. The v1 patch did modify the set in-place, so the code is there if we ever need it. I didn't find any need for it in our current code. The selective tuple deforming patch I'm working on could use it, but I doubt it's worth the trouble for 1 caller. It's just for something that happens during create_plan(), so 1 more allocation in that code likely isn't going to be noticed. David