Re: Small and unlikely overflow hazard in bms_next_member()
| От | Chao Li |
|---|---|
| Тема | Re: Small and unlikely overflow hazard in bms_next_member() |
| Дата | |
| Msg-id | 5E98992A-62D3-4961-B2F1-61246DB9D814@gmail.com обсуждение исходный текст |
| Ответ на | Re: Small and unlikely overflow hazard in bms_next_member() (David Rowley <dgrowleyml@gmail.com>) |
| Ответы |
Re: Small and unlikely overflow hazard in bms_next_member()
|
| Список | pgsql-hackers |
> On Apr 12, 2026, at 21:17, David Rowley <dgrowleyml@gmail.com> wrote: > > On Fri, 3 Apr 2026 at 15:08, David Rowley <dgrowleyml@gmail.com> wrote: >> IMO, if we can make bitmapset.c work with INT_MAX members and get a >> performance increase, then we should do it. > > Re-thinking this after a week's holiday, it seems fine to use an > unsigned 32-bit int rather than a 64-bit int to fix this bug. I'd > previously been uncertain if there were any guarantees in C to what > (unsigned int) -1 would return, but going by [1] at 6.3.1.3, it says: > > "Otherwise, if the new type is unsigned, the value is converted by > repeatedly adding or subtracting one more than the maximum value that > can be represented in the new type until the value is in the range of > the new type." > > So, it seems even on one's complement that -1 as an unsigned int will > be UINT_MAX. When we add 1 to UINT_MAX, we're guaranteed to get 0, as > it's unsigned maths and overflows are going to result in a value > modulus the max value for the type. > > That leads me to the attached v2 patch. Compiler Explorer link showing > the assembly at [2]. > > Testing the performance, it's better than master, so I got rid of the > size_t wordnum stuff. We're post-freeze now, so fiddling with other > optimisations seems a bit off the table as there appears to be no > performance regression to compensate for now, per: > > drowley@amd3990x:~$ gcc test_bms_next3.c -O2 -o test_bms_next3 && > ./test_bms_next3 > Benchmarking 100000000 bms_next_member iterations... > master: 1.18330 seconds > Patched: 1.05493 seconds > > Benchmarking 100000000 bms_prev_member iterations... > master: 2.94522 seconds > Patched: 1.86130 seconds > > drowley@amd3990x:~$ clang test_bms_next3.c -O2 -o test_bms_next3 && > ./test_bms_next3 > Benchmarking 100000000 bms_next_member iterations... > master: 1.07860 seconds > Patched: 1.07896 seconds > > Benchmarking 100000000 bms_prev_member iterations... > master: 2.76550 seconds > Patched: 2.12369 seconds > > David > > [1] https://www.open-std.org/jtc1/sc22/wg14/www/docs/n1548.pdf > [2] https://godbolt.org/z/xW96rxd3P > <test_bms_next3.c><v2-0001-Fix-unlikely-overflow-bug-in-bms_next_member.patch> I just tried test_bms_next3 on my MacBook, looks like patched bms_prev_member is much faster there. I ran about 10 times,the results were consistent. ``` chaol@ChaodeMacBook-Air test % gcc test_bms_next3.c -O2 -o test_bms_next3 test_bms_next3.c:281:18: warning: format specifies type 'long' but the argument has type 'int64' (aka 'long long') [-Wformat] 281 | printf("%ld\n", count); | ~~~ ^~~~~ | %lld 1 warning generated. chaol@ChaodeMacBook-Air test % ./test_bms_next3 Benchmarking 100000000 bms_next_member iterations... master: 0.49287 seconds Patched: 0.47315 seconds Benchmarking 100000000 bms_prev_member iterations... master: 1.04644 seconds Patched: 0.58053 seconds 2800000000 ``` Best regards, -- Chao Li (Evan) HighGo Software Co., Ltd. https://www.highgo.com/
В списке pgsql-hackers по дате отправления: