Обсуждение: Undefined behavior detected by new clang's ubsan
Hello hackers. When trying to run `make check` for a build made by clang-21 with sanitizers enabled: CFLAGS="-Og -fsanitize=address -fsanitize=undefined -fno-sanitize-recover -fno-sanitize=function" LDFLAGS="-static-libsan" ... I hit into: ../../src/include/lib/sort_template.h:314:15: runtime error: applying non-zero offset 8 to null pointer SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior ../../src/include/lib/sort_template.h:314:15 I could workaround that with: --- a/src/include/lib/sort_template.h +++ b/src/include/lib/sort_template.h @@ -307,6 +307,9 @@ ST_SORT(ST_ELEMENT_TYPE * data, size_t n int r, presorted; +if (!data && n == 0) + return; + loop: But then there was: heaptoast.c:770:26: runtime error: addition of unsigned offset to 0x7395fbd3d204 overflowed to 0x7395fbd3d142 SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior heaptoast.c:770:26 sharedtuplestore.c:326:30: runtime error: applying non-zero offset 24 to null pointer SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior sharedtuplestore.c:326:30 and trgm_gist.c:702:40: runtime error: applying non-zero offset 16 to null pointer SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior trgm_gist.c:702:40 With the attached patch applied, `make check-world` passes for me. Reproduced with clang 20.1, but not reproduced with clang 20.0, so maybe this note is relevant here: https://releases.llvm.org/20.1.0/tools/clang/docs/ReleaseNotes.html#sanitizers Changed -fsanitize=pointer-overflow to no longer report NULL + 0 as undefined behavior in C, in line with N3322, and matching the previous behavior for C++. NULL + non_zero continues to be reported as undefined behavior. Best regards, Alexander
Вложения
On Tue, Jan 20, 2026 at 12:00 PM Alexander Lakhin <exclusion@gmail.com> wrote: > ../../src/include/lib/sort_template.h:314:15: runtime error: applying non-zero offset 8 to null pointer > SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior ../../src/include/lib/sort_template.h:314:15 Where does it think a NULL pointer is coming from? -- John Naylor Amazon Web Services
Hello John, 20.01.2026 08:13, John Naylor wrote: > On Tue, Jan 20, 2026 at 12:00 PM Alexander Lakhin <exclusion@gmail.com> wrote: >> ../../src/include/lib/sort_template.h:314:15: runtime error: applying non-zero offset 8 to null pointer >> SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior ../../src/include/lib/sort_template.h:314:15 > Where does it think a NULL pointer is coming from? Thank you for paying attention to this! With UBSAN_OPTIONS=print_stacktrace=1, I can see: #0 0x607efd762a61 in qsort_arg .../src/port/../../src/include/lib/sort_template.h:314:15 #1 0x607efd3fa268 in multirange_canonicalize .../src/backend/utils/adt/multirangetypes.c:488:2 #2 0x607efd3fa268 in make_multirange .../src/backend/utils/adt/multirangetypes.c:655:16 #3 0x607efcab820f in ExecInterpExpr .../src/backend/executor/execExprInterp.c:926:8 #4 0x607efceef4b6 in ExecEvalExprSwitchContext .../src/backend/optimizer/util/../../../../src/include/executor/executor.h:444:13 #5 0x607efceef4b6 in evaluate_expr .../src/backend/optimizer/util/clauses.c:5323:14 #6 0x607efcef282f in evaluate_function .../src/backend/optimizer/util/clauses.c:4830:9 #7 0x607efcef282f in simplify_function .../src/backend/optimizer/util/clauses.c:4179:12 #8 0x607efcee84d9 in eval_const_expressions_mutator .../src/backend/optimizer/util/clauses.c:2608:14 #9 0x607efccdda63 in expression_tree_mutator_impl .../src/backend/nodes/nodeFuncs.c:3485:5 #10 0x607efcee60ba in eval_const_expressions_mutator .../src/backend/optimizer/util/clauses.c:3798:9 #11 0x607efccde200 in expression_tree_mutator_impl .../src/backend/nodes/nodeFuncs.c:3571:12 #12 0x607efcee60ba in eval_const_expressions_mutator .../src/backend/optimizer/util/clauses.c:3798:9 #13 0x607efcee5d97 in eval_const_expressions .../src/backend/optimizer/util/clauses.c:2282:9 #14 0x607efce76a8f in preprocess_expression .../src/backend/optimizer/plan/planner.c:1331:10 #15 0x607efce73c32 in subquery_planner .../src/backend/optimizer/plan/planner.c:934:3 #16 0x607efce6fbee in standard_planner .../src/backend/optimizer/plan/planner.c:470:9 #17 0x607efce6f49a in planner .../src/backend/optimizer/plan/planner.c:324:12 #18 0x607efd1ff680 in pg_plan_query .../src/backend/tcop/postgres.c:905:9 #19 0x607efd1ff968 in pg_plan_queries .../src/backend/tcop/postgres.c:1000:11 #20 0x607efd20a87b in exec_simple_query .../src/backend/tcop/postgres.c:1198:19 #21 0x607efd204dc0 in PostgresMain .../src/backend/tcop/postgres.c #22 0x607efd1f804b in BackendMain .../src/backend/tcop/backend_startup.c:124:2 #23 0x607efcf9e59a in postmaster_child_launch .../src/backend/postmaster/launch_backend.c:268:3 #24 0x607efcfa8136 in BackendStartup .../src/backend/postmaster/postmaster.c:3606:8 #25 0x607efcfa8136 in ServerLoop .../src/backend/postmaster/postmaster.c:1713:6 #26 0x607efcfa46cc in PostmasterMain .../src/backend/postmaster/postmaster.c:1403:11 #27 0x607efcc4ca92 in main .../src/backend/main/main.c:231:4 ... SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior ../../src/include/lib/sort_template.h:314:15 ... 2026-01-20 08:20:18.256 EET postmaster[3914440] LOG: client backend (PID 3914618) was terminated by signal 6: Aborted 2026-01-20 08:20:18.256 EET postmaster[3914440] DETAIL: Failed process was running: select textmultirange(); Best regards, Alexander
On Tue, Jan 20, 2026 at 2:00 PM Alexander Lakhin <exclusion@gmail.com> wrote:
> With UBSAN_OPTIONS=print_stacktrace=1, I can see:
> #0 0x607efd762a61 in qsort_arg .../src/port/../../src/include/lib/sort_template.h:314:15
> #1 0x607efd3fa268 in multirange_canonicalize .../src/backend/utils/adt/multirangetypes.c:488:2
> #2 0x607efd3fa268 in make_multirange .../src/backend/utils/adt/multirangetypes.c:655:16
Indeed, there are calls like "make_multirange(mltrngtypoid, rangetyp,
0, NULL);", where 0 is the count and NULL is the ranges. Then
multirange_canonicalize() has
qsort_arg(ranges, input_range_count, sizeof(RangeType *),
range_compare, rangetyp);
I haven't dug further, but I wonder if multirange_canonicalize() does
anything useful at all with "0, NULL" input from make_multirange().
Anyway, the complaint is about this place:
if (n < 7)
{
for (pm = a + ST_POINTER_STEP; pm < a + n * ST_POINTER_STEP;
pm += ST_POINTER_STEP)
...
I don't think it's great to pass a NULL pointer to a sort, but the
length could conceivably be zero for future degenerate cases, so we
could silence the warning by adding "if (n < 2) return;" before the
for-loop. The advantage of doing that anyway is it allows us to remove
all four of the "if (d_ > ST_POINTER_STEP)" branches in the recursion
part. That's better for readability.
--
John Naylor
Amazon Web Services
John Naylor <johncnaylorls@gmail.com> writes:
> I don't think it's great to pass a NULL pointer to a sort, but the
> length could conceivably be zero for future degenerate cases, so we
> could silence the warning by adding "if (n < 2) return;" before the
> for-loop. The advantage of doing that anyway is it allows us to remove
> all four of the "if (d_ > ST_POINTER_STEP)" branches in the recursion
> part. That's better for readability.
+1
regards, tom lane
On Wed, Jan 21, 2026 at 1:52 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> John Naylor <johncnaylorls@gmail.com> writes:
> > I don't think it's great to pass a NULL pointer to a sort, but the
> > length could conceivably be zero for future degenerate cases, so we
> > could silence the warning by adding "if (n < 2) return;" before the
> > for-loop. The advantage of doing that anyway is it allows us to remove
> > all four of the "if (d_ > ST_POINTER_STEP)" branches in the recursion
> > part. That's better for readability.
>
> +1
Okay, I've written that approach. Since it requires a bit more
explanation, I've kept it separate for now.
> With the attached patch applied, `make check-world` passes for me.
As for the rest of the proposed fixes, most seem okay, but I have some nits:
trgm_gist.c:
- TRGM *cachedVal = (TRGM *) (cache + MAXALIGN(siglen));
+ TRGM *cachedVal = cache ? ((TRGM *) (cache + MAXALIGN(siglen))) : NULL;
This is getting a bit unwieldy for a declaration. How about this?
char *cache = (char *) fcinfo->flinfo->fn_extra;
- TRGM *cachedVal = (TRGM *) (cache + MAXALIGN(siglen));
+ TRGM *cachedVal = NULL;
[...]
+ if (cache != NULL)
+ cachedVal = (TRGM *) (cache + MAXALIGN(siglen));
heaptoast.c
memcpy(VARDATA(result) +
- (curchunk * TOAST_MAX_CHUNK_SIZE - sliceoffset) + chcpystrt,
+ (int)(curchunk * TOAST_MAX_CHUNK_SIZE - sliceoffset) + chcpystrt,
Not sure about this one. It would be better if we reversing the
operands allowed us to avoid overflow in the first place:
- (curchunk * TOAST_MAX_CHUNK_SIZE - sliceoffset) + chcpystrt,
+ chcpystrt + (curchunk * TOAST_MAX_CHUNK_SIZE - sliceoffset)
Does that silence the warning?
sharedtuplestore.c
- if (accessor->write_pointer + size > accessor->write_end)
+ if ((accessor->write_pointer == NULL && accessor->write_end == NULL
&& size > 0) || (accessor->write_pointer + size >
accessor->write_end))
{
if (accessor->write_chunk == NULL)
{
/* First time through. Allocate chunk. */
I don't see why we have to check so many conditions. The last line
above is where write_pointer is set in a new allocation, so it seems
we could just have
if (accessor->write_pointer == NULL ||
accessor->write_pointer + size > accessor->write_end)
Вложения
Hello John,
21.01.2026 12:05, John Naylor wrote:
21.01.2026 12:05, John Naylor wrote:
As for the rest of the proposed fixes, most seem okay, but I have some nits:
Thank you for spending time on this!
I agree with all of your changes (except for one noted below) -- didn't
mean to propose committable changes, just wanted to show the fixes that
allowed check-world to pass.
heaptoast.c memcpy(VARDATA(result) + - (curchunk * TOAST_MAX_CHUNK_SIZE - sliceoffset) + chcpystrt, + (int)(curchunk * TOAST_MAX_CHUNK_SIZE - sliceoffset) + chcpystrt, Not sure about this one. It would be better if we reversing the operands allowed us to avoid overflow in the first place: - (curchunk * TOAST_MAX_CHUNK_SIZE - sliceoffset) + chcpystrt, + chcpystrt + (curchunk * TOAST_MAX_CHUNK_SIZE - sliceoffset) Does that silence the warning?
Unfortunately, no -- I still got:
heaptoast.c:771:17: runtime error: addition of unsigned offset to 0x78120673fac6 overflowed to 0x78120673fa04
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior heaptoast.c:771:17
Best regards,
Alexander
On Thu, Jan 22, 2026 at 3:00 AM Alexander Lakhin <exclusion@gmail.com> wrote: > heaptoast.c > memcpy(VARDATA(result) + > - (curchunk * TOAST_MAX_CHUNK_SIZE - sliceoffset) + chcpystrt, > + (int)(curchunk * TOAST_MAX_CHUNK_SIZE - sliceoffset) + chcpystrt, > > Not sure about this one. It would be better if we reversing the > operands allowed us to avoid overflow in the first place: > > - (curchunk * TOAST_MAX_CHUNK_SIZE - sliceoffset) + chcpystrt, > + chcpystrt + (curchunk * TOAST_MAX_CHUNK_SIZE - sliceoffset) > > Does that silence the warning? > > > Unfortunately, no -- I still got: > heaptoast.c:771:17: runtime error: addition of unsigned offset to 0x78120673fac6 overflowed to 0x78120673fa04 > SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior heaptoast.c:771:17 Okay, I'll refrain from guessing more then. I'm due for an OS upgrade anyway, and that'll have clang 21, so I'll come back to this one unless someone beats me to it. -- John Naylor Amazon Web Services
On Thu, Jan 22, 2026 at 12:14 PM John Naylor <johncnaylorls@gmail.com> wrote: > > On Thu, Jan 22, 2026 at 3:00 AM Alexander Lakhin <exclusion@gmail.com> wrote: > > Unfortunately, no -- I still got: > > heaptoast.c:771:17: runtime error: addition of unsigned offset to 0x78120673fac6 overflowed to 0x78120673fa04 > > SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior heaptoast.c:771:17 > > Okay, I'll refrain from guessing more then. I'm due for an OS upgrade > anyway, and that'll have clang 21, so I'll come back to this one > unless someone beats me to it. Small update: With clang 21 I've confirmed the behavior Alexander found, and confirmed my v1 patches fix the warnings except for the one in heap_fetch_toast_slice() that I skipped over. I'd like to understand that one better, in case a more principled fix can be found. -- John Naylor Amazon Web Services
On Wed, Jan 21, 2026 at 5:05 PM John Naylor <johncnaylorls@gmail.com> wrote: > heaptoast.c > memcpy(VARDATA(result) + > - (curchunk * TOAST_MAX_CHUNK_SIZE - sliceoffset) + chcpystrt, > + (int)(curchunk * TOAST_MAX_CHUNK_SIZE - sliceoffset) + chcpystrt, Recall, the error was "runtime error: addition of unsigned offset to 0x7395fbd3d204 overflowed to 0x7395fbd3d142" It looks like "- 194" got turned into "+ (SIZE_MAX - 193)". Curiously, just removing the parentheses is enough to pass make check for me.: - (curchunk * TOAST_MAX_CHUNK_SIZE - sliceoffset) + chcpystrt, + curchunk * TOAST_MAX_CHUNK_SIZE - sliceoffset + chcpystrt, That's obviously equivalent in math, and IIUC in C precedence, so I'm not sure what to think of this. For v2 I've just done the above, but I'm curious if this raises anyone else's eyebrow. 0001 is backpatchable to v14, and doesn't change the sort template, and just guards NULL at the call site. The sort template change 0002 is a master-only patch. I don't think it would make any difference for performance, but to remove any doubt we could bump the insertion sort threshold, which is a good idea anyway. -- John Naylor Amazon Web Services
Вложения
On Mon, Feb 2, 2026 at 6:30 PM John Naylor <johncnaylorls@gmail.com> wrote: > > On Wed, Jan 21, 2026 at 5:05 PM John Naylor <johncnaylorls@gmail.com> wrote: > > heaptoast.c > > memcpy(VARDATA(result) + > > - (curchunk * TOAST_MAX_CHUNK_SIZE - sliceoffset) + chcpystrt, > > + (int)(curchunk * TOAST_MAX_CHUNK_SIZE - sliceoffset) + chcpystrt, > > Recall, the error was "runtime error: addition of unsigned offset to > 0x7395fbd3d204 overflowed to 0x7395fbd3d142" > > It looks like "- 194" got turned into "+ (SIZE_MAX - 193)". > > Curiously, just removing the parentheses is enough to pass make check for me.: > > - (curchunk * TOAST_MAX_CHUNK_SIZE - sliceoffset) + chcpystrt, > + curchunk * TOAST_MAX_CHUNK_SIZE - sliceoffset + chcpystrt, > > That's obviously equivalent in math, and IIUC in C precedence, so I'm > not sure what to think of this. For v2 I've just done the above, but > I'm curious if this raises anyone else's eyebrow. I logged the offending statement and found curchunk: 50 TOAST_MAX_CHUNK_SIZE: 1996 sliceoffset: 99994 chcpystrt 194 The expression for the offset to the pointer: (curchunk * TOAST_MAX_CHUNK_SIZE - sliceoffset) + chcpystrt is zero by way of (-194) + 194 ...so the "+ (-194)" must get cast to "+ (size_t) (-194)" causing the overflow. So the parentheses are indeed at fault by forcing precedence. The entire offset expression is never negative for our regression tests (nor should it ever be), so it's kind of pointless to cast any subset of this to a signed type. I went ahead and pushed 0001 to all versions. Thanks Alexander, for the report and the patch! I'll push 0002 in the near future. -- John Naylor Amazon Web Services
On Wed, Feb 04, 2026 at 06:15:11PM +0700, John Naylor wrote: > I logged the offending statement and found > > curchunk: 50 > TOAST_MAX_CHUNK_SIZE: 1996 > sliceoffset: 99994 > chcpystrt 194 > > The expression for the offset to the pointer: > > (curchunk * TOAST_MAX_CHUNK_SIZE - sliceoffset) + chcpystrt > > is zero by way of > > (-194) + 194 > > ...so the "+ (-194)" must get cast to "+ (size_t) (-194)" causing the > overflow. So the parentheses are indeed at fault by forcing > precedence. The entire offset expression is never negative for our > regression tests (nor should it ever be), so it's kind of pointless to > cast any subset of this to a signed type. I went ahead and pushed 0001 > to all versions. Thanks Alexander, for the report and the patch! Ah. I've seen this one, actually, while playing with the TOAST patch set for 64-bit values. The max chunk size varies as this code would need to switch between two possible sizes, and I've been super puzzled by the fact that I had to switch the code to use an unsigned integer for the declaration of my max_chunk_size, and a signed integer quickly failed. Thanks for the fix! -- Michael