Обсуждение: Improving asan/ubsan support

Поиск
Список
Период
Сортировка

Improving asan/ubsan support

От
Alexander Lakhin
Дата:
[ this thread separated from [1] as the discussion focus shifted ]

H Andres,

29.11.2023 22:39, Andres Freund wrote:
>> I use the following:
>> ASAN_OPTIONS=detect_leaks=0:abort_on_error=1:print_stacktrace=1:\
>> disable_coredump=0:strict_string_checks=1:check_initialization_order=1:\
>> strict_init_order=1:detect_stack_use_after_return=0
> I wonder if we should add some of these options by default ourselves. We could
> e.g. add something like the __ubsan_default_options() in
> src/backend/main/main.c to src/port/... instead, and return a combination of
> "our" options (like detect_leaks=0) and the ones from the environment.

I think that such explicit expression of the project policy regarding
sanitizer checks is for good, but I see some obstacles on this way.

First, I'm not sure what to do with new useful options/maybe new option
values, that will appear in sanitizers eventually. Should the only options,
that are supported by all sanitizers' versions, be specified, or we may
expect that unsupported options/values would be ignored by old versions?

Second, what to do with other binaries, that need detect_leaks=0, for
example, that same ecpg?

> ISTM that, if it actually works as I theorize it should, using
> __attribute__((no_sanitize("address"))) would be the easiest approach
> here. Something like
>
> #if defined(__has_feature) && __has_feature(address_sanitizer)
> #define pg_attribute_no_asan __attribute__((no_sanitize("address")))
> #else
> #define pg_attribute_no_asan
> #endif
>
> or such should work.

I've tried adding:
  bool
+__attribute__((no_sanitize("address")))
  stack_is_too_deep(void)

and it does work got me with clang 15, 18: `make check-world` passes with
ASAN_OPTIONS=detect_leaks=0:abort_on_error=1:print_stacktrace=1:\
disable_coredump=0:strict_string_checks=1:check_initialization_order=1:\
strict_init_order=1:detect_stack_use_after_return=1
UBSAN_OPTIONS=abort_on_error=1:print_stacktrace=1

(with a fix for pg_bsd_indent applied [2])

But with gcc 11, 12, 13 I get an assertion failure during `make check`:
#4  0x00007fabadcd67f3 in __GI_abort () at ./stdlib/abort.c:79
#5  0x0000557f35260382 in ExceptionalCondition (conditionName=0x557f35ca51a0 "(uintptr_t) buffer == 
TYPEALIGN(PG_IO_ALIGN_SIZE, buffer)", fileName=0x557f35ca4fc0 "md.c", lineNumber=471) at assert.c:66
#6  0x0000557f34a3b2bc in mdextend (reln=0x6250000375c8, forknum=MAIN_FORKNUM, blocknum=18, buffer=0x7fabaa800020, 
skipFsync=true) at md.c:471
#7  0x0000557f34a45a6f in smgrextend (reln=0x6250000375c8, forknum=MAIN_FORKNUM, blocknum=18, buffer=0x7fabaa800020, 
skipFsync=true) at smgr.c:501
#8  0x0000557f349139ed in RelationCopyStorageUsingBuffer (srclocator=..., dstlocator=..., forkNum=MAIN_FORKNUM, 
permanent=true) at bufmgr.c:4386

The buffer (buf) declared as follows:
static void
RelationCopyStorageUsingBuffer(RelFileLocator srclocator,
                                RelFileLocator dstlocator,
                                ForkNumber forkNum, bool permanent)
{
...
     PGIOAlignedBlock buf;
...

But as we can see, the buffer address is really not 4k-aligned, and that
offset 0x20 added in run-time only when the server started with
detect_stack_use_after_return=1.
So it looks like the asan feature detect_stack_use_after_return implemented
in gcc allows itself to add some data on stack, that breaks our alignment
expectations. With all three such Asserts in md.c removed,
`make check-world` passes for me.

> One thing that's been holding me back on trying to do something around this is
> the basically non-existing documentation around all of this. I haven't even
> found documentation referencing the fact that there are headers like
> sanitizer/asan_interface.h, you just have to figure that out yourself. Compare
> that to something like valgrind, which has documented this at least somewhat.

Yes, so maybe it's reasonable to support only basic/common features (such
as detect_leaks), leaving advanced ones for ad-hoc usage till they prove
their worthiness.

Best regards,
Alexander

[1] https://www.postgresql.org/message-id/flat/CWTLB2WWVJJ2.2YV6ERNOL1WVF%40neon.tech
[2] https://www.postgresql.org/message-id/591971ce-25c1-90f3-0526-5f54e3ebb32e%40gmail.com



Re: Improving asan/ubsan support

От
"Tristan Partin"
Дата:
On Fri Dec 1, 2023 at 3:00 AM CST, Alexander Lakhin wrote:
> [ this thread separated from [1] as the discussion focus shifted ]
>
> H Andres,
>
> 29.11.2023 22:39, Andres Freund wrote:
> >> I use the following:
> >> ASAN_OPTIONS=detect_leaks=0:abort_on_error=1:print_stacktrace=1:\
> >> disable_coredump=0:strict_string_checks=1:check_initialization_order=1:\
> >> strict_init_order=1:detect_stack_use_after_return=0
> > I wonder if we should add some of these options by default ourselves. We could
> > e.g. add something like the __ubsan_default_options() in
> > src/backend/main/main.c to src/port/... instead, and return a combination of
> > "our" options (like detect_leaks=0) and the ones from the environment.
>
> I think that such explicit expression of the project policy regarding
> sanitizer checks is for good, but I see some obstacles on this way.
>
> First, I'm not sure what to do with new useful options/maybe new option
> values, that will appear in sanitizers eventually. Should the only options,
> that are supported by all sanitizers' versions, be specified, or we may
> expect that unsupported options/values would be ignored by old versions?
>
> Second, what to do with other binaries, that need detect_leaks=0, for
> example, that same ecpg?
>
> > ISTM that, if it actually works as I theorize it should, using
> > __attribute__((no_sanitize("address"))) would be the easiest approach
> > here. Something like
> >
> > #if defined(__has_feature) && __has_feature(address_sanitizer)
> > #define pg_attribute_no_asan __attribute__((no_sanitize("address")))
> > #else
> > #define pg_attribute_no_asan
> > #endif
> >
> > or such should work.
>
> I've tried adding:
>   bool
> +__attribute__((no_sanitize("address")))
>   stack_is_too_deep(void)
>
> and it does work got me with clang 15, 18: `make check-world` passes with
> ASAN_OPTIONS=detect_leaks=0:abort_on_error=1:print_stacktrace=1:\
> disable_coredump=0:strict_string_checks=1:check_initialization_order=1:\
> strict_init_order=1:detect_stack_use_after_return=1
> UBSAN_OPTIONS=abort_on_error=1:print_stacktrace=1
>
> (with a fix for pg_bsd_indent applied [2])
>
> But with gcc 11, 12, 13 I get an assertion failure during `make check`:
> #4  0x00007fabadcd67f3 in __GI_abort () at ./stdlib/abort.c:79
> #5  0x0000557f35260382 in ExceptionalCondition (conditionName=0x557f35ca51a0 "(uintptr_t) buffer ==
> TYPEALIGN(PG_IO_ALIGN_SIZE, buffer)", fileName=0x557f35ca4fc0 "md.c", lineNumber=471) at assert.c:66
> #6  0x0000557f34a3b2bc in mdextend (reln=0x6250000375c8, forknum=MAIN_FORKNUM, blocknum=18, buffer=0x7fabaa800020,
> skipFsync=true) at md.c:471
> #7  0x0000557f34a45a6f in smgrextend (reln=0x6250000375c8, forknum=MAIN_FORKNUM, blocknum=18, buffer=0x7fabaa800020,
> skipFsync=true) at smgr.c:501
> #8  0x0000557f349139ed in RelationCopyStorageUsingBuffer (srclocator=..., dstlocator=..., forkNum=MAIN_FORKNUM,
> permanent=true) at bufmgr.c:4386
>
> The buffer (buf) declared as follows:
> static void
> RelationCopyStorageUsingBuffer(RelFileLocator srclocator,
>                                 RelFileLocator dstlocator,
>                                 ForkNumber forkNum, bool permanent)
> {
> ...
>      PGIOAlignedBlock buf;
> ...
>
> But as we can see, the buffer address is really not 4k-aligned, and that
> offset 0x20 added in run-time only when the server started with
> detect_stack_use_after_return=1.
> So it looks like the asan feature detect_stack_use_after_return implemented
> in gcc allows itself to add some data on stack, that breaks our alignment
> expectations. With all three such Asserts in md.c removed,
> `make check-world` passes for me.

Decided to do some digging into this, and Google actually documents[0]
how it works. After reading the algorithm, it is obvious why this fails.
What happens if you throw an __attribute__((no_sanitize("address")) on
the function? I assume the Asserts would then pass. The commit[1] which
added pg_attribute_aligned() provides insight as to why the Asserts
exist.

> /* If this build supports direct I/O, the buffer must be I/O aligned. */

Disabling instrumentation in functions which use this specific type when
the build supports direct IO seems like the best solution.

> > One thing that's been holding me back on trying to do something around this is
> > the basically non-existing documentation around all of this. I haven't even
> > found documentation referencing the fact that there are headers like
> > sanitizer/asan_interface.h, you just have to figure that out yourself. Compare
> > that to something like valgrind, which has documented this at least somewhat.
>
> Yes, so maybe it's reasonable to support only basic/common features (such
> as detect_leaks), leaving advanced ones for ad-hoc usage till they prove
> their worthiness.

Possibly, but I think I would rather see upstream support running with
all features with instrumentation turned off in various sections of
code. Even some assistance from AddressSanitizer is better than none.
Here[1][2] are all the AddressSanitizer flags for those curious.

> Best regards,
> Alexander
>
> [1] https://www.postgresql.org/message-id/flat/CWTLB2WWVJJ2.2YV6ERNOL1WVF%40neon.tech
> [2] https://www.postgresql.org/message-id/591971ce-25c1-90f3-0526-5f54e3ebb32e%40gmail.com

I personally would like to see Postgres have support for
AddressSanitizer. I think it already supports UndefinedBehaviorSanitizer
if I am remembering the buildfarm properly. AddressSanitizer has been so
helpful in past experiences writing C.

[0]: https://github.com/google/sanitizers/wiki/AddressSanitizerUseAfterReturn#algorithm
[1]: https://github.com/postgres/postgres/commit/faeedbcefd40bfdf314e048c425b6d9208896d90
[2]: https://github.com/google/sanitizers/wiki/AddressSanitizerFlags
[3]: https://github.com/google/sanitizers/wiki/SanitizerCommonFlags

--
Tristan Partin
Neon (https://neon.tech)



Re: Improving asan/ubsan support

От
Alexander Lakhin
Дата:
Hello Tristan,

02.12.2023 00:48, Tristan Partin wrote:
>
>> So it looks like the asan feature detect_stack_use_after_return implemented
>> in gcc allows itself to add some data on stack, that breaks our alignment
>> expectations. With all three such Asserts in md.c removed,
>> `make check-world` passes for me.
>
> Decided to do some digging into this, and Google actually documents[0] how it works. After reading the algorithm, it

> is obvious why this fails. What happens if you throw an __attribute__((no_sanitize("address")) on the function? I 
> assume the Asserts would then pass. The commit[1] which added pg_attribute_aligned() provides insight as to why the 
> Asserts exist.

Thank you for spending your time on this!

Yes, I understand what those Asserts were added for, I removed them just
to check what else is on the way.
And I can confirm that marking that function with the no_sanitize attribute
fixes that exact failure. Then the same attribute has to be added to
_hash_alloc_buckets(), to prevent:
TRAP: failed Assert("(uintptr_t) buffer == TYPEALIGN(PG_IO_ALIGN_SIZE, buffer)"), File: "md.c", Line: 471, PID:
1766976

#5  0x00005594a6a0f0d0 in ExceptionalCondition (conditionName=0x5594a7454a60 "(uintptr_t) buffer == 
TYPEALIGN(PG_IO_ALIGN_SIZE, buffer)", fileName=0x5594a7454880 "md.c", lineNumber=471) at assert.c:66
#6  0x00005594a61ce133 in mdextend (reln=0x625000037e48, forknum=MAIN_FORKNUM, blocknum=9, buffer=0x7fc3b3947020, 
skipFsync=false) at md.c:471
#7  0x00005594a61d89ab in smgrextend (reln=0x625000037e48, forknum=MAIN_FORKNUM, blocknum=9, buffer=0x7fc3b3947020, 
skipFsync=false) at smgr.c:501
#8  0x00005594a4a0c43d in _hash_alloc_buckets (rel=0x7fc3a89714f8, firstblock=6, nblocks=4) at hashpage.c:1033

And to RelationCopyStorage(), to prevent:
TRAP: failed Assert("(uintptr_t) buffer == TYPEALIGN(PG_IO_ALIGN_SIZE, buffer)"), File: "md.c", Line: 752, PID:
1787855

#5  0x000056081d5688bc in ExceptionalCondition (conditionName=0x56081dfaea40 "(uintptr_t) buffer == 
TYPEALIGN(PG_IO_ALIGN_SIZE, buffer)", fileName=0x56081dfae860 "md.c", lineNumber=752) at assert.c:66
#6  0x000056081cd29415 in mdread (reln=0x629000043158, forknum=MAIN_FORKNUM, blocknum=0, buffer=0x7fe480633020) at
md.c:752
#7  0x000056081cd32cb3 in smgrread (reln=0x629000043158, forknum=MAIN_FORKNUM, blocknum=0, buffer=0x7fe480633020) at 
smgr.c:565
#8  0x000056081b9ed5f2 in RelationCopyStorage (src=0x629000043158, dst=0x629000041248, forkNum=MAIN_FORKNUM, 
relpersistence=112 'p') at storage.c:487

Probably, it has to be added for all the functions where PGIOAlignedBlock
located on stack...

But I still wonder, how it works with clang, why that extra attribute is
not required?
In other words, such implementation specifics discourage me...

>
> Possibly, but I think I would rather see upstream support running with all features with instrumentation turned off
in
 
> various sections of code. Even some assistance from AddressSanitizer is better than none. Here[1][2] are all the 
> AddressSanitizer flags for those curious.

Yeah, and you might also need to specify extra flags to successfully run
postgres with newer sanitizers' versions. Say, for clang-18 you need to
specify -fno-sanitize=function (which is not recognized by gcc 13.2), to
avoid errors like this:
running bootstrap script ... dynahash.c:1120:4: runtime error: call to function strlcpy through pointer to incorrect 
function type 'void *(*)(void *, const void *, unsigned long)'
.../src/port/strlcpy.c:46: note: strlcpy defined here
     #0 0x556af5e0b0a9 in hash_search_with_hash_value .../src/backend/utils/hash/dynahash.c:1120:4
     #1 0x556af5e08f4f in hash_search .../src/backend/utils/hash/dynahash.c:958:9

> I personally would like to see Postgres have support for  AddressSanitizer. I think it already supports 
> UndefinedBehaviorSanitizer if I am remembering the buildfarm properly. AddressSanitizer has been so helpful in past 
> experiences writing C.

Me too. I find it very valuable for my personal usage but I'm afraid it's
still not very stable/mature.
One more example. Just adding -fsanitize=undefined for gcc 12, 13 (I tried
12.1, 13.0, 13.2) produces new warnings like:
In function 'PageGetItemId',
     inlined from 'heap_xlog_update' at heapam.c:9569:9:
../../../../src/include/storage/bufpage.h:243:16: warning: array subscript -1 is below array bounds of 'ItemIdData[]' 
[-Warray-bounds=]
   243 |         return &((PageHeader) page)->pd_linp[offsetNumber - 1];
       | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

But I don't get such warnings when I use gcc 11.3 (though it generates
other ones) or clang (15, 18). They also aren't produced with -O0, -O1...
Maybe it's another gcc bug, I'm not sure how to deal with it.
(I can research this issue, if it makes any sense.)

So I would say that cost of providing/maintaining full support for asan
(hwasan), ubsan is not near zero, unfortunately. I would estimate it to
10-20 discussions/commits on start/5-10 per year later (not including fixes
for bugs that would be found). If it's affordable for the project, I'd like
to have such support out-of-the-box.

Best regards,
Alexander