Обсуждение: Improving asan/ubsan support
[ this thread separated from [1] as the discussion focus shifted ] H Andres, 29.11.2023 22:39, Andres Freund wrote: >> I use the following: >> ASAN_OPTIONS=detect_leaks=0:abort_on_error=1:print_stacktrace=1:\ >> disable_coredump=0:strict_string_checks=1:check_initialization_order=1:\ >> strict_init_order=1:detect_stack_use_after_return=0 > I wonder if we should add some of these options by default ourselves. We could > e.g. add something like the __ubsan_default_options() in > src/backend/main/main.c to src/port/... instead, and return a combination of > "our" options (like detect_leaks=0) and the ones from the environment. I think that such explicit expression of the project policy regarding sanitizer checks is for good, but I see some obstacles on this way. First, I'm not sure what to do with new useful options/maybe new option values, that will appear in sanitizers eventually. Should the only options, that are supported by all sanitizers' versions, be specified, or we may expect that unsupported options/values would be ignored by old versions? Second, what to do with other binaries, that need detect_leaks=0, for example, that same ecpg? > ISTM that, if it actually works as I theorize it should, using > __attribute__((no_sanitize("address"))) would be the easiest approach > here. Something like > > #if defined(__has_feature) && __has_feature(address_sanitizer) > #define pg_attribute_no_asan __attribute__((no_sanitize("address"))) > #else > #define pg_attribute_no_asan > #endif > > or such should work. I've tried adding: bool +__attribute__((no_sanitize("address"))) stack_is_too_deep(void) and it does work got me with clang 15, 18: `make check-world` passes with ASAN_OPTIONS=detect_leaks=0:abort_on_error=1:print_stacktrace=1:\ disable_coredump=0:strict_string_checks=1:check_initialization_order=1:\ strict_init_order=1:detect_stack_use_after_return=1 UBSAN_OPTIONS=abort_on_error=1:print_stacktrace=1 (with a fix for pg_bsd_indent applied [2]) But with gcc 11, 12, 13 I get an assertion failure during `make check`: #4 0x00007fabadcd67f3 in __GI_abort () at ./stdlib/abort.c:79 #5 0x0000557f35260382 in ExceptionalCondition (conditionName=0x557f35ca51a0 "(uintptr_t) buffer == TYPEALIGN(PG_IO_ALIGN_SIZE, buffer)", fileName=0x557f35ca4fc0 "md.c", lineNumber=471) at assert.c:66 #6 0x0000557f34a3b2bc in mdextend (reln=0x6250000375c8, forknum=MAIN_FORKNUM, blocknum=18, buffer=0x7fabaa800020, skipFsync=true) at md.c:471 #7 0x0000557f34a45a6f in smgrextend (reln=0x6250000375c8, forknum=MAIN_FORKNUM, blocknum=18, buffer=0x7fabaa800020, skipFsync=true) at smgr.c:501 #8 0x0000557f349139ed in RelationCopyStorageUsingBuffer (srclocator=..., dstlocator=..., forkNum=MAIN_FORKNUM, permanent=true) at bufmgr.c:4386 The buffer (buf) declared as follows: static void RelationCopyStorageUsingBuffer(RelFileLocator srclocator, RelFileLocator dstlocator, ForkNumber forkNum, bool permanent) { ... PGIOAlignedBlock buf; ... But as we can see, the buffer address is really not 4k-aligned, and that offset 0x20 added in run-time only when the server started with detect_stack_use_after_return=1. So it looks like the asan feature detect_stack_use_after_return implemented in gcc allows itself to add some data on stack, that breaks our alignment expectations. With all three such Asserts in md.c removed, `make check-world` passes for me. > One thing that's been holding me back on trying to do something around this is > the basically non-existing documentation around all of this. I haven't even > found documentation referencing the fact that there are headers like > sanitizer/asan_interface.h, you just have to figure that out yourself. Compare > that to something like valgrind, which has documented this at least somewhat. Yes, so maybe it's reasonable to support only basic/common features (such as detect_leaks), leaving advanced ones for ad-hoc usage till they prove their worthiness. Best regards, Alexander [1] https://www.postgresql.org/message-id/flat/CWTLB2WWVJJ2.2YV6ERNOL1WVF%40neon.tech [2] https://www.postgresql.org/message-id/591971ce-25c1-90f3-0526-5f54e3ebb32e%40gmail.com
On Fri Dec 1, 2023 at 3:00 AM CST, Alexander Lakhin wrote: > [ this thread separated from [1] as the discussion focus shifted ] > > H Andres, > > 29.11.2023 22:39, Andres Freund wrote: > >> I use the following: > >> ASAN_OPTIONS=detect_leaks=0:abort_on_error=1:print_stacktrace=1:\ > >> disable_coredump=0:strict_string_checks=1:check_initialization_order=1:\ > >> strict_init_order=1:detect_stack_use_after_return=0 > > I wonder if we should add some of these options by default ourselves. We could > > e.g. add something like the __ubsan_default_options() in > > src/backend/main/main.c to src/port/... instead, and return a combination of > > "our" options (like detect_leaks=0) and the ones from the environment. > > I think that such explicit expression of the project policy regarding > sanitizer checks is for good, but I see some obstacles on this way. > > First, I'm not sure what to do with new useful options/maybe new option > values, that will appear in sanitizers eventually. Should the only options, > that are supported by all sanitizers' versions, be specified, or we may > expect that unsupported options/values would be ignored by old versions? > > Second, what to do with other binaries, that need detect_leaks=0, for > example, that same ecpg? > > > ISTM that, if it actually works as I theorize it should, using > > __attribute__((no_sanitize("address"))) would be the easiest approach > > here. Something like > > > > #if defined(__has_feature) && __has_feature(address_sanitizer) > > #define pg_attribute_no_asan __attribute__((no_sanitize("address"))) > > #else > > #define pg_attribute_no_asan > > #endif > > > > or such should work. > > I've tried adding: > bool > +__attribute__((no_sanitize("address"))) > stack_is_too_deep(void) > > and it does work got me with clang 15, 18: `make check-world` passes with > ASAN_OPTIONS=detect_leaks=0:abort_on_error=1:print_stacktrace=1:\ > disable_coredump=0:strict_string_checks=1:check_initialization_order=1:\ > strict_init_order=1:detect_stack_use_after_return=1 > UBSAN_OPTIONS=abort_on_error=1:print_stacktrace=1 > > (with a fix for pg_bsd_indent applied [2]) > > But with gcc 11, 12, 13 I get an assertion failure during `make check`: > #4 0x00007fabadcd67f3 in __GI_abort () at ./stdlib/abort.c:79 > #5 0x0000557f35260382 in ExceptionalCondition (conditionName=0x557f35ca51a0 "(uintptr_t) buffer == > TYPEALIGN(PG_IO_ALIGN_SIZE, buffer)", fileName=0x557f35ca4fc0 "md.c", lineNumber=471) at assert.c:66 > #6 0x0000557f34a3b2bc in mdextend (reln=0x6250000375c8, forknum=MAIN_FORKNUM, blocknum=18, buffer=0x7fabaa800020, > skipFsync=true) at md.c:471 > #7 0x0000557f34a45a6f in smgrextend (reln=0x6250000375c8, forknum=MAIN_FORKNUM, blocknum=18, buffer=0x7fabaa800020, > skipFsync=true) at smgr.c:501 > #8 0x0000557f349139ed in RelationCopyStorageUsingBuffer (srclocator=..., dstlocator=..., forkNum=MAIN_FORKNUM, > permanent=true) at bufmgr.c:4386 > > The buffer (buf) declared as follows: > static void > RelationCopyStorageUsingBuffer(RelFileLocator srclocator, > RelFileLocator dstlocator, > ForkNumber forkNum, bool permanent) > { > ... > PGIOAlignedBlock buf; > ... > > But as we can see, the buffer address is really not 4k-aligned, and that > offset 0x20 added in run-time only when the server started with > detect_stack_use_after_return=1. > So it looks like the asan feature detect_stack_use_after_return implemented > in gcc allows itself to add some data on stack, that breaks our alignment > expectations. With all three such Asserts in md.c removed, > `make check-world` passes for me. Decided to do some digging into this, and Google actually documents[0] how it works. After reading the algorithm, it is obvious why this fails. What happens if you throw an __attribute__((no_sanitize("address")) on the function? I assume the Asserts would then pass. The commit[1] which added pg_attribute_aligned() provides insight as to why the Asserts exist. > /* If this build supports direct I/O, the buffer must be I/O aligned. */ Disabling instrumentation in functions which use this specific type when the build supports direct IO seems like the best solution. > > One thing that's been holding me back on trying to do something around this is > > the basically non-existing documentation around all of this. I haven't even > > found documentation referencing the fact that there are headers like > > sanitizer/asan_interface.h, you just have to figure that out yourself. Compare > > that to something like valgrind, which has documented this at least somewhat. > > Yes, so maybe it's reasonable to support only basic/common features (such > as detect_leaks), leaving advanced ones for ad-hoc usage till they prove > their worthiness. Possibly, but I think I would rather see upstream support running with all features with instrumentation turned off in various sections of code. Even some assistance from AddressSanitizer is better than none. Here[1][2] are all the AddressSanitizer flags for those curious. > Best regards, > Alexander > > [1] https://www.postgresql.org/message-id/flat/CWTLB2WWVJJ2.2YV6ERNOL1WVF%40neon.tech > [2] https://www.postgresql.org/message-id/591971ce-25c1-90f3-0526-5f54e3ebb32e%40gmail.com I personally would like to see Postgres have support for AddressSanitizer. I think it already supports UndefinedBehaviorSanitizer if I am remembering the buildfarm properly. AddressSanitizer has been so helpful in past experiences writing C. [0]: https://github.com/google/sanitizers/wiki/AddressSanitizerUseAfterReturn#algorithm [1]: https://github.com/postgres/postgres/commit/faeedbcefd40bfdf314e048c425b6d9208896d90 [2]: https://github.com/google/sanitizers/wiki/AddressSanitizerFlags [3]: https://github.com/google/sanitizers/wiki/SanitizerCommonFlags -- Tristan Partin Neon (https://neon.tech)
Hello Tristan, 02.12.2023 00:48, Tristan Partin wrote: > >> So it looks like the asan feature detect_stack_use_after_return implemented >> in gcc allows itself to add some data on stack, that breaks our alignment >> expectations. With all three such Asserts in md.c removed, >> `make check-world` passes for me. > > Decided to do some digging into this, and Google actually documents[0] how it works. After reading the algorithm, it > is obvious why this fails. What happens if you throw an __attribute__((no_sanitize("address")) on the function? I > assume the Asserts would then pass. The commit[1] which added pg_attribute_aligned() provides insight as to why the > Asserts exist. Thank you for spending your time on this! Yes, I understand what those Asserts were added for, I removed them just to check what else is on the way. And I can confirm that marking that function with the no_sanitize attribute fixes that exact failure. Then the same attribute has to be added to _hash_alloc_buckets(), to prevent: TRAP: failed Assert("(uintptr_t) buffer == TYPEALIGN(PG_IO_ALIGN_SIZE, buffer)"), File: "md.c", Line: 471, PID: 1766976 #5 0x00005594a6a0f0d0 in ExceptionalCondition (conditionName=0x5594a7454a60 "(uintptr_t) buffer == TYPEALIGN(PG_IO_ALIGN_SIZE, buffer)", fileName=0x5594a7454880 "md.c", lineNumber=471) at assert.c:66 #6 0x00005594a61ce133 in mdextend (reln=0x625000037e48, forknum=MAIN_FORKNUM, blocknum=9, buffer=0x7fc3b3947020, skipFsync=false) at md.c:471 #7 0x00005594a61d89ab in smgrextend (reln=0x625000037e48, forknum=MAIN_FORKNUM, blocknum=9, buffer=0x7fc3b3947020, skipFsync=false) at smgr.c:501 #8 0x00005594a4a0c43d in _hash_alloc_buckets (rel=0x7fc3a89714f8, firstblock=6, nblocks=4) at hashpage.c:1033 And to RelationCopyStorage(), to prevent: TRAP: failed Assert("(uintptr_t) buffer == TYPEALIGN(PG_IO_ALIGN_SIZE, buffer)"), File: "md.c", Line: 752, PID: 1787855 #5 0x000056081d5688bc in ExceptionalCondition (conditionName=0x56081dfaea40 "(uintptr_t) buffer == TYPEALIGN(PG_IO_ALIGN_SIZE, buffer)", fileName=0x56081dfae860 "md.c", lineNumber=752) at assert.c:66 #6 0x000056081cd29415 in mdread (reln=0x629000043158, forknum=MAIN_FORKNUM, blocknum=0, buffer=0x7fe480633020) at md.c:752 #7 0x000056081cd32cb3 in smgrread (reln=0x629000043158, forknum=MAIN_FORKNUM, blocknum=0, buffer=0x7fe480633020) at smgr.c:565 #8 0x000056081b9ed5f2 in RelationCopyStorage (src=0x629000043158, dst=0x629000041248, forkNum=MAIN_FORKNUM, relpersistence=112 'p') at storage.c:487 Probably, it has to be added for all the functions where PGIOAlignedBlock located on stack... But I still wonder, how it works with clang, why that extra attribute is not required? In other words, such implementation specifics discourage me... > > Possibly, but I think I would rather see upstream support running with all features with instrumentation turned off in > various sections of code. Even some assistance from AddressSanitizer is better than none. Here[1][2] are all the > AddressSanitizer flags for those curious. Yeah, and you might also need to specify extra flags to successfully run postgres with newer sanitizers' versions. Say, for clang-18 you need to specify -fno-sanitize=function (which is not recognized by gcc 13.2), to avoid errors like this: running bootstrap script ... dynahash.c:1120:4: runtime error: call to function strlcpy through pointer to incorrect function type 'void *(*)(void *, const void *, unsigned long)' .../src/port/strlcpy.c:46: note: strlcpy defined here #0 0x556af5e0b0a9 in hash_search_with_hash_value .../src/backend/utils/hash/dynahash.c:1120:4 #1 0x556af5e08f4f in hash_search .../src/backend/utils/hash/dynahash.c:958:9 > I personally would like to see Postgres have support for AddressSanitizer. I think it already supports > UndefinedBehaviorSanitizer if I am remembering the buildfarm properly. AddressSanitizer has been so helpful in past > experiences writing C. Me too. I find it very valuable for my personal usage but I'm afraid it's still not very stable/mature. One more example. Just adding -fsanitize=undefined for gcc 12, 13 (I tried 12.1, 13.0, 13.2) produces new warnings like: In function 'PageGetItemId', inlined from 'heap_xlog_update' at heapam.c:9569:9: ../../../../src/include/storage/bufpage.h:243:16: warning: array subscript -1 is below array bounds of 'ItemIdData[]' [-Warray-bounds=] 243 | return &((PageHeader) page)->pd_linp[offsetNumber - 1]; | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ But I don't get such warnings when I use gcc 11.3 (though it generates other ones) or clang (15, 18). They also aren't produced with -O0, -O1... Maybe it's another gcc bug, I'm not sure how to deal with it. (I can research this issue, if it makes any sense.) So I would say that cost of providing/maintaining full support for asan (hwasan), ubsan is not near zero, unfortunately. I would estimate it to 10-20 discussions/commits on start/5-10 per year later (not including fixes for bugs that would be found). If it's affordable for the project, I'd like to have such support out-of-the-box. Best regards, Alexander