Обсуждение: 012_subtransactions.pl vs clang -fsanitize=undefined

Поиск
Список
Период
Сортировка

012_subtransactions.pl vs clang -fsanitize=undefined

От
Justin Pryzby
Дата:
I found that -fsanitize causes the test to fail, going back to REL_10_STABLE,
for any clang in:

1:11.1.0-6
1:12.0.1-19ubuntu3
1:13.0.1-2ubuntu2
1:14.0.0-1ubuntu1

| time ./configure --enable-cassert --enable-debug --enable-tap-tests --with-CC=clang-13 CFLAGS='-fsanitize=undefined'
| time { make -j4 clean; make -j4; } >/dev/null
| time PROVE_TESTS=t/012_subtransactions.pl make -C ./src/test/recovery check
| 
| t/012_subtransactions.pl .. 2/? 
| #   Failed test 'Visible'
| #   at t/012_subtransactions.pl line 111.
| #          got: '-1'
| #     expected: '8128'
| ...
| # Looks like you failed 6 tests of 12.

I haven't found any combination of options which cause it to fail differently,
so I'm not even sure if the problem is in postgres, the test case, clang or
libubsan.  Note that optimization seems to avoid the problem, which is why
"kestrel" shows no issue.
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=kestrel&dt=2022-07-23%2022%3A17%3A48

-- 
Justin



Re: 012_subtransactions.pl vs clang -fsanitize=undefined

От
Thomas Munro
Дата:
On Mon, Jul 25, 2022 at 8:59 AM Justin Pryzby <pryzby@telsasoft.com> wrote:
> I found that -fsanitize causes the test to fail, going back to REL_10_STABLE,
> for any clang in:
>
> 1:11.1.0-6
> 1:12.0.1-19ubuntu3
> 1:13.0.1-2ubuntu2
> 1:14.0.0-1ubuntu1
>
> | time ./configure --enable-cassert --enable-debug --enable-tap-tests --with-CC=clang-13
CFLAGS='-fsanitize=undefined'
> | time { make -j4 clean; make -j4; } >/dev/null
> | time PROVE_TESTS=t/012_subtransactions.pl make -C ./src/test/recovery check
> |
> | t/012_subtransactions.pl .. 2/?
> | #   Failed test 'Visible'
> | #   at t/012_subtransactions.pl line 111.
> | #          got: '-1'
> | #     expected: '8128'
> | ...
> | # Looks like you failed 6 tests of 12.
>
> I haven't found any combination of options which cause it to fail differently,
> so I'm not even sure if the problem is in postgres, the test case, clang or
> libubsan.  Note that optimization seems to avoid the problem, which is why
> "kestrel" shows no issue.
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=kestrel&dt=2022-07-23%2022%3A17%3A48

Yeah I've seen this too... it'd be good to figure out how to fix it:

https://www.postgresql.org/message-id/CA%2BhUKGLDA-GuQKRvDF3abHadDrrYZ33N9e4DEOGwKH3JqdYSCQ%40mail.gmail.com



Re: 012_subtransactions.pl vs clang -fsanitize=undefined

От
Tom Lane
Дата:
Thomas Munro <thomas.munro@gmail.com> writes:
> On Mon, Jul 25, 2022 at 8:59 AM Justin Pryzby <pryzby@telsasoft.com> wrote:
>> I found that -fsanitize causes the test to fail, going back to REL_10_STABLE,
>> for any clang in:

> Yeah I've seen this too... it'd be good to figure out how to fix it:
> https://www.postgresql.org/message-id/CA%2BhUKGLDA-GuQKRvDF3abHadDrrYZ33N9e4DEOGwKH3JqdYSCQ%40mail.gmail.com

Yeah, reproduces here too with RHEL8's clang 13.0.1.  I also see
that the failures are due to "stack depth exceeded" errors from that
recursive hs_subxids() function.  As best I can tell, the stack depth
failure is entirely honest:

(gdb) p stack_base_ptr
$1 = 0x7ffd92032100 "\360[\\\006"
(gdb) p $sp
$2 = (void *) 0x7ffd91e305a0
(gdb) p 0x7ffd92032100 - 0x7ffd91e305a0
$3 = 2104160

I can get at most 82 recursion levels without failure.
With my normal build it can get to 678 levels before dying.

I think what's happening is just that this build configuration
eats stack extravagantly.  Maybe it keeps all locals on the stack
instead of in registers?  I'm too lazy to check out the assembly
code.

I thought for a moment of rewriting hs_subxids() to iterate instead
of recurse, but it doesn't look like that's tremendously feasible in
plpgsql --- the only way to make nested subtransactions is to recurse.
It could probably be done from the client by issuing a series of
SAVEPOINT commands, but not nearly as elegantly.

            regards, tom lane



Re: 012_subtransactions.pl vs clang -fsanitize=undefined

От
Tom Lane
Дата:
I wrote:
> I think what's happening is just that this build configuration
> eats stack extravagantly.

That's definitely it, but I don't entirely see why.  Here are a
couple of major offenders though:

(gdb) x/8i ExecInterpExpr
   0x11a5530 <ExecInterpExpr>:  push   %rbp
   0x11a5531 <ExecInterpExpr+1>:        mov    %rsp,%rbp
   0x11a5534 <ExecInterpExpr+4>:        sub    $0x2f40,%rsp
   0x11a553b <ExecInterpExpr+11>:       mov    %rdi,-0x10(%rbp)
   0x11a553f <ExecInterpExpr+15>:       mov    %rsi,-0x18(%rbp)
   0x11a5543 <ExecInterpExpr+19>:       mov    %rdx,-0x20(%rbp)
   0x11a5547 <ExecInterpExpr+23>:       jmpq   0x11a554c <ExecInterpExpr+28>
   0x11a554c <ExecInterpExpr+28>:       cmpq   $0x0,-0x10(%rbp)

(gdb) p 0x2f40
$51 = 12096

(gdb) x/8i ExecInitExprRec
   0x11672e0 <ExecInitExprRec>: push   %rbp
   0x11672e1 <ExecInitExprRec+1>:       mov    %rsp,%rbp
   0x11672e4 <ExecInitExprRec+4>:       sub    $0x3c80,%rsp
   0x11672eb <ExecInitExprRec+11>:      mov    %rdi,-0x8(%rbp)
   0x11672ef <ExecInitExprRec+15>:      mov    %rsi,-0x10(%rbp)
   0x11672f3 <ExecInitExprRec+19>:      mov    %rdx,-0x18(%rbp)
   0x11672f7 <ExecInitExprRec+23>:      mov    %rcx,-0x20(%rbp)
   0x11672fb <ExecInitExprRec+27>:      lea    -0x60(%rbp),%rdi

(gdb) p 0x3c80
$52 = 15488

It looks like this build eats about 24K of stack per plpgsql recursion
level, of which ExecInterpExpr alone accounts for half.  Why is that?
It has no large local variables, mostly just ints and pointers.
There are a lot of them, but even if you presume that each one gets
its own dedicated bit of the stack frame, it's hard to arrive at 12K.

I'd almost call this a compiler bug.

            regards, tom lane



Re: 012_subtransactions.pl vs clang -fsanitize=undefined

От
Thomas Munro
Дата:
On Mon, Jul 25, 2022 at 10:18 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I wrote:
> > I think what's happening is just that this build configuration
> > eats stack extravagantly.
>
> That's definitely it, but I don't entirely see why.  Here are a
> couple of major offenders though:

Interesting.  I wonder where we can read about what stuff clang puts
on the stack to implement the undefined behaviour checker (and what
GCC does differently here), but today I will resist the urge to go
looking.

As for workarounds (and as a note for my future self next time I'm
testing with UBSan), this is enough for the test to pass on my dev box
(4MB is not enough):

--- a/src/test/recovery/t/012_subtransactions.pl
+++ b/src/test/recovery/t/012_subtransactions.pl
@@ -16,6 +16,7 @@ $node_primary->append_conf(
        'postgresql.conf', qq(
        max_prepared_transactions = 10
        log_checkpoints = true
+       max_stack_depth = 5MB
 ));

It's also possible to tell it to keep out of certain functions:


https://github.com/llvm/llvm-project/blob/main/clang/docs/UndefinedBehaviorSanitizer.rst#disabling-instrumentation-with-attribute-no-sanitize-undefined