Re: BUG #16696: Backend crash in llvmjit
| От | Dmitry Marakasov |
|---|---|
| Тема | Re: BUG #16696: Backend crash in llvmjit |
| Дата | |
| Msg-id | 20201104235054.GB30304@hades.panopticon обсуждение исходный текст |
| Ответ на | Re: BUG #16696: Backend crash in llvmjit (Dmitry Marakasov <amdmi3@amdmi3.ru>) |
| Список | pgsql-bugs |
* Dmitry Marakasov (amdmi3@amdmi3.ru) wrote:
> > > > Environment details:
> > > > - FreeBSD 12.1 amd64
> > > > - PostgreSQL 13.0 (built from FreeBSD ports)
> > > > - llvm-10.0.1 (build from FreeBSD ports)
> > >
> > > My bad, it's actually llvm-9.0.1. Multiple llvm versions are installed on
> > > the system, and PostgreSQL uses llvm9:
> > >
> > > ldd /usr/local/lib/postgresql/llvmjit.so | grep LLVM
> > > libLLVM-9.so => /usr/local/llvm90/lib/libLLVM-9.so (0x800e00000)
> >
> > Could you try generating a backtrace after turning jit_debugging_support on? That might give a bit more
information.
> >
> > I'll check once I'm home whether I can reproduce in my environment.
>
> I did some digging. First of all, I've discovered that the problem
> goes away if llvm bitcode optimization is disabled (by commenting out
> llvm_optimize_module call).
>
> I've dumped the opcode and tried compiling it back to match disassembly
> of the failing function in gdb disassembly. It didn't match perfectly,
> but this place looked similar:
>
> # %bb.84: # %op.32.inputcall
> movq %rax, 5267(%r13)
> movb %bl, 5275(%r13)
> movb $0, 5263(%r13)
> movzbl (%rax), %esi
> movl __mb_sb_limit(%rip), %edi
> movq _ThreadRuneLocale@GOTTPOFF(%rip), %rcx
> movq %fs:0, %rdx
> movq (%rdx,%rcx), %rcx
> cmpl %esi, %edi
> movq %rax, -96(%rbp) # 8-byte Spill
> movl %edi, -72(%rbp) # 4-byte Spill
> movq %rcx, -64(%rbp) # 8-byte Spill
> jle .LBB1_85
>
> Here's my hypothesis:
>
> The problem happens when boolin() function is inlined by LLVM.
> The named function calls isspace() internally, which on FreeBSD is
> locale-specific and involves caching some locale parameters in
> thread-local variable defined as
>
> extern _Thread_local const _RuneLocale *_ThreadRuneLocale;
>
> The execution crashes on trying to access the named thread-local varible,
> probably because something related to TLS is not set up properly in/for
> LLVM.
>
> I've confirmed this hypothesis by disabling isspace() calls in boolin()
> which has also fixed the problem.
Long story short, I was able to mitigate the crash with the following patch:
--- disable-inlining-tls-using-functions.patch begins here ---
commit f703544edc406293e39b7a59a245e798d18f458e
Author: Dmitry Marakasov <amdmi3@amdmi3.ru>
Date: Thu Nov 5 02:56:00 2020 +0300
Do not inline functions accessing TLS in LLVM JIT
diff --git src/backend/jit/llvm/llvmjit_inline.cpp src/backend/jit/llvm/llvmjit_inline.cpp
index 2617a46..a063edb 100644
--- src/backend/jit/llvm/llvmjit_inline.cpp
+++ src/backend/jit/llvm/llvmjit_inline.cpp
@@ -608,6 +608,16 @@ function_inlinable(llvm::Function &F,
if (rv->materialize())
elog(FATAL, "failed to materialize metadata");
+ /*
+ * Don't inline functions with thread-local variables until
+ * related crashes are investigated (see BUG #16696)
+ */
+ if (rv->isThreadLocal()) {
+ ilog(DEBUG1, "cannot inline %s due to thread-local variable %s",
+ F.getName().data(), rv->getName().data());
+ return false;
+ }
+
/*
* Never want to inline externally visible vars, cheap enough to
* reference.
--- disable-inlining-tls-using-functions.patch ends here ---
I have no knowledge of LLVM to investigate this further, but the guess
is that something TLS related is not initialized properly.
--
Dmitry Marakasov . 55B5 0596 FF1E 8D84 5F56 9510 D35A 80DD F9D2 F77D
amdmi3@amdmi3.ru ..: https://github.com/AMDmi3
В списке pgsql-bugs по дате отправления: