Valgrind failures in Apply Launcher's bgworker_quickdie() exit

Поиск
Список
Период
Сортировка
От Thomas Munro
Тема Valgrind failures in Apply Launcher's bgworker_quickdie() exit
Дата
Msg-id CAEepm=3C+K-1-AJyzNV_76ZEcku3wN-3UQ6vzdn37PNK_HFOQw@mail.gmail.com
обсуждение исходный текст
Ответы Re: Valgrind failures in Apply Launcher's bgworker_quickdie() exit  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
Hello,

Since libcrypto.so is implicated, Andres asked me off-list if my
changes to random number state initialisation might be linked to
skink's failures beginning 12 or 15 days ago.  It appears not, as it
was green for several runs after that commit.  Looking at the report:

==2802== VALGRINDERROR-BEGIN
==2802== Invalid read of size 8
==2802==    at 0x4DED5A5: check_free (dlerror.c:188)
==2802==    by 0x4DEDAB1: free_key_mem (dlerror.c:221)
==2802==    by 0x4DEDAB1: __dlerror_main_freeres (dlerror.c:239)
==2802==    by 0x55DCF81: __libc_freeres (in /lib/x86_64-linux-gnu/libc-2.28.so)
==2802==    by 0x482D19E: _vgnU_freeres (vg_preloaded.c:77)
==2802==    by 0x478AD3: bgworker_quickdie (bgworker.c:661)
==2802==    by 0x48626AF: ??? (in /lib/x86_64-linux-gnu/libpthread-2.28.so)
==2802==    by 0x556DB76: epoll_wait (epoll_wait.c:30)
==2802==    by 0x4E25B9: WaitEventSetWaitBlock (latch.c:1078)
==2802==    by 0x4E25B9: WaitEventSetWait (latch.c:1030)
==2802==    by 0x4E28C1: WaitLatchOrSocket (latch.c:407)
==2802==    by 0x4E29A6: WaitLatch (latch.c:347)
==2802==    by 0x49E03E: ApplyLauncherMain (launcher.c:1062)
==2802==    by 0x479831: StartBackgroundWorker (bgworker.c:834)
==2802==  Address 0x7fd7e28 is 12 bytes after a block of size 12 alloc'd
==2802==    at 0x483577F: malloc (vg_replace_malloc.c:299)
==2802==    by 0x4C3BD38: CRYPTO_zalloc (in
/usr/lib/x86_64-linux-gnu/libcrypto.so.1.1)
==2802==    by 0x4C37F8D: ??? (in /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1)
==2802==    by 0x4C615B9: RAND_DRBG_get0_public (in
/usr/lib/x86_64-linux-gnu/libcrypto.so.1.1)
==2802==    by 0x4C615EF: ??? (in /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1)
==2802==    by 0x675D75: pg_strong_random (pg_strong_random.c:135)
==2802==    by 0x4848EB: RandomCancelKey (postmaster.c:5251)
==2802==    by 0x484909: assign_backendlist_entry (postmaster.c:5822)
==2802==    by 0x4873BA: do_start_bgworker (postmaster.c:5692)
==2802==    by 0x487701: maybe_start_bgworkers (postmaster.c:5955)
==2802==    by 0x4878C2: reaper (postmaster.c:2940)
==2802==    by 0x48626AF: ??? (in /lib/x86_64-linux-gnu/libpthread-2.28.so)
==2802==
==2802== VALGRINDERROR-END

The function __libc_freeres is a special glibc entry point provided
for leak checkers to call explicitly if they want glibc to clean up
after itself (normally it doesn't bother).  The specific thing being
cleaned up here is a piece of thread local storage that belongs to the
dynamic linker support code:

https://github.com/lattera/glibc/blob/master/dlfcn/dlerror.c#L228

Since we don't see strcmp() or free() at the top of the stack (and
assuming they aren't inlined), I think the line numbers must line up
with current glibc HEAD as of today, and it must be failing on
accessing rec->errstring at line 188, meaning that rec (the value
stored as a thread specific key) is a bad pointer.  That's quite
strange and I don't have an explanation; if libcrypto overran its
buffer, for example, that would perhaps trash rec->errstring but we'd
still be able to read the pointer itself.  So I wonder if libcrypto.so
is a red herring here.

It's Debian unstable, which could be a factor.  Bugs in glibc?
That's 2.28, out for 3 months now, but then why only in Apply
Launcher? Did we trash 'key', or the thread specific pointer table, or
is my assessment above wrong, and somehow it's really errstring that
is a bad pointer (which would allow for a more mundane explanation,
like someone trashed a bit of heap memory by overrunning a buffer)?

-- 
Thomas Munro
http://www.enterprisedb.com


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andres Freund
Дата:
Сообщение: Computing the conflict xid for index page-level-vacuum on primary
Следующее
От: Tom Lane
Дата:
Сообщение: Re: Valgrind failures in Apply Launcher's bgworker_quickdie() exit