Обсуждение: occasional valgrind reports for handle_sig_alarm on 32-bit ARM

Поиск
Список
Период
Сортировка

occasional valgrind reports for handle_sig_alarm on 32-bit ARM

От
Tomas Vondra
Дата:
Hi,

I've been running a lot of valgrind tests on 32-bit arm recently, and
from time to time I get a failure in handle_sig_alarm like this:

    ==13605== Use of uninitialised value of size 4
    ==13605==    at 0x88DA98: handle_sig_alarm (timeout.c:457)
    ==13605==    by 0xFFFFFFFF: ???
    ==13605==  Uninitialised value was created by a heap allocation
    ==13605==    at 0x8A0374: MemoryContextAllocExtended (mcxt.c:1149)
    ==13605==    by 0x86A187: DynaHashAlloc (dynahash.c:292)
    ==13605==    by 0x86CB07: element_alloc (dynahash.c:1715)
    ==13605==    by 0x86A9E7: hash_create (dynahash.c:611)
    ==13605==    by 0x8A1CE3: EnablePortalManager (portalmem.c:122)
    ==13605==    by 0x8716CF: InitPostgres (postinit.c:806)
    ==13605==    by 0x653F63: PostgresMain (postgres.c:4141)
    ==13605==    by 0x5651CB: BackendRun (postmaster.c:4461)
    ==13605==    by 0x564A43: BackendStartup (postmaster.c:4189)
    ==13605==    by 0x560663: ServerLoop (postmaster.c:1779)
    ==13605==    by 0x55FE27: PostmasterMain (postmaster.c:1463)
    ==13605==    by 0x4107F3: main (main.c:200)
    ==13605==
    {
       <insert_a_suppression_name_here>
       Memcheck:Value4
       fun:handle_sig_alarm
       obj:*
    }

or (somewhat weird)

    ==23734== Use of uninitialised value of size 4
    ==23734==    at 0x88DDC8: handle_sig_alarm (timeout.c:457)
    ==23734==    by 0xFFFFFFFF: ???
    ==23734==  Uninitialised value was created by a stack allocation
    ==23734==    at 0x64CE2C: EndCommand (dest.c:167)
    ==23734==
    {
       <insert_a_suppression_name_here>
       Memcheck:Value4
       fun:handle_sig_alarm
       obj:*
    }

It might be a valgrind issue and/or false positive, but I don't think
I've seen such failures before, so I'm wondering if this might be due to
some recent changes?

It's pretty rare, as it depends on the timing of the signal being just
"right" (I wonder if there's a way to increase the frequency).


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: occasional valgrind reports for handle_sig_alarm on 32-bit ARM

От
Andres Freund
Дата:
Hi,

On 2023-02-18 13:56:38 +0100, Tomas Vondra wrote:
> or (somewhat weird)
> 
>     ==23734== Use of uninitialised value of size 4
>     ==23734==    at 0x88DDC8: handle_sig_alarm (timeout.c:457)
>     ==23734==    by 0xFFFFFFFF: ???
>     ==23734==  Uninitialised value was created by a stack allocation
>     ==23734==    at 0x64CE2C: EndCommand (dest.c:167)
>     ==23734==
>     {
>        <insert_a_suppression_name_here>
>        Memcheck:Value4
>        fun:handle_sig_alarm
>        obj:*
>     }

I'd try using valgrind's --vgdb-error=1, and inspecting the state.

I assume this is without specifying --read-var-info=yes? Might be worth
trying, sometimes the increased detail can be really helpful.


It's certainly interesting that the error happens in timeout.c:457 - currently
that's the end of the function. And dest.c:167 is the entry of EndCommand().

Perhaps there's some confusion around the state of the stack? The fact that it
looks like the function epilogue of handle_sig_alarm() uses an uninitialized
variable created by the function prologue of EndCommand() does seem to suggest
something like that.

It'd be interesting to see the exact instruction triggering the failure +
surroundings.


> It might be a valgrind issue and/or false positive, but I don't think
> I've seen such failures before, so I'm wondering if this might be due to
> some recent changes?

Have you run 32bit arm valgrind before? It'd not surprise me if there are some
32bit arm issues in valgrind, libc, or such.

Greetings,

Andres Freund