Обсуждение: confusing valgrind report about tuplestore+wrapper_handler (?) on 32-bit arm

Поиск
Список
Период
Сортировка

confusing valgrind report about tuplestore+wrapper_handler (?) on 32-bit arm

От
Tomas Vondra
Дата:
Hi,

While running valgrind on 32-bit ARM (rpi5 with debian), I got this
really strange report:


==25520== Use of uninitialised value of size 4
==25520==    at 0x94A550: wrapper_handler (pqsignal.c:108)
==25520==    by 0x4D7826F: ??? (sigrestorer.S:64)
==25520==  Uninitialised value was created by a heap allocation
==25520==    at 0x8FB780: palloc (mcxt.c:1340)
==25520==    by 0x913067: tuplestore_begin_common (tuplestore.c:289)
==25520==    by 0x91310B: tuplestore_begin_heap (tuplestore.c:331)
==25520==    by 0x3EA717: ExecMaterial (nodeMaterial.c:64)
==25520==    by 0x3B2FF7: ExecProcNodeFirst (execProcnode.c:464)
==25520==    by 0x3EF73F: ExecProcNode (executor.h:274)
==25520==    by 0x3F0637: ExecMergeJoin (nodeMergejoin.c:703)
==25520==    by 0x3B2FF7: ExecProcNodeFirst (execProcnode.c:464)
==25520==    by 0x3C47DB: ExecProcNode (executor.h:274)
==25520==    by 0x3C4D4F: fetch_input_tuple (nodeAgg.c:561)
==25520==    by 0x3C8233: agg_retrieve_direct (nodeAgg.c:2364)
==25520==    by 0x3C7E07: ExecAgg (nodeAgg.c:2179)
==25520==    by 0x3B2FF7: ExecProcNodeFirst (execProcnode.c:464)
==25520==    by 0x3A5EC3: ExecProcNode (executor.h:274)
==25520==    by 0x3A8FBF: ExecutePlan (execMain.c:1646)
==25520==    by 0x3A6677: standard_ExecutorRun (execMain.c:363)
==25520==    by 0x3A644B: ExecutorRun (execMain.c:304)
==25520==    by 0x6976D3: PortalRunSelect (pquery.c:924)
==25520==    by 0x6972F7: PortalRun (pquery.c:768)
==25520==    by 0x68FA1F: exec_simple_query (postgres.c:1274)
==25520==
{
   <insert_a_suppression_name_here>
   Memcheck:Value4
   fun:wrapper_handler
   obj:/usr/lib/arm-linux-gnueabihf/libc.so.6
}
**25520** Valgrind detected 1 error(s) during execution of "select
count(*) from
**25520**   (select * from tenk1 x order by x.thousand, x.twothousand,
x.fivethous) x
**25520**   left join
**25520**   (select * from tenk1 y order by y.unique2) y
**25520**   on x.thousand = y.unique2 and x.twothousand = y.hundred and
x.fivethous = y.unique2;"


I'm mostly used to weird valgrind stuff on this platform, but it's
usually about libarmmmem and (possibly) thinking it might access
undefined stuff when calculating checksums etc.

This seems somewhat different, so I wonder if it's something real? But
also, at the same time, it's rather weird, because the report says it's
this bit in pqsignal.c

    (*pqsignal_handlers[postgres_signal_arg]) (postgres_signal_arg);

but it also says the memory was allocated in tuplestore, and that's
obviously very unlikely, because it does not do anything with signals.

I've only seen this once, but if it's related to signals, that's not
surprising - the window may be pretty narrow.

Anyone saw/investigated a report like this?


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Em qui., 20 de jun. de 2024 às 07:28, Tomas Vondra <tomas.vondra@enterprisedb.com> escreveu:
Hi,

While running valgrind on 32-bit ARM (rpi5 with debian), I got this
really strange report:


==25520== Use of uninitialised value of size 4
==25520==    at 0x94A550: wrapper_handler (pqsignal.c:108)
==25520==    by 0x4D7826F: ??? (sigrestorer.S:64)
==25520==  Uninitialised value was created by a heap allocation
==25520==    at 0x8FB780: palloc (mcxt.c:1340)
==25520==    by 0x913067: tuplestore_begin_common (tuplestore.c:289)
==25520==    by 0x91310B: tuplestore_begin_heap (tuplestore.c:331)
==25520==    by 0x3EA717: ExecMaterial (nodeMaterial.c:64)
==25520==    by 0x3B2FF7: ExecProcNodeFirst (execProcnode.c:464)
==25520==    by 0x3EF73F: ExecProcNode (executor.h:274)
==25520==    by 0x3F0637: ExecMergeJoin (nodeMergejoin.c:703)
==25520==    by 0x3B2FF7: ExecProcNodeFirst (execProcnode.c:464)
==25520==    by 0x3C47DB: ExecProcNode (executor.h:274)
==25520==    by 0x3C4D4F: fetch_input_tuple (nodeAgg.c:561)
==25520==    by 0x3C8233: agg_retrieve_direct (nodeAgg.c:2364)
==25520==    by 0x3C7E07: ExecAgg (nodeAgg.c:2179)
==25520==    by 0x3B2FF7: ExecProcNodeFirst (execProcnode.c:464)
==25520==    by 0x3A5EC3: ExecProcNode (executor.h:274)
==25520==    by 0x3A8FBF: ExecutePlan (execMain.c:1646)
==25520==    by 0x3A6677: standard_ExecutorRun (execMain.c:363)
==25520==    by 0x3A644B: ExecutorRun (execMain.c:304)
==25520==    by 0x6976D3: PortalRunSelect (pquery.c:924)
==25520==    by 0x6972F7: PortalRun (pquery.c:768)
==25520==    by 0x68FA1F: exec_simple_query (postgres.c:1274)
==25520==
{
   <insert_a_suppression_name_here>
   Memcheck:Value4
   fun:wrapper_handler
   obj:/usr/lib/arm-linux-gnueabihf/libc.so.6
}
**25520** Valgrind detected 1 error(s) during execution of "select
count(*) from
**25520**   (select * from tenk1 x order by x.thousand, x.twothousand,
x.fivethous) x
**25520**   left join
**25520**   (select * from tenk1 y order by y.unique2) y
**25520**   on x.thousand = y.unique2 and x.twothousand = y.hundred and
x.fivethous = y.unique2;"


I'm mostly used to weird valgrind stuff on this platform, but it's
usually about libarmmmem and (possibly) thinking it might access
undefined stuff when calculating checksums etc.

This seems somewhat different, so I wonder if it's something real?
It seems like a false positive to me.

According to valgrind's documentation:

" This can lead to false positive errors, as the shared memory can be initialised via a first mapping, and accessed via another mapping. The access via this other mapping will have its own V bits, which have not been changed when the memory was initialised via the first mapping. The bypass for these false positives is to use Memcheck's client requests VALGRIND_MAKE_MEM_DEFINED and VALGRIND_MAKE_MEM_UNDEFINED to inform Memcheck about what your program does (or what another process does) to these shared memory mappings. "

best regards,
Ranier Vilela

On 6/20/24 13:32, Ranier Vilela wrote:
> Em qui., 20 de jun. de 2024 às 07:28, Tomas Vondra <
> tomas.vondra@enterprisedb.com> escreveu:
> 
>> Hi,
>>
>> While running valgrind on 32-bit ARM (rpi5 with debian), I got this
>> really strange report:
>>
>>
>> ==25520== Use of uninitialised value of size 4
>> ==25520==    at 0x94A550: wrapper_handler (pqsignal.c:108)
>> ==25520==    by 0x4D7826F: ??? (sigrestorer.S:64)
>> ==25520==  Uninitialised value was created by a heap allocation
>> ==25520==    at 0x8FB780: palloc (mcxt.c:1340)
>> ==25520==    by 0x913067: tuplestore_begin_common (tuplestore.c:289)
>> ==25520==    by 0x91310B: tuplestore_begin_heap (tuplestore.c:331)
>> ==25520==    by 0x3EA717: ExecMaterial (nodeMaterial.c:64)
>> ==25520==    by 0x3B2FF7: ExecProcNodeFirst (execProcnode.c:464)
>> ==25520==    by 0x3EF73F: ExecProcNode (executor.h:274)
>> ==25520==    by 0x3F0637: ExecMergeJoin (nodeMergejoin.c:703)
>> ==25520==    by 0x3B2FF7: ExecProcNodeFirst (execProcnode.c:464)
>> ==25520==    by 0x3C47DB: ExecProcNode (executor.h:274)
>> ==25520==    by 0x3C4D4F: fetch_input_tuple (nodeAgg.c:561)
>> ==25520==    by 0x3C8233: agg_retrieve_direct (nodeAgg.c:2364)
>> ==25520==    by 0x3C7E07: ExecAgg (nodeAgg.c:2179)
>> ==25520==    by 0x3B2FF7: ExecProcNodeFirst (execProcnode.c:464)
>> ==25520==    by 0x3A5EC3: ExecProcNode (executor.h:274)
>> ==25520==    by 0x3A8FBF: ExecutePlan (execMain.c:1646)
>> ==25520==    by 0x3A6677: standard_ExecutorRun (execMain.c:363)
>> ==25520==    by 0x3A644B: ExecutorRun (execMain.c:304)
>> ==25520==    by 0x6976D3: PortalRunSelect (pquery.c:924)
>> ==25520==    by 0x6972F7: PortalRun (pquery.c:768)
>> ==25520==    by 0x68FA1F: exec_simple_query (postgres.c:1274)
>> ==25520==
>> {
>>    <insert_a_suppression_name_here>
>>    Memcheck:Value4
>>    fun:wrapper_handler
>>    obj:/usr/lib/arm-linux-gnueabihf/libc.so.6
>> }
>> **25520** Valgrind detected 1 error(s) during execution of "select
>> count(*) from
>> **25520**   (select * from tenk1 x order by x.thousand, x.twothousand,
>> x.fivethous) x
>> **25520**   left join
>> **25520**   (select * from tenk1 y order by y.unique2) y
>> **25520**   on x.thousand = y.unique2 and x.twothousand = y.hundred and
>> x.fivethous = y.unique2;"
>>
>>
>> I'm mostly used to weird valgrind stuff on this platform, but it's
>> usually about libarmmmem and (possibly) thinking it might access
>> undefined stuff when calculating checksums etc.
>>
>> This seems somewhat different, so I wonder if it's something real?
> 
> It seems like a false positive to me.
> 
> According to valgrind's documentation:
> https://valgrind.org/docs/manual/mc-manual.html#mc-manual.value
> 
> " This can lead to false positive errors, as the shared memory can be
> initialised via a first mapping, and accessed via another mapping. The
> access via this other mapping will have its own V bits, which have not been
> changed when the memory was initialised via the first mapping. The bypass
> for these false positives is to use Memcheck's client requests
> VALGRIND_MAKE_MEM_DEFINED and VALGRIND_MAKE_MEM_UNDEFINED to inform
> Memcheck about what your program does (or what another process does) to
> these shared memory mappings. "
> 

But that's about shared memory, and the report has nothing to do with
shared memory AFAICS.

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Em qui., 20 de jun. de 2024 às 08:54, Tomas Vondra <tomas.vondra@enterprisedb.com> escreveu:


On 6/20/24 13:32, Ranier Vilela wrote:
> Em qui., 20 de jun. de 2024 às 07:28, Tomas Vondra <
> tomas.vondra@enterprisedb.com> escreveu:
>
>> Hi,
>>
>> While running valgrind on 32-bit ARM (rpi5 with debian), I got this
>> really strange report:
>>
>>
>> ==25520== Use of uninitialised value of size 4
>> ==25520==    at 0x94A550: wrapper_handler (pqsignal.c:108)
>> ==25520==    by 0x4D7826F: ??? (sigrestorer.S:64)
>> ==25520==  Uninitialised value was created by a heap allocation
>> ==25520==    at 0x8FB780: palloc (mcxt.c:1340)
>> ==25520==    by 0x913067: tuplestore_begin_common (tuplestore.c:289)
>> ==25520==    by 0x91310B: tuplestore_begin_heap (tuplestore.c:331)
>> ==25520==    by 0x3EA717: ExecMaterial (nodeMaterial.c:64)
>> ==25520==    by 0x3B2FF7: ExecProcNodeFirst (execProcnode.c:464)
>> ==25520==    by 0x3EF73F: ExecProcNode (executor.h:274)
>> ==25520==    by 0x3F0637: ExecMergeJoin (nodeMergejoin.c:703)
>> ==25520==    by 0x3B2FF7: ExecProcNodeFirst (execProcnode.c:464)
>> ==25520==    by 0x3C47DB: ExecProcNode (executor.h:274)
>> ==25520==    by 0x3C4D4F: fetch_input_tuple (nodeAgg.c:561)
>> ==25520==    by 0x3C8233: agg_retrieve_direct (nodeAgg.c:2364)
>> ==25520==    by 0x3C7E07: ExecAgg (nodeAgg.c:2179)
>> ==25520==    by 0x3B2FF7: ExecProcNodeFirst (execProcnode.c:464)
>> ==25520==    by 0x3A5EC3: ExecProcNode (executor.h:274)
>> ==25520==    by 0x3A8FBF: ExecutePlan (execMain.c:1646)
>> ==25520==    by 0x3A6677: standard_ExecutorRun (execMain.c:363)
>> ==25520==    by 0x3A644B: ExecutorRun (execMain.c:304)
>> ==25520==    by 0x6976D3: PortalRunSelect (pquery.c:924)
>> ==25520==    by 0x6972F7: PortalRun (pquery.c:768)
>> ==25520==    by 0x68FA1F: exec_simple_query (postgres.c:1274)
>> ==25520==
>> {
>>    <insert_a_suppression_name_here>
>>    Memcheck:Value4
>>    fun:wrapper_handler
>>    obj:/usr/lib/arm-linux-gnueabihf/libc.so.6
>> }
>> **25520** Valgrind detected 1 error(s) during execution of "select
>> count(*) from
>> **25520**   (select * from tenk1 x order by x.thousand, x.twothousand,
>> x.fivethous) x
>> **25520**   left join
>> **25520**   (select * from tenk1 y order by y.unique2) y
>> **25520**   on x.thousand = y.unique2 and x.twothousand = y.hundred and
>> x.fivethous = y.unique2;"
>>
>>
>> I'm mostly used to weird valgrind stuff on this platform, but it's
>> usually about libarmmmem and (possibly) thinking it might access
>> undefined stuff when calculating checksums etc.
>>
>> This seems somewhat different, so I wonder if it's something real?
>
> It seems like a false positive to me.
>
> According to valgrind's documentation:
> https://valgrind.org/docs/manual/mc-manual.html#mc-manual.value
>
> " This can lead to false positive errors, as the shared memory can be
> initialised via a first mapping, and accessed via another mapping. The
> access via this other mapping will have its own V bits, which have not been
> changed when the memory was initialised via the first mapping. The bypass
> for these false positives is to use Memcheck's client requests
> VALGRIND_MAKE_MEM_DEFINED and VALGRIND_MAKE_MEM_UNDEFINED to inform
> Memcheck about what your program does (or what another process does) to
> these shared memory mappings. "
>

But that's about shared memory, and the report has nothing to do with
shared memory AFAICS.
You can try once:
Selecting --expensive-definedness-checks=yes causes Memcheck to use the most accurate analysis possible. This minimises false error rates but can cause up to 30% performance degradation. 

I did a search through my reports and none refer to this particular source.

best regards,
Ranier Vilela