Обсуждение: valgrind issues on Fedora 28

Поиск
Список
Период
Сортировка

valgrind issues on Fedora 28

От
Tomas Vondra
Дата:
Hi,

I've recently updated to Fedora 28, and in that environment I get quite 
a few new valgrind issues (see the attached log).

Essentially, all the reports start with either

==5971== Invalid read of size 32
==5971==    at 0x5957EB1: __wcsnlen_avx2 (in /usr/lib64/libc-2.27.so)
==5971==    by 0x589E871: wcsrtombs (in /usr/lib64/libc-2.27.so)
==5971==    by 0x5834000: wcstombs (in /usr/lib64/libc-2.27.so)
==5971==    by 0x97DD82: wchar2char (pg_locale.c:1641)

or

==5971== Conditional jump or move depends on uninitialised value(s)
==5971==    at 0x5822123: __gconv_transform_internal_utf8 (in 
/usr/lib64/libc-2.27.so)
==5971==    by 0x589E8A4: wcsrtombs (in /usr/lib64/libc-2.27.so)
==5971==    by 0x5834000: wcstombs (in /usr/lib64/libc-2.27.so)
==5971==    by 0x97DD82: wchar2char (pg_locale.c:1641)

or some other combination of that. In all cases the call stack is

   wchar2char > wcstombs > wcsrtombs > something

I don't see these issues on the old (Fedora 26) environment, so it seems 
to be caused by updating either gcc or glibc:

old system:

* glibc-2.25-13.fc26.x86_64
* gcc version 7.3.1 20180130 (Red Hat 7.3.1-2) (GCC)

new system:

* glibc-2.27-32.fc28.x86_64
* gcc version 8.2.1 20181011 (Red Hat 8.2.1-4) (GCC)

Valgrind is the same (valgrind-3.13.0) in both cases. I'm not sure if 
it's due to some glibc changes, gcc generating different code, or 
something else.

I've tried to look at wcsrtombs in glibc, but it's not particularly 
readable piece of code. Also, my eyes bleed ... Thank god for pgindent!

I suspect it's either due to some glibc bug and/or gcc optimization that 
ends up touching memory that was not actually initialized (judging by 
__wcsnlen_avx2), or something like that. Not sure.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Вложения

Re: valgrind issues on Fedora 28

От
Andres Freund
Дата:
On 2018-11-06 18:24:55 +0100, Tomas Vondra wrote:
> I've recently updated to Fedora 28, and in that environment I get quite a
> few new valgrind issues (see the attached log).
> 
> Essentially, all the reports start with either
> 
> ==5971== Invalid read of size 32
> ==5971==    at 0x5957EB1: __wcsnlen_avx2 (in /usr/lib64/libc-2.27.so)
> ==5971==    by 0x589E871: wcsrtombs (in /usr/lib64/libc-2.27.so)
> ==5971==    by 0x5834000: wcstombs (in /usr/lib64/libc-2.27.so)
> ==5971==    by 0x97DD82: wchar2char (pg_locale.c:1641)


I think this isn't actually a bug, just a missing suppression. The avx2
code uses instructions to scan for 0 bytes in multiple bytes at the same
time. Therefore it can encounter a byte marked as undefined, even if it
never actually uses that value.

> or
> 
> ==5971== Conditional jump or move depends on uninitialised value(s)
> ==5971==    at 0x5822123: __gconv_transform_internal_utf8 (in
> /usr/lib64/libc-2.27.so)
> ==5971==    by 0x589E8A4: wcsrtombs (in /usr/lib64/libc-2.27.so)
> ==5971==    by 0x5834000: wcstombs (in /usr/lib64/libc-2.27.so)
> ==5971==    by 0x97DD82: wchar2char (pg_locale.c:1641)
> 
> or some other combination of that. In all cases the call stack is
> 
>   wchar2char > wcstombs > wcsrtombs > something

I think I came to the same conclusion here, but I'm not quite sure.

FWIW, I've supressed these on my valgrind animal a while ago.

Greetings,

Andres Freund


Re: valgrind issues on Fedora 28

От
Alvaro Herrera
Дата:
On 2018-Nov-06, Tomas Vondra wrote:

> Hi,
> 
> I've recently updated to Fedora 28, and in that environment I get quite a
> few new valgrind issues (see the attached log).

Hmm, related to https://postgr.es/m/20180220150838.GD18315@e733.localdomain ?
Apparently the right answer is to add local suppressions ...

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: valgrind issues on Fedora 28

От
Tomas Vondra
Дата:
On 11/6/18 6:35 PM, Andres Freund wrote:
> On 2018-11-06 18:24:55 +0100, Tomas Vondra wrote:
>> I've recently updated to Fedora 28, and in that environment I get quite a
>> few new valgrind issues (see the attached log).
>>
>> Essentially, all the reports start with either
>>
>> ==5971== Invalid read of size 32
>> ==5971==    at 0x5957EB1: __wcsnlen_avx2 (in /usr/lib64/libc-2.27.so)
>> ==5971==    by 0x589E871: wcsrtombs (in /usr/lib64/libc-2.27.so)
>> ==5971==    by 0x5834000: wcstombs (in /usr/lib64/libc-2.27.so)
>> ==5971==    by 0x97DD82: wchar2char (pg_locale.c:1641)
> 
> 
> I think this isn't actually a bug, just a missing suppression. The avx2
> code uses instructions to scan for 0 bytes in multiple bytes at the same
> time. Therefore it can encounter a byte marked as undefined, even if it
> never actually uses that value.
> 

OK, my thoughts exactly.

>> or
>>
>> ==5971== Conditional jump or move depends on uninitialised value(s)
>> ==5971==    at 0x5822123: __gconv_transform_internal_utf8 (in
>> /usr/lib64/libc-2.27.so)
>> ==5971==    by 0x589E8A4: wcsrtombs (in /usr/lib64/libc-2.27.so)
>> ==5971==    by 0x5834000: wcstombs (in /usr/lib64/libc-2.27.so)
>> ==5971==    by 0x97DD82: wchar2char (pg_locale.c:1641)
>>
>> or some other combination of that. In all cases the call stack is
>>
>>    wchar2char > wcstombs > wcsrtombs > something
> 
> I think I came to the same conclusion here, but I'm not quite sure.
> 

Looking at gconv code at [1], it seems it's reading the data as int32 
values and using shifts to extract individual bytes. I'm pretty sure 
this confuses valgrind so it thinks it's accessing all the bytes.

[1] 

https://sourceware.org/git/?p=glibc.git;a=blob;f=iconv/gconv_simple.c;h=506c92caf228d61f92986c39a2ddf9c0a134b4c0;hb=HEAD

> FWIW, I've supressed these on my valgrind animal a while ago.
> 

OK, I propose to add these suppressions into the current list.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: valgrind issues on Fedora 28

От
Tomas Vondra
Дата:
On 11/6/18 6:35 PM, Alvaro Herrera wrote:
> On 2018-Nov-06, Tomas Vondra wrote:
> 
>> Hi,
>>
>> I've recently updated to Fedora 28, and in that environment I get quite a
>> few new valgrind issues (see the attached log).
> 
> Hmm, related to https://postgr.es/m/20180220150838.GD18315@e733.localdomain ?
> Apparently the right answer is to add local suppressions ...
> 

Yep, seems like that. I'll check if the patch proposed by Aleksander 
resolves all the issues reported by valgrind.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: valgrind issues on Fedora 28

От
Tomas Vondra
Дата:
On 11/6/18 6:51 PM, Tomas Vondra wrote:
> On 11/6/18 6:35 PM, Alvaro Herrera wrote:
>> On 2018-Nov-06, Tomas Vondra wrote:
>>
>>> Hi,
>>>
>>> I've recently updated to Fedora 28, and in that environment I get
>>> quite a
>>> few new valgrind issues (see the attached log).
>>
>> Hmm, related to
>> https://postgr.es/m/20180220150838.GD18315@e733.localdomain ?
>> Apparently the right answer is to add local suppressions ...
>>
> 
> Yep, seems like that. I'll check if the patch proposed by Aleksander
> resolves all the issues reported by valgrind.
> 

OK, I propose we add suppressions per the attached patch.

The rules are clearly incomplete, due to dependence on optimization
used. For example now the stack includes __wcsnlen_avx2, but I see it
was __wcsnlen_sse4_1 in Aleksander's report in February.

Not sure what to do about this - maybe we should wildcard this first
frame, to accept all wcsnlen variants. Opinions?

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Вложения

Re: valgrind issues on Fedora 28

От
Tomas Vondra
Дата:
On 11/8/18 1:07 PM, Tomas Vondra wrote:
> On 11/6/18 6:51 PM, Tomas Vondra wrote:
>> On 11/6/18 6:35 PM, Alvaro Herrera wrote:
>>> On 2018-Nov-06, Tomas Vondra wrote:
>>>
>>>> Hi,
>>>>
>>>> I've recently updated to Fedora 28, and in that environment I get
>>>> quite a
>>>> few new valgrind issues (see the attached log).
>>>
>>> Hmm, related to
>>> https://postgr.es/m/20180220150838.GD18315@e733.localdomain ?
>>> Apparently the right answer is to add local suppressions ...
>>>
>>
>> Yep, seems like that. I'll check if the patch proposed by Aleksander
>> resolves all the issues reported by valgrind.
>>
> 
> OK, I propose we add suppressions per the attached patch.
> 
> The rules are clearly incomplete, due to dependence on optimization
> used. For example now the stack includes __wcsnlen_avx2, but I see it
> was __wcsnlen_sse4_1 in Aleksander's report in February.
> 
> Not sure what to do about this - maybe we should wildcard this first
> frame, to accept all wcsnlen variants. Opinions?
> 

I've committed this with the first frame wirldcarded, and backpatched it
all the way back to 9.4.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: valgrind issues on Fedora 28

От
Tom Lane
Дата:
Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
> On 11/8/18 1:07 PM, Tomas Vondra wrote:
>> Not sure what to do about this - maybe we should wildcard this first
>> frame, to accept all wcsnlen variants. Opinions?

> I've committed this with the first frame wirldcarded, and backpatched it
> all the way back to 9.4.

So I just got around to trying to do some valgrind testing for the first
time in a couple of months, and what I find is that this patch completely
breaks valgrind for me: it exits immediately with complaints like

==00:00:00:00.222 14821== FATAL: in suppressions file "/home/postgres/pgsql/src/tools/valgrind.supp" near line 227:
==00:00:00:00.222 14821==    unknown tool suppression type
==00:00:00:00.222 14821== exiting now.

After a bit of hair-pulling I found that the line number report is
a lie, and what it's really unhappy about is the "Memcheck:Addr32"
specification on line 236.  This is not terribly surprising perhaps
considering that "Addr32" isn't even documented on what should be
the authoritative page:
http://valgrind.org/docs/manual/mc-manual.html#mc-manual.suppfiles

This is valgrind 3.8.1 (RHEL6's version), so not bleeding edge exactly,
but we have very little evidence about how far back the committed patch
works.  I'm inclined to think we can't use this solution.

            regards, tom lane


Re: valgrind issues on Fedora 28

От
Tomas Vondra
Дата:
On 12/14/18 5:10 AM, Tom Lane wrote:
> Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
>> On 11/8/18 1:07 PM, Tomas Vondra wrote:
>>> Not sure what to do about this - maybe we should wildcard this first
>>> frame, to accept all wcsnlen variants. Opinions?
> 
>> I've committed this with the first frame wirldcarded, and backpatched it
>> all the way back to 9.4.
> 
> So I just got around to trying to do some valgrind testing for the first
> time in a couple of months, and what I find is that this patch completely
> breaks valgrind for me: it exits immediately with complaints like
> 
> ==00:00:00:00.222 14821== FATAL: in suppressions file "/home/postgres/pgsql/src/tools/valgrind.supp" near line 227:
> ==00:00:00:00.222 14821==    unknown tool suppression type
> ==00:00:00:00.222 14821== exiting now.
> 
> After a bit of hair-pulling I found that the line number report is
> a lie, and what it's really unhappy about is the "Memcheck:Addr32"
> specification on line 236.  This is not terribly surprising perhaps
> considering that "Addr32" isn't even documented on what should be
> the authoritative page:
> http://valgrind.org/docs/manual/mc-manual.html#mc-manual.suppfiles
> 

Hmm :-( FWIW the Addr32 comes directly from valgrind itself, which
explicitly suggested to add suppression like this:

    {
       <insert_a_suppression_name_here>
       Memcheck:Addr32
       fun:__wcsnlen_avx2
       fun:wcsrtombs
       fun:wcstombs
       fun:wchar2char
       fun:str_tolower
       fun:lower
       fun:DirectFunctionCall1Coll
       fun:Generic_Text_IC_like
       ...
    }

My guess is that documentation is either somewhat obsolete, or is meant
more like a general description than an exhaustive and authoritative
source of truth regarding suppressions.

> This is valgrind 3.8.1 (RHEL6's version), so not bleeding edge exactly,
> but we have very little evidence about how far back the committed patch
> works.  I'm inclined to think we can't use this solution.
> 

Fedora 28 has 3.14.0, so this seems to be due to some improvements since
3.8.1. I'm not sure how to deal with this - surely we don't want to just
revert the change because that would start generating noise again. For a
moment I thought we might wildcard the error type somehow, so that it
would accept Memcheck:* but AFAICs that is not supported.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: valgrind issues on Fedora 28

От
Tom Lane
Дата:
Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
> On 12/14/18 5:10 AM, Tom Lane wrote:
>> So I just got around to trying to do some valgrind testing for the first
>> time in a couple of months, and what I find is that this patch completely
>> breaks valgrind for me ...
>> This is valgrind 3.8.1 (RHEL6's version), so not bleeding edge exactly,
>> but we have very little evidence about how far back the committed patch
>> works.  I'm inclined to think we can't use this solution.

> Fedora 28 has 3.14.0, so this seems to be due to some improvements since
> 3.8.1. I'm not sure how to deal with this - surely we don't want to just
> revert the change because that would start generating noise again. For a
> moment I thought we might wildcard the error type somehow, so that it
> would accept Memcheck:* but AFAICs that is not supported.

Meh.  There are lots of situations where it's necessary to carry some
platform-specific valgrind suppressions; I have half a dozen on my
RHEL6 box because of various weirdness in its old version of perl, for
example.  I think this is one where people running code new enough to
have this problem need to carry private suppressions of their own.

In general, I'm not particularly on board with our valgrind.supp
carrying suppressions for code outside our own code base: I think
that's assuming WAY too much about which version of what is installed
on a particular box.

Maybe we could do something to make it simpler to have custom
suppressions?  Not sure what, though.

            regards, tom lane


Re: valgrind issues on Fedora 28

От
Tomas Vondra
Дата:
On 12/14/18 4:58 PM, Tom Lane wrote:
> ...
> 
> In general, I'm not particularly on board with our valgrind.supp
> carrying suppressions for code outside our own code base: I think
> that's assuming WAY too much about which version of what is installed
> on a particular box.
> 

Fair point.

> Maybe we could do something to make it simpler to have custom
> suppressions?  Not sure what, though.
> 

I was thinking that perhaps we could allows specifying path to extra
suppressions and pass that to valgrind.

But we don't actually invoke valgrind, that's something people do on
their own anyway - so we don't have anywhere to pass the path to. And
whoever invokes valgrind can simply stick it directly into the command
they're using (as it allows specifying multiple --suppressions=<file>
options). Or perhaps just put it into ~/.valgrindrc.

So perhaps we should simply revert that commit and be done with it.

One place that will need to solve it is buildfarm client, but it could
pick either of the options I mentioned.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: valgrind issues on Fedora 28

От
Andrew Dunstan
Дата:
On 12/14/18 4:35 PM, Tomas Vondra wrote:
> On 12/14/18 4:58 PM, Tom Lane wrote:
>> ...
>>
>> In general, I'm not particularly on board with our valgrind.supp
>> carrying suppressions for code outside our own code base: I think
>> that's assuming WAY too much about which version of what is installed
>> on a particular box.
>>
> Fair point.
>
>> Maybe we could do something to make it simpler to have custom
>> suppressions?  Not sure what, though.
>>
> I was thinking that perhaps we could allows specifying path to extra
> suppressions and pass that to valgrind.
>
> But we don't actually invoke valgrind, that's something people do on
> their own anyway - so we don't have anywhere to pass the path to. And
> whoever invokes valgrind can simply stick it directly into the command
> they're using (as it allows specifying multiple --suppressions=<file>
> options). Or perhaps just put it into ~/.valgrindrc.
>
> So perhaps we should simply revert that commit and be done with it.
>
> One place that will need to solve it is buildfarm client, but it could
> pick either of the options I mentioned.
>
>

The buildfarm client has a parameter in the config file for valgrind
options. All you would have to do is an an extra --suppressions setting
in there.


cheers


andrew


-- 
Andrew Dunstan                https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: valgrind issues on Fedora 28

От
Tomas Vondra
Дата:
On 12/15/18 12:32 AM, Andrew Dunstan wrote:
> 
> On 12/14/18 4:35 PM, Tomas Vondra wrote:
>> On 12/14/18 4:58 PM, Tom Lane wrote:
>>> ...
>>>
>>> In general, I'm not particularly on board with our valgrind.supp
>>> carrying suppressions for code outside our own code base: I think
>>> that's assuming WAY too much about which version of what is installed
>>> on a particular box.
>>>
>> Fair point.
>>
>>> Maybe we could do something to make it simpler to have custom
>>> suppressions?  Not sure what, though.
>>>
>> I was thinking that perhaps we could allows specifying path to extra
>> suppressions and pass that to valgrind.
>>
>> But we don't actually invoke valgrind, that's something people do on
>> their own anyway - so we don't have anywhere to pass the path to. And
>> whoever invokes valgrind can simply stick it directly into the command
>> they're using (as it allows specifying multiple --suppressions=<file>
>> options). Or perhaps just put it into ~/.valgrindrc.
>>
>> So perhaps we should simply revert that commit and be done with it.
>>
>> One place that will need to solve it is buildfarm client, but it could
>> pick either of the options I mentioned.
>>
>>
> 
> The buildfarm client has a parameter in the config file for valgrind
> options. All you would have to do is an an extra --suppressions setting
> in there.
> 

OK, makes sense. I'll revert the commit tomorrow.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services