Обсуждение: Huge amount of memory errors with libpq

Поиск
Список
Период
Сортировка

Huge amount of memory errors with libpq

От
Casey Jones
Дата:
I'm writing a server application in C that needs to interact with a postgre
database, but on my development server I'm getting tons of memory errors from
valgrind.  There are enough of them that it's causing problems, like data
stored in a char* is magically changing after calling PQexec().
I'm not having any of these issues on my production server, which is odd.
Maybe there is some configuration difference.

My development server was initially running 8.4.4 on Gentoo.  I downgraded to
8.1.21 (still on Gentoo) to match my CentOS production server to see if the
problems would go away, but they didn't.

I set up a simple test program that links to libpq to see if it was a problem
in libpq or with my program.

#include <stdio.h>
#include <stdlib.h>
#include "libpq-fe.h"

int main(int argc, char **argv)
{
    PGconn *conn;
    conn = PQconnectdb("dbname=mydb");
    if(PQstatus(conn) != CONNECTION_OK)
    {
        fprintf(stderr, "connection failed: %s\n", PQerrorMessage(conn));
        PQfinish(conn);
        return 1;
    }

    char* q = "SELECT * from mytable;";
    PGresult *res = PQexec(conn, q);
    if(PQresultStatus(res) != PGRES_TUPLES_OK)
    {
        fprintf(stderr, "comamnd failed: %s\n", PQerrorMessage(conn));
        PQfinish(conn);
        return 1;
    }

    PQclear(res);
    PQfinish(conn);

    return 0;
}

I compiled it using this command: gcc test.c -lpq
Then I ran it through valgrind 3.5.0 using:
valgrind --tool=memcheck --leak-check=full ./a.out

This is the summary from valgrind on the problem server.
==22234== HEAP SUMMARY:
==22234==     in use at exit: 292 bytes in 11 blocks
==22234==   total heap usage: 122 allocs, 112 frees, 51,747 bytes allocated
==22234==
==22234== 292 (52 direct, 240 indirect) bytes in 1 blocks are definitely lost
in loss record 11 of 11
==22234==    at 0x4C260AE: malloc (vg_replace_malloc.c:195)
==22234==    by 0x5130C4C: nss_parse_service_list (in /lib64/libc-2.12.1.so)
==22234==    by 0x5131425: __nss_database_lookup (in /lib64/libc-2.12.1.so)
==22234==    by 0x641536F: ???
==22234==    by 0x6415FB4: ???
==22234==    by 0x50EFF4C: getpwuid_r@@GLIBC_2.2.5 (in /lib64/libc-2.12.1.so)
==22234==    by 0x50EF83E: getpwuid (in /lib64/libc-2.12.1.so)
==22234==    by 0x4E46708: pqGetpwuid (in
/usr/lib64/postgresql-8.1/lib64/libpq.so.4.1)
==22234==    by 0x4E354D6: pg_fe_getauthname (in
/usr/lib64/postgresql-8.1/lib64/libpq.so.4.1)
==22234==    by 0x4E37CF9: conninfo_parse (in
/usr/lib64/postgresql-8.1/lib64/libpq.so.4.1)
==22234==    by 0x4E37DD7: connectOptions1 (in
/usr/lib64/postgresql-8.1/lib64/libpq.so.4.1)
==22234==    by 0x4E389C0: PQconnectStart (in
/usr/lib64/postgresql-8.1/lib64/libpq.so.4.1)
==22234==
==22234== LEAK SUMMARY:
==22234==    definitely lost: 52 bytes in 1 blocks
==22234==    indirectly lost: 240 bytes in 10 blocks
==22234==      possibly lost: 0 bytes in 0 blocks
==22234==    still reachable: 0 bytes in 0 blocks
==22234==         suppressed: 0 bytes in 0 blocks
==22234==
==22234== For counts of detected and suppressed errors, rerun with: -v
==22234== Use --track-origins=yes to see where uninitialised values come from
==22234== ERROR SUMMARY: 255 errors from 76 contexts (suppressed: 6 from 6)

These are the first two errors from valgrind.  If more are needed, I can send
them.
==22234== Invalid read of size 8
==22234==    at 0x515EE03: __strcmp_ssse3 (in /lib64/libc-2.12.1.so)
==22234==    by 0x4E377D4: conninfo_parse (in
/usr/lib64/postgresql-8.1/lib64/libpq.so.4.1)
==22234==    by 0x4E37DD7: connectOptions1 (in
/usr/lib64/postgresql-8.1/lib64/libpq.so.4.1)
==22234==    by 0x4E389C0: PQconnectStart (in
/usr/lib64/postgresql-8.1/lib64/libpq.so.4.1)
==22234==    by 0x4E38A05: PQconnectdb (in
/usr/lib64/postgresql-8.1/lib64/libpq.so.4.1)
==22234==    by 0x40097C: main (in /home/casey/a.out)
==22234==  Address 0x601baa8 is 8 bytes inside a block of size 12 alloc'd
==22234==    at 0x4C260AE: malloc (vg_replace_malloc.c:195)
==22234==    by 0x50CBB41: strdup (in /lib64/libc-2.12.1.so)
==22234==    by 0x4E3766B: conninfo_parse (in
/usr/lib64/postgresql-8.1/lib64/libpq.so.4.1)
==22234==    by 0x4E37DD7: connectOptions1 (in
/usr/lib64/postgresql-8.1/lib64/libpq.so.4.1)
==22234==    by 0x4E389C0: PQconnectStart (in
/usr/lib64/postgresql-8.1/lib64/libpq.so.4.1)
==22234==    by 0x4E38A05: PQconnectdb (in
/usr/lib64/postgresql-8.1/lib64/libpq.so.4.1)
==22234==    by 0x40097C: main (in /home/casey/a.out)
==22234==
==22234== Invalid read of size 8
==22234==    at 0x515FA54: __strcmp_ssse3 (in /lib64/libc-2.12.1.so)
==22234==    by 0x4E377D4: conninfo_parse (in
/usr/lib64/postgresql-8.1/lib64/libpq.so.4.1)
==22234==    by 0x4E37DD7: connectOptions1 (in
/usr/lib64/postgresql-8.1/lib64/libpq.so.4.1)
==22234==    by 0x4E389C0: PQconnectStart (in
/usr/lib64/postgresql-8.1/lib64/libpq.so.4.1)
==22234==    by 0x4E38A05: PQconnectdb (in
/usr/lib64/postgresql-8.1/lib64/libpq.so.4.1)
==22234==    by 0x40097C: main (in /home/casey/a.out)
==22234==  Address 0x601baa8 is 8 bytes inside a block of size 12 alloc'd
==22234==    at 0x4C260AE: malloc (vg_replace_malloc.c:195)
==22234==    by 0x50CBB41: strdup (in /lib64/libc-2.12.1.so)
==22234==    by 0x4E3766B: conninfo_parse (in
/usr/lib64/postgresql-8.1/lib64/libpq.so.4.1)
==22234==    by 0x4E37DD7: connectOptions1 (in
/usr/lib64/postgresql-8.1/lib64/libpq.so.4.1)
==22234==    by 0x4E389C0: PQconnectStart (in
/usr/lib64/postgresql-8.1/lib64/libpq.so.4.1)
==22234==    by 0x4E38A05: PQconnectdb (in
/usr/lib64/postgresql-8.1/lib64/libpq.so.4.1)
==22234==    by 0x40097C: main (in /home/casey/a.out)

This is what I get from valgrind on my production server (CentOS 5.4 with
postgresql 8.1.21)
==21941== HEAP SUMMARY:
==21941==     in use at exit: 0 bytes in 0 blocks
==21941==   total heap usage: 240 allocs, 240 frees, 65,725 bytes allocated
==21941==
==21941== All heap blocks were freed -- no leaks are possible
==21941==
==21941== For counts of detected and suppressed errors, rerun with: -v
==21941== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 49 from 10)


My Gentoo machine's setup
AMD64
Linux 2.6.35.4
gcc 4.4.4
glibc 2.12.1
postgresql 8.1.21

Thanks

--
Casey Jones

Re: Huge amount of memory errors with libpq

От
Craig Ringer
Дата:
On 09/12/2010 02:53 PM, Casey Jones wrote:

> My development server was initially running 8.4.4 on Gentoo.  I downgraded to
> 8.1.21 (still on Gentoo) to match my CentOS production server to see if the
> problems would go away, but they didn't.

Thanks for the test case. It's rare - and delightful - to see a neat,
minimal test case for any kind of question or issue report posted here.

The test case helps eliminate any outside code as a problem. In this
case, though, it's almost certainly that valgrind doesn't know about
glibc's sse3 code, for which it needs additional suppressions to handle
apparent memory errors that are in fact OK.

http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=583856

Anyway, since you've provided a test program, I can at least run it here
on a modern PostgreSQL and see what results I get to provide some more
info. In this case, it runs fine and no issues are detected. I'm on a
64-bit Fedora 13 install with glibc 2.12.3 and postgresql 9.0rc1 , so
it's not exactly a close match for your system. It is a Core 2 Duo, so
it's SSE3 capable hardware as confirmed by /proc/cpuinfo. I'm using
valgrind 3.5.0 .

$ valgrind --tool=memcheck --leak-check=full ./a.out
==26001== Memcheck, a memory error detector
==26001== Copyright (C) 2002-2009, and GNU GPL'd, by Julian Seward et al.
==26001== Using Valgrind-3.5.0 and LibVEX; rerun with -h for copyright info
==26001== Command: ./a.out
==26001==
==26001==
==26001== HEAP SUMMARY:
==26001==     in use at exit: 0 bytes in 0 blocks
==26001==   total heap usage: 102 allocs, 102 frees, 47,606 bytes allocated
==26001==
==26001== All heap blocks were freed -- no leaks are possible
==26001==
==26001== For counts of detected and suppressed errors, rerun with: -v
==26001== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 6 from 6)


Looking at the trace from yours, it appears to me that it's trying to
use an operation with an 8 byte input on the last four bytes of a
12-byte string. That string is certainly going to be "dbname=mydb\0", as
"dbname=mydb" is 11 bytes long and is the conninfo string being supplied
to libpq.

It's hard to see how strcmp could perform an incorrect read on that due
to bad input from libpq, so long as the null-terminator is present on at
least the shorter of the inputs if not both. In this case it's present
on the string the error report complains about, excluding a missing
terminator as a problem cause. There's no length argument to be wrong,
nothing much else at all to be wrong in what libpq supplies to libc.

I strongly suspect that glibc is doing funky magic with sse3 string
operations that cause apparently invalid reads that are actually safe,
*or* there's an issue with valgrind its self.

It'd be interesting to test the following program:


#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main(int argc, char **argv)
{
     int cmpresult = strcmp("user", strdup("dbname=classads"));
     printf("Comparison: %i\n", cmpresult);
}



... to see if it, too, reports errors from valgrind. It doesn't here, of
course (though interestingly strcmp returns 1 under valgrind and 17
outside it); I'd like to see what your results are.

--
Craig Ringer

Re: Huge amount of memory errors with libpq

От
Casey Jones
Дата:


On Sun, Sep 12, 2010 at 7:54 AM, Craig Ringer <craig@postnewspapers.com.au> wrote:
Anyway, since you've provided a test program, I can at least run it here on a modern PostgreSQL and see what results I get to provide some more info. In this case, it runs fine and no issues are detected. I'm on a 64-bit Fedora 13 install with glibc 2.12.3 and postgresql 9.0rc1 , so it's not exactly a close match for your system. It is a Core 2 Duo, so it's SSE3 capable hardware as confirmed by /proc/cpuinfo. I'm using valgrind 3.5.0 .

I use a AMD Athlon II X4.  It's based off the new Phenom II's, so it certainly supports SSE3 and SSE4a as well.
 
... to see if it, too, reports errors from valgrind. It doesn't here, of course (though interestingly strcmp returns 1 under valgrind and 17 outside it); I'd like to see what your results are.

I get 17 as a result with or without valgrind.  And I don't get any memory errors.

==23894== Memcheck, a memory error detector
==23894== Copyright (C) 2002-2009, and GNU GPL'd, by Julian Seward et al.
==23894== Using Valgrind-3.5.0 and LibVEX; rerun with -h for copyright info
==23894== Command: ./a.out
==23894== 
Comparison: 17
==23894== 
==23894== HEAP SUMMARY:
==23894==     in use at exit: 16 bytes in 1 blocks
==23894==   total heap usage: 1 allocs, 0 frees, 16 bytes allocated
==23894== 
==23894== 16 bytes in 1 blocks are definitely lost in loss record 1 of 1
==23894==    at 0x4C260AE: malloc (vg_replace_malloc.c:195)
==23894==    by 0x4EA8B41: strdup (in /lib64/libc-2.12.1.so)
==23894==    by 0x40061C: main (in /home/casey/kwooty/Download/a.out)
==23894== 
==23894== LEAK SUMMARY:
==23894==    definitely lost: 16 bytes in 1 blocks
==23894==    indirectly lost: 0 bytes in 0 blocks
==23894==      possibly lost: 0 bytes in 0 blocks
==23894==    still reachable: 0 bytes in 0 blocks
==23894==         suppressed: 0 bytes in 0 blocks
==23894== 
==23894== For counts of detected and suppressed errors, rerun with: -v
==23894== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 6 from 6)

This bug from Gentoo may be related, but I thought I had worked around it.
It says to compile glibc with splitdebug, which I have and it got me past a fatal error in valgrind.  But it does mention sse-optimized strlen().
I just checked an older program I had written, and I'm getting tons of errors on that too.  Just a few months ago I had it down to just a couple of errors.  Now I'm seeing lots of errors ending at __strncmp_ssse3.

I don't think valgrind is the only issue here because outside valgrind my data is getting magically overwritten.  In the function causing that problem I set all the fields I wanted to set by hand instead of using PQgetvalue().  If I leave PQexec() uncommented, my data in a totally unrelated area would change, but when I comment it out I get the expected results.  There might be an error I'm making thats causing this, but I can't find it in valgrind because of the huge number of errors.

Re: Huge amount of memory errors with libpq

От
Tom Lane
Дата:
Casey Jones <jonescaseyb@gmail.com> writes:
> I don't think valgrind is the only issue here because outside valgrind my
> data is getting magically overwritten.  In the function causing that problem
> I set all the fields I wanted to set by hand instead of using PQgetvalue().
>  If I leave PQexec() uncommented, my data in a totally unrelated area would
> change, but when I comment it out I get the expected results.  There might
> be an error I'm making thats causing this, but I can't find it in valgrind
> because of the huge number of errors.

FWIW, that test case shows no errors at all for me, on an x86_64 running
Fedora 13.  I'd suggest trying it on something other than Gentoo.

            regards, tom lane

Re: Huge amount of memory errors with libpq

От
Casey Jones
Дата:
On Sunday 12 September 2010 5:44:26 pm you wrote:
> Casey Jones <jonescaseyb@gmail.com> writes:
> > I don't think valgrind is the only issue here because outside valgrind my
> > data is getting magically overwritten.  In the function causing that
> > problem I set all the fields I wanted to set by hand instead of using
> > PQgetvalue().
> >
> >  If I leave PQexec() uncommented, my data in a totally unrelated area
> >  would
> >
> > change, but when I comment it out I get the expected results.  There
> > might be an error I'm making thats causing this, but I can't find it in
> > valgrind because of the huge number of errors.
>
> FWIW, that test case shows no errors at all for me, on an x86_64 running
> Fedora 13.  I'd suggest trying it on something other than Gentoo.
>
>             regards, tom lane

I set up Fedora 13 and ran the test case, and I didn't get any errors.  I also
tested my project and it had significantly fewer errors.  So yeah, it looks
like a glibc problem on Gentoo.  Thanks for the help everyone.

--
Casey Jones