Обсуждение: leak in libpq, getpwuid

Поиск
Список
Период
Сортировка

leak in libpq, getpwuid

От
Grzegorz Jaśkiewicz
Дата:
Hey folks,

I am getting leaks on my machine, valgrind points to getpwuid_r called
by libpq's PQConnectDb()

==11784== 32,772 bytes in 1 blocks are indirectly lost in loss record 31 of 31
==11784==    at 0x4004BA2: calloc (vg_replace_malloc.c:397)
==11784==    by 0x63D9FCB: ???
==11784==    by 0x63C10FD: ???
==11784==    by 0x63AF5BE: ???
==11784==    by 0x63AFB4D: ???
==11784==    by 0x63A0C0E: ???
==11784==    by 0x63A3520: ???
==11784==    by 0x63A42BD: ???
==11784==    by 0x63A4A86: ???
==11784==    by 0x63A513F: ???
==11784==    by 0x4949C1: getpwuid_r@@GLIBC_2.1.2 (in /lib/libc-2.8.so)
==11784==    by 0x297F2D: (within /usr/lib/libpq.so.5.1)
==11784==    by 0x283050: (within /usr/lib/libpq.so.5.1)
==11784==    by 0x287118: (within /usr/lib/libpq.so.5.1)
==11784==    by 0x287269: (within /usr/lib/libpq.so.5.1)
==11784==    by 0x2875AE: PQconnectStart (in /usr/lib/libpq.so.5.1)
==11784==    by 0x287601: PQconnectdb (in /usr/lib/libpq.so.5.1)


The application fires up few threads once in a while, and they connect
to db on different hosts.
What matters, is that after few days it started to fail in
pthread_create(), because of memory problems.

And looks like this is causing the problem. Please note that it
happends on both 8.3 and 8.4, in the same way (above backtrace is from
8.4).
Any ideas ?
I call PGconnectdb, and than PQfinish, on that test - it can't connect
because db is down.
Now, I am very sure that PQfinish is called.
Since no connection exists during that test, nothing else is called in libpq.

Any ideas ?

--
GJ

Re: leak in libpq, getpwuid

От
Grzegorz Jaśkiewicz
Дата:
oh, and note that I kind of rulled out linux libc/distro problem, same
happens on both centos 4.7 and fedora 9.
strangely.

Re: leak in libpq, getpwuid

От
Bruce Momjian
Дата:
Grzegorz Jaśkiewicz wrote:
> Hey folks,
>
> I am getting leaks on my machine, valgrind points to getpwuid_r called
> by libpq's PQConnectDb()
>
> ==11784== 32,772 bytes in 1 blocks are indirectly lost in loss record 31 of 31
> ==11784==    at 0x4004BA2: calloc (vg_replace_malloc.c:397)
> ==11784==    by 0x63D9FCB: ???
> ==11784==    by 0x63C10FD: ???
> ==11784==    by 0x63AF5BE: ???
> ==11784==    by 0x63AFB4D: ???
> ==11784==    by 0x63A0C0E: ???
> ==11784==    by 0x63A3520: ???
> ==11784==    by 0x63A42BD: ???
> ==11784==    by 0x63A4A86: ???
> ==11784==    by 0x63A513F: ???
> ==11784==    by 0x4949C1: getpwuid_r@@GLIBC_2.1.2 (in /lib/libc-2.8.so)
> ==11784==    by 0x297F2D: (within /usr/lib/libpq.so.5.1)
> ==11784==    by 0x283050: (within /usr/lib/libpq.so.5.1)
> ==11784==    by 0x287118: (within /usr/lib/libpq.so.5.1)
> ==11784==    by 0x287269: (within /usr/lib/libpq.so.5.1)
> ==11784==    by 0x2875AE: PQconnectStart (in /usr/lib/libpq.so.5.1)
> ==11784==    by 0x287601: PQconnectdb (in /usr/lib/libpq.so.5.1)
>
>
> The application fires up few threads once in a while, and they connect
> to db on different hosts.
> What matters, is that after few days it started to fail in
> pthread_create(), because of memory problems.

That is kind of odd, considering that getpwuid_r() shouldn't be
allocating any memory at all --- in fact, the reason we use it is
because we pass the memory it uses for storage.

The only thing I can suggest is posting the application source code
somewhere in hopes we can see the problem somewhere.

I have never heard of a similar report, so odds are there is something
wierd happening in your application.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

Re: leak in libpq, getpwuid

От
Grzegorz Jaśkiewicz
Дата:
thanks Bruce,
In fact - the more weird fact is, that it still happens on
fedora9+8.4, but I can't get it anymore on centos+8.3.5

Re: leak in libpq, getpwuid

От
Tom Lane
Дата:
=?UTF-8?Q?Grzegorz_Ja=C5=9Bkiewicz?= <gryzman@gmail.com> writes:
> oh, and note that I kind of rulled out linux libc/distro problem, same
> happens on both centos 4.7 and fedora 9.

That hardly constitutes a wide sample of linux distros ...

            regards, tom lane

Re: leak in libpq, getpwuid

От
Grzegorz Jaśkiewicz
Дата:
On Tue, Feb 17, 2009 at 4:03 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> =?UTF-8?Q?Grzegorz_Ja=C5=9Bkiewicz?= <gryzman@gmail.com> writes:
>> oh, and note that I kind of rulled out linux libc/distro problem, same
>> happens on both centos 4.7 and fedora 9.
>
> That hardly constitutes a wide sample of linux distros ...
>

same thing on debian, well - almost:

==8261==    at 0x4023D6E: malloc (vg_replace_malloc.c:207)
==8261==    by 0x43B1930: (within /lib/i686/cmov/libc-2.7.so)
==8261==    by 0x43B222B: __nss_database_lookup (in /lib/i686/cmov/libc-2.7.so)
==8261==    by 0x6C98F5B: ???
==8261==    by 0x6C9B0B4: ???
==8261==    by 0x4358ED1: getpwuid_r (in /lib/i686/cmov/libc-2.7.so)
==8261==    by 0x417ED3D: (within /usr/lib/libpq.so.5.1)
==8261==    by 0x416A7A8: (within /usr/lib/libpq.so.5.1)
==8261==    by 0x416E5CA: (within /usr/lib/libpq.so.5.1)
==8261==    by 0x416E709: (within /usr/lib/libpq.so.5.1)
==8261==    by 0x416EA4E: PQconnectStart (in /usr/lib/libpq.so.5.1)
==8261==    by 0x416EAA1: PQconnectdb (in /usr/lib/libpq.so.5.1)

--
GJ

Re: leak in libpq, getpwuid

От
Tom Lane
Дата:
=?UTF-8?Q?Grzegorz_Ja=C5=9Bkiewicz?= <gryzman@gmail.com> writes:
> same thing on debian, well - almost:

> ==8261==    at 0x4023D6E: malloc (vg_replace_malloc.c:207)
> ==8261==    by 0x43B1930: (within /lib/i686/cmov/libc-2.7.so)
> ==8261==    by 0x43B222B: __nss_database_lookup (in /lib/i686/cmov/libc-2.7.so)
> ==8261==    by 0x6C98F5B: ???
> ==8261==    by 0x6C9B0B4: ???
> ==8261==    by 0x4358ED1: getpwuid_r (in /lib/i686/cmov/libc-2.7.so)
> ==8261==    by 0x417ED3D: (within /usr/lib/libpq.so.5.1)
> ==8261==    by 0x416A7A8: (within /usr/lib/libpq.so.5.1)
> ==8261==    by 0x416E5CA: (within /usr/lib/libpq.so.5.1)
> ==8261==    by 0x416E709: (within /usr/lib/libpq.so.5.1)
> ==8261==    by 0x416EA4E: PQconnectStart (in /usr/lib/libpq.so.5.1)
> ==8261==    by 0x416EAA1: PQconnectdb (in /usr/lib/libpq.so.5.1)

[ shrug... ]  You're bugging the wrong people about this.  A leak
inside getpwuid_r is a glibc bug, not our bug.

            regards, tom lane

Re: leak in libpq, getpwuid

От
Grzegorz Jaśkiewicz
Дата:
On Wed, Feb 18, 2009 at 5:47 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> =?UTF-8?Q?Grzegorz_Ja=C5=9Bkiewicz?= <gryzman@gmail.com> writes:
>> same thing on debian, well - almost:
>
>> ==8261==    at 0x4023D6E: malloc (vg_replace_malloc.c:207)
>> ==8261==    by 0x43B1930: (within /lib/i686/cmov/libc-2.7.so)
>> ==8261==    by 0x43B222B: __nss_database_lookup (in /lib/i686/cmov/libc-2.7.so)
>> ==8261==    by 0x6C98F5B: ???
>> ==8261==    by 0x6C9B0B4: ???
>> ==8261==    by 0x4358ED1: getpwuid_r (in /lib/i686/cmov/libc-2.7.so)
>> ==8261==    by 0x417ED3D: (within /usr/lib/libpq.so.5.1)
>> ==8261==    by 0x416A7A8: (within /usr/lib/libpq.so.5.1)
>> ==8261==    by 0x416E5CA: (within /usr/lib/libpq.so.5.1)
>> ==8261==    by 0x416E709: (within /usr/lib/libpq.so.5.1)
>> ==8261==    by 0x416EA4E: PQconnectStart (in /usr/lib/libpq.so.5.1)
>> ==8261==    by 0x416EAA1: PQconnectdb (in /usr/lib/libpq.so.5.1)
>
> [ shrug... ]  You're bugging the wrong people about this.  A leak
> inside getpwuid_r is a glibc bug, not our bug.
>

thought so, but it is good to check on both sides of fence.
thanks

--
GJ

Re: leak in libpq, getpwuid

От
Michael Nacos
Дата:
just to say I have run into related problems on debian lenny amd64 (postgres 8.3.5, libc-2.7) and centos 5.2 (postgres 8.4.1, libc-2.5)

code as simple as this:

#include <libpq-fe.h>

int main()
{
       PGconn *connection = PQconnectdb("user=postgres");
       PQfinish(connection);
       return 0;
}

gives (run through valgrind --leak-check=full):

==13832==
==13832== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 19 from 2)
==13832== malloc/free: in use at exit: 292 bytes in 11 blocks.
==13832== malloc/free: 131 allocs, 120 frees, 51,532 bytes allocated.
==13832== For counts of detected errors, rerun with: -v
==13832== searching for pointers to 11 not-freed blocks.
==13832== checked 703,248 bytes.
==13832==
==13832== 292 (52 direct, 240 indirect) bytes in 1 blocks are definitely lost in loss record 1 of 3
==13832==    at 0x4C2260E: malloc (vg_replace_malloc.c:207)
==13832==    by 0x512852F: (within /lib/libc-2.7.so)
==13832==    by 0x5128D06: __nss_database_lookup (in /lib/libc-2.7.so)
==13832==    by 0x82A931F: ???
==13832==    by 0x82AA02C: ???
==13832==    by 0x50E7101: getpwuid_r (in /lib/libc-2.7.so)
==13832==    by 0x4E41D38: (within /usr/lib/libpq.so.5.1)
==13832==    by 0x4E2E50C: (within /usr/lib/libpq.so.5.1)
==13832==    by 0x4E3258F: (within /usr/lib/libpq.so.5.1)
==13832==    by 0x4E3260B: (within /usr/lib/libpq.so.5.1)
==13832==    by 0x4E32F98: PQconnectStart (in /usr/lib/libpq.so.5.1)
==13832==    by 0x4E32FE5: PQconnectdb (in /usr/lib/libpq.so.5.1)
==13832==
==13832== LEAK SUMMARY:
==13832==    definitely lost: 52 bytes in 1 blocks.
==13832==    indirectly lost: 240 bytes in 10 blocks.
==13832==      possibly lost: 0 bytes in 0 blocks.
==13832==    still reachable: 0 bytes in 0 blocks.
==13832==         suppressed: 0 bytes in 0 blocks.

and

==9466== Invalid free() / delete / delete[]
==9466==    at 0x4020FDA: free (vg_replace_malloc.c:233)
==9466==    by 0x4158A2D: free_mem (in /lib/libc-2.5.so)
==9466==    by 0x41585A6: __libc_freeres (in /lib/libc-2.5.so)
==9466==    by 0x401D1E6: _vgnU_freeres (vg_preloaded.c:60)
==9466==    by 0x40D9C63: _Exit (in /lib/libc-2.5.so)
==9466==    by 0x405EDF3: (below main) (in /lib/libc-2.5.so)
==9466==  Address 0x401C8F8 is not stack'd, malloc'd or (recently) free'd
==9466==
==9466== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 45 from 1)
==9466== malloc/free: in use at exit: 0 bytes in 0 blocks.
==9466== malloc/free: 136 allocs, 137 frees, 49,272 bytes allocated.

cheers, Michael

Re: leak in libpq, getpwuid

От
Tom Lane
Дата:
Michael Nacos <m.nacos@gmail.com> writes:
> just to say I have run into related problems on debian lenny amd64 (postgres
> 8.3.5, libc-2.7) and centos 5.2 (postgres 8.4.1, libc-2.5)

This is not a Postgres bug.  You can try filing it against glibc, but
I wouldn't be too surprised if they tell you it's not worth fixing.

            regards, tom lane

Re: leak in libpq, getpwuid

От
Grzegorz Jaśkiewicz
Дата:


On Thu, Oct 22, 2009 at 3:46 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Michael Nacos <m.nacos@gmail.com> writes:
> just to say I have run into related problems on debian lenny amd64 (postgres
> 8.3.5, libc-2.7) and centos 5.2 (postgres 8.4.1, libc-2.5)

This is not a Postgres bug.  You can try filing it against glibc, but
I wouldn't be too surprised if they tell you it's not worth fixing.

I tried to fight the same battle, and apparently this is by design.
 


--
GJ

Re: leak in libpq, getpwuid

От
Michael Nacos
Дата:
I have just run some tests, the number of lost bytes is always 292, no matter how many connections are opened and closed.
I guess it's ok, then.

M.

Re: leak in libpq, getpwuid

От
Tom Lane
Дата:
Michael Nacos <m.nacos@gmail.com> writes:
> I have just run some tests, the number of lost bytes is always 292, no
> matter how many connections are opened and closed.
> I guess it's ok, then.

Yeah.  I suspect the memory is in fact not "leaked", but valgrind is
somehow missing the link that points to it.  You'd have to dig into
the glibc sources to find out for sure though.

            regards, tom lane

Re: leak in libpq, getpwuid

От
Craig Ringer
Дата:
Michael Nacos wrote:
> I have just run some tests, the number of lost bytes is always 292, no
> matter how many connections are opened and closed.
> I guess it's ok, then.

Search the archives for a detailed explanation of this issue. The
earlier discussion was about a supposed leak in ecpg.

See:
 Message-ID: <022e01ca06e8$898255c0$aa1c10ac@RKC.local>
 Message-Id: <1247858675.9349.240.camel@ayaki>
on the -general list.

In brief: while technically a leak, it doesn't matter. Freeing that
memory would only ever be done immediately before a program exits.
Trying to free it introduces finalization ordering issues (what if
someone calls getpwnam(), getpwuid() etc after the cache is freed?) and
wastes CPU cycles. There's no point freeing memory when the whole
program is about to exit and its memory will be more efficiently
released by the OS.

The right answer to this is an addition to the default valgrind
suppressions file, not any change to glibc.

--
Craig Ringer

Re: leak in libpq, getpwuid

От
Michael Nacos
Дата:
thanks... I guess if it really mattered it would have come up by now
(since so many interfaces are based on libpq)

toying with the idea of yet another one :-)