Обсуждение: Probable memory leak with ECPG and AIX

Поиск
Список
Период
Сортировка

Probable memory leak with ECPG and AIX

От
Guillaume Lelarge
Дата:
Hello,

Our customer thinks he has found a memory leak on ECPG and AIX.

The code is quite simple. It declares a cursor, opens it, and fetches the only line available in the table many times. After some time, the client crashes with a segfault error. According to him, it consumed around 256MB. What's weird is that it works great on Linux, but crashed on AIX. One coworker thought it could be the compiler. Our customer used cc, but he also tried with gcc, and got the same error.

The test case is attached (testcase.pgc is the ECPG code, testcase.sh is what our customer used to precompile and compile his code). Do you have any idea why that happens on AIX?

Two queries to create the table and populate it with a single record:

CREATE TABLE foo(
   key integer PRIMARY KEY,
   value character varying(20)
);
INSERT INTO foo values (1, 'one');

Thanks.

Regards.


--
Guillaume.
Вложения

Re: Probable memory leak with ECPG and AIX

От
Justin Pryzby
Дата:
On Fri, Dec 10, 2021 at 03:40:50PM +0100, Guillaume Lelarge wrote:
> Hello,
> 
> Our customer thinks he has found a memory leak on ECPG and AIX.
> 
> The code is quite simple. It declares a cursor, opens it, and fetches the
> only line available in the table many times. After some time, the client
> crashes with a segfault error. According to him, it consumed around 256MB.
> What's weird is that it works great on Linux, but crashed on AIX. One
> coworker thought it could be the compiler. Our customer used cc, but he
> also tried with gcc, and got the same error.

A memory leak isn't the same as a segfault (although I don't know how AIX
responds to OOM).

Can you show that it's a memory leak ?  Show RAM use increasing continuously
and linearly with loop count.

How many loops does it take to crash ?

Could you obtain a backtrace ?

-- 
Justin



Re: Probable memory leak with ECPG and AIX

От
Guillaume Lelarge
Дата:
Le sam. 11 déc. 2021 à 07:52, Justin Pryzby <pryzby@telsasoft.com> a écrit :
On Fri, Dec 10, 2021 at 03:40:50PM +0100, Guillaume Lelarge wrote:
> Hello,
>
> Our customer thinks he has found a memory leak on ECPG and AIX.
>
> The code is quite simple. It declares a cursor, opens it, and fetches the
> only line available in the table many times. After some time, the client
> crashes with a segfault error. According to him, it consumed around 256MB.
> What's weird is that it works great on Linux, but crashed on AIX. One
> coworker thought it could be the compiler. Our customer used cc, but he
> also tried with gcc, and got the same error.

A memory leak isn't the same as a segfault (although I don't know how AIX
responds to OOM).

Can you show that it's a memory leak ?  Show RAM use increasing continuously
and linearly with loop count.

How many loops does it take to crash ?

Could you obtain a backtrace ?


Thanks. I'll try to get all these informations, but it won't be before monday.


--
Guillaume.

Re: Probable memory leak with ECPG and AIX

От
Noah Misch
Дата:
On Fri, Dec 10, 2021 at 03:40:50PM +0100, Guillaume Lelarge wrote:
> After some time, the client
> crashes with a segfault error. According to him, it consumed around 256MB.
> What's weird is that it works great on Linux, but crashed on AIX.

That almost certainly means he's using a 32-bit binary with the default heap
size.  To use more heap on AIX, build 64-bit or override the heap size.  For
example, "env LDR_CNTRL=MAXDATA=0x80000000 ./a.out" gives 2GiB of heap.  See
https://www.postgresql.org/docs/devel/installation-platform-notes.html#INSTALLATION-NOTES-AIX
for more ways to control heap size.  While that documentation focuses on the
server, the same techniques apply to clients like your test program.

That said, I don't know why your test program reaches 256MB on AIX.  On
GNU/Linux, it uses a lot less.  What version of PostgreSQL provided your
client libraries?



Re: Probable memory leak with ECPG and AIX

От
talk to ben
Дата:
Hi,

(I work with Guillaume on this case.)

On Sun, Dec 12, 2021 at 8:34 AM Noah Misch <noah@leadboat.com> wrote:
That almost certainly means he's using a 32-bit binary with the default heap
size.  To use more heap on AIX, build 64-bit or override the heap size.  For
example, "env LDR_CNTRL=MAXDATA=0x80000000 ./a.out" gives 2GiB of heap.  See
https://www.postgresql.org/docs/devel/installation-platform-notes.html#INSTALLATION-NOTES-AIX
for more ways to control heap size.  While that documentation focuses on the
server, the same techniques apply to clients like your test program.

That said, I don't know why your test program reaches 256MB on AIX.  On
GNU/Linux, it uses a lot less.  What version of PostgreSQL provided your
client libraries?

They use a 12.3 in production but have also tested on a  12.9 with the same result.
We relayed your suggestion and will get back to you with the results.

Thanks for the input !

Re: Probable memory leak with ECPG and AIX

От
Benoit Lobréau
Дата:
Our client confirmed that the more he fetches the more memory is consumed.
The segfault was indeed caused by the absence of LDR_CNTRL.

The tests show that:

* without LDR_CNTRL, we reach 256Mb and segfault ;
* with LDR_CNTRL=MAXDATA=0x10000000, we reach 256Mo but there is no segfault, the program just continues running ;
* with LDR_CNTRL=MAXDATA=0x80000000, we reach 2Go and there is no segfault either, the program just continues running.

Re: Probable memory leak with ECPG and AIX

От
Noah Misch
Дата:
On Wed, Dec 15, 2021 at 04:20:42PM +0100, Benoit Lobréau wrote:
> * with LDR_CNTRL=MAXDATA=0x10000000, we reach 256Mo but there is no
> segfault, the program just continues running ;
> * with LDR_CNTRL=MAXDATA=0x80000000, we reach 2Go and there is no segfault
> either, the program just continues running.

I get the same results.  The leak arises because AIX freelocale() doesn't free
all memory allocated in newlocale().  The following program uses trivial
memory on GNU/Linux, but it leaks like you're seeing on AIX:

#include <locale.h>
int main(int argc, char **argv)
{
    while (1)
        freelocale(newlocale(LC_NUMERIC_MASK, "C", (locale_t) 0));
    return 0;
}

If you have access to file an AIX bug, I recommend doing so.  If we want
PostgreSQL to work around this, one idea is to have ECPG do this newlocale()
less often.  For example, do it once per process or once per connection
instead of once per ecpg_do_prologue().



Re: Probable memory leak with ECPG and AIX

От
Tom Lane
Дата:
Noah Misch <noah@leadboat.com> writes:
> I get the same results.  The leak arises because AIX freelocale() doesn't free
> all memory allocated in newlocale().  The following program uses trivial
> memory on GNU/Linux, but it leaks like you're seeing on AIX:

Bleah.

> If you have access to file an AIX bug, I recommend doing so.  If we want
> PostgreSQL to work around this, one idea is to have ECPG do this newlocale()
> less often.  For example, do it once per process or once per connection
> instead of once per ecpg_do_prologue().

It's worse than that: see also ECPGget_desc().  Seems like a case
could be made for doing something about this just on the basis
of cycles expended, never mind freelocale() bugs.

            regards, tom lane



Re: Probable memory leak with ECPG and AIX

От
Noah Misch
Дата:
On Sat, Jan 01, 2022 at 11:35:02AM -0500, Tom Lane wrote:
> Noah Misch <noah@leadboat.com> writes:
> > I get the same results.  The leak arises because AIX freelocale() doesn't free
> > all memory allocated in newlocale().  The following program uses trivial
> > memory on GNU/Linux, but it leaks like you're seeing on AIX:
> 
> Bleah.
> 
> > If you have access to file an AIX bug, I recommend doing so.  If we want
> > PostgreSQL to work around this, one idea is to have ECPG do this newlocale()
> > less often.  For example, do it once per process or once per connection
> > instead of once per ecpg_do_prologue().
> 
> It's worse than that: see also ECPGget_desc().  Seems like a case
> could be made for doing something about this just on the basis
> of cycles expended, never mind freelocale() bugs.

Agreed.  Once per process seems best.  I only hesitated before since it means
nothing will free this storage, which could be annoying in the context of
Valgrind and similar.  However, ECPG already has bits of never-freed memory in
the form of pthread_key_create() calls having no pthread_key_delete(), so I
don't mind adding a bit more.



Re: Probable memory leak with ECPG and AIX

От
Guillaume Lelarge
Дата:
Le dim. 2 janv. 2022 à 01:07, Noah Misch <noah@leadboat.com> a écrit :
On Sat, Jan 01, 2022 at 11:35:02AM -0500, Tom Lane wrote:
> Noah Misch <noah@leadboat.com> writes:
> > I get the same results.  The leak arises because AIX freelocale() doesn't free
> > all memory allocated in newlocale().  The following program uses trivial
> > memory on GNU/Linux, but it leaks like you're seeing on AIX:
>
> Bleah.
>
> > If you have access to file an AIX bug, I recommend doing so.  If we want
> > PostgreSQL to work around this, one idea is to have ECPG do this newlocale()
> > less often.  For example, do it once per process or once per connection
> > instead of once per ecpg_do_prologue().
>
> It's worse than that: see also ECPGget_desc().  Seems like a case
> could be made for doing something about this just on the basis
> of cycles expended, never mind freelocale() bugs.

Agreed.  Once per process seems best.  I only hesitated before since it means
nothing will free this storage, which could be annoying in the context of
Valgrind and similar.  However, ECPG already has bits of never-freed memory in
the form of pthread_key_create() calls having no pthread_key_delete(), so I
don't mind adding a bit more.

Did this get anywhere? Is there something we could do to make this move forward?


--
Guillaume.

Re: Probable memory leak with ECPG and AIX

От
Tom Lane
Дата:
Guillaume Lelarge <guillaume@lelarge.info> writes:
> Did this get anywhere? Is there something we could do to make this move
> forward?

No.  Write a patch?

            regards, tom lane



Re: Probable memory leak with ECPG and AIX

От
Guillaume Lelarge
Дата:
Le ven. 25 mars 2022, 14:25, Tom Lane <tgl@sss.pgh.pa.us> a écrit :
Guillaume Lelarge <guillaume@lelarge.info> writes:
> Did this get anywhere? Is there something we could do to make this move
> forward?

No.  Write a patch?

I wouldn't have asked if I could write such a patch :-)

Re: Probable memory leak with ECPG and AIX

От
Noah Misch
Дата:
On Sat, Jan 01, 2022 at 04:07:50PM -0800, Noah Misch wrote:
> On Sat, Jan 01, 2022 at 11:35:02AM -0500, Tom Lane wrote:
> > Noah Misch <noah@leadboat.com> writes:
> > > I get the same results.  The leak arises because AIX freelocale() doesn't free
> > > all memory allocated in newlocale().  The following program uses trivial
> > > memory on GNU/Linux, but it leaks like you're seeing on AIX:
> > 
> > Bleah.
> > 
> > > If you have access to file an AIX bug, I recommend doing so.  If we want
> > > PostgreSQL to work around this, one idea is to have ECPG do this newlocale()
> > > less often.  For example, do it once per process or once per connection
> > > instead of once per ecpg_do_prologue().
> > 
> > It's worse than that: see also ECPGget_desc().  Seems like a case
> > could be made for doing something about this just on the basis
> > of cycles expended, never mind freelocale() bugs.
> 
> Agreed.  Once per process seems best.  I only hesitated before since it means
> nothing will free this storage, which could be annoying in the context of
> Valgrind and similar.  However, ECPG already has bits of never-freed memory in
> the form of pthread_key_create() calls having no pthread_key_delete(), so I
> don't mind adding a bit more.

The comparison to pthread_key_create() wasn't completely fair.  While POSIX
welcomes pthread_key_create() to fail with ENOMEM, the glibc implementation
appears not to allocate memory.  Even so, I'm okay leaking one newlocale() per
process lifetime.

I had expected to use pthread_once() for the newlocale() call, but there would
be no useful way to report failure and try again later.  Instead, I called
newlocale() while ECPGconnect() holds connections_mutex.  See log message and
comments for details.  I tested "./configure ac_cv_func_uselocale=no ..." and
tested the scenario of newlocale() failing every time.

Вложения

Re: Probable memory leak with ECPG and AIX

От
Tom Lane
Дата:
Noah Misch <noah@leadboat.com> writes:
> I had expected to use pthread_once() for the newlocale() call, but there would
> be no useful way to report failure and try again later.  Instead, I called
> newlocale() while ECPGconnect() holds connections_mutex.  See log message and
> comments for details.  I tested "./configure ac_cv_func_uselocale=no ..." and
> tested the scenario of newlocale() failing every time.

This looks solid to me.  The only nit I can find to pick is that I'd
have added one more comment, along the lines of

diff --git a/src/interfaces/ecpg/ecpglib/connect.c b/src/interfaces/ecpg/ecpglib/connect.c
index 9f958b822c..96f99ae072 100644
--- a/src/interfaces/ecpg/ecpglib/connect.c
+++ b/src/interfaces/ecpg/ecpglib/connect.c
@@ -508,6 +508,11 @@ ECPGconnect(int lineno, int c, const char *name, const char *user, const char *p
 #ifdef ENABLE_THREAD_SAFETY
     pthread_mutex_lock(&connections_mutex);
 #endif
+
+    /*
+     * ... but first, make certain we have created ecpg_clocale.  Rely on
+     * holding connections_mutex to ensure this is done by only one thread.
+     */
 #ifdef HAVE_USELOCALE
     if (!ecpg_clocale)
     {

I've marked it RFC.

            regards, tom lane



Re: Probable memory leak with ECPG and AIX

От
Noah Misch
Дата:
On Sat, Jul 02, 2022 at 02:53:34PM -0400, Tom Lane wrote:
> This looks solid to me.  The only nit I can find to pick is that I'd
> have added one more comment, along the lines of
> 
> diff --git a/src/interfaces/ecpg/ecpglib/connect.c b/src/interfaces/ecpg/ecpglib/connect.c
> index 9f958b822c..96f99ae072 100644
> --- a/src/interfaces/ecpg/ecpglib/connect.c
> +++ b/src/interfaces/ecpg/ecpglib/connect.c
> @@ -508,6 +508,11 @@ ECPGconnect(int lineno, int c, const char *name, const char *user, const char *p
>  #ifdef ENABLE_THREAD_SAFETY
>      pthread_mutex_lock(&connections_mutex);
>  #endif
> +
> +    /*
> +     * ... but first, make certain we have created ecpg_clocale.  Rely on
> +     * holding connections_mutex to ensure this is done by only one thread.
> +     */
>  #ifdef HAVE_USELOCALE
>      if (!ecpg_clocale)
>      {
> 
> I've marked it RFC.

Thanks for reviewing.  Pushed with that comment.  prairiedog complains[1]:

  ld: common symbols not allowed with MH_DYLIB output format with the -multi_module option
  connect.o definition of common _ecpg_clocale (size 4)

I bet this would fix it:

--- a/src/interfaces/ecpg/ecpglib/connect.c
+++ b/src/interfaces/ecpg/ecpglib/connect.c
@@ -11,7 +11,7 @@
 #include "sqlca.h"
 
 #ifdef HAVE_USELOCALE
-locale_t    ecpg_clocale;
+locale_t    ecpg_clocale = (locale_t) 0;
 #endif
 
 #ifdef ENABLE_THREAD_SAFETY

I hear[1] adding -fno-common to compiler options would also fix that.  Still,
in the absence of other opinions, I'll just add the no-op initialization.

[1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prairiedog&dt=2022-07-03%2001%3A14%3A19
[2] https://gcc.gnu.org/legacy-ml/gcc/2005-06/msg00378.html



Re: Probable memory leak with ECPG and AIX

От
Tom Lane
Дата:
Noah Misch <noah@leadboat.com> writes:
> Thanks for reviewing.  Pushed with that comment.  prairiedog complains[1]:
>   ld: common symbols not allowed with MH_DYLIB output format with the -multi_module option
>   connect.o definition of common _ecpg_clocale (size 4)

Blah.

> I bet this would fix it:

> -locale_t    ecpg_clocale;
> +locale_t    ecpg_clocale = (locale_t) 0;

Hmm, I was considering suggesting that just on stylistic grounds,
but decided it was too nitpicky even for me.
Do you want me to test it on prairiedog?

> I hear[1] adding -fno-common to compiler options would also fix that.

I've got -fno-common turned on on my other macOS animals, but in
those cases I did it to detect bugs not fix them.  I'm not sure
whether prairiedog's ancient toolchain has that switch at all,
or whether it behaves the same as in more recent platforms.
Still, that gcc.gnu.org message you cite is of the right era.

            regards, tom lane



Re: Probable memory leak with ECPG and AIX

От
Noah Misch
Дата:
On Sat, Jul 02, 2022 at 11:37:08PM -0400, Tom Lane wrote:
> Noah Misch <noah@leadboat.com> writes:
> > Thanks for reviewing.  Pushed with that comment.  prairiedog complains[1]:
> >   ld: common symbols not allowed with MH_DYLIB output format with the -multi_module option
> >   connect.o definition of common _ecpg_clocale (size 4)
> 
> Blah.
> 
> > I bet this would fix it:
> 
> > -locale_t    ecpg_clocale;
> > +locale_t    ecpg_clocale = (locale_t) 0;
> 
> Hmm, I was considering suggesting that just on stylistic grounds,
> but decided it was too nitpicky even for me.
> Do you want me to test it on prairiedog?

Sure, if it's easy enough.  If not, I'm 87% sure it will suffice.



Re: Probable memory leak with ECPG and AIX

От
Tom Lane
Дата:
Noah Misch <noah@leadboat.com> writes:
> On Sat, Jul 02, 2022 at 11:37:08PM -0400, Tom Lane wrote:
>> Do you want me to test it on prairiedog?

> Sure, if it's easy enough.  If not, I'm 87% sure it will suffice.

On it now, but it'll take a few minutes :-(

            regards, tom lane



Re: Probable memory leak with ECPG and AIX

От
Tom Lane
Дата:
I wrote:
> On it now, but it'll take a few minutes :-(

Confirmed that either initializing ecpg_clocale or adding -fno-common
allows the ecpglib build to succeed.  (Testing it beyond that would
require another hour or so to build the rest of the system, so I won't.)

As I said, I was already leaning to the idea that initializing the
variable explicitly is better style, so I recommend we do that.

            regards, tom lane



Re: Probable memory leak with ECPG and AIX

От
Noah Misch
Дата:
On Sat, Jul 02, 2022 at 11:59:58PM -0400, Tom Lane wrote:
> Confirmed that either initializing ecpg_clocale or adding -fno-common
> allows the ecpglib build to succeed.  (Testing it beyond that would
> require another hour or so to build the rest of the system, so I won't.)
> 
> As I said, I was already leaning to the idea that initializing the
> variable explicitly is better style, so I recommend we do that.

Works for me.  Pushed that way, and things have been clean.