More than 1024 connections from the same c-client

Поиск
Список
Период
Сортировка
От Andreas Muck
Тема More than 1024 connections from the same c-client
Дата
Msg-id 3F56EF3E.5040501@blitztrade.de
обсуждение исходный текст
Список pgsql-general
Hi!

We have an application running on Linux (SuSE 7.2, kernel 2.4.16) that
opens lots of connections to a Postgres database and occasionaly dies
with segfault. Trying to reproduce the crash, I came up with the
following test code:

--------------------- pgsql-test.c ---------------------
#include <stdio.h>
#include <libpq-fe.h>

int main(int argc, char **argv)
{
     PGconn *conn;
     int i;

     for (i = 0; i < 2048; i++)
     {
         conn = PQsetdbLogin("localhost", "5432", NULL, NULL,
                             "template1", "postgres", NULL);

         if (PQstatus(conn) == CONNECTION_BAD)
             printf("%5d: Connection to database FAILED\n", i+1);
         else
             printf("%5d: Connection to database OK\n", i+1);

         if (i > 1010)
         {
             sleep(10);
         }

         // PQfinish(conn);
     }

     // sleep(300);
     return 0;
}
--------------------------------------------------------

The test program segfaults after it opens 1020 connections. Then it has
exactly 1024 open file descriptors, including stdin, stdout, stderr and
a file descriptor on /proc/sys/kernel/shmmax.

The system limits on open file descriptors is set to 65535 (both ulimit
and /proc/sys/fs/file-max). It's not related to the max-backends limit
in postmaster either. The test program crashes even if postmaster is not
running at all.

The program seems to crash when it returns from pqWaitTimed(). As
pqWaitTimed uses select() to poll the file descriptors, I suppose the
crash is related to the limit of 1024 file descriptors that fd_set can hold.

The weird thing is that it's not the select() that segfaults. The
segfault occurs on return from pqWaitTimed(). It is 100% reproduceable
on one machine, but it doesn't crash on another one. GDB can't show a
backtrace from the core file:

(gdb) bt
#0  0x08049ab3 in connectDBComplete ()
Cannot access memory at address 0x1

When stepping through the program in gdb, I can see the "conn" pointer
getting lost after on the 1021st connect when pqWaitTimed() returns. So
it looks like the return stack gets corrupted or something like that.

Can anyone confirm this? Am I missing anything here?

Any idea how to get more than 1024 connections with one backend?

Andi


В списке pgsql-general по дате отправления:

Предыдущее
От: Richard Huxton
Дата:
Сообщение: Re: Crosstab function Problem
Следующее
От: Adam Kavan
Дата:
Сообщение: Re: pg_autovacuum