BUG #6342: libpq blocks forever in "poll" function

Поиск
Список
Период
Сортировка
От andreagrassi@sogeasoft.com
Тема BUG #6342: libpq blocks forever in "poll" function
Дата
Msg-id E1RbSUA-0003kd-Tb@wrigleys.postgresql.org
обсуждение исходный текст
Ответы Re: BUG #6342: libpq blocks forever in "poll" function  (Craig Ringer <ringerc@ringerc.id.au>)
Re: BUG #6342: libpq blocks forever in "poll" function  (Craig Ringer <ringerc@ringerc.id.au>)
Список pgsql-bugs
The following bug has been logged on the website:

Bug reference:      6342
Logged by:          Andrea Grassi
Email address:      andreagrassi@sogeasoft.com
PostgreSQL version: 8.4.8
Operating system:   SUSE SLES 10 SP4 64 BIT
Description:=20=20=20=20=20=20=20=20

Hi,=20
I have a big and strange problem. Sometimes, libpq remains blocked in =E2=
=80=9Cpoll=E2=80=9D
function even if the server has already answered to the query. If I attach
to the process using kdbg I found this stack:

__kernel_vsyscall()
poll()                          from /lib/libc.so.6
pqSocketCheck()  from /home/pg/pgsql/lib-32/libpq.so.5
pqWaitTimed()      from /home/pg/pgsql/lib-32/libpq.so.5
pqWait()                  from /home/pg/pgsql/lib-32/libpq.so.5
PQgetResult()       from /home/pg/pgsql/lib-32/libpq.so.5
PQexecFinish()     from /home/pg/pgsql/lib-32/libpq.so.5
=E2=80=A6


To simplify the context and to reproduce the bug, I wrote a test program
(that I attach below) that uses only libpq interface (no other strange
libraries) to read my database at localhost.=20
It loop on a table of 64000 rows and for each row it reads another table.
Generally it take 1 minute to work. I put this program in a loop, so once it
finishes, it restarts.=20
Usually it works fine but sometimes (without any rule) it blocks. It blocks
always (with the stack above) executing PQexec function (=E2=80=9CCLOSE CUR=
SOR xx=E2=80=9D
or =E2=80=9CFETCH ALL IN xx=E2=80=9D).
If I press =E2=80=9Ccontinue=E2=80=9D on kdbg after attaching the process, =
the programs
continue its execution and exit with success.
Here the specifics of the platform (a SLES 10 SP4 64-bit WITHOUT any
VMWARE)

Server
HP DL 580 G7
4 CPU INTEL XEON X7550
64 GB RAM
8 HD 600GB SAS DP 6G 2,5=E2=80=9D RAID 1 e RAID5

S.O.=20
SUSE SLES 10 SP4 64 BIT

Kernel=20
Linux linuxspanesi 2.6.16.60-0.85.1-smp #1 SMP Thu Mar 17 11:45:06 UTC 2011
x86_64 x86_64 x86_64 GNU/Linux

Server Postgres=20
8.4.8 - 64-bit=20

Libpq
8.4.8 =E2=80=93 32-bit=20

I try to recompile libpq in=20
-    debug mode
-    on a 64-bit machine with =E2=80=93m32 option
-    on a 32-bit machine=20
-    setting HAVE_POLL to false at line 1053 in fe-misc.c to force to execute
the other branch of =E2=80=9C#ifdef/else=E2=80=9D using the function =E2=80=
=9Cselect()=E2=80=9D instead of
=E2=80=9Cpoll()=E2=80=9D
but none fixes the bug. I had the same stack as above, except for the last
case in which I had =E2=80=9C___newselect_nocancel()=E2=80=9D instead of =
=E2=80=9Cpoll()=E2=80=9D.

If I check the state of the connection using the =E2=80=9Cnetstat=E2=80=9D =
command I get
this output:

tcp         24      0    127.0.0.1:49007        127.0.0.1:5432=20=20=20=20=
=20=20=20
ESTABLISHED        17415/pq_example.e

where the second field (recv-Q) is always blocked to a non-zero value.
It seems as the server has already answered but the libpq or poll function
don=E2=80=99t realize it.=20
Consider that the machine is very good and very fast.
It seems that the answer of the server arrives before the libpq starts
waiting for it (calling poll). Could be ?=20

I try to install a VMware this the same version of Linux and same version of
the kernel on a machine much less powerful: my program works fine and never
blocks.

Here below the code of the example program:

/*
 * testlibpq.c
 *
 *      Test the C version of libpq, the PostgreSQL frontend library.
 */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "libpq-fe.h"

static void
exit_nicely(PGconn *conn)
{
    PQfinish(conn);
    exit(1);
}

int
main(int argc, char **argv)
{
    const char *conninfo;
    PGconn     *conn;
    PGresult   *res;
    int         i,
                j;
    /*
     * If the user supplies a parameter on the command line, use it as the
     * conninfo string; otherwise default to setting dbname=3Dpostgres and
using
     * environment variables or defaults for all other connection
parameters.
     */

    /* Make a connection to the database */
#ifdef CASE1
       conn =3D PQsetdbLogin( getenv("SQLSERVER"),             // pghost
                            0,                               // pgport
                            0,                               // pgoptions
                            0,                               // pgtty
                            "OSA",                           // dbName
                            0,                               // login
                            0                                // pwd
                           );
#else
      conn =3D PQconnectdb("dbname =3D OSA");
#endif

    /* Check to see that the backend connection was successfully made */
    if (PQstatus(conn) !=3D CONNECTION_OK)
    {
        fprintf(stderr, "Connection to database failed: %s",
                PQerrorMessage(conn));
        exit_nicely(conn);
    }

    res =3D PQexec (conn, "SET datestyle=3D'ISO'");
    switch (PQresultStatus (res))
     {
      case PGRES_BAD_RESPONSE:
      case PGRES_NONFATAL_ERROR:
      case PGRES_FATAL_ERROR:
         fprintf(stderr, "SET DATESTYLE command failed: %s",
PQresultErrorMessage(res));
         break;
     }
    PQclear(res);


    /*
     * Our test case here involves using a cursor, for which we must be
inside
     * a transaction block.  We could do the whole thing with a single
     * PQexec() of "select * from pg_database", but that's too trivial to
make
     * a good example.
     */

    /* Start a transaction block */
    res =3D PQexec(conn, "BEGIN");
    if (PQresultStatus(res) !=3D PGRES_COMMAND_OK)
    {
        fprintf(stderr, "BEGIN command failed: %s", PQerrorMessage(conn));
        PQclear(res);
        exit_nicely(conn);
    }

    /*
     * Should PQclear PGresult whenever it is no longer needed to avoid
memory
     * leaks
     */
    PQclear(res);

    /*
     * Fetch rows from pg_database, the system catalog of databases
     */
    res =3D PQexec(conn, "DECLARE articoli CURSOR FOR select cdart from
base_a_artico ORDER BY cdart");
    if (PQresultStatus(res) !=3D PGRES_COMMAND_OK)
    {
        fprintf(stderr, "DECLARE CURSOR failed: %s", PQerrorMessage(conn));
        PQclear(res);
        exit_nicely(conn);
    }
    PQclear(res);

    res =3D PQexec(conn, "FETCH ALL in articoli");
    if (PQresultStatus(res) !=3D PGRES_TUPLES_OK)
    {
        fprintf(stderr, "FETCH ALL failed: %s", PQerrorMessage(conn));
        PQclear(res);
        exit_nicely(conn);
    }

    /* next, print out the rows */
    for (i =3D 0; i < PQntuples(res); i++)
    {
        read_rigpia(conn, PQgetvalue(res, i, 0));
    }

    PQclear(res);

    /* close the portal ... we don't bother to check for errors ... */
    res =3D PQexec(conn, "CLOSE articoli");
    PQclear(res);

    /* end the transaction */
    res =3D PQexec(conn, "END");
    PQclear(res);

    /* close the connection to the database and cleanup */
    PQfinish(conn);

    return 0;
}

int read_rigpia(PGconn* conn, char* cdart)
{
    PGresult   *res; char sql[1024]; int i;
    char* dtfab;
    char* sum;

    memset(sql,0,sizeof(sql));
    sprintf(sql,"DECLARE rigpia CURSOR FOR select dtfab,sum(qtfan-qtpro)
from adp_d_rigpia where flsta=3D'' and cdart=3D'%s' and qtfan>qtpro and cdd=
pu
not in ('04','05','06','07','08','09',
'91','92','93','94','95','96','97','98','A0','B8','C2','LF','SC') group by
dtfab", cdart);

    res =3D PQexec(conn, sql);=20
    if (PQresultStatus(res) !=3D PGRES_COMMAND_OK)
    {
        fprintf(stderr, "DECLARE CURSOR rigpia failed: %s *** %s",
PQerrorMessage(conn),sql);
        PQclear(res);
        return 0;=20
    }

    PQclear(res);
    res =3D PQexec(conn, "FETCH ALL in rigpia");
    if (PQresultStatus(res) !=3D PGRES_TUPLES_OK)
    {
        fprintf(stderr, "FETCH ALL failed in rigpia: %s",
PQerrorMessage(conn));
        PQclear(res);
        return 0;
    }

    /* next, print out the rows */
    for (i =3D 0; i < PQntuples(res); i++)
    {
        dtfab =3D PQgetvalue(res, i, 0);
        sum   =3D PQgetvalue(res, i, 1);
    }

    PQclear(res); res =3D PQexec(conn, "CLOSE rigpia"); PQclear(res);
}

Regards,=20
Andrea=20

В списке pgsql-bugs по дате отправления:

Предыдущее
От: "Holec, JPH Software"
Дата:
Сообщение: user names & non-ASCII
Следующее
От: Craig Ringer
Дата:
Сообщение: Re: BUG #6342: libpq blocks forever in "poll" function