BUG #6342: libpq blocks forever in "poll" function
| От | andreagrassi@sogeasoft.com |
|---|---|
| Тема | BUG #6342: libpq blocks forever in "poll" function |
| Дата | |
| Msg-id | E1RbSUA-0003kd-Tb@wrigleys.postgresql.org обсуждение исходный текст |
| Ответы |
Re: BUG #6342: libpq blocks forever in "poll" function
Re: BUG #6342: libpq blocks forever in "poll" function |
| Список | pgsql-bugs |
The following bug has been logged on the website:
Bug reference: 6342
Logged by: Andrea Grassi
Email address: andreagrassi@sogeasoft.com
PostgreSQL version: 8.4.8
Operating system: SUSE SLES 10 SP4 64 BIT
Description:=20=20=20=20=20=20=20=20
Hi,=20
I have a big and strange problem. Sometimes, libpq remains blocked in =E2=
=80=9Cpoll=E2=80=9D
function even if the server has already answered to the query. If I attach
to the process using kdbg I found this stack:
__kernel_vsyscall()
poll() from /lib/libc.so.6
pqSocketCheck() from /home/pg/pgsql/lib-32/libpq.so.5
pqWaitTimed() from /home/pg/pgsql/lib-32/libpq.so.5
pqWait() from /home/pg/pgsql/lib-32/libpq.so.5
PQgetResult() from /home/pg/pgsql/lib-32/libpq.so.5
PQexecFinish() from /home/pg/pgsql/lib-32/libpq.so.5
=E2=80=A6
To simplify the context and to reproduce the bug, I wrote a test program
(that I attach below) that uses only libpq interface (no other strange
libraries) to read my database at localhost.=20
It loop on a table of 64000 rows and for each row it reads another table.
Generally it take 1 minute to work. I put this program in a loop, so once it
finishes, it restarts.=20
Usually it works fine but sometimes (without any rule) it blocks. It blocks
always (with the stack above) executing PQexec function (=E2=80=9CCLOSE CUR=
SOR xx=E2=80=9D
or =E2=80=9CFETCH ALL IN xx=E2=80=9D).
If I press =E2=80=9Ccontinue=E2=80=9D on kdbg after attaching the process, =
the programs
continue its execution and exit with success.
Here the specifics of the platform (a SLES 10 SP4 64-bit WITHOUT any
VMWARE)
Server
HP DL 580 G7
4 CPU INTEL XEON X7550
64 GB RAM
8 HD 600GB SAS DP 6G 2,5=E2=80=9D RAID 1 e RAID5
S.O.=20
SUSE SLES 10 SP4 64 BIT
Kernel=20
Linux linuxspanesi 2.6.16.60-0.85.1-smp #1 SMP Thu Mar 17 11:45:06 UTC 2011
x86_64 x86_64 x86_64 GNU/Linux
Server Postgres=20
8.4.8 - 64-bit=20
Libpq
8.4.8 =E2=80=93 32-bit=20
I try to recompile libpq in=20
- debug mode
- on a 64-bit machine with =E2=80=93m32 option
- on a 32-bit machine=20
- setting HAVE_POLL to false at line 1053 in fe-misc.c to force to execute
the other branch of =E2=80=9C#ifdef/else=E2=80=9D using the function =E2=80=
=9Cselect()=E2=80=9D instead of
=E2=80=9Cpoll()=E2=80=9D
but none fixes the bug. I had the same stack as above, except for the last
case in which I had =E2=80=9C___newselect_nocancel()=E2=80=9D instead of =
=E2=80=9Cpoll()=E2=80=9D.
If I check the state of the connection using the =E2=80=9Cnetstat=E2=80=9D =
command I get
this output:
tcp 24 0 127.0.0.1:49007 127.0.0.1:5432=20=20=20=20=
=20=20=20
ESTABLISHED 17415/pq_example.e
where the second field (recv-Q) is always blocked to a non-zero value.
It seems as the server has already answered but the libpq or poll function
don=E2=80=99t realize it.=20
Consider that the machine is very good and very fast.
It seems that the answer of the server arrives before the libpq starts
waiting for it (calling poll). Could be ?=20
I try to install a VMware this the same version of Linux and same version of
the kernel on a machine much less powerful: my program works fine and never
blocks.
Here below the code of the example program:
/*
* testlibpq.c
*
* Test the C version of libpq, the PostgreSQL frontend library.
*/
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "libpq-fe.h"
static void
exit_nicely(PGconn *conn)
{
PQfinish(conn);
exit(1);
}
int
main(int argc, char **argv)
{
const char *conninfo;
PGconn *conn;
PGresult *res;
int i,
j;
/*
* If the user supplies a parameter on the command line, use it as the
* conninfo string; otherwise default to setting dbname=3Dpostgres and
using
* environment variables or defaults for all other connection
parameters.
*/
/* Make a connection to the database */
#ifdef CASE1
conn =3D PQsetdbLogin( getenv("SQLSERVER"), // pghost
0, // pgport
0, // pgoptions
0, // pgtty
"OSA", // dbName
0, // login
0 // pwd
);
#else
conn =3D PQconnectdb("dbname =3D OSA");
#endif
/* Check to see that the backend connection was successfully made */
if (PQstatus(conn) !=3D CONNECTION_OK)
{
fprintf(stderr, "Connection to database failed: %s",
PQerrorMessage(conn));
exit_nicely(conn);
}
res =3D PQexec (conn, "SET datestyle=3D'ISO'");
switch (PQresultStatus (res))
{
case PGRES_BAD_RESPONSE:
case PGRES_NONFATAL_ERROR:
case PGRES_FATAL_ERROR:
fprintf(stderr, "SET DATESTYLE command failed: %s",
PQresultErrorMessage(res));
break;
}
PQclear(res);
/*
* Our test case here involves using a cursor, for which we must be
inside
* a transaction block. We could do the whole thing with a single
* PQexec() of "select * from pg_database", but that's too trivial to
make
* a good example.
*/
/* Start a transaction block */
res =3D PQexec(conn, "BEGIN");
if (PQresultStatus(res) !=3D PGRES_COMMAND_OK)
{
fprintf(stderr, "BEGIN command failed: %s", PQerrorMessage(conn));
PQclear(res);
exit_nicely(conn);
}
/*
* Should PQclear PGresult whenever it is no longer needed to avoid
memory
* leaks
*/
PQclear(res);
/*
* Fetch rows from pg_database, the system catalog of databases
*/
res =3D PQexec(conn, "DECLARE articoli CURSOR FOR select cdart from
base_a_artico ORDER BY cdart");
if (PQresultStatus(res) !=3D PGRES_COMMAND_OK)
{
fprintf(stderr, "DECLARE CURSOR failed: %s", PQerrorMessage(conn));
PQclear(res);
exit_nicely(conn);
}
PQclear(res);
res =3D PQexec(conn, "FETCH ALL in articoli");
if (PQresultStatus(res) !=3D PGRES_TUPLES_OK)
{
fprintf(stderr, "FETCH ALL failed: %s", PQerrorMessage(conn));
PQclear(res);
exit_nicely(conn);
}
/* next, print out the rows */
for (i =3D 0; i < PQntuples(res); i++)
{
read_rigpia(conn, PQgetvalue(res, i, 0));
}
PQclear(res);
/* close the portal ... we don't bother to check for errors ... */
res =3D PQexec(conn, "CLOSE articoli");
PQclear(res);
/* end the transaction */
res =3D PQexec(conn, "END");
PQclear(res);
/* close the connection to the database and cleanup */
PQfinish(conn);
return 0;
}
int read_rigpia(PGconn* conn, char* cdart)
{
PGresult *res; char sql[1024]; int i;
char* dtfab;
char* sum;
memset(sql,0,sizeof(sql));
sprintf(sql,"DECLARE rigpia CURSOR FOR select dtfab,sum(qtfan-qtpro)
from adp_d_rigpia where flsta=3D'' and cdart=3D'%s' and qtfan>qtpro and cdd=
pu
not in ('04','05','06','07','08','09',
'91','92','93','94','95','96','97','98','A0','B8','C2','LF','SC') group by
dtfab", cdart);
res =3D PQexec(conn, sql);=20
if (PQresultStatus(res) !=3D PGRES_COMMAND_OK)
{
fprintf(stderr, "DECLARE CURSOR rigpia failed: %s *** %s",
PQerrorMessage(conn),sql);
PQclear(res);
return 0;=20
}
PQclear(res);
res =3D PQexec(conn, "FETCH ALL in rigpia");
if (PQresultStatus(res) !=3D PGRES_TUPLES_OK)
{
fprintf(stderr, "FETCH ALL failed in rigpia: %s",
PQerrorMessage(conn));
PQclear(res);
return 0;
}
/* next, print out the rows */
for (i =3D 0; i < PQntuples(res); i++)
{
dtfab =3D PQgetvalue(res, i, 0);
sum =3D PQgetvalue(res, i, 1);
}
PQclear(res); res =3D PQexec(conn, "CLOSE rigpia"); PQclear(res);
}
Regards,=20
Andrea=20
В списке pgsql-bugs по дате отправления: