Обсуждение: BUG #10141: Server fails to send query result.

Поиск
Список
Период
Сортировка

BUG #10141: Server fails to send query result.

От
molind@gmail.com
Дата:
The following bug has been logged on the website:

Bug reference:      10141
Logged by:          Evgen Bodunov
Email address:      molind@gmail.com
PostgreSQL version: 9.3.4
Operating system:   Ubuntu 13.10
Description:

I have big database with geospatial data from OpenStreetMap. I prepare
vector tiles from this data using console app which requests data from
database for each tile in given area.

While it works it opens 1 db connection per thread (8 threads) and queries
all tiles from list 25k tiles in average. 5 queries per tile. During map
generation usually 2 threads stuck in idle state.

http://i.stack.imgur.com/VUU4O.png

Look at the last row. Query started. We have a timestamp, then it finished
and changed state to idle, and we have timestamp of that moment. Everything
looks ok, except console app which waits answer from server inside

PGresult *res = PQexec(conn, query);

It waits for query results from server. And I can't find why server thinks
that job is done and app says that there was no answer from db yet.

I added PQtrace for each thread and wait for stuck thread. There is trace:

>From backend> D
>From backend (#4)> 139891
>From backend> D
>From backend (#4)> 139891
>From backend> D
>From backend (#4)> 139891
>From backend> D
>From backend (#4)> 139891
>From backend> D
>From backend (#4)> 139891
>From backend> D
>From backend (#4)> 139891

Seems problem somewhere inside PostgreSQL. It tries to send result but
fails.

PostgreSQL 9.3.4-1.pgdg70+1 installed from apt source
deb http://apt.postgresql.org/pub/repos/apt/ wheezy-pgdg main

Re: BUG #10141: Server fails to send query result.

От
Stephen Frost
Дата:
Greetings,

* molind@gmail.com (molind@gmail.com) wrote:
> While it works it opens 1 db connection per thread (8 threads) and queries
> all tiles from list 25k tiles in average. 5 queries per tile. During map
> generation usually 2 threads stuck in idle state.
>=20
> http://i.stack.imgur.com/VUU4O.png

This only shows what's happening server-side.

> Look at the last row. Query started. We have a timestamp, then it finished
> and changed state to idle, and we have timestamp of that moment. Everythi=
ng
> looks ok, except console app which waits answer from server inside
>=20
> PGresult *res =3D PQexec(conn, query);
>=20
> It waits for query results from server. And I can't find why server thinks
> that job is done and app says that there was no answer from db yet.

Can you provide a small example test case which illustrates this?  You
didn't provide your code, so it's really difficult to have any idea
what's going on.

What version of libpq is being used and on what platform?  Are you using
SSL at all for your connection?

> I added PQtrace for each thread and wait for stuck thread. There is trace:
>=20
> >From backend> D
> >From backend (#4)> 139891
> >From backend> D
> >From backend (#4)> 139891
> >From backend> D
> >From backend (#4)> 139891
> >From backend> D
> >From backend (#4)> 139891
> >From backend> D
> >From backend (#4)> 139891
> >From backend> D
> >From backend (#4)> 139891
>=20
> Seems problem somewhere inside PostgreSQL. It tries to send result but
> fails.

I don't see anything here which shows that to be the case.  Is the TCP
connection still fully ESTABLISHED on both sides?  What's in the
middle..?  Could you have a firewall or IDS which is dropping the
connections?

    Thanks,

        Stephen

Re: BUG #10141: Server fails to send query result.

От
Tom Lane
Дата:
Stephen Frost <sfrost@snowman.net> writes:
> * molind@gmail.com (molind@gmail.com) wrote:
>> I added PQtrace for each thread and wait for stuck thread. There is trace:

>> From backend> D
>> From backend (#4)> 139891
>> From backend> D
>> From backend (#4)> 139891
>> From backend> D
>> From backend (#4)> 139891
>> From backend> D
>> From backend (#4)> 139891
>> From backend> D
>> From backend (#4)> 139891
>> From backend> D
>> From backend (#4)> 139891

FWIW, this looks suspiciously like libpq is unable to consume data from
its input buffer (and keeps retrying to process the same data).  Are you
using PQsetnonblocking by any chance?  If so, this probably indicates
failure to follow the required call sequencing to process data.

Another likely theory, given that you mentioned multiple client threads,
is that the threads are stomping on each others' toes somehow.  libpq
does not defend itself against that: it's up to you to be sure that
only one thread is touching each PGconn object.

>> Seems problem somewhere inside PostgreSQL. It tries to send result but
>> fails.

> I don't see anything here which shows that to be the case.

Indeed.  I'd bet considerable money that this is a client-side issue.
It's possible that it's a libpq bug, but what seems more likely is
that you're using libpq improperly.

            regards, tom lane

Re: BUG #10141: Server fails to send query result.

От
Evgen Bodunov
Дата:
There is more details:

Even when it works it sends D more than once usually

=46rom backend> D
=46rom backend (#4)> 33170
=46rom backend> D
=46rom backend (#4)> 33170
=46rom backend> D
=46rom backend (#4)> 33170
=46rom backend> D
=46rom backend (#4)> 33170
=46rom backend> D
=46rom backend (#4)> 33170
=46rom backend (#2)> 2
=46rom backend (#4)> 33132
=46rom backend (33132)> =
\x01060000006c0000000103000000010000000a00000080ce26eb13eaa640003fad186435=
9840800c2b00bdeea6400055d
=46rom backend (#4)> 24
=46rom backend (24)> "landuse"=3D>"residential"

My query looks like
select * from poly9(3130860.6785608195, 7592337.1455099881, =
3209132.1955248402, 7670608.6624740083);

while poly9 is function generated by app before data processing starts. =
And basically it looks like:

CREATE FUNCTION "public"."poly9"(IN minx numeric, IN miny numeric, IN =
maxx numeric, IN maxy numeric)=20
returns table (geom_out bytea, tags_out public.hstore)=20
    AS $BODY$
declare
bbox =3D st_makebox2d(st_makepoint(minx, miny), st_makepoint(maxx, =
maxy));
begin
-- simplified version
return query
SELECT way, tags FROM planet_polygon WHERE way && bbox;=20
end;
$BODY$
    LANGUAGE plpgsql
    COST 100
    ROWS 1000
    CALLED ON NULL INPUT
    SECURITY INVOKER
    VOLATILE;

Hope it helps. This solution worked fine 2 months ago. Is there a small =
chance that something broken inside 9.3.4?

Best Regards,
Evgen Bodunov.

On Apr 25, 2014, at 10:20 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> Stephen Frost <sfrost@snowman.net> writes:
>> * molind@gmail.com (molind@gmail.com) wrote:
>>> I added PQtrace for each thread and wait for stuck thread. There is =
trace:
>=20
>>> =46rom backend> D
>>> =46rom backend (#4)> 139891
>>> =46rom backend> D
>>> =46rom backend (#4)> 139891
>>> =46rom backend> D
>>> =46rom backend (#4)> 139891
>>> =46rom backend> D
>>> =46rom backend (#4)> 139891
>>> =46rom backend> D
>>> =46rom backend (#4)> 139891
>>> =46rom backend> D
>>> =46rom backend (#4)> 139891
>=20
> FWIW, this looks suspiciously like libpq is unable to consume data =
from
> its input buffer (and keeps retrying to process the same data).  Are =
you
> using PQsetnonblocking by any chance?  If so, this probably indicates
> failure to follow the required call sequencing to process data.
>=20
> Another likely theory, given that you mentioned multiple client =
threads,
> is that the threads are stomping on each others' toes somehow.  libpq
> does not defend itself against that: it's up to you to be sure that
> only one thread is touching each PGconn object.
>=20
>>> Seems problem somewhere inside PostgreSQL. It tries to send result =
but
>>> fails.
>=20
>> I don't see anything here which shows that to be the case.
>=20
> Indeed.  I'd bet considerable money that this is a client-side issue.
> It's possible that it's a libpq bug, but what seems more likely is
> that you're using libpq improperly.
>=20
>             regards, tom lane

Re: BUG #10141: Server fails to send query result.

От
Evgen Bodunov
Дата:
Hello guys,=20

I checked it with direct ethernet connection and now it works without =
issues. When i switched OpenVPN server from "proto udp=94 to =93proto =
tcp" mode it seems also works ok.

Thanks for your help.

Best Regards,
Evgen Bodunov.

On Apr 25, 2014, at 11:55 PM, Evgen Bodunov <molind@gmail.com> wrote:

> There is more details:
>=20
> Even when it works it sends D more than once usually
>=20
> =46rom backend> D
> =46rom backend (#4)> 33170
> =46rom backend> D
> =46rom backend (#4)> 33170
> =46rom backend> D
> =46rom backend (#4)> 33170
> =46rom backend> D
> =46rom backend (#4)> 33170
> =46rom backend> D
> =46rom backend (#4)> 33170
> =46rom backend (#2)> 2
> =46rom backend (#4)> 33132
> =46rom backend (33132)> =
\x01060000006c0000000103000000010000000a00000080ce26eb13eaa640003fad186435=
9840800c2b00bdeea6400055d
> =46rom backend (#4)> 24
> =46rom backend (24)> "landuse"=3D>"residential"
>=20
> My query looks like
> select * from poly9(3130860.6785608195, 7592337.1455099881, =
3209132.1955248402, 7670608.6624740083);
>=20
> while poly9 is function generated by app before data processing =
starts. And basically it looks like:
>=20
> CREATE FUNCTION "public"."poly9"(IN minx numeric, IN miny numeric, IN =
maxx numeric, IN maxy numeric)=20
> returns table (geom_out bytea, tags_out public.hstore)=20
>     AS $BODY$
> declare
> bbox =3D st_makebox2d(st_makepoint(minx, miny), st_makepoint(maxx, =
maxy));
> begin
> -- simplified version
> return query
> SELECT way, tags FROM planet_polygon WHERE way && bbox;=20
> end;
> $BODY$
>     LANGUAGE plpgsql
>     COST 100
>     ROWS 1000
>     CALLED ON NULL INPUT
>     SECURITY INVOKER
>     VOLATILE;
>=20
> Hope it helps. This solution worked fine 2 months ago. Is there a =
small chance that something broken inside 9.3.4?
>=20
> Best Regards,
> Evgen Bodunov.
>=20
> On Apr 25, 2014, at 10:20 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>=20
>> Stephen Frost <sfrost@snowman.net> writes:
>>> * molind@gmail.com (molind@gmail.com) wrote:
>>>> I added PQtrace for each thread and wait for stuck thread. There is =
trace:
>>=20
>>>> =46rom backend> D
>>>> =46rom backend (#4)> 139891
>>>> =46rom backend> D
>>>> =46rom backend (#4)> 139891
>>>> =46rom backend> D
>>>> =46rom backend (#4)> 139891
>>>> =46rom backend> D
>>>> =46rom backend (#4)> 139891
>>>> =46rom backend> D
>>>> =46rom backend (#4)> 139891
>>>> =46rom backend> D
>>>> =46rom backend (#4)> 139891
>>=20
>> FWIW, this looks suspiciously like libpq is unable to consume data =
from
>> its input buffer (and keeps retrying to process the same data).  Are =
you
>> using PQsetnonblocking by any chance?  If so, this probably indicates
>> failure to follow the required call sequencing to process data.
>>=20
>> Another likely theory, given that you mentioned multiple client =
threads,
>> is that the threads are stomping on each others' toes somehow.  libpq
>> does not defend itself against that: it's up to you to be sure that
>> only one thread is touching each PGconn object.
>>=20
>>>> Seems problem somewhere inside PostgreSQL. It tries to send result =
but
>>>> fails.
>>=20
>>> I don't see anything here which shows that to be the case.
>>=20
>> Indeed.  I'd bet considerable money that this is a client-side issue.
>> It's possible that it's a libpq bug, but what seems more likely is
>> that you're using libpq improperly.
>>=20
>>             regards, tom lane
>=20