Обсуждение: libpq bad async behaviour

Поиск
Список
Период
Сортировка

libpq bad async behaviour

От
Daurnimator
Дата:
I'm worried about libpq blocking in some circumstances; particularly
around SSL renegotiations.
This came up while writing an async postgres library for lua, I
realised that this code was dangerous:
https://github.com/daurnimator/cqueues-pgsql/blob/ee9c3fc85c94669b8128386d99d730fe93d9dbad/cqueues-pgsql.lua#L121


e.g. 1:
When a PQ connection is in non-blocking mode, PQflush returns 1, the docs say:
> wait for the socket to be write-ready and call it again
However, if the SSL layer is waiting on data for a renegotiation,
write readiness is not enough:
Waiting for POLLOUT and calling PQflush again will (untested) just
return 1 again, and continue to do so until data is recieved.
This is a busy-loop, and will block the host application.

e.g. 2:
An SSL renegiation happens while trying to receive a response.
According to 'andres' on IRC, inside of `PQisBusy` there is a busy loop:
> 14:22:32 andres You'll not see that. Even though the explanation for it is absolutely horrid.
> 14:23:32 andres There's a busy retry loop because of exactly that reason inside libpq's ssl read function whenever it
hitsa WANT_WRITE.
 
> 14:23:58 daurnimator so... libpq will block my process? :(
> 14:24:25 andres daurnimator: That case is unlikely to be hit often luckily because of the OS buffering. But yea, it's
reallyunsatisfying.
 
> 14:26:06 andres daurnimator: I think this'll need a new API to be properly fixed.


One idea that came to mind if we want to keep the same api, is to hide
the socket behind an epoll file descriptor,
they always poll read ready when an fd in their set becomes ready.
I think this is also possible for kqueue on bsd, ports on solaris and
IOCP on windows.


Regards,
Daurnimator.



Re: libpq bad async behaviour

От
Robert Haas
Дата:
On Fri, Jan 9, 2015 at 2:57 PM, Daurnimator <quae@daurnimator.com> wrote:
> I'm worried about libpq blocking in some circumstances; particularly
> around SSL renegotiations.
> This came up while writing an async postgres library for lua, I
> realised that this code was dangerous:
> https://github.com/daurnimator/cqueues-pgsql/blob/ee9c3fc85c94669b8128386d99d730fe93d9dbad/cqueues-pgsql.lua#L121
>
>
> e.g. 1:
> When a PQ connection is in non-blocking mode, PQflush returns 1, the docs say:
>> wait for the socket to be write-ready and call it again
> However, if the SSL layer is waiting on data for a renegotiation,
> write readiness is not enough:
> Waiting for POLLOUT and calling PQflush again will (untested) just
> return 1 again, and continue to do so until data is recieved.
> This is a busy-loop, and will block the host application.
>
> e.g. 2:
> An SSL renegiation happens while trying to receive a response.
> According to 'andres' on IRC, inside of `PQisBusy` there is a busy loop:
>> 14:22:32 andres You'll not see that. Even though the explanation for it is absolutely horrid.
>> 14:23:32 andres There's a busy retry loop because of exactly that reason inside libpq's ssl read function whenever
ithits a WANT_WRITE.
 
>> 14:23:58 daurnimator so... libpq will block my process? :(
>> 14:24:25 andres daurnimator: That case is unlikely to be hit often luckily because of the OS buffering. But yea,
it'sreally unsatisfying.
 
>> 14:26:06 andres daurnimator: I think this'll need a new API to be properly fixed.
>
>
> One idea that came to mind if we want to keep the same api, is to hide
> the socket behind an epoll file descriptor,
> they always poll read ready when an fd in their set becomes ready.
> I think this is also possible for kqueue on bsd, ports on solaris and
> IOCP on windows.

Yeah, this is a problem. Fixing it isn't easy, though, I think.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: libpq bad async behaviour

От
Andres Freund
Дата:
On 2015-01-14 08:32:19 -0500, Robert Haas wrote:
> On Fri, Jan 9, 2015 at 2:57 PM, Daurnimator <quae@daurnimator.com> wrote:
> > I'm worried about libpq blocking in some circumstances; particularly
> > around SSL renegotiations.
> > This came up while writing an async postgres library for lua, I
> > realised that this code was dangerous:
> > https://github.com/daurnimator/cqueues-pgsql/blob/ee9c3fc85c94669b8128386d99d730fe93d9dbad/cqueues-pgsql.lua#L121
> >
> >
> > e.g. 1:
> > When a PQ connection is in non-blocking mode, PQflush returns 1, the docs say:
> >> wait for the socket to be write-ready and call it again
> > However, if the SSL layer is waiting on data for a renegotiation,
> > write readiness is not enough:
> > Waiting for POLLOUT and calling PQflush again will (untested) just
> > return 1 again, and continue to do so until data is recieved.
> > This is a busy-loop, and will block the host application.
> >
> > e.g. 2:
> > An SSL renegiation happens while trying to receive a response.
> > According to 'andres' on IRC, inside of `PQisBusy` there is a busy loop:
> >> 14:22:32 andres You'll not see that. Even though the explanation for it is absolutely horrid.
> >> 14:23:32 andres There's a busy retry loop because of exactly that reason inside libpq's ssl read function whenever
ithits a WANT_WRITE.
 
> >> 14:23:58 daurnimator so... libpq will block my process? :(
> >> 14:24:25 andres daurnimator: That case is unlikely to be hit often luckily because of the OS buffering. But yea,
it'sreally unsatisfying.
 
> >> 14:26:06 andres daurnimator: I think this'll need a new API to be properly fixed.
> >
> >
> > One idea that came to mind if we want to keep the same api, is to hide
> > the socket behind an epoll file descriptor,
> > they always poll read ready when an fd in their set becomes ready.
> > I think this is also possible for kqueue on bsd, ports on solaris and
> > IOCP on windows.

I think that kind of solution isn't likely to be satisfying. The amount
of porting work is just not going to be worth the cost. And it won't be
easily hideable in the API at all as the callers will expect a normal
fd.

> Yeah, this is a problem. Fixing it isn't easy, though, I think.

I think
extern PostgresPollingStatusType PQconnectPoll(PGconn *conn);
has the right interface. It returns what upper layers need to wait
for. I think we should extend pretty much that to more interfaces. This
likely means that we'll need extended versions of PQFlush() and
PQconsumeInput() - afaics it shouldn't be much more?

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: libpq bad async behaviour

От
Daurnimator
Дата:
On 14 January 2015 at 08:40, Andres Freund <andres@2ndquadrant.com> wrote:
I think that kind of solution isn't likely to be satisfying. The amount
of porting work is just not going to be worth the cost. And it won't be
easily hideable in the API at all as the callers will expect a normal
fd.

All that consumers of the API need is something they can `select()` or equivalent on.
 
> Yeah, this is a problem. Fixing it isn't easy, though, I think.

I think
extern PostgresPollingStatusType PQconnectPoll(PGconn *conn);
has the right interface. It returns what upper layers need to wait
for. I think we should extend pretty much that to more interfaces.

This would be a fine solution. That enum indeed has the correct values/semantics.
  
This
likely means that we'll need extended versions of PQFlush() and
PQconsumeInput() - afaics it shouldn't be much more?

PQping?
PQconnectPoll already has it.

Though, I think we could probably even reduce this down to a single common function for all cases:
PQpoll() or similar.