Обсуждение: libpq v17 PQsocketPoll timeout is not granular enough

Поиск
Список
Период
Сортировка

libpq v17 PQsocketPoll timeout is not granular enough

От
Dominique Devienne
Дата:
Hi. I've noticed [that libpq API in v17 beta1][1], and wanted to use
it to replace an existing Boost.ASIO-based async polling of the
connection's socket, waiting for notifications. The use case being
using PostgreSQL LISTEN/NOTIFY for a simple message queue. The code
needs to be cross-platform Windows and Linux. My goal would be to
eliminate that Boost.ASIO dependency for that, to use just libpq.

PQsocketPoll() being based on time_t, it has only second resolution, AFAIK.
Despite the [underlying implementation in fe-misc.c][2] supporting at
least milliseconds.

My use case is clients posting "requests" to the "queue" (i.e. a
PostgreSQL table), which trigger NOTIFY messages, waited on by
"servers"; and those servers informing back clients via further
notifications (on per-request channels) about the processing status of
their requests.

In that use case, second-resolution on long-lived servers is OK.
But OTOH, for the client side, waiting 1s or more to know whether a
server picked up their request is "too long / slow". I'd need
millisecond timeouts for that.

The doc for PQsocketPoll() mentions a different use case for that API.
But I think it would a pity if that unreleased API couldn't be made to
accomodate sub-second timeouts and more use-cases, like above.
Especially given that libpq v17 isn't out yet. I may come late to the
game, but hopefully it is not too late.

Thoughts? Thanks, --DD

[1]: https://www.postgresql.org/docs/17/libpq-connect.html#LIBPQ-PQSOCKETPOLL
[2]: https://github.com/postgres/postgres/blob/master/src/interfaces/libpq/fe-misc.c#L1086



Re: libpq v17 PQsocketPoll timeout is not granular enough

От
Tom Lane
Дата:
Dominique Devienne <ddevienne@gmail.com> writes:
> PQsocketPoll() being based on time_t, it has only second resolution, AFAIK.
> Despite the [underlying implementation in fe-misc.c][2] supporting at
> least milliseconds.
> ...
> But I think it would a pity if that unreleased API couldn't be made to
> accomodate sub-second timeouts and more use-cases, like above.
> Especially given that libpq v17 isn't out yet. I may come late to the
> game, but hopefully it is not too late.

This is an interesting suggestion, but I think yes it's too late.
We're already past beta1 and this'd require some fairly fundamental
rethinking, since there's no easy substitute for type time_t that has
millisecond resolution.  (The callers do want to specify an end time
not a timeout interval, since some of them loop around PQsocketPoll
and don't want the end time to slip.)

I guess conceivably we could use gettimeofday() and struct timeval
instead of time() and time_t, but it'd touch a lot of places in
libpq and it'd make some of the calculations a lot more complex.

Maybe a better idea would be to convert to using our
src/include/portability/instr_time.h abstraction?  But that
would be problematic for outside callers.

In any case this doesn't seem like a sane thing to be redesigning
post-beta.  A few months ago maybe we'd have done it, but ...

            regards, tom lane



Re: libpq v17 PQsocketPoll timeout is not granular enough

От
Dominique Devienne
Дата:
Bummer… I didn’t presume to suggest an api before, but simply adding an extra int with the milliseconds offset from the time_t is simple, and trivial to plug into the implementation I saw. Callers who don’t care can simply pass zero. while I could pass a computed time_t and ms offset using <chrono>. No need for fancy types imho. Aren’t betas precisely for the purpose of exposing apis to those like myself to vet them? This is also beta1, I,e, the first one. My €0.02

On Mon, Jun 10, 2024 at 6:18 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Dominique Devienne <ddevienne@gmail.com> writes:
> PQsocketPoll() being based on time_t, it has only second resolution, AFAIK.
> Despite the [underlying implementation in fe-misc.c][2] supporting at
> least milliseconds.
> ...
> But I think it would a pity if that unreleased API couldn't be made to
> accomodate sub-second timeouts and more use-cases, like above.
> Especially given that libpq v17 isn't out yet. I may come late to the
> game, but hopefully it is not too late.

This is an interesting suggestion, but I think yes it's too late.
We're already past beta1 and this'd require some fairly fundamental
rethinking, since there's no easy substitute for type time_t that has
millisecond resolution.  (The callers do want to specify an end time
not a timeout interval, since some of them loop around PQsocketPoll
and don't want the end time to slip.)

I guess conceivably we could use gettimeofday() and struct timeval
instead of time() and time_t, but it'd touch a lot of places in
libpq and it'd make some of the calculations a lot more complex.

Maybe a better idea would be to convert to using our
src/include/portability/instr_time.h abstraction?  But that
would be problematic for outside callers.

In any case this doesn't seem like a sane thing to be redesigning
post-beta.  A few months ago maybe we'd have done it, but ...

                        regards, tom lane

Re: libpq v17 PQsocketPoll timeout is not granular enough

От
Adrian Klaver
Дата:
On 6/10/24 11:43, Dominique Devienne wrote:
> Bummer… I didn’t presume to suggest an api before, but simply adding an 
> extra int with the milliseconds offset from the time_t is simple, and 
> trivial to plug into the implementation I saw. Callers who don’t care 
> can simply pass zero. while I could pass a computed time_t and ms offset 
> using <chrono>. No need for fancy types imho. Aren’t betas precisely for 

https://www.postgresql.org/developer/beta/

"PostgreSQL beta and release candidate releases are pre-release testing 
versions before the community makes a new release generally available. 
They are feature-frozen (i.e. no new features are added), and we release 
these to the public for testing before our final release. PostgreSQL 
beta and release candidate release are not meant for use in production 
systems."


> the purpose of exposing apis to those like myself to vet them? This is 
> also beta1, I,e, the first one. My €0.02
> 

-- 
Adrian Klaver
adrian.klaver@aklaver.com




Re: libpq v17 PQsocketPoll timeout is not granular enough

От
Thomas Munro
Дата:
On Tue, Jun 11, 2024 at 2:36 AM Dominique Devienne <ddevienne@gmail.com> wrote:
> Hi. I've noticed [that libpq API in v17 beta1][1], and wanted to use
> it to replace an existing Boost.ASIO-based async polling of the
> connection's socket, waiting for notifications. The use case being
> using PostgreSQL LISTEN/NOTIFY for a simple message queue. The code
> needs to be cross-platform Windows and Linux. My goal would be to
> eliminate that Boost.ASIO dependency for that, to use just libpq.

One idea I have wondered about is why you wouldn't just use poll()
directly, if you want a facility that works on all known operating
systems without extra portability libraries.  Windows has WSApoll(),
which AFAIK was designed to be Unix-compatible and a drop-in
replacement, requiring just a rename but otherwise having the same
macros and struct etc.

For some period of time, people who had to deal with socket connection
events (that includes us) avoided it, with the Curl guys' blog being
the most often cited public explanation for why:

https://daniel.haxx.se/blog/2012/10/10/wsapoll-is-broken/

However, as far as I know, that was fixed ~4 years ago:

https://learn.microsoft.com/en-us/windows/win32/api/winsock2/nf-winsock2-wsapoll

"Note  As of Windows 10 version 2004, when a TCP socket fails to
connect, (POLLHUP \| POLLERR \| POLLWRNORM) is indicated."

I wonder if that means that it's now completely usable on all
reasonable versions of the OS.  I think so?  I don't use Windows
myself, my interest in this topic is that I have a slow moving
background project to figure out how and when to remove all remaining
uses of select() from our tree, and this one is on my hit list.

> PQsocketPoll() being based on time_t, it has only second resolution, AFAIK.
> Despite the [underlying implementation in fe-misc.c][2] supporting at
> least milliseconds.

Yeah, that is not nice and your complaint is very reasonable, and we
should probably do something like what Tom suggested.

Hmm, but if what I speculated above is true, I wonder if the extern
function is even worth its bits...  but I don't know how to determine
that completely.



Re: libpq v17 PQsocketPoll timeout is not granular enough

От
Tom Lane
Дата:
Thomas Munro <thomas.munro@gmail.com> writes:
> On Tue, Jun 11, 2024 at 2:36 AM Dominique Devienne <ddevienne@gmail.com> wrote:
>> PQsocketPoll() being based on time_t, it has only second resolution, AFAIK.
>> Despite the [underlying implementation in fe-misc.c][2] supporting at
>> least milliseconds.

> Yeah, that is not nice and your complaint is very reasonable, and we
> should probably do something like what Tom suggested.

> Hmm, but if what I speculated above is true, I wonder if the extern
> function is even worth its bits...  but I don't know how to determine
> that completely.

I think if we're going to change anything at all here, we should
define the external API in microseconds not milliseconds.  The lesson
we need to be taking from this is that system calls come and go,
but libpq API is forever ;-).  Somebody could conceivably want
sub-millisecond wait resolution within the lifespan of libpq.

            regards, tom lane



Re: libpq v17 PQsocketPoll timeout is not granular enough

От
Francisco Olarte
Дата:
Tom:

On Tue, 11 Jun 2024 at 01:49, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> I think if we're going to change anything at all here, we should
> define the external API in microseconds not milliseconds.  The lesson
> we need to be taking from this is that system calls come and go,
> but libpq API is forever ;-).  Somebody could conceivably want
> sub-millisecond wait resolution within the lifespan of libpq.

I was thinking on that, but since you need at least 20 bits, so 32,
for microseconds I would suggest nanoseconds, which fit in 32 too.
Sure, nanos seems too much for the current time but it pushes the
future problem further down, nearly forever for comms between
different machines until someone develops FTL networks.

Francisco Olarte.