Обсуждение: RFC: Async query processing

Поиск

Список

Период

Сортировка

RFC: Async query processing

От

Florian Weimer

Дата:

03 ноября 2013 г., 21:58:32

I would like to add truly asynchronous query processing to libpq, 
enabling command pipelining.  The idea is to to allow applications to 
auto-tune to the bandwidth-delay product and reduce the number of 
context switches when running against a local server.

Here's a sketch of what the interface would look like.  I believe it can 
be implemented in principle within the existing libpq structure, 
although the interaction with single-row mode looks complicated.

If the application is not interested in intermediate query results, it 
would use something like this:
  PGAsyncMode oldMode = PQsetsendAsyncMode(conn, PQASYNC_DISCARD);  PGresult *res NULL;  while (more_data()) {     int
ret= PQsendQueryParams(conn, "INSERT ...", ...);     if (ret == 0) {       // handle low-level error     }     // We
canissue a second command.  There is no need to fetch     // the result immediately.     ret = PQsendQueryParams(conn,
"INSERT...", ...);     if (ret == 0) {       // handle low-level error     }     res = PQgetResultNoWait(conn);     if
(res!= NULL) {       // Only returns non-NULL in case of error.  Does not wait for       // data to arrive from the
server. If there is an error,       // drains all pending errors and reports only the first error.       break;     }
} if (res == NULL) {    // NULL means that all commands completed without error.    // Again, drains all pending
errors.   // (Necessarily syncs with the server.)    res = PQgetResult(conn));  }  PQsetsendAsyncMode(conn, oldMode);
if(res != NULL) {    // Handle error result.    ...    PQclear(res);    throw_error();  }
 

If there is no need to exit from the loop early (say, because errors are 
expected to be extremely rare), the PQgetResultNoWait call can be left out.

Internally, libpq will check in PQsendQueryParams (and similar 
functions) whether data is available for reading.  If yes, it is 
consumed, discarded unless it is an error, and stored for later 
retrieval with PQgetResultNoWait or PQgetResult.  (After the first 
error, all responses are discarded—this is necessary to avoid 
deadlocks.)  PQgetResultNoWait returns immediately if there is no stored 
error object.  If there is one, it syncs with the server (draining all 
pending results).  PQgetResult just drains all pending results, 
reporting the first error.

The general expectation here is that commands are sent within a 
transaction block, so only the first error is interesting.  (It make 
sense for fatal errors to overtake regular errors.)

If the programmer needs results, not just the errors, a different mode 
is needed:
  PGAsyncMode oldMode = PQsetsendAsyncMode(conn, PQASYNC_RESULT);  bool more_data;  do {     more_data = ...;     if
(more_data){       int ret = PQsendQueryParams(conn,         "INSERT ... RETURNING ...", ...);       if (ret == 0) {
    // handle low-level error       }     }     // Consume all pending results.     while (1) {       PGresult *res;
  if (more_data) {         res = PQgetResultNoWait(conn);       } else {         res = PQgetResult(conn);       }
if(res == NULL) {         // No result so far.         break;       }
 
       // Result data is available.  Check if it is an error.       switch (PGresultStatus(res)) {       case
PGRES_TUPLES_OK:         // Process asynchronous result data.          ...          break;
 
       // Error handling.       case PGRES_BAD_RESPONSE:       case PGRES_NONFATAL_ERROR:       case PGRES_FATAL_ERROR:
        // Store error somewhere.          ...          // Keep draining results, eventually          // synchronizing
withthe server.          more_data = false;  break;       }       PQclear(res);     }  } while (more_data);
PQsetsendAsyncMode(conn,oldMode);
 

(This suggest that PQgetResultNoWait should have a different name and a 
flag argument that tells it whether it is blocking or not.)

In PQASYNC_RESULT mode, libpq has to build up a list of pending results 
each time it notices available data during a PQsendQueryParams 
operation.  The number of responses that need to be stored this way is 
not totally unbounded, it is related to what fits into the 
bandwidth-delay product of the link.  This, and the fairly elaborate 
processing logic, is the reason why I think it makes to have sense 
PQASYNC_DISCARD as well.

Instead of buffering the results, we could buffer the encoded command 
messages in PQASYNC_RESULT mode.  This means that PQsendQueryParams 
would not block when it cannot send the (complete) command message, but 
store in the connection object so that the subsequent PQgetResultNoWait 
and PQgetResult would send it.  This might work better with single-tuple 
result mode.  We cannot avoid buffering either multiple queries or 
multiple responses if we want to utilize the link bandwidth, or we'd 
risk deadlocks.

There'd also be the default PQASYNC_DEFAULT mode, where it is an error 
to call PQsendQueryParams multiple times without a PQgetResult call 
in-between.

In PQASYNC_DISCARD mode, we could implicitly synchronize with the server 
when a synchronous API is used, reporting a pending error against the 
synchronous command.  (I think this is what the delayed error reporting 
in X11 does.)  Typically, that would be the "COMMIT" at the end of the 
transaction.  With a result-buffering implementation of PQASYNC_RESULT, 
we could do that as well, but this might be too much magic.

I thought I'd ask for comments before starting coding because this looks 
a bit more complicated than I expected. :)
-- 
Florian Weimer / Red Hat Product Security Team

Re: RFC: Async query processing

От

Claudio Freire

Дата:

04 ноября 2013 г., 04:51:17

On Sun, Nov 3, 2013 at 3:58 PM, Florian Weimer <fweimer@redhat.com> wrote:
> I would like to add truly asynchronous query processing to libpq, enabling
> command pipelining.  The idea is to to allow applications to auto-tune to
> the bandwidth-delay product and reduce the number of context switches when
> running against a local server.
...
> If the application is not interested in intermediate query results, it would
> use something like this:
...
> If there is no need to exit from the loop early (say, because errors are
> expected to be extremely rare), the PQgetResultNoWait call can be left out.

It doesn't seem wise to me making such a distinction. It sounds like
you're oversimplifying, and that's why you need "modes", to overcome
the evidently restrictive limits of the simplified interface, and that
it would only be a matter of (a short) time when some other limitation
requires some other mode.

>   PGAsyncMode oldMode = PQsetsendAsyncMode(conn, PQASYNC_RESULT);
>   bool more_data;
>   do {
>      more_data = ...;
>      if (more_data) {
>        int ret = PQsendQueryParams(conn,
>          "INSERT ... RETURNING ...", ...);
>        if (ret == 0) {
>          // handle low-level error
>        }
>      }
>      // Consume all pending results.
>      while (1) {
>        PGresult *res;
>        if (more_data) {
>          res = PQgetResultNoWait(conn);
>        } else {
>          res = PQgetResult(conn);
>        }

Somehow, that code looks backwards. I mean, really backwards. Wouldn't
that be !more_data?

In any case, pipelining like that, without a clear distinction, in the
wire protocol, of which results pertain to which query, could be a
recipe for trouble when subtle bugs, either in lib usage or
implementation, mistakenly treat one query's result as another's.

Notice that it's not an uncommon mistake, and this is much more likely
with such an unclear interface.

> Instead of buffering the results, we could buffer the encoded command
> messages in PQASYNC_RESULT mode.  This means that PQsendQueryParams would
> not block when it cannot send the (complete) command message, but store in
> the connection object so that the subsequent PQgetResultNoWait and
> PQgetResult would send it.  This might work better with single-tuple result
> mode.  We cannot avoid buffering either multiple queries or multiple
> responses if we want to utilize the link bandwidth, or we'd risk deadlocks.

This is a non-solution. Such an implementation, at least as described,
would not remove neither network latency nor context switches, it
would be a purely API change with no externally visible behavior
change.

An effective solution must include multi-command packets. Without
knowing the wire protocol in detail, something like:

PARSE: INSERT blah
BIND: args
EXECUTE with DISCARD
PARSE: INSERT blah
BIND: args
EXECUTE with DISCARD
PARSE: SELECT  blah
BIND: args
EXECUTE with FETCH ALL

All in one packet, would be efficient and error-free (IMO). This
precludes multiple result-containing comands, but it could hold
multiple result-less ones.

This could better be specified with expectations

PARSE: INSERT blah
EXPECT: ParseComplete
BIND: args
EXECUTE
EXPECT: CommandComplete
PARSE: INSERT blah
EXPECT: ParseComplete
BIND: args
EXECUTE
EXPECT: CommandComplete
PARSE: SELECT blah
EXPECT: ParseComplete
BIND: args
EXECUTE

A failed expectation would discard the command buffer and send an
ExpectationFailed message.

API ways of performing that could vary. Perhaps a
PQqueueQueryExpectNoResults could build up a message buffer, flushed
with either an explicit call to a flush command or to a PQsendQuery /
PQgetresult. Perhaps all of libpq could work with a message queue and
only flush it at specific points (PQgetresult, for instance, multiple
PQsendQuery calls could be turned into expect command complete).

Re: RFC: Async query processing

От

Florian Weimer

Дата:

18 декабря 2013 г., 19:50:53

On 11/04/2013 02:51 AM, Claudio Freire wrote:
> On Sun, Nov 3, 2013 at 3:58 PM, Florian Weimer <fweimer@redhat.com> wrote:
>> I would like to add truly asynchronous query processing to libpq, enabling
>> command pipelining.  The idea is to to allow applications to auto-tune to
>> the bandwidth-delay product and reduce the number of context switches when
>> running against a local server.
> ...
>> If the application is not interested in intermediate query results, it would
>> use something like this:
> ...
>> If there is no need to exit from the loop early (say, because errors are
>> expected to be extremely rare), the PQgetResultNoWait call can be left out.
>
> It doesn't seem wise to me making such a distinction. It sounds like
> you're oversimplifying, and that's why you need "modes", to overcome
> the evidently restrictive limits of the simplified interface, and that
> it would only be a matter of (a short) time when some other limitation
> requires some other mode.

I need modes because I want to avoid unbound buffering, which means that 
result data has to be consumed in the order queries are issued.

>>    PGAsyncMode oldMode = PQsetsendAsyncMode(conn, PQASYNC_RESULT);
>>    bool more_data;
>>    do {
>>       more_data = ...;
>>       if (more_data) {
>>         int ret = PQsendQueryParams(conn,
>>           "INSERT ... RETURNING ...", ...);
>>         if (ret == 0) {
>>           // handle low-level error
>>         }
>>       }
>>       // Consume all pending results.
>>       while (1) {
>>         PGresult *res;
>>         if (more_data) {
>>           res = PQgetResultNoWait(conn);
>>         } else {
>>           res = PQgetResult(conn);
>>         }
>
> Somehow, that code looks backwards. I mean, really backwards. Wouldn't
> that be !more_data?

No, if more data is available to transfer to the server, the no-wait 
variant has to be used to avoid a needless synchronization with the server.

> In any case, pipelining like that, without a clear distinction, in the
> wire protocol, of which results pertain to which query, could be a
> recipe for trouble when subtle bugs, either in lib usage or
> implementation, mistakenly treat one query's result as another's.

We already use pipelining in libpq (see pqFlush, PQsendQueryGuts and 
pqParseInput3), the server is supposed to support it, and there is a 
lack of a clear tit-for-tat response mechanism anyway because of 
NOTIFY/LISTEN and the way certain errors are reported.

>> Instead of buffering the results, we could buffer the encoded command
>> messages in PQASYNC_RESULT mode.  This means that PQsendQueryParams would
>> not block when it cannot send the (complete) command message, but store in
>> the connection object so that the subsequent PQgetResultNoWait and
>> PQgetResult would send it.  This might work better with single-tuple result
>> mode.  We cannot avoid buffering either multiple queries or multiple
>> responses if we want to utilize the link bandwidth, or we'd risk deadlocks.
>
> This is a non-solution. Such an implementation, at least as described,
> would not remove neither network latency nor context switches, it
> would be a purely API change with no externally visible behavior
> change.

Ugh, why?

> An effective solution must include multi-command packets. Without
> knowing the wire protocol in detail, something like:
>
> PARSE: INSERT blah
> BIND: args
> EXECUTE with DISCARD
> PARSE: INSERT blah
> BIND: args
> EXECUTE with DISCARD
> PARSE: SELECT  blah
> BIND: args
> EXECUTE with FETCH ALL
>
> All in one packet, would be efficient and error-free (IMO).

No, because this doesn't scale automatically with the bandwidth-delay 
product.  It also requires that the client buffers queries and their 
parameters even though the network has to do that anyway.

In any case, I don't want to change the wire protocol, I just want to 
enable libpq clients to use more of its capabilities.

-- 
Florian Weimer / Red Hat Product Security Team

Re: RFC: Async query processing

От

Claudio Freire

Дата:

02 января 2014 г., 21:52:52

On Wed, Dec 18, 2013 at 1:50 PM, Florian Weimer <fweimer@redhat.com> wrote:
> On 11/04/2013 02:51 AM, Claudio Freire wrote:
>>
>> On Sun, Nov 3, 2013 at 3:58 PM, Florian Weimer <fweimer@redhat.com> wrote:
>>>
>>> I would like to add truly asynchronous query processing to libpq,
>>> enabling
>>> command pipelining.  The idea is to to allow applications to auto-tune to
>>> the bandwidth-delay product and reduce the number of context switches
>>> when
>>> running against a local server.
>>
>> ...
>>>
>>> If the application is not interested in intermediate query results, it
>>> would
>>> use something like this:
>>
>> ...
>>>
>>> If there is no need to exit from the loop early (say, because errors are
>>> expected to be extremely rare), the PQgetResultNoWait call can be left
>>> out.
>>
>>
>> It doesn't seem wise to me making such a distinction. It sounds like
>> you're oversimplifying, and that's why you need "modes", to overcome
>> the evidently restrictive limits of the simplified interface, and that
>> it would only be a matter of (a short) time when some other limitation
>> requires some other mode.
>
>
> I need modes because I want to avoid unbound buffering, which means that
> result data has to be consumed in the order queries are issued.
...
> In any case, I don't want to change the wire protocol, I just want to enable
> libpq clients to use more of its capabilities.

I believe you will at least need to use TCP_CORK or some advanced
socket options if you intend to decrease the number of packets without
changing the protocol.

Due to the interactive and synchronized nature of the protocol, TCP
will immediately send the first query in a packet since it's already
ready to do so. Buffering will only happen from the second query
onwards, and this won't benefit a two-query loop as the one in your
sample.

As for expectations, they can be part of the connection object and not
the wire protocol if you wish. The point I was making, is that the
expectation should be part of the query call, since that's less error
prone than setting a "discard results" mode. Think of it as
PQsendQueryParams with an extra "async" argument that defaults to
PQASYNC_NOT (ie, sync). There you can tell libpq to expect either no
results, expect and discard them, or whatever. The benefit here is a
simplified usage: your example code will be part of libpq and thus all
this complexity will be hidden from users. Furthermore, libpq will do
the small sanity check of actually checking that the server returns no
results when expecting no result.

>>>    PGAsyncMode oldMode = PQsetsendAsyncMode(conn, PQASYNC_RESULT);
>>>    bool more_data;
>>>    do {
>>>       more_data = ...;
>>>       if (more_data) {
>>>         int ret = PQsendQueryParams(conn,
>>>           "INSERT ... RETURNING ...", ...);
>>>         if (ret == 0) {
>>>           // handle low-level error
>>>         }
>>>       }
>>>       // Consume all pending results.
>>>       while (1) {
>>>         PGresult *res;
>>>         if (more_data) {
>>>           res = PQgetResultNoWait(conn);
>>>         } else {
>>>           res = PQgetResult(conn);
>>>         }
>>
>>
>> Somehow, that code looks backwards. I mean, really backwards. Wouldn't
>> that be !more_data?
>
> No, if more data is available to transfer to the server, the no-wait variant
> has to be used to avoid a needless synchronization with the server.

Ok, yeah. Now I get it. It's client-side more_data.

>> In any case, pipelining like that, without a clear distinction, in the
>> wire protocol, of which results pertain to which query, could be a
>> recipe for trouble when subtle bugs, either in lib usage or
>> implementation, mistakenly treat one query's result as another's.
>
>
> We already use pipelining in libpq (see pqFlush, PQsendQueryGuts and
> pqParseInput3), the server is supposed to support it, and there is a lack of
> a clear tit-for-tat response mechanism anyway because of NOTIFY/LISTEN and
> the way certain errors are reported.

pqFlush doesn't seem overly related, since the API specifically states
that you cannot queue multiple PQsendQuery. It looks more like
low-level buffering. Ie: when the command itself is larger than the os
buffer and nonblocking operation requires multiple send() calls for
one PQsendQuery. Am I wrong?

>>> Instead of buffering the results, we could buffer the encoded command
>>> messages in PQASYNC_RESULT mode.  This means that PQsendQueryParams would
>>> not block when it cannot send the (complete) command message, but store
>>> in
>>> the connection object so that the subsequent PQgetResultNoWait and
>>> PQgetResult would send it.  This might work better with single-tuple
>>> result
>>> mode.  We cannot avoid buffering either multiple queries or multiple
>>> responses if we want to utilize the link bandwidth, or we'd risk
>>> deadlocks.
>>
>>
>> This is a non-solution. Such an implementation, at least as described,
>> would not remove neither network latency nor context switches, it
>> would be a purely API change with no externally visible behavior
>> change.
>
>
> Ugh, why?

Oh, sorry. I had this elaborate answer prepared, but I just noticed
it's wrong: you do say "if it cannot send it rightaway".

So yes, I guess that's quite similar to the kind of buffering I was
talking about anyway.

Still, I'd suggest using TCP_CORK when expecting this kind of usage
pattern, or the first call in your example won't buffer at all. It's
essentially the TCP slow-start issue, unless you've got a great many
queries to pipeline, you won't see the benefit without careful use of
TCP_CORK.

Since TCP_CORK is quite platform-dependent, I'd recommend "corking" on
the library side rather than trusting the network stack.

>> An effective solution must include multi-command packets. Without
>> knowing the wire protocol in detail, something like:
>>
>> PARSE: INSERT blah
>> BIND: args
>> EXECUTE with DISCARD
>> PARSE: INSERT blah
>> BIND: args
>> EXECUTE with DISCARD
>> PARSE: SELECT  blah
>> BIND: args
>> EXECUTE with FETCH ALL
>>
>> All in one packet, would be efficient and error-free (IMO).
>
>
> No, because this doesn't scale automatically with the bandwidth-delay
> product.  It also requires that the client buffers queries and their
> parameters even though the network has to do that anyway.

Why not? I'm talking about transport-level packets, btw, not libpq
frames/whatever.

Yes, the network stack will sometimes do that. But the it doesn't have
to do it. It does it sometimes, which is not the same.

And buffering algorithms are quite platform-dependent anyway, so it's
not the best idea to make libpq highly reliant on them.

But yes. You would get the benefit for large number of queries.

Launch a tcpdump and test it. This is a simple test in the loopback
interface, with python.

On the server:

>>> import socket
>>> s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
>>> s.bind(('',8000))
>>> s.listen(10)
>>> s2 = s.accept()[0]
>>> s2.recv(256)
'hola mundo\n'

On the client:

>>> import socket
>>> s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
>>> s.connect(('127.0.0.1',8000))
>>> s.send('hola') ; s.send(' mundo\n')
4
7

Tcpdump output:

15:33:16.112991 IP localhost.49138 > localhost.irdmi: Flags [S], seq
2768629731, win 43690, options [mss 65495,sackOK,TS val 3152304 ecr
0,nop,wscale 7], length 0
E..<.S@.@.Ff...........@.............0.........
.0..........
15:33:16.113004 IP localhost.irdmi > localhost.49138: Flags [S.], seq
840184739, ack 2768629732, win 43690, options [mss 65495,sackOK,TS val
3152304 ecr 3152304,nop,wscale 7], length 0
E..<..@.@.<..........@..2.3..........0.........
.0...0......
15:33:16.113016 IP localhost.49138 > localhost.irdmi: Flags [.], ack
1, win 342, options [nop,nop,TS val 3152304 ecr 3152304], length 0
E..4.T@.@.Fm...........@....2.3....V.(.....
.0...0..
15:34:32.843626 IP localhost.49138 > localhost.irdmi: Flags [P.], seq
1:5, ack 1, win 342, options [nop,nop,TS val 3229034 ecr 3152304],
length 4
E..8.U@.@.Fh...........@....2.3....V.,.....
.1Ej.0..hola
15:34:32.843675 IP localhost.irdmi > localhost.49138: Flags [.], ack
5, win 342, options [nop,nop,TS val 3229035 ecr 3229034], length 0
E..4*.@.@............@..2.3........V.(.....
.1Ek.1Ej
15:34:32.843696 IP localhost.49138 > localhost.irdmi: Flags [P.], seq
5:12, ack 1, win 342, options [nop,nop,TS val 3229035 ecr 3229035],
length 7
E..;.V@.@.Fd...........@....2.3....V./.....
.1Ek.1Ek mundo

15:34:32.843701 IP localhost.irdmi > localhost.49138: Flags [.], ack
12, win 342, options [nop,nop,TS val 3229035 ecr 3229035], length 0
E..4*.@.@............@..2.3........V.(.....
.1Ek.1Ek

See how there's two packets and two ack.

On eth, it's the same. Except the server doesn't even get the whole
"hola mundo", but just the first "hola", on the first recv, because of
network delay.

So, trusting the network start to do the quick start won't work. For
steady streams of queries, it will work. But not for short bursts,
which will be the most heavily used case I believe (most apps create
short bursts of inserts and not continuous streams at full bandwidth).

Re: RFC: Async query processing

От

Florian Weimer

Дата:

03 января 2014 г., 16:22:54

On 01/02/2014 07:52 PM, Claudio Freire wrote:

>> No, because this doesn't scale automatically with the bandwidth-delay
>> product.  It also requires that the client buffers queries and their
>> parameters even though the network has to do that anyway.
>
> Why not? I'm talking about transport-level packets, btw, not libpq
> frames/whatever.
>
> Yes, the network stack will sometimes do that. But the it doesn't have
> to do it. It does it sometimes, which is not the same.

The network inevitably buffers because the speed of light is not infinite.

Here's a concrete example.  Suppose the server is 100ms away, and you 
want to send data at a constant rate of 10 Mbps.  The server needs to 
acknowledge the data you sent, but this acknowledgment arrives after 200 
ms.  As a result, you've sent 2 Mbits before the acknowledgment arrives, 
so the network appears to have buffered 250 KB.  This effect can 
actually be used for data storage, called "delay line memory", but it is 
somewhat out of fashion now.

> And buffering algorithms are quite platform-dependent anyway, so it's
> not the best idea to make libpq highly reliant on them.

That is why I think libpq needs to keep sending until the first response 
from the server arrives.  Batching a fixed number of INSERTs together in 
a single conceptual query does not achieve auto-tuning to the buffering 
characteristics of the path.

> So, trusting the network start to do the quick start won't work. For
> steady streams of queries, it will work. But not for short bursts,
> which will be the most heavily used case I believe (most apps create
> short bursts of inserts and not continuous streams at full bandwidth).

Loading data into the database isn't such an uncommon task.  Not 
everything is OLTP.

-- 
Florian Weimer / Red Hat Product Security Team

Re: RFC: Async query processing

От

Claudio Freire

Дата:

03 января 2014 г., 16:36:53

On Fri, Jan 3, 2014 at 10:22 AM, Florian Weimer <fweimer@redhat.com> wrote:
> On 01/02/2014 07:52 PM, Claudio Freire wrote:
>
>>> No, because this doesn't scale automatically with the bandwidth-delay
>>> product.  It also requires that the client buffers queries and their
>>> parameters even though the network has to do that anyway.
>>
>>
>> Why not? I'm talking about transport-level packets, btw, not libpq
>> frames/whatever.
>>
>> Yes, the network stack will sometimes do that. But the it doesn't have
>> to do it. It does it sometimes, which is not the same.
>
>
> The network inevitably buffers because the speed of light is not infinite.
>
> Here's a concrete example.  Suppose the server is 100ms away, and you want
> to send data at a constant rate of 10 Mbps.  The server needs to acknowledge
> the data you sent, but this acknowledgment arrives after 200 ms.  As a
> result, you've sent 2 Mbits before the acknowledgment arrives, so the
> network appears to have buffered 250 KB.  This effect can actually be used
> for data storage, called "delay line memory", but it is somewhat out of
> fashion now.
...
>> So, trusting the network start to do the quick start won't work. For
>> steady streams of queries, it will work. But not for short bursts,
>> which will be the most heavily used case I believe (most apps create
>> short bursts of inserts and not continuous streams at full bandwidth).
>
>
> Loading data into the database isn't such an uncommon task.  Not everything
> is OLTP.

Truly, but a sustained insert stream of 10 Mbps is certainly way
beyond common non-OLTP loads. This is far more specific than non-OLTP.

Buffering will benefit the vast majority of applications that don't do
steady, sustained query streams. Which is the vast majority of
applications. An ORM doing a flush falls in this category, so it's an
overwhelmingly common case.

>> And buffering algorithms are quite platform-dependent anyway, so it's
>> not the best idea to make libpq highly reliant on them.
>
>
> That is why I think libpq needs to keep sending until the first response
> from the server arrives.  Batching a fixed number of INSERTs together in a
> single conceptual query does not achieve auto-tuning to the buffering
> characteristics of the path.

Not on its own, but it does improve thoughput during slow start, which
benefits OLTP, which is a hugely common use case. As you say, the
network will then auto-tune when the query stream is consistent
enough, so what's the problem with explicitly buffering a little then?

Re: RFC: Async query processing

От

Tom Lane

Дата:

03 января 2014 г., 18:20:16

Claudio Freire <klaussfreire@gmail.com> writes:
> On Fri, Jan 3, 2014 at 10:22 AM, Florian Weimer <fweimer@redhat.com> wrote:
>> Loading data into the database isn't such an uncommon task.  Not everything
>> is OLTP.

> Truly, but a sustained insert stream of 10 Mbps is certainly way
> beyond common non-OLTP loads. This is far more specific than non-OLTP.

I think Florian has a good point there, and the reason is this: what
you are talking about will be of exactly zero use to applications that
want to see the results of one query before launching the next.  Which
eliminates a whole lot of apps.  I suspect that almost the *only*
common use case in which a stream of queries can be launched without
feedback is going to be bulk data loading.  It's not clear at all
that pipelining the PQexec code path is the way to better performance
for that --- why not use COPY, instead?

Or to put it another way, I don't subscribe to "if you build it they
will come" for this proposed feature.  I think that making any use of
it would be so complex and error-prone that the vast majority of apps
won't bother.  Before we start adding a lot of complexity to libpq's
API and internals to support this, you need to make a better case
that there would be a significant number of users.
        regards, tom lane

Re: RFC: Async query processing

От

Florian Weimer

Дата:

03 января 2014 г., 18:46:37

On 01/03/2014 04:20 PM, Tom Lane wrote:

> I think Florian has a good point there, and the reason is this: what
> you are talking about will be of exactly zero use to applications that
> want to see the results of one query before launching the next.  Which
> eliminates a whole lot of apps.  I suspect that almost the *only*
> common use case in which a stream of queries can be launched without
> feedback is going to be bulk data loading.  It's not clear at all
> that pipelining the PQexec code path is the way to better performance
> for that --- why not use COPY, instead?

The data I encounter has to be distributed across multiple tables. 
Switching between the COPY TO commands would again need client-side 
buffering and heuristics for sizing these buffers.  Lengths of runs vary 
a lot in my case.

I also want to use binary mode as a far as possible to avoid the integer 
conversion overhead, but some columns use custom enum types and are 
better transferred in text mode.

Some INSERTs happen via stored procedures, to implement de-duplication.

These issues could be addressed by using temporary staging tables. 
However, when I did that in the past, this caused pg_shdepend bloat. 
Carefully reusing them when possible might avoid that.  Again, due to 
the variance in lengths of runs, the staging tables are not always 
beneficial.

I understand that pipelining introduces complexity.  But solving the 
issues described above is no picnic, either.

-- 
Florian Weimer / Red Hat Product Security Team

Re: RFC: Async query processing

От

Merlin Moncure

Дата:

03 января 2014 г., 19:31:45

On Fri, Jan 3, 2014 at 9:46 AM, Florian Weimer <fweimer@redhat.com> wrote:
> On 01/03/2014 04:20 PM, Tom Lane wrote:
>
>> I think Florian has a good point there, and the reason is this: what
>> you are talking about will be of exactly zero use to applications that
>> want to see the results of one query before launching the next.  Which
>> eliminates a whole lot of apps.  I suspect that almost the *only*
>> common use case in which a stream of queries can be launched without
>> feedback is going to be bulk data loading.  It's not clear at all
>> that pipelining the PQexec code path is the way to better performance
>> for that --- why not use COPY, instead?
>
>
> The data I encounter has to be distributed across multiple tables. Switching
> between the COPY TO commands would again need client-side buffering and
> heuristics for sizing these buffers.  Lengths of runs vary a lot in my case.
>
> I also want to use binary mode as a far as possible to avoid the integer
> conversion overhead, but some columns use custom enum types and are better
> transferred in text mode.
>
> Some INSERTs happen via stored procedures, to implement de-duplication.
>
> These issues could be addressed by using temporary staging tables. However,
> when I did that in the past, this caused pg_shdepend bloat. Carefully
> reusing them when possible might avoid that.  Again, due to the variance in
> lengths of runs, the staging tables are not always beneficial.
>
> I understand that pipelining introduces complexity.  But solving the issues
> described above is no picnic, either.

Maybe consider using libpqtypes (http://libpqtypes.esilo.com/)?  It
transfers most everything in binary (enums notably are handled as
strings).  A typical usage of libpqtypes would be to arrange multiple
records into an array on the client then hand them off to a stored
procedure on the server side (perhaps over an asynchronous call while
you assemble the next batch).  libpqtypes was written for C
applications with very high performance requirements (for non
performance critical cases we might use json instead).  In my
experience it's not too difficult to arrange an assembly/push loop
that amortizes the round trip overhead to zero; it's not as efficient
as COPY but much more flexible and will blow away any scheme that
sends data row per query.

I agree with Tom that major changes to the libpq network stack is
probably not a good idea.

merlin

Re: RFC: Async query processing

От

Claudio Freire

Дата:

03 января 2014 г., 20:06:18

On Fri, Jan 3, 2014 at 12:20 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Claudio Freire <klaussfreire@gmail.com> writes:
>> On Fri, Jan 3, 2014 at 10:22 AM, Florian Weimer <fweimer@redhat.com> wrote:
>>> Loading data into the database isn't such an uncommon task.  Not everything
>>> is OLTP.
>
>> Truly, but a sustained insert stream of 10 Mbps is certainly way
>> beyond common non-OLTP loads. This is far more specific than non-OLTP.
>
> I think Florian has a good point there, and the reason is this: what
> you are talking about will be of exactly zero use to applications that
> want to see the results of one query before launching the next.  Which
> eliminates a whole lot of apps.  I suspect that almost the *only*
> common use case in which a stream of queries can be launched without
> feedback is going to be bulk data loading.  It's not clear at all
> that pipelining the PQexec code path is the way to better performance
> for that --- why not use COPY, instead?

You're forgetting ORM workloads.

ORMs can usually plan the inserts to be in a sequence that both don't
require feedback (except the knowledge that they were successful), and
that do not violate constraints.

Flushing a whole object hierarchy for instance, can be done without
feedback. Not even serial columns need feedback, since many ORMs
(SQLAlchemy, Hibernate) support allocation of ID sequences in batches
(by issuing a proper select nextval).

I agree, that with the proposed API, it's too error prone to be
useful. But I also think, if the API is simple and fool-proof enough,
it could be "build them and they will come". I know I'll be happy to
implement support for SQLAlchemy (since it will benefit me), if the
API resembles the proposition below (at least in simplicity).

Per-query expectations could be such a thing. And it can even work with PQexec:

PQexec(con, "SELECT nextval('a_id_seq') FROM generate_series(1,10);");
--read--
PQexec(con, "SELECT nextval('b_id_seq') FROM generate_series(1,10);");
--read--
PQexec(con, "INSERT INTO a (...);", PQEXPECT_NO_RESULT | PQASYNC_CORK);
PQexec(con, "INSERT INTO b (...);", PQEXPECT_NO_RESULT | PQASYNC_CORK);
PQexec(con, "INSERT INTO a (...);", PQEXPECT_NO_RESULT | PQASYNC_CORK);
PQexec(con, "INSERT INTO b (...);", PQEXPECT_NO_RESULT | PQASYNC_CORK);
PQexec(con, "INSERT INTO a (...);", PQEXPECT_NO_RESULT | PQASYNC_CORK);
PQexec(con, "INSERT INTO b (...);", PQEXPECT_NO_RESULT | PQASYNC_CORK);
... 9 times...
PQexec(con, "INSERT INTO a (...);", PQEXPECT_NO_RESULT | PQASYNC_CORK);
PQexec(con, "INSERT INTO b (...);", PQEXPECT_NO_RESULT | PQASYNC);
do {  // do something useful
} while (PQflush());

Here, the PQASYNC flag would temporarily switch to non-blocking I/O,
and buffer what cannot be sent. PQASNC_CORK, would only buffer (only
send if the buffer is full). After any ASYNC call, PQflush would be
necessary (to flush the send queue and to consume the expected
responses), but I can imagine any synchronous call (PQexec,
PQsendQuery or whatever) could detect a non-empty buffer and just
blockingly flush right there.

This can benefit many useful patterns. ORM flush, is one, if there can
be preallocation of IDs (which I know at least SQLAlchemy and
Hibernate both support).

Execute-many of prepared statements is another one, quite common.

I'm not sure what would happen if one of the queries returned an
error. If in a transaction, all the following queries would error out
I'd imagine. If not, they would simply be executed blindly.. am I
correct?

Re: RFC: Async query processing

От

Merlin Moncure

Дата:

03 января 2014 г., 20:22:55

On Fri, Jan 3, 2014 at 11:06 AM, Claudio Freire <klaussfreire@gmail.com> wrote:
> On Fri, Jan 3, 2014 at 12:20 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Claudio Freire <klaussfreire@gmail.com> writes:
>>> On Fri, Jan 3, 2014 at 10:22 AM, Florian Weimer <fweimer@redhat.com> wrote:
>>>> Loading data into the database isn't such an uncommon task.  Not everything
>>>> is OLTP.
>>
>>> Truly, but a sustained insert stream of 10 Mbps is certainly way
>>> beyond common non-OLTP loads. This is far more specific than non-OLTP.
>>
>> I think Florian has a good point there, and the reason is this: what
>> you are talking about will be of exactly zero use to applications that
>> want to see the results of one query before launching the next.  Which
>> eliminates a whole lot of apps.  I suspect that almost the *only*
>> common use case in which a stream of queries can be launched without
>> feedback is going to be bulk data loading.  It's not clear at all
>> that pipelining the PQexec code path is the way to better performance
>> for that --- why not use COPY, instead?
>
> You're forgetting ORM workloads.
>
> ORMs can usually plan the inserts to be in a sequence that both don't
> require feedback (except the knowledge that they were successful), and
> that do not violate constraints.
>
> Flushing a whole object hierarchy for instance, can be done without
> feedback. Not even serial columns need feedback, since many ORMs
> (SQLAlchemy, Hibernate) support allocation of ID sequences in batches
> (by issuing a proper select nextval).
>
> I agree, that with the proposed API, it's too error prone to be
> useful. But I also think, if the API is simple and fool-proof enough,
> it could be "build them and they will come". I know I'll be happy to
> implement support for SQLAlchemy (since it will benefit me), if the
> API resembles the proposition below (at least in simplicity).
>
> Per-query expectations could be such a thing. And it can even work with PQexec:
>
> PQexec(con, "SELECT nextval('a_id_seq') FROM generate_series(1,10);");
> --read--
> PQexec(con, "SELECT nextval('b_id_seq') FROM generate_series(1,10);");
> --read--
> PQexec(con, "INSERT INTO a (...);", PQEXPECT_NO_RESULT | PQASYNC_CORK);
> PQexec(con, "INSERT INTO b (...);", PQEXPECT_NO_RESULT | PQASYNC_CORK);
> PQexec(con, "INSERT INTO a (...);", PQEXPECT_NO_RESULT | PQASYNC_CORK);
> PQexec(con, "INSERT INTO b (...);", PQEXPECT_NO_RESULT | PQASYNC_CORK);
> PQexec(con, "INSERT INTO a (...);", PQEXPECT_NO_RESULT | PQASYNC_CORK);
> PQexec(con, "INSERT INTO b (...);", PQEXPECT_NO_RESULT | PQASYNC_CORK);
> ... 9 times...
> PQexec(con, "INSERT INTO a (...);", PQEXPECT_NO_RESULT | PQASYNC_CORK);
> PQexec(con, "INSERT INTO b (...);", PQEXPECT_NO_RESULT | PQASYNC);
> do {
>    // do something useful
> } while (PQflush());
>
> Here, the PQASYNC flag would temporarily switch to non-blocking I/O,
> and buffer what cannot be sent. PQASNC_CORK, would only buffer (only
> send if the buffer is full). After any ASYNC call, PQflush would be
> necessary (to flush the send queue and to consume the expected
> responses), but I can imagine any synchronous call (PQexec,
> PQsendQuery or whatever) could detect a non-empty buffer and just
> blockingly flush right there.
>
> This can benefit many useful patterns. ORM flush, is one, if there can
> be preallocation of IDs (which I know at least SQLAlchemy and
> Hibernate both support).
>
> Execute-many of prepared statements is another one, quite common.
>
> I'm not sure what would happen if one of the queries returned an
> error. If in a transaction, all the following queries would error out
> I'd imagine. If not, they would simply be executed blindly.. am I
> correct?

Long term, I'd rather see an optimized 'ORM flush' assemble the data
into a structured data set (perhaps a JSON document) and pass it to
some receiving routine that decomposed it into records.  This is a
better way to so things on so many levels.  Maybe I'm an old cranky
guy yelling at pigeons, but I don't think the current approach that
many ORMs take is going to withstand the test of time.

merlin

Re: RFC: Async query processing

От

Martijn van Oosterhout

Дата:

04 января 2014 г., 18:39:43

On Fri, Jan 03, 2014 at 04:46:23PM +0100, Florian Weimer wrote:
> On 01/03/2014 04:20 PM, Tom Lane wrote:
>
> >I think Florian has a good point there, and the reason is this: what
> >you are talking about will be of exactly zero use to applications that
> >want to see the results of one query before launching the next.  Which
> >eliminates a whole lot of apps.  I suspect that almost the *only*
> >common use case in which a stream of queries can be launched without
> >feedback is going to be bulk data loading.  It's not clear at all
> >that pipelining the PQexec code path is the way to better performance
> >for that --- why not use COPY, instead?
>
> The data I encounter has to be distributed across multiple tables.
> Switching between the COPY TO commands would again need client-side
> buffering and heuristics for sizing these buffers.  Lengths of runs
> vary a lot in my case.

Why switch between COPY commands, why could you not do it in one? For
example:

COPY table1(col1, col2, ...),    table2(col1, col2, ...)
FROM STDIN WITH (tableoids);
tableoid1<tab>col1<tab>col2...
tableoid2<tab>...
...
\.

There's no especially good reason why a COPY can only write to one
table. In this way you provide, per row, which table this row should go
to.

There's always the issue of generated primary keys. So you could, if
you wanted to, do:

COPY table1(col1, col2, ...) RETURNING pkey1,    table2(col1, col2, ...)
FROM STDIN WITH (tableoids);
tableoid1<tab>col1<tab>col2...
tableoid2<tab>\K <tableoid><tab>...
...
\.

So, like we have \N for NULL, a \K oid which be the value of the
RETURNING column for the table with that oid.

This may be way outfield, but we have a streaming interface, there's no
reason why we can't extend it.

There's also the idea of WITH x AS (COPY ... FROM STDIN) which you
could do with a similar multiplexing of rows in one stream. This would
be extremely powerful together with MERGE.

Have a nice way,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> He who writes carelessly confesses thereby at the very outset that he does
> not attach much importance to his own thoughts.  -- Arthur Schopenhauer

Re: RFC: Async query processing

От

Craig Ringer

Дата:

05 января 2014 г., 15:56:42

On 01/04/2014 01:06 AM, Claudio Freire wrote:
> You're forgetting ORM workloads.

I'm impressed that you've come up with an area where ORMs are beneficial ;-)

JDBC also has a statement batching interface. Right now PgJDBC just
unwraps the batch and runs each query individually. Any async-support
improvements server-side should probably consider the need of executing
a batch. The batch might be one PreparedStatement with many different
parameters, or it might be a series of unrelated statements. A way for
PgJDBC to run the batch without syncing with the server after each query
would be really helpful.

So would a way to BIND an array of parameters, so we could execute a
prepared statmenet once with multiple parameters and then sync up with
the server after all executions.

As for ORMs benefitting from this: Remember that nPgSQL and PgJDBC don't
use libpq. So the libpq changes would only help ORMs based on things
like Python (psycopg2), Ruby (Pg gem), etc, where they're using libpq
wrapper drivers.

> Execute-many of prepared statements is another one, quite common.

That's the case I'd really love to see proper server-side batch support
for. BIND_MULTIPLE, EXECUTE.

> I'm not sure what would happen if one of the queries returned an
> error. If in a transaction, all the following queries would error out
> I'd imagine. If not, they would simply be executed blindly.. am I
> correct?

It's not just dealing with erroring out. Many ORMs look at the count of
rows affected to detect whether an operation conflicted with another
concurrent operation when optimistic concurrency control is in use. E.g.
   UPDATE t SET x = 'fred' WHERE rowversion = 4;

will be seen to "fail" if it reports that it's affected zero rows. This
is one of the reasons ORM users have such serious problems with the
write side of our partitioning support - we discard affected row counts,
and many ORMs don't deal well with that.

At least in JDBC, executeBatch returns an array of rowcounts. So you
can't throw away affected row counts when running batches, they must be
returned to the client. Doesn't matter if it's a single protocol message
with a list of IDs, or a series of individual messages, though.

-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: RFC: Async query processing

От

Craig Ringer

Дата:

05 января 2014 г., 16:10:17

On 01/04/2014 01:22 AM, Merlin Moncure wrote:
> Long term, I'd rather see an optimized 'ORM flush' assemble the data
> into a structured data set (perhaps a JSON document) and pass it to
> some receiving routine that decomposed it into records.

The same is true on the input side. I'd much rather be sending an ORM
client a big JSON / YAML / whatever graph than a horrible,
duplication-filled chained LEFT JOIN projection like they currently rely
on. When they're not just doing n+1 selects, which is worse.

I think that's really a side-issue though. ORMs aren't going to change
in a hurry, and batching / fire-and-forget support is good for all sorts
of other jobs too.

-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: RFC: Async query processing

От

Greg Stark

Дата:

05 января 2014 г., 17:12:36

On Fri, Jan 3, 2014 at 3:20 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I think Florian has a good point there, and the reason is this: what
> you are talking about will be of exactly zero use to applications that
> want to see the results of one query before launching the next.

There are techniques for handling that actually. For a better
explanation than I can do see
http://kentonv.github.io/capnproto/rpc.html. It's mostly a language
feature but it does require support from the protocol to be able to
reference data in earlier responses in your subsequent requests.

You can pull this trick by having an RPC server near the database so
all the latency is handled by the RPC server and not the database
connection. But it seems to me that that's unnecessary complication
and shouldn't be necessary. The database protocol basically is an RPC
layer.

-- 
greg

Re: RFC: Async query processing

От

Florian Weimer

Дата:

05 января 2014 г., 18:57:17

On 01/05/2014 03:11 PM, Greg Stark wrote:
> On Fri, Jan 3, 2014 at 3:20 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I think Florian has a good point there, and the reason is this: what
>> you are talking about will be of exactly zero use to applications that
>> want to see the results of one query before launching the next.
>
> There are techniques for handling that actually. For a better
> explanation than I can do see
> http://kentonv.github.io/capnproto/rpc.html. It's mostly a language
> feature but it does require support from the protocol to be able to
> reference data in earlier responses in your subsequent requests.

Purely from a perspective of latency avoidance, sub-queries, WITH or 
stored procedures can achieve the same thing, and work even if the 
intermediate result has to undergo some transformation. :-)

-- 
Florian Weimer / Red Hat Product Security Team

Re: RFC: Async query processing

От

Florian Weimer

Дата:

05 января 2014 г., 19:00:39

On 01/04/2014 04:39 PM, Martijn van Oosterhout wrote:

> Why switch between COPY commands, why could you not do it in one? For
> example:
>
> COPY table1(col1, col2, ...),
>       table2(col1, col2, ...)
> FROM STDIN WITH (tableoids);
> tableoid1<tab>col1<tab>col2...
> tableoid2<tab>...
> ...
> \.

My originally idea was to avoid any server-side changes, so that 
applications do not need fallback code for talking to old servers.

-- 
Florian Weimer / Red Hat Product Security Team

Re: RFC: Async query processing

От

David Fetter

Дата:

06 января 2014 г., 07:41:08

On Fri, Jan 03, 2014 at 03:06:11PM -0200, Claudio Freire wrote:
> On Fri, Jan 3, 2014 at 12:20 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > Claudio Freire <klaussfreire@gmail.com> writes:
> >> On Fri, Jan 3, 2014 at 10:22 AM, Florian Weimer <fweimer@redhat.com> wrote:
> >>> Loading data into the database isn't such an uncommon task.  Not everything
> >>> is OLTP.
> >
> >> Truly, but a sustained insert stream of 10 Mbps is certainly way
> >> beyond common non-OLTP loads. This is far more specific than non-OLTP.
> >
> > I think Florian has a good point there, and the reason is this: what
> > you are talking about will be of exactly zero use to applications that
> > want to see the results of one query before launching the next.  Which
> > eliminates a whole lot of apps.  I suspect that almost the *only*
> > common use case in which a stream of queries can be launched without
> > feedback is going to be bulk data loading.  It's not clear at all
> > that pipelining the PQexec code path is the way to better performance
> > for that --- why not use COPY, instead?
> 
> You're forgetting ORM workloads.
> 
> ORMs can usually plan the inserts to be in a sequence that both don't
> require feedback (except the knowledge that they were successful), and
> that do not violate constraints.
> 
> Flushing a whole object hierarchy for instance, can be done without
> feedback. Not even serial columns need feedback, since many ORMs
> (SQLAlchemy, Hibernate) support allocation of ID sequences in batches
> (by issuing a proper select nextval).

This is already doable using data-modifying WITH clauses.  Whether
ORMs are (or should be made) smart enough to take advantage of this is
probably out of the scope of this discussion, though.  For what it's
worth, I'm not in favor of orthogonality in languages, so +1 from me.
Cheers,
David.
-- 
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Re: RFC: Async query processing

От

Florian Weimer

Дата:

22 апреля 2014 г., 14:19:19

On 01/05/2014 01:56 PM, Craig Ringer wrote:

> JDBC also has a statement batching interface. Right now PgJDBC just
> unwraps the batch and runs each query individually. Any async-support
> improvements server-side should probably consider the need of executing
> a batch. The batch might be one PreparedStatement with many different
> parameters, or it might be a series of unrelated statements. A way for
> PgJDBC to run the batch without syncing with the server after each query
> would be really helpful.

The server already supports this because it supports pipelining.

Anyway, I have now switched my application to COPY FROM STDIN where 
possible, and the increase in throughput is phenomenal—even though it is 
running completely locally.  I could squeeze out more round-trips if I 
had an asynchronous query facility in libpq.

Feedback in this thread was, "we want something like this in libpq, but 
not the thing you proposed".  But there have been no concrete 
counter-proposals, and some of the responses did not take into account 
the inherent complexities of round-trip avoidance.  So I'm not sure how 
to move this topic forward.

-- 
Florian Weimer / Red Hat Product Security Team

Re: RFC: Async query processing

От

Claudio Freire

Дата:

22 апреля 2014 г., 20:03:55

On Tue, Apr 22, 2014 at 8:19 AM, Florian Weimer <fweimer@redhat.com> wrote:
> Feedback in this thread was, "we want something like this in libpq, but not
> the thing you proposed".  But there have been no concrete counter-proposals,
> and some of the responses did not take into account the inherent
> complexities of round-trip avoidance.  So I'm not sure how to move this
> topic forward.

What exactly do you mean by not taking into account?

I do not believe you responded to my proposed interface. I'm not
requesting a change in roundtrip avoidance measures, just the
interface.

Re: RFC: Async query processing

От

Florian Weimer

Дата:

22 апреля 2014 г., 21:46:12

On 01/03/2014 06:06 PM, Claudio Freire wrote:

> Per-query expectations could be such a thing. And it can even work with PQexec:
>
> PQexec(con, "SELECT nextval('a_id_seq') FROM generate_series(1,10);");
> --read--
> PQexec(con, "SELECT nextval('b_id_seq') FROM generate_series(1,10);");
> --read--
> PQexec(con, "INSERT INTO a (...);", PQEXPECT_NO_RESULT | PQASYNC_CORK);
> PQexec(con, "INSERT INTO b (...);", PQEXPECT_NO_RESULT | PQASYNC_CORK);
> PQexec(con, "INSERT INTO a (...);", PQEXPECT_NO_RESULT | PQASYNC_CORK);
> PQexec(con, "INSERT INTO b (...);", PQEXPECT_NO_RESULT | PQASYNC_CORK);
> PQexec(con, "INSERT INTO a (...);", PQEXPECT_NO_RESULT | PQASYNC_CORK);
> PQexec(con, "INSERT INTO b (...);", PQEXPECT_NO_RESULT | PQASYNC_CORK);
> ... 9 times...
> PQexec(con, "INSERT INTO a (...);", PQEXPECT_NO_RESULT | PQASYNC_CORK);
> PQexec(con, "INSERT INTO b (...);", PQEXPECT_NO_RESULT | PQASYNC);
> do {
>     // do something useful
> } while (PQflush());
>
> Here, the PQASYNC flag would temporarily switch to non-blocking I/O,
> and buffer what cannot be sent. PQASNC_CORK, would only buffer (only
> send if the buffer is full). After any ASYNC call, PQflush would be
> necessary (to flush the send queue and to consume the expected
> responses), but I can imagine any synchronous call (PQexec,
> PQsendQuery or whatever) could detect a non-empty buffer and just
> blockingly flush right there.

How would you consume results once they arrive?  I think this only 
covers the no-result case, and it has since come to my attention that 
the Node.js folks are looking for general async response processing.

(In order to implement your proposal, we'd have to add quite a few 
additional APIs to include the flag argument, but that's a minor detail.)

-- 
Florian Weimer / Red Hat Product Security Team

Re: RFC: Async query processing

От

Florian Weimer

Дата:

22 апреля 2014 г., 21:49:54

On 04/22/2014 07:03 PM, Claudio Freire wrote:
> On Tue, Apr 22, 2014 at 8:19 AM, Florian Weimer <fweimer@redhat.com> wrote:
>> Feedback in this thread was, "we want something like this in libpq, but not
>> the thing you proposed".  But there have been no concrete counter-proposals,
>> and some of the responses did not take into account the inherent
>> complexities of round-trip avoidance.  So I'm not sure how to move this
>> topic forward.
>
> What exactly do you mean by not taking into account?

A couple of them were along the lines "let's just send batches of 
queries and sync between batches".  This does not seem very helpful to 
me because sizing the batches is difficult, and the sizes could be quite 
large.

> I do not believe you responded to my proposed interface. I'm not
> requesting a change in roundtrip avoidance measures, just the
> interface.

I've sent a reply now, I think I missed this part.  Sorry.

-- 
Florian Weimer / Red Hat Product Security Team

Re: RFC: Async query processing

От

Claudio Freire

Дата:

22 апреля 2014 г., 22:14:27

On Tue, Apr 22, 2014 at 3:45 PM, Florian Weimer <fweimer@redhat.com> wrote:
> On 01/03/2014 06:06 PM, Claudio Freire wrote:
>
>> Per-query expectations could be such a thing. And it can even work with
>> PQexec:
>>
>> PQexec(con, "SELECT nextval('a_id_seq') FROM generate_series(1,10);");
>> --read--
>> PQexec(con, "SELECT nextval('b_id_seq') FROM generate_series(1,10);");
>> --read--
>> PQexec(con, "INSERT INTO a (...);", PQEXPECT_NO_RESULT | PQASYNC_CORK);
>> PQexec(con, "INSERT INTO b (...);", PQEXPECT_NO_RESULT | PQASYNC_CORK);
>> PQexec(con, "INSERT INTO a (...);", PQEXPECT_NO_RESULT | PQASYNC_CORK);
>> PQexec(con, "INSERT INTO b (...);", PQEXPECT_NO_RESULT | PQASYNC_CORK);
>> PQexec(con, "INSERT INTO a (...);", PQEXPECT_NO_RESULT | PQASYNC_CORK);
>> PQexec(con, "INSERT INTO b (...);", PQEXPECT_NO_RESULT | PQASYNC_CORK);
>> ... 9 times...
>> PQexec(con, "INSERT INTO a (...);", PQEXPECT_NO_RESULT | PQASYNC_CORK);
>> PQexec(con, "INSERT INTO b (...);", PQEXPECT_NO_RESULT | PQASYNC);
>> do {
>>     // do something useful
>> } while (PQflush());
>>
>> Here, the PQASYNC flag would temporarily switch to non-blocking I/O,
>> and buffer what cannot be sent. PQASNC_CORK, would only buffer (only
>> send if the buffer is full). After any ASYNC call, PQflush would be
>> necessary (to flush the send queue and to consume the expected
>> responses), but I can imagine any synchronous call (PQexec,
>> PQsendQuery or whatever) could detect a non-empty buffer and just
>> blockingly flush right there.
>
>
> How would you consume results once they arrive?  I think this only covers
> the no-result case,

You could do PQEXPECT_ROWCOUNT for storing rowcounts (and supply a
pointer to a result buffer), or PQ_BUFFER_RESULTS and do the same. The
user would have to know beforehand the size of the result set (or an
upper bound of it), and readiness notification would also need to be
solved. There could also be PQEXPECT_DISCARD_RESULTS.

Alternatively, you could use a callback, node-style, and it would
solve everything (including readiness and error notification), but I'm
not sure about the portability of such a thing. Callbacks certainly
would be tricky when ABI compatibility has to be maintained. It would
however be a much better interface.

The pattern here, is the API needs to perform all the magic and
complex buffering and flushing, it should not be on the application
side.

> and it has since come to my attention that the Node.js
> folks are looking for general async response processing.

Node support would take a little more work. Specifically, for node to
work with this API, the non-blocking case has to be handled properly,
allowing node to wait on the FDs instead of requiring it to flush and
block on the event loop thread.

That means a buffer at least as large as the query parameters, which
should be no problem (but might be tricky to implement), and handling
the equivalent of EWOULDBLOCK at the PQexec(.., PQASYNC) calls.

In any case, on any specific connection, query processing is linear.
So you really need a "result callback queue" (however you implement
it, be the aplication or the API). What I propose is moving as much as
possible to the API, since it will be common to all users of the async
functionality, and it will make it possible to fix bugs in that code
centrally too.

My earlier examples where all about discarding results, because that's
what enables the most thoughput, and it covers lots of cases. But, as
has been mentioned in previous posts, rowcounts at the very least have
to be handled as well, so there's that. I guess we can throw in
generic result callbacks (and errbacks) if the ABI allows it, and it
will be a net win in clarity and simplicity.

On Tue, Apr 22, 2014 at 3:49 PM, Florian Weimer <fweimer@redhat.com> wrote:
> On 04/22/2014 07:03 PM, Claudio Freire wrote:
>>
>> On Tue, Apr 22, 2014 at 8:19 AM, Florian Weimer <fweimer@redhat.com>
>> wrote:
>>>
>>> Feedback in this thread was, "we want something like this in libpq, but
>>> not
>>> the thing you proposed".  But there have been no concrete
>>> counter-proposals,
>>> and some of the responses did not take into account the inherent
>>> complexities of round-trip avoidance.  So I'm not sure how to move this
>>> topic forward.
>>
>>
>> What exactly do you mean by not taking into account?
>
>
> A couple of them were along the lines "let's just send batches of queries
> and sync between batches".  This does not seem very helpful to me because
> sizing the batches is difficult, and the sizes could be quite large.

Not at all.

The proposal certainly has that in their examples, but the API can be
used with no explicit synchronization. That's what I was going for
when I suggested that other API calls could flush implicitly when
needed. If you never call flush, flush happens on its own when
necessary. If you never call synchronous API, you never wait for
replies. Though when you need the results (as in right now), you need
to flush explicitly, there's no way around that.

Also, feel free to consider all this mere opinion. I'm certainly not
the one implementing it ;-)

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: RFC: Async query processing