Re: [HACKERS] PATCH: Batch/pipelining support for libpq

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: [HACKERS] PATCH: Batch/pipelining support for libpq
Дата
Msg-id 20170823094053.wug6junk35346ajz@alap3.anarazel.de
обсуждение исходный текст
Ответ на Re: [HACKERS] PATCH: Batch/pipelining support for libpq  (Vaishnavi Prabakaran <vaishnaviprabakaran@gmail.com>)
Ответы Re: [HACKERS] PATCH: Batch/pipelining support for libpq  (Vaishnavi Prabakaran <vaishnaviprabakaran@gmail.com>)
Список pgsql-hackers
Hi,

On 2017-08-10 15:23:06 +1000, Vaishnavi Prabakaran wrote:
> Andres Freund wrote :
> > If you were to send a gigabyte of queries, it'd buffer them all up in
> memory... So some
> >more intelligence is going to be needed.
> 
> Firstly, sorry for the delayed response as I got busy with other
> activities.

No worries - development of new features was slowed down anyway, due to
the v10 feature freeze.


> To buffer up the queries before flushing them to the socket, I think
> "conn->outCount>=65536" is ok to use, as 64k is considered to be safe in
> Windows as per comments in pqSendSome() API.
> 
> Attached the code patch to replace pqFlush calls with pqBatchFlush in the
> asynchronous libpq function calls flow.
> Still pqFlush is used in "PQbatchSyncQueue" and
> "PQexitBatchMode" functions.


> Am failing to see the benefit in allowing user to set
> PQBatchAutoFlush(true|false) property? Is it really needed?

I'm inclined not to introduce that for now. If somebody comes up with a
convincing usecase and numbers, we can add it later. Libpq API is set in
stone, so I'd rather not introduce unnecessary stuff...



> +   <para>
> +    Much like asynchronous query mode, there is no performance disadvantage to
> +    using batching and pipelining. It increases client application complexity
> +    and extra caution is required to prevent client/server deadlocks but
> +    can sometimes offer considerable performance improvements.
> +   </para>

That's not necessarily true, is it? Unless you count always doing
batches of exactly size 1.


> +   <para>
> +    Batching is most useful when the server is distant, i.e. network latency
> +    (<quote>ping time</quote>) is high, and when many small operations are being performed in
> +    rapid sequence. There is usually less benefit in using batches when each
> +    query takes many multiples of the client/server round-trip time to execute.
> +    A 100-statement operation run on a server 300ms round-trip-time away would take
> +    30 seconds in network latency alone without batching; with batching it may spend
> +    as little as 0.3s waiting for results from the server.
> +   </para>

I'd add a remark that this is frequently beneficial even in cases of
minimal latency - as e.g. shown by the numbers I presented upthread.


> +   <para>
> +    Use batches when your application does lots of small
> +    <literal>INSERT</literal>, <literal>UPDATE</literal> and
> +    <literal>DELETE</literal> operations that can't easily be transformed into
> +    operations on sets or into a
> +    <link linkend="libpq-copy"><literal>COPY</literal></link> operation.
> +   </para>

Aren't SELECTs also a major beneficiarry of this?


> +   <para>
> +    Batching is less useful when information from one operation is required by the
> +    client before it knows enough to send the next operation.

s/less/not/


> +   <note>
> +    <para>
> +     The batch API was introduced in PostgreSQL 10.0, but clients using PostgresSQL 10.0 version of libpq can
> +     use batches on server versions 8.4 and newer. Batching works on any server
> +     that supports the v3 extended query protocol.
> +    </para>
> +   </note>

Where's the 8.4 coming from?


> +   <para>
> +    The client uses libpq's asynchronous query functions to dispatch work,
> +    marking the end of each batch with <function>PQbatchSyncQueue</function>.
> +    And to get results, it uses <function>PQgetResult</function> and
> +    <function>PQbatchProcessQueue</function>. It may eventually exit
> +    batch mode with <function>PQexitBatchMode</function> once all results are
> +    processed.
> +   </para>
> +
> +   <note>
> +    <para>
> +     It is best to use batch mode with <application>libpq</application> in
> +     <link linkend="libpq-pqsetnonblocking">non-blocking mode</link>. If used in
> +     blocking mode it is possible for a client/server deadlock to occur. The
> +     client will block trying to send queries to the server, but the server will
> +     block trying to send results from queries it has already processed to the
> +     client. This only occurs when the client sends enough queries to fill its
> +     output buffer and the server's receive buffer before switching to
> +     processing input from the server, but it's hard to predict exactly when
> +     that'll happen so it's best to always use non-blocking mode.
> +    </para>
> +   </note>

Mention that nonblocking only actually helps if send/recv is done as
required, and can essentially require unbound memory?  We probably
should either document or implement some smarts about when to signal
read/write readyness. Otherwise we e.g. might be receiving tons of
result data without having sent the next query - or the other way round.


> +   <sect3 id="libpq-batch-sending">
> +    <title>Issuing queries</title>
> +
> +    <para>
> +     After entering batch mode the application dispatches requests
> +     using normal asynchronous <application>libpq</application> functions such as 
> +     <function>PQsendQueryParams</function>, <function>PQsendPrepare</function>,
> +     <function>PQsendQueryPrepared</function>, <function>PQsendDescribePortal</function>,
> +     <function>PQsendDescribePrepared</function>.
> +     The asynchronous requests are followed by a <link
> +     linkend="libpq-PQbatchSyncQueue"><function>PQbatchSyncQueue(conn)</function></link> call to mark
> +     the end of the batch. The client <emphasis>does not</emphasis> need to call
> +     <function>PQgetResult</function> immediately after dispatching each
> +     operation. <link linkend="libpq-batch-results">Result processing</link>
> +     is handled separately.
> +    </para>
> +    
> +    <para>
> +     Batched operations will be executed by the server in the order the client
> +     sends them. The server will send the results in the order the statements
> +     executed. The server may begin executing the batch before all commands
> +     in the batch are queued and the end of batch command is sent. If any
> +     statement encounters an error the server aborts the current transaction and
> +     skips processing the rest of the batch. Query processing resumes after the
> +     end of the failed batch.
> +    </para>

Maybe note that multiple batches can be "in flight"?
I.e. PQbatchSyncQueue() is about error handling, nothing else? Don't
have a great idea, but we might want to rename...


> +    <note>
> +     <para>
> +      The client must not assume that work is committed when it
> +      <emphasis>sends</emphasis> a <literal>COMMIT</literal>, only when the
> +      corresponding result is received to confirm the commit is complete.
> +      Because errors arrive asynchronously the application needs to be able to
> +      restart from the last <emphasis>received</emphasis> committed change and
> +      resend work done after that point if something goes wrong.
> +     </para>
> +    </note>

This seems fairly independent of batching.


> +   </sect3>
> +
> +   <sect3 id="libpq-batch-interleave">
> +    <title>Interleaving result processing and query dispatch</title>
> +
> +    <para>
> +     To avoid deadlocks on large batches the client should be structured around
> +     a nonblocking I/O loop using a function like <function>select</function>,
> +     <function>poll</function>, <function>epoll</function>,
> +     <function>WaitForMultipleObjectEx</function>, etc.
> +    </para>
> +
> +    <para>
> +     The client application should generally maintain a queue of work still to
> +     be dispatched and a queue of work that has been dispatched but not yet had
> +     its results processed.

Hm. Why? If queries are just issued, no such queue is required?


> When the socket is writable it should dispatch more
> +     work. When the socket is readable it should read results and process them,
> +     matching them up to the next entry in its expected results queue. Batches
> +     should be scoped to logical units of work, usually (but not always) one
> +     transaction per batch. There's no need to exit batch mode and re-enter it
> +     between batches or to wait for one batch to finish before sending the next.
> +    </para>

This really needs to take memory usage into account.

> +       </variablelist>
> +
> +     </listitem>
> +    </varlistentry>
> +
> +    <varlistentry id="libpq-PQenterBatchMode">
> +     <term>
> +      <function>PQenterBatchMode</function>
> +      <indexterm>
> +       <primary>PQenterBatchMode</primary>
> +      </indexterm>
> +     </term>
> +
> +     <listitem>
> +      <para>
> +      Causes a connection to enter batch mode if it is currently idle or
> +      already in batch mode.
> +
> +<synopsis>
> +int PQenterBatchMode(PGconn *conn);
> +</synopsis>
> +
> +        </para>
> +        <para>
> +          Returns 1 for success. Returns 0 and has no 
> +          effect if the connection is not currently idle, i.e. it has a result 
> +          ready, is waiting for more input from the server, etc. This function 
> +          does not actually send anything to the server, it just changes the 
> +          <application>libpq</application> connection state.
> +
> +        </para>
> +     </listitem>
> +    </varlistentry>
> +
> +    <varlistentry id="libpq-PQexitBatchMode">
> +     <term>
> +      <function>PQexitBatchMode</function>
> +      <indexterm>
> +       <primary>PQexitBatchMode</primary>
> +      </indexterm>
> +     </term>
> +
> +     <listitem>
> +      <para>
> +      Causes a connection to exit batch mode if it is currently in batch mode
> +      with an empty queue and no pending results.
> +<synopsis>
> +int PQexitBatchMode(PGconn *conn);
> +</synopsis>
> +        </para>
> +        <para>Returns 1 for success.
> +      Returns 1 and takes no action if not in batch mode. If the connection has

"returns 1"?


> +    <varlistentry id="libpq-PQbatchSyncQueue">
> +     <term>
> +      <function>PQbatchSyncQueue</function>
> +      <indexterm>
> +       <primary>PQbatchSyncQueue</primary>
> +      </indexterm>
> +     </term>
> +
> +     <listitem>
> +      <para>
> +      Delimits the end of a set of a batched commands by sending a <link
> +      linkend="protocol-flow-ext-query">sync message</link> and flushing
> +      the send buffer. The end of a batch serves as 
> +      the delimiter of an implicit transaction and
> +      an error recovery point; see <link linkend="libpq-batch-errors">
> +      error handling</link>.

I wonder why this isn't framed as PQbatchIssue/Send/...()? Syncing seems
to mostly make sense from a protocol POV.


> +    <varlistentry id="libpq-PQbatchQueueCount">
> +     <term>
> +      <function>PQbatchQueueCount</function>
> +      <indexterm>
> +       <primary>PQbatchQueueCount</primary>
> +      </indexterm>
> +     </term>
> +
> +     <listitem>
> +      <para>
> +      Returns the number of queries still in the queue for this batch, not
> +      including any query that's currently having results being processed.
> +      This is the number of times <function>PQbatchProcessQueue</function> has to be
> +      called before the query queue is empty again.
> +
> +<synopsis>
> +int PQbatchQueueCount(PGconn *conn);
> +</synopsis>
> +
> +      </para>
> +     </listitem>
> +    </varlistentry>

Given that apps are supposed to track this, I'm not sure why we have
this?


> +/*
> + * PQrecyclePipelinedCommand
> + *    Push a command queue entry onto the freelist. It must be a dangling entry
> + *    with null next pointer and not referenced by any other entry's next pointer.
> + */
> +static void
> +PQrecyclePipelinedCommand(PGconn *conn, PGcommandQueueEntry * entry)
> +{
> +    if (entry == NULL)
> +        return;
> +    if (entry->next != NULL)
> +    {
> +        fprintf(stderr, libpq_gettext("tried to recycle non-dangling command queue entry"));
> +        abort();

Don't think we use abort() in libpq like that. There's some Assert()s
tho.


>  static bool
>  PQsendQueryStart(PGconn *conn)
> @@ -1377,20 +1486,59 @@ PQsendQueryStart(PGconn *conn)
>                            libpq_gettext("no connection to the server\n"));
>          return false;
>      }
> -    /* Can't send while already busy, either. */
> -    if (conn->asyncStatus != PGASYNC_IDLE)
> +    /* Can't send while already busy, either, unless enqueuing for later */
> +    if (conn->asyncStatus != PGASYNC_IDLE && conn->batch_status == PQBATCH_MODE_OFF)
>      {
>          printfPQExpBuffer(&conn->errorMessage,
>                            libpq_gettext("another command is already in progress\n"));
>          return false;
>      }
>  
> -    /* initialize async result-accumulation state */
> -    pqClearAsyncResult(conn);
> +    if (conn->batch_status != PQBATCH_MODE_OFF)
> +    {
> +    /*

Weirdly indented.


> +    if (conn->batch_status != PQBATCH_MODE_OFF)
> +    {
> +        pipeCmd = PQmakePipelinedCommand(conn);
> +
> +        if (pipeCmd == NULL)
> +            return 0;            /* error msg already set */
> +
> +        last_query = &pipeCmd->query;
> +        queryclass = &pipeCmd->queryclass;
> +    }
> +    else
> +    {
> +        last_query = &conn->last_query;
> +        queryclass = &conn->queryclass;
> +    }

The amount of complexity / branches we're adding to all of these is more
than a bit unsightly.


> +/*
> + * PQbatchQueueCount
> + *     Return number of queries currently pending in batch mode
> + */
> +int
> +PQbatchQueueCount(PGconn *conn)
> +{
> +    int            count = 0;
> +    PGcommandQueueEntry *entry;
> +
> +    if (PQbatchStatus(conn) == PQBATCH_MODE_OFF)
> +        return 0;
> +
> +    entry = conn->cmd_queue_head;
> +    while (entry != NULL)
> +    {
> +        ++count;
> +        entry = entry->next;
> +    }
> +    return count;
> +}

Ugh, O(N)? In that case I'd rather just remove this.


> +/*
> + * PQbatchBegin

Mismatched w/ actual function name.


> + *     Put an idle connection in batch mode. Commands submitted after this
> + *     can be pipelined on the connection, there's no requirement to wait for
> + *     one to finish before the next is dispatched.
> + *
> + *     Queuing of new query or syncing during COPY is not allowed.

+"a"?

> + *     A set of commands is terminated by a PQbatchQueueSync. Multiple sets of batched
> + *     commands may be sent while in batch mode. Batch mode can be exited by
> + *     calling PQbatchEnd() once all results are processed.
> + *
> + *     This doesn't actually send anything on the wire, it just puts libpq
> + *     into a state where it can pipeline work.
> + */
> +int
> +PQenterBatchMode(PGconn *conn)
> +{
> +    if (!conn)
> +        return false;

true/false isn't quite in line with int return code.


> +/*
> + * PQbatchEnd

wrong name.

> + *     End batch mode and return to normal command mode.
> + *
> + *     Has no effect unless the client has processed all results
> + *     from all outstanding batches and the connection is idle.

That seems wrong - will lead to hard to diagnose errors.


> + *     Returns true if batch mode ended.
> + */
> +int
> +PQexitBatchMode(PGconn *conn)
> +{
> +    if (!conn)
> +        return false;
> +
> +    if (conn->batch_status == PQBATCH_MODE_OFF)
> +        return true;
> +
> +    switch (conn->asyncStatus)
> +    {
> +        case PGASYNC_IDLE:
> +            printfPQExpBuffer(&conn->errorMessage,
> +                   libpq_gettext_noop("internal error, IDLE in batch mode"));
> +            break;
> +        case PGASYNC_COPY_IN:
> +        case PGASYNC_COPY_OUT:
> +        case PGASYNC_COPY_BOTH:
> +            printfPQExpBuffer(&conn->errorMessage,
> +                   libpq_gettext_noop("internal error, COPY in batch mode"));
> +            break;

So we'll still check the queue in this case, that's a bit weird?



> +/*
> + * PQbatchQueueSync

Out of sync.

> + *     End a batch submission by sending a protocol sync. The connection will
> + *     remain in batch mode and unavailable for new non-batch commands until all
> + *     results from the batch are processed by the client.

"unavailable for new non-batch commands" - that's hard to follow, and
seems pretty redundant with PQendBatchMode (or however it's called).

> + *     It's legal to start submitting another batch immediately, without waiting
> + *     for the results of the current batch. There's no need to end batch mode
> + *     and start it again.
> + *
> + *     If a command in a batch fails, every subsequent command up to and including
> + *     the PQbatchQueueSync command result gets set to PGRES_BATCH_ABORTED state. If the
> + *     whole batch is processed without error, a PGresult with PGRES_BATCH_END is
> + *     produced.

Hm, should probably mention that that's only true for commands since the
last PQbatchQueueSync?



> +/*
> + * PQbatchQueueProcess

Out of sync.


> + * Returns false if the current query isn't done yet, the connection
> + * is not in a batch, or there are no more queries to process.

Last complaint about this - think this forgiving mode is a mistake.


> + */
> +int
> +PQbatchProcessQueue(PGconn *conn)
> +{

> +    /* This command's results will come in immediately.
> +     * Initialize async result-accumulation state */
> +    pqClearAsyncResult(conn);

I'm not following?


>  /*
>   * PQgetResult
> @@ -1749,10 +2228,32 @@ PQgetResult(PGconn *conn)

> +            if (conn->batch_status != PQBATCH_MODE_OFF)
> +            {
> +                /*
> +                 * batched queries aren't followed by a Sync to put us back in
> +                 * PGASYNC_IDLE state, and when we do get a sync we could
> +                 * still have another batch coming after this one.

This needs rephrasing.


> +                 * The connection isn't idle since we can't submit new
> +                 * nonbatched commands. It isn't also busy since the current
> +                 * command is done and we need to process a new one.
> +                 */
> +                conn->asyncStatus = PGASYNC_QUEUED;

Not sure I like the name.


> +    if (conn->asyncStatus == PGASYNC_QUEUED || conn->batch_status != PQBATCH_MODE_OFF)
> +    {
> +        printfPQExpBuffer(&conn->errorMessage,
> +                          libpq_gettext("Synchronous command execution functions are not allowed in batch
mode\n"));
> +        return false;
> +    }

Why do we need the PGASYNC_QUEUED test here?

> +/* pqBatchFlush
> + * In batch mode, data will be flushed only when the out buffer reaches the threshold value.
> + * In non-batch mode, data will be flushed all the time.
> + */
> +static int
> +pqBatchFlush(PGconn *conn)
> +{
> +    if ((conn->batch_status == PQBATCH_MODE_OFF)||(conn->outCount>=65536))
> +        return(pqFlush(conn));
> +    return 0; /* Just to keep compiler quiet */
> +}

This should be defined in a macro or such, rather than hardcoded.


Falling over now. This seems like enough feedback for a bit of work
anyway.

Regards,

Andres



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Simon Riggs
Дата:
Сообщение: Re: [HACKERS] Explicit relation name in VACUUM VERBOSE log
Следующее
От: Jeevan Chalke
Дата:
Сообщение: Re: [HACKERS] Partition-wise aggregation/grouping