Re: [HACKERS] PATCH: Batch/pipelining support for libpq

Поиск
Список
Период
Сортировка
От Craig Ringer
Тема Re: [HACKERS] PATCH: Batch/pipelining support for libpq
Дата
Msg-id CAMsr+YE2N5Am=iXPuLtNnyH_vgZSjfL40JMTSY3hnNEFZTDsaw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [HACKERS] PATCH: Batch/pipelining support for libpq  (Vaishnavi Prabakaran <vaishnaviprabakaran@gmail.com>)
Ответы Re: [HACKERS] PATCH: Batch/pipelining support for libpq  (Vaishnavi Prabakaran <vaishnaviprabakaran@gmail.com>)
Список pgsql-hackers


On 13 September 2017 at 13:06, Vaishnavi Prabakaran <vaishnaviprabakaran@gmail.com> wrote:


On Wed, Aug 23, 2017 at 7:40 PM, Andres Freund <andres@anarazel.de> wrote:



> Am failing to see the benefit in allowing user to set
> PQBatchAutoFlush(true|false) property? Is it really needed?

I'm inclined not to introduce that for now. If somebody comes up with a
convincing usecase and numbers, we can add it later. Libpq API is set in
stone, so I'd rather not introduce unnecessary stuff...


Thanks for reviewing the patch and yes ok.
 


> +   <para>
> +    Much like asynchronous query mode, there is no performance disadvantage to
> +    using batching and pipelining. It increases client application complexity
> +    and extra caution is required to prevent client/server deadlocks but
> +    can sometimes offer considerable performance improvements.
> +   </para>

That's not necessarily true, is it? Unless you count always doing
batches of exactly size 1.

Client application complexity is increased in batch mode,because application needs to remember the query queue status. Results processing can be done at anytime, so the application needs to know till what query, the results are consumed. 
 

Yep. Also, the client/server deadlocks at issue here are a buffer management issue, and deadlock is probably not exactly the right word. Your app has to process replies from the server while it's sending queries, otherwise it can get into a state where it has no room left in its send buffer, but the server isn't consuming its receive buffer because the server's send buffer is full. To allow the system to make progress, the client must read from the client receive buffer.

This isn't an issue when using libpq normally.

PgJDBC has similar issues with its batch mode, but in PgJDBC it's much worse because there's no non-blocking send available. In libpq you can at least set your sending socket to non-blocking.

 

> +   <para>
> +    Use batches when your application does lots of small
> +    <literal>INSERT</literal>, <literal>UPDATE</literal> and
> +    <literal>DELETE</literal> operations that can't easily be transformed into
> +    operations on sets or into a
> +    <link linkend="libpq-copy"><literal>COPY</literal></link> operation.
> +   </para>

Aren't SELECTs also a major beneficiarry of this?

Yes, many individual SELECTs that cannot be assembled into a single more efficient query would definitely also benefit.
 
Hmm, though SELECTs also benefit from batch mode, doing multiple selects in batch mode will fill up the memory rapidly and might not be as beneficial as other operations listed.

Depends on the SELECT. With wide results you'll get less benefit, but even then you can gain if you're on a high latency network. With "n+1" patterns and similar, you'll see huge gains.
 
Maybe note that multiple batches can be "in flight"?
I.e. PQbatchSyncQueue() is about error handling, nothing else? Don't
have a great idea, but we might want to rename...


This function not only does error handling, but also sends the "Sync" message to backend. In batch mode, "Sync" message is not sent with every query but will
be sent only via this function to mark the end of implicit transaction.  Renamed it to PQbatchCommitQueue. Kindly let me know if you think of any other better name.

I really do not like calling it "commit" as that conflates with a database commit.

A batch can embed multiple BEGINs and COMMITs. It's entirely possible for an earlier part of the batch to succeed and commit, then a later part to fail, if that's the case. So that name is IMO wrong.



> +    <varlistentry id="libpq-PQbatchSyncQueue">
> +     <term>
> +      <function>PQbatchSyncQueue</function>
> +      <indexterm>
> +       <primary>PQbatchSyncQueue</primary>
> +      </indexterm>
> +     </term>

I wonder why this isn't framed as PQbatchIssue/Send/...()? Syncing seems
to mostly make sense from a protocol POV.


Renamed to PQbatchCommitQueue. 


Per above, strong -1 on that. But SendQueue seems OK, or FlushQueue?



> + *   Put an idle connection in batch mode. Commands submitted after this
> + *   can be pipelined on the connection, there's no requirement to wait for
> + *   one to finish before the next is dispatched.
> + *
> + *   Queuing of new query or syncing during COPY is not allowed.

+"a"?

Hmm, Can you explain the question please. I don't understand.

s/of new query/of a new query/


--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Vaishnavi Prabakaran
Дата:
Сообщение: Re: [HACKERS] PATCH: Batch/pipelining support for libpq
Следующее
От: Vaishnavi Prabakaran
Дата:
Сообщение: Re: [HACKERS] PATCH: Batch/pipelining support for libpq