Re: PATCH: Batch/pipelining support for libpq

Поиск
Список
Период
Сортировка
От Shay Rojansky
Тема Re: PATCH: Batch/pipelining support for libpq
Дата
Msg-id CADT4RqBBw=XJ7geNkMdx+NmkQ4T1epPhH-yHeGpKmgNfGXWduA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: PATCH: Batch/pipelining support for libpq  (Craig Ringer <craig@2ndquadrant.com>)
Ответы Re: PATCH: Batch/pipelining support for libpq  (Craig Ringer <craig@2ndquadrant.com>)
Список pgsql-hackers
> It has recently come to my attention that this implementation is problematic
> because it forces the batch to occur within a transaction; in other words,
> there's no option for a non-transactional batch.

That's not strictly the case. If you explicitly BEGIN and COMMIT,
those operations are honoured within the batch.

I wasn't precise in my formulation (although I think we understand each other)... The problem I'm trying to address here is the fact that in the "usual" batching implementation (i.e. where a single Sync message is sent at the end of the batch), there's no support for batches which have no transactions whatsoever (i.e. where each statement is auto-committed and errors in earlier statements don't trigger skipping of later statements).

What's missing is implicit transactions. Usually if you send a series
of messages they will each get their own implicit transaction. If you
batch them, the whole lot gets an implicit transaction. This is
because the PostgreSQL v3 protocol conflates transaction delineation
with protocol error recovery into a single Sync message type.

That's right. I sent a message complaining about this conflation a while ago: https://www.postgresql.org/message-id/CADT4RqDdo9EcFbxwB_YO2H3BVZ0t-1qqZ%3D%2B%2BdVMnYaN6BpyUGQ%40mail.gmail.com. There weren't any responses, although I'll add a note on the wiki on this as a requested feature for the v4 protocol.

If the mythical v4 protocol ever happens I'd want to split Sync into
two messages, one which is a protocol synchronisation message and
another that is a transaction delimiter. Or give it flags or whatever.

Totally agree.
 
In the mean time:

> This can be a problem for
> several reasons: users may want to sent off a batch of inserts, not caring
> whether one of them fails (e.g. because of a unique constraint violation).
> In other words, in some scenarios it may be appropriate for later batched
> statements to be executed when an earlier batched statement raised an error.
> If Sync is only sent at the very end, this isn't possible.

Right, and that remains the case even with explicit transaction
delineation, because the first ERROR causes processing of all
subsequent messages to be skipped.

The design I have in libpq allows for this by allowing the client to
delimit batches without ending batch mode, concurrently consuming a
stream of multiple batches. Each endbatch is a Sync. So a client that
wants autocommit-like behavour can send a series of 1-query batches. 

I think I'll need to document this a bit better since it's more subtle
than I properly explained.

Ah, I see. libpq's API is considerably more low-level than what Npgsql needs to provide. If I understand correctly, you allow users to specify exactly where to insert Sync messages (if at all), so that any number of statements arbitrarily interleaved with Sync messages can be sent without starting to read any results. If so, then the user indeed has everything they need to control the exact transactional behavior they want (including full auto-commit) without compromising on performance in any way (i.e. by increasing roundtrips).

The only minor problem I can see is that PQsendEndBatch not only adds a Sync message to the buffer, but also flushes it. This means that you may be forcing users to needlessly flush the buffer just because they wanted to insert a Sync. In other words, users can't send the following messages in a single buffer/packet: Prepare1/Bind1/Describe1/Execute1/Sync1/Prepare2/Bind2/Describe2/Execute2/Sync2 - they have to be split into different packets. Of course, this is a relatively minor performance issue (especially when compared to the overall performance benefits provided by batching), and providing an API distinction between adding a Sync and flushing the buffer may over-complicate the API. I just thought I'd mention it.
 
Yes, that's what I suggest, and basically what the libpq batch
interface does, though it expects the client to deal with the
transaction boundaries.

You will need to think hard about transaction boundaries as they
relate to multi-statements unless nPgSQL parses out each statement
from multi-statement strings like PgJDBC does. Otherwise a user can
sneak in:

somestatement; BEGIN; someotherstatement;

or

somestatement; CoMMiT; otherstatement;

That's a good point. I definitely don't want to depend on client-side parsing of SQL in any way (in fact a planned feature is to allow using Npgsql without any sort of client-side parsing/manipulation of SQL).  However, the fact that BEGIN/COMMIT can be sent in batches doesn't appear too problematic to me.

When it's about to send a batch, Npgsql knows whether it's in an (explicit) transaction or not (by examining the transaction indicator on the last ReadyForQuery message it received). If it's not in an (explicit) transaction, it automatically inserts a Sync message after every Execute. If some statement happens to be a BEGIN, it will be executed as a normal statement and so on. The only issue is that if an error occurs after a sneaked-in BEGIN, all subsequent statements will fail, and all have the Sync messages Npgsql inserted. The end result will be a series of errors that will be raised up to the user, but this isn't fundamentally different from the case of a simple auto-commit batch containing multiple failures (because of unique constraint violation or whatever) - multiple errors is something that will have to be handled in any case.

Thanks for all your comments. Npgsql's support of batches needs to be more complicated than libpq's since it's a more high-level interface - whereas libpq offloads some of the sending/processing complexity to the user, Npgsql needs to take care of most of it internally (another good example is deadlock avoidance...).

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Heikki Linnakangas
Дата:
Сообщение: Re: signal handling in plpython
Следующее
От: Mario De Frutos Dieguez
Дата:
Сообщение: Re: signal handling in plpython