Обсуждение: WIP: libpq: add a possibility to not send D(escribe) when executing a prepared statement

Поиск
Список
Период
Сортировка

WIP: libpq: add a possibility to not send D(escribe) when executing a prepared statement

От
Ivan Trofimov
Дата:
Hi!


Currently libpq sends B(ind), D(escribe), E(execute), S(ync) when 
executing a prepared statement.

The response for that D message is a RowDescription, which doesn't 
change during prepared

statement lifetime (with the attributes format being an exception, as 
they aren't know before execution) .


In a presumably very common case of repeatedly executing the same 
statement, this leads to

both client and server parsing/sending exactly the same RowDescritpion 
data over and over again.


Instead, library user could acquire a statement result RowDescription 
once (via PQdescribePrepared),

and reuse it in subsequent calls to PQexecPrepared and/or its async 
friends. Doing it this way saves

a measurable amount of CPU for both client and server and saves a lot of 
network traffic, for example:

when selecting a single row from a table with 30 columns, where each 
column has 10-symbols name, and

every value in a row is a 10-symbols TEXT, i'm seeing an amount of bytes 
sent to client decreased

by a factor of 2.8, and the CPU time client spends in userland decreased 
by a factor of ~1.5.


The patch attached adds a family of functions

PQsendQueryPreparedPredescribed, PQgetResultPredescribed, 
PQisBusyPredescribed,

which allow a user to do just that.

If the idea seems reasonable i'd be happy to extend these to 
PQexecPrepared as well and cover it with tests.


P.S. This is my first time ever sending a patch via email, so please 
don't hesitate to point at mistakes

i'm doing in the process.

Вложения

Re: WIP: libpq: add a possibility to not send D(escribe) when executing a prepared statement

От
"Andrey M. Borodin"
Дата:
Hi Ivan,

thank you for the patch.

> On 22 Nov 2023, at 03:58, Ivan Trofimov <i.trofimow@yandex.ru> wrote:
>
> Currently libpq sends B(ind), D(escribe), E(execute), S(ync) when executing a prepared statement.
> The response for that D message is a RowDescription, which doesn't change during prepared
> statement lifetime (with the attributes format being an exception, as they aren't know before execution) .
From my POV the idea seems reasonable (though I’m not a real libpq expert).
BTW some drivers also send Describe even before Bind. This creates some fuss for routing connection poolers.

> In a presumably very common case of repeatedly executing the same statement, this leads to
> both client and server parsing/sending exactly the same RowDescritpion data over and over again.
> Instead, library user could acquire a statement result RowDescription once (via PQdescribePrepared),
> and reuse it in subsequent calls to PQexecPrepared and/or its async friends.
But what if query result structure changes? Will we detect this error gracefully and return correct error?


Best regards, Andrey Borodin.


Re: WIP: libpq: add a possibility to not send D(escribe) when executing a prepared statement

От
Ivan Trofimov
Дата:
>> In a presumably very common case of repeatedly executing the same statement, this leads to
>> both client and server parsing/sending exactly the same RowDescritpion data over and over again.
>> Instead, library user could acquire a statement result RowDescription once (via PQdescribePrepared),
>> and reuse it in subsequent calls to PQexecPrepared and/or its async friends.
> But what if query result structure changes? Will we detect this error gracefully and return correct error?

Afaik changing prepared statement result structure is prohibited by
Postgres server-side, and should always lead to "ERROR: cached plan
must not change result type", see src/test/regress/sql/plancache.sql.

So yes, from the libpq point of view this is just an server error, which
would be given to the user, the patch shouldn't change any behavior here.

The claim about this always being a server-side error better be
reassured from someone from the Postgres team, of course.



Re: WIP: libpq: add a possibility to not send D(escribe) when executing a prepared statement

От
Tom Lane
Дата:
Ivan Trofimov <i.trofimow@yandex.ru> writes:
> Afaik changing prepared statement result structure is prohibited by
> Postgres server-side, and should always lead to "ERROR: cached plan
> must not change result type", see src/test/regress/sql/plancache.sql.

Independently of whether we're willing to guarantee that that will
never change, I think this patch is basically a bad idea as presented.
It adds a whole new set of programmer-error possibilities, and I doubt
it saves enough in typical cases to justify creating such a foot-gun.

Moreover, it will force us to devote attention to the problem of
keeping libpq itself from misbehaving badly in the inevitable
situation that somebody passes the wrong tuple descriptor.
That is developer effort we could better spend elsewhere.

I say this as somebody who deliberately designed the v3 protocol
to allow clients to skip Describe steps if they were confident
they knew the query result type.  I am not disavowing that choice;
I just think that successful use of that option requires a client-
side coding structure that allows tying a previously-obtained
tuple descriptor to the current query with confidence.  The proposed
API fails badly at that, or at least leaves it up to the end-user
programmer while providing no tools to help her get it right.

Instead, I'm tempted to suggest having PQprepare/PQexecPrepared
maintain a cache that maps statement name to result tupdesc, so that
this is all handled internally to libpq.  The main hole in that idea
is that it's possible to issue PREPARE, DEALLOCATE, etc via PQexec, so
that a user could possibly redefine a prepared statement without libpq
noticing it.  Maybe that's not a big issue.  For a little more safety,
we could add some extra step that the library user has to take to
enable caching of result tupdescs, whereupon it's definitely caller
error if she does that and then changes the statement behind our back.

BTW, the submitted patch lacks both documentation and tests.
For a feature like this, there is much to be said for writing
the documentation *first*.  Having to explain how to use something
often helps you figure out weak spots in your initial design.

            regards, tom lane



Re: WIP: libpq: add a possibility to not send D(escribe) when executing a prepared statement

От
Ivan Trofimov
Дата:
Hi Tom! Thank you for considering this.

> It adds a whole new set of programmer-error possibilities, and I doubt
> it saves enough in typical cases to justify creating such a foot-gun.
Although I agree it adds a considerable amount of complexity, I'd argue 
it doesn't bring the complexity to a new level, since matching queries 
against responses is a concept users of asynchronous processing are 
already familiar with, especially so when pipelining is in play.

In case of a single-row select this can easily save as much as a half of 
the network traffic, which is likely to be encrypted/decrypted through 
multiple hops (a connection-pooler, for example), and has to be 
serialized/parsed on a server, a client, a pooler etc.
For example, i have a service which bombards its Postgres database with 
~20kRPS of "SELECT * FROM users WHERE id=$1", with "users" being a table 
of just a bunch of textual ids, a couple of timestamps and some enums in 
it, and for that service alone this change would save
~10Megabytes of server-originated traffic per second, and i have 
hundreds of such services at my workplace.

I can provide more elaborate network/CPU measurements of different 
workloads if needed.

> Instead, I'm tempted to suggest having PQprepare/PQexecPrepared
> maintain a cache that maps statement name to result tupdesc, so that
> this is all handled internally to libpq
 From a perspective of someone who maintains a library built on top of 
libpq and is familiar with other such libraries, I think this is much 
easier done on the level above libpq, simply because there is more 
control of when and how invalidation/eviction is done, and the level 
above also has a more straightforward way to access the cache across 
different asynchronous processing points.

> I just think that successful use of that option requires a client-
> side coding structure that allows tying a previously-obtained
> tuple descriptor to the current query with confidence. The proposed
> API fails badly at that, or at least leaves it up to the end-user
> programmer while providing no tools to help her get it right
I understand your concerns of usability/safety of what I propose, and I 
think I have an idea of how to make this much less of a foot-gun: what 
if we add a new function

PGresult *
PQexecPreparedPredescribed(PGconn *conn,
                            const char *stmtName,
                            PGresult* description,
                            ...);
which requires both a prepared statement and its tuple descriptor (or 
these two could even be tied together by a struct), and exposes its 
implementation (basically what I've prototyped in the patch) in the 
libpq-int.h?

This way users of synchronous API get a nice thing too, which is 
arguably pretty hard to misuse:
if the description isn't available upfront then there's no point to 
reach for the function added since PQexecPrepared is strictly better 
performance/usability-wise, and if the description is available it's 
most likely cached alongside the statement.
If a user still manages to provide an erroneous description, well,
they either get a parsing error or the erroneous description back,
I don't see how libpq could misbehave badly here.

Exposure of the implementation in the internal includes gives a 
possibility for users to juggle the actual foot-gun, but implies they 
know very well what they are doing, and are ready to be on their own.

What do you think of such approach?