Обсуждение: WIP: libpq: add a possibility to not send D(escribe) when executing a prepared statement
WIP: libpq: add a possibility to not send D(escribe) when executing a prepared statement
От
Ivan Trofimov
Дата:
Hi! Currently libpq sends B(ind), D(escribe), E(execute), S(ync) when executing a prepared statement. The response for that D message is a RowDescription, which doesn't change during prepared statement lifetime (with the attributes format being an exception, as they aren't know before execution) . In a presumably very common case of repeatedly executing the same statement, this leads to both client and server parsing/sending exactly the same RowDescritpion data over and over again. Instead, library user could acquire a statement result RowDescription once (via PQdescribePrepared), and reuse it in subsequent calls to PQexecPrepared and/or its async friends. Doing it this way saves a measurable amount of CPU for both client and server and saves a lot of network traffic, for example: when selecting a single row from a table with 30 columns, where each column has 10-symbols name, and every value in a row is a 10-symbols TEXT, i'm seeing an amount of bytes sent to client decreased by a factor of 2.8, and the CPU time client spends in userland decreased by a factor of ~1.5. The patch attached adds a family of functions PQsendQueryPreparedPredescribed, PQgetResultPredescribed, PQisBusyPredescribed, which allow a user to do just that. If the idea seems reasonable i'd be happy to extend these to PQexecPrepared as well and cover it with tests. P.S. This is my first time ever sending a patch via email, so please don't hesitate to point at mistakes i'm doing in the process.
Вложения
Re: WIP: libpq: add a possibility to not send D(escribe) when executing a prepared statement
От
"Andrey M. Borodin"
Дата:
Hi Ivan, thank you for the patch. > On 22 Nov 2023, at 03:58, Ivan Trofimov <i.trofimow@yandex.ru> wrote: > > Currently libpq sends B(ind), D(escribe), E(execute), S(ync) when executing a prepared statement. > The response for that D message is a RowDescription, which doesn't change during prepared > statement lifetime (with the attributes format being an exception, as they aren't know before execution) . From my POV the idea seems reasonable (though I’m not a real libpq expert). BTW some drivers also send Describe even before Bind. This creates some fuss for routing connection poolers. > In a presumably very common case of repeatedly executing the same statement, this leads to > both client and server parsing/sending exactly the same RowDescritpion data over and over again. > Instead, library user could acquire a statement result RowDescription once (via PQdescribePrepared), > and reuse it in subsequent calls to PQexecPrepared and/or its async friends. But what if query result structure changes? Will we detect this error gracefully and return correct error? Best regards, Andrey Borodin.
Re: WIP: libpq: add a possibility to not send D(escribe) when executing a prepared statement
От
Ivan Trofimov
Дата:
>> In a presumably very common case of repeatedly executing the same statement, this leads to >> both client and server parsing/sending exactly the same RowDescritpion data over and over again. >> Instead, library user could acquire a statement result RowDescription once (via PQdescribePrepared), >> and reuse it in subsequent calls to PQexecPrepared and/or its async friends. > But what if query result structure changes? Will we detect this error gracefully and return correct error? Afaik changing prepared statement result structure is prohibited by Postgres server-side, and should always lead to "ERROR: cached plan must not change result type", see src/test/regress/sql/plancache.sql. So yes, from the libpq point of view this is just an server error, which would be given to the user, the patch shouldn't change any behavior here. The claim about this always being a server-side error better be reassured from someone from the Postgres team, of course.
Ivan Trofimov <i.trofimow@yandex.ru> writes:
> Afaik changing prepared statement result structure is prohibited by
> Postgres server-side, and should always lead to "ERROR: cached plan
> must not change result type", see src/test/regress/sql/plancache.sql.
Independently of whether we're willing to guarantee that that will
never change, I think this patch is basically a bad idea as presented.
It adds a whole new set of programmer-error possibilities, and I doubt
it saves enough in typical cases to justify creating such a foot-gun.
Moreover, it will force us to devote attention to the problem of
keeping libpq itself from misbehaving badly in the inevitable
situation that somebody passes the wrong tuple descriptor.
That is developer effort we could better spend elsewhere.
I say this as somebody who deliberately designed the v3 protocol
to allow clients to skip Describe steps if they were confident
they knew the query result type. I am not disavowing that choice;
I just think that successful use of that option requires a client-
side coding structure that allows tying a previously-obtained
tuple descriptor to the current query with confidence. The proposed
API fails badly at that, or at least leaves it up to the end-user
programmer while providing no tools to help her get it right.
Instead, I'm tempted to suggest having PQprepare/PQexecPrepared
maintain a cache that maps statement name to result tupdesc, so that
this is all handled internally to libpq. The main hole in that idea
is that it's possible to issue PREPARE, DEALLOCATE, etc via PQexec, so
that a user could possibly redefine a prepared statement without libpq
noticing it. Maybe that's not a big issue. For a little more safety,
we could add some extra step that the library user has to take to
enable caching of result tupdescs, whereupon it's definitely caller
error if she does that and then changes the statement behind our back.
BTW, the submitted patch lacks both documentation and tests.
For a feature like this, there is much to be said for writing
the documentation *first*. Having to explain how to use something
often helps you figure out weak spots in your initial design.
regards, tom lane
Re: WIP: libpq: add a possibility to not send D(escribe) when executing a prepared statement
От
Ivan Trofimov
Дата:
Hi Tom! Thank you for considering this.
> It adds a whole new set of programmer-error possibilities, and I doubt
> it saves enough in typical cases to justify creating such a foot-gun.
Although I agree it adds a considerable amount of complexity, I'd argue
it doesn't bring the complexity to a new level, since matching queries
against responses is a concept users of asynchronous processing are
already familiar with, especially so when pipelining is in play.
In case of a single-row select this can easily save as much as a half of
the network traffic, which is likely to be encrypted/decrypted through
multiple hops (a connection-pooler, for example), and has to be
serialized/parsed on a server, a client, a pooler etc.
For example, i have a service which bombards its Postgres database with
~20kRPS of "SELECT * FROM users WHERE id=$1", with "users" being a table
of just a bunch of textual ids, a couple of timestamps and some enums in
it, and for that service alone this change would save
~10Megabytes of server-originated traffic per second, and i have
hundreds of such services at my workplace.
I can provide more elaborate network/CPU measurements of different
workloads if needed.
> Instead, I'm tempted to suggest having PQprepare/PQexecPrepared
> maintain a cache that maps statement name to result tupdesc, so that
> this is all handled internally to libpq
From a perspective of someone who maintains a library built on top of
libpq and is familiar with other such libraries, I think this is much
easier done on the level above libpq, simply because there is more
control of when and how invalidation/eviction is done, and the level
above also has a more straightforward way to access the cache across
different asynchronous processing points.
> I just think that successful use of that option requires a client-
> side coding structure that allows tying a previously-obtained
> tuple descriptor to the current query with confidence. The proposed
> API fails badly at that, or at least leaves it up to the end-user
> programmer while providing no tools to help her get it right
I understand your concerns of usability/safety of what I propose, and I
think I have an idea of how to make this much less of a foot-gun: what
if we add a new function
PGresult *
PQexecPreparedPredescribed(PGconn *conn,
const char *stmtName,
PGresult* description,
...);
which requires both a prepared statement and its tuple descriptor (or
these two could even be tied together by a struct), and exposes its
implementation (basically what I've prototyped in the patch) in the
libpq-int.h?
This way users of synchronous API get a nice thing too, which is
arguably pretty hard to misuse:
if the description isn't available upfront then there's no point to
reach for the function added since PQexecPrepared is strictly better
performance/usability-wise, and if the description is available it's
most likely cached alongside the statement.
If a user still manages to provide an erroneous description, well,
they either get a parsing error or the erroneous description back,
I don't see how libpq could misbehave badly here.
Exposure of the implementation in the internal includes gives a
possibility for users to juggle the actual foot-gun, but implies they
know very well what they are doing, and are ready to be on their own.
What do you think of such approach?