Обсуждение: WIP: libpq: add a possibility to not send D(escribe) when executing a prepared statement
WIP: libpq: add a possibility to not send D(escribe) when executing a prepared statement
От
Ivan Trofimov
Дата:
Hi! Currently libpq sends B(ind), D(escribe), E(execute), S(ync) when executing a prepared statement. The response for that D message is a RowDescription, which doesn't change during prepared statement lifetime (with the attributes format being an exception, as they aren't know before execution) . In a presumably very common case of repeatedly executing the same statement, this leads to both client and server parsing/sending exactly the same RowDescritpion data over and over again. Instead, library user could acquire a statement result RowDescription once (via PQdescribePrepared), and reuse it in subsequent calls to PQexecPrepared and/or its async friends. Doing it this way saves a measurable amount of CPU for both client and server and saves a lot of network traffic, for example: when selecting a single row from a table with 30 columns, where each column has 10-symbols name, and every value in a row is a 10-symbols TEXT, i'm seeing an amount of bytes sent to client decreased by a factor of 2.8, and the CPU time client spends in userland decreased by a factor of ~1.5. The patch attached adds a family of functions PQsendQueryPreparedPredescribed, PQgetResultPredescribed, PQisBusyPredescribed, which allow a user to do just that. If the idea seems reasonable i'd be happy to extend these to PQexecPrepared as well and cover it with tests. P.S. This is my first time ever sending a patch via email, so please don't hesitate to point at mistakes i'm doing in the process.
Вложения
Re: WIP: libpq: add a possibility to not send D(escribe) when executing a prepared statement
От
"Andrey M. Borodin"
Дата:
Hi Ivan, thank you for the patch. > On 22 Nov 2023, at 03:58, Ivan Trofimov <i.trofimow@yandex.ru> wrote: > > Currently libpq sends B(ind), D(escribe), E(execute), S(ync) when executing a prepared statement. > The response for that D message is a RowDescription, which doesn't change during prepared > statement lifetime (with the attributes format being an exception, as they aren't know before execution) . From my POV the idea seems reasonable (though I’m not a real libpq expert). BTW some drivers also send Describe even before Bind. This creates some fuss for routing connection poolers. > In a presumably very common case of repeatedly executing the same statement, this leads to > both client and server parsing/sending exactly the same RowDescritpion data over and over again. > Instead, library user could acquire a statement result RowDescription once (via PQdescribePrepared), > and reuse it in subsequent calls to PQexecPrepared and/or its async friends. But what if query result structure changes? Will we detect this error gracefully and return correct error? Best regards, Andrey Borodin.
Re: WIP: libpq: add a possibility to not send D(escribe) when executing a prepared statement
От
Ivan Trofimov
Дата:
>> In a presumably very common case of repeatedly executing the same statement, this leads to >> both client and server parsing/sending exactly the same RowDescritpion data over and over again. >> Instead, library user could acquire a statement result RowDescription once (via PQdescribePrepared), >> and reuse it in subsequent calls to PQexecPrepared and/or its async friends. > But what if query result structure changes? Will we detect this error gracefully and return correct error? Afaik changing prepared statement result structure is prohibited by Postgres server-side, and should always lead to "ERROR: cached plan must not change result type", see src/test/regress/sql/plancache.sql. So yes, from the libpq point of view this is just an server error, which would be given to the user, the patch shouldn't change any behavior here. The claim about this always being a server-side error better be reassured from someone from the Postgres team, of course.
Ivan Trofimov <i.trofimow@yandex.ru> writes: > Afaik changing prepared statement result structure is prohibited by > Postgres server-side, and should always lead to "ERROR: cached plan > must not change result type", see src/test/regress/sql/plancache.sql. Independently of whether we're willing to guarantee that that will never change, I think this patch is basically a bad idea as presented. It adds a whole new set of programmer-error possibilities, and I doubt it saves enough in typical cases to justify creating such a foot-gun. Moreover, it will force us to devote attention to the problem of keeping libpq itself from misbehaving badly in the inevitable situation that somebody passes the wrong tuple descriptor. That is developer effort we could better spend elsewhere. I say this as somebody who deliberately designed the v3 protocol to allow clients to skip Describe steps if they were confident they knew the query result type. I am not disavowing that choice; I just think that successful use of that option requires a client- side coding structure that allows tying a previously-obtained tuple descriptor to the current query with confidence. The proposed API fails badly at that, or at least leaves it up to the end-user programmer while providing no tools to help her get it right. Instead, I'm tempted to suggest having PQprepare/PQexecPrepared maintain a cache that maps statement name to result tupdesc, so that this is all handled internally to libpq. The main hole in that idea is that it's possible to issue PREPARE, DEALLOCATE, etc via PQexec, so that a user could possibly redefine a prepared statement without libpq noticing it. Maybe that's not a big issue. For a little more safety, we could add some extra step that the library user has to take to enable caching of result tupdescs, whereupon it's definitely caller error if she does that and then changes the statement behind our back. BTW, the submitted patch lacks both documentation and tests. For a feature like this, there is much to be said for writing the documentation *first*. Having to explain how to use something often helps you figure out weak spots in your initial design. regards, tom lane
Re: WIP: libpq: add a possibility to not send D(escribe) when executing a prepared statement
От
Ivan Trofimov
Дата:
Hi Tom! Thank you for considering this. > It adds a whole new set of programmer-error possibilities, and I doubt > it saves enough in typical cases to justify creating such a foot-gun. Although I agree it adds a considerable amount of complexity, I'd argue it doesn't bring the complexity to a new level, since matching queries against responses is a concept users of asynchronous processing are already familiar with, especially so when pipelining is in play. In case of a single-row select this can easily save as much as a half of the network traffic, which is likely to be encrypted/decrypted through multiple hops (a connection-pooler, for example), and has to be serialized/parsed on a server, a client, a pooler etc. For example, i have a service which bombards its Postgres database with ~20kRPS of "SELECT * FROM users WHERE id=$1", with "users" being a table of just a bunch of textual ids, a couple of timestamps and some enums in it, and for that service alone this change would save ~10Megabytes of server-originated traffic per second, and i have hundreds of such services at my workplace. I can provide more elaborate network/CPU measurements of different workloads if needed. > Instead, I'm tempted to suggest having PQprepare/PQexecPrepared > maintain a cache that maps statement name to result tupdesc, so that > this is all handled internally to libpq From a perspective of someone who maintains a library built on top of libpq and is familiar with other such libraries, I think this is much easier done on the level above libpq, simply because there is more control of when and how invalidation/eviction is done, and the level above also has a more straightforward way to access the cache across different asynchronous processing points. > I just think that successful use of that option requires a client- > side coding structure that allows tying a previously-obtained > tuple descriptor to the current query with confidence. The proposed > API fails badly at that, or at least leaves it up to the end-user > programmer while providing no tools to help her get it right I understand your concerns of usability/safety of what I propose, and I think I have an idea of how to make this much less of a foot-gun: what if we add a new function PGresult * PQexecPreparedPredescribed(PGconn *conn, const char *stmtName, PGresult* description, ...); which requires both a prepared statement and its tuple descriptor (or these two could even be tied together by a struct), and exposes its implementation (basically what I've prototyped in the patch) in the libpq-int.h? This way users of synchronous API get a nice thing too, which is arguably pretty hard to misuse: if the description isn't available upfront then there's no point to reach for the function added since PQexecPrepared is strictly better performance/usability-wise, and if the description is available it's most likely cached alongside the statement. If a user still manages to provide an erroneous description, well, they either get a parsing error or the erroneous description back, I don't see how libpq could misbehave badly here. Exposure of the implementation in the internal includes gives a possibility for users to juggle the actual foot-gun, but implies they know very well what they are doing, and are ready to be on their own. What do you think of such approach?