dynamic result sets support in extended query protocol

Поиск
Список
Период
Сортировка
От Peter Eisentraut
Тема dynamic result sets support in extended query protocol
Дата
Msg-id 6e747f98-835f-2e05-cde5-86ee444a7140@2ndquadrant.com
обсуждение исходный текст
Ответы Re: dynamic result sets support in extended query protocol  (Tatsuo Ishii <ishii@sraoss.co.jp>)
Re: dynamic result sets support in extended query protocol  (Andrew Dunstan <andrew@dunslane.net>)
Re: dynamic result sets support in extended query protocol  (Andres Freund <andres@anarazel.de>)
Список pgsql-hackers
I want to progress work on stored procedures returning multiple result 
sets.  Examples of how this could work on the SQL side have previously 
been shown [0].  We also have ongoing work to make psql show multiple 
result sets [1].  This appears to work fine in the simple query 
protocol.  But the extended query protocol doesn't support multiple 
result sets at the moment [2].  This would be desirable to be able to 
use parameter binding, and also since one of the higher-level goals 
would be to support the use case of stored procedures returning multiple 
result sets via JDBC.

[0]: 
https://www.postgresql.org/message-id/flat/4580ff7b-d610-eaeb-e06f-4d686896b93b%402ndquadrant.com
[1]: https://commitfest.postgresql.org/29/2096/
[2]: https://www.postgresql.org/message-id/9507.1534370765%40sss.pgh.pa.us

(Terminology: I'm calling this project "dynamic result sets", which 
includes several concepts: 1) multiple result sets, 2) those result sets 
can have different structures, 3) the structure of the result sets is 
decided at run time, not declared in the schema/procedure definition/etc.)

One possibility I rejected was to invent a third query protocol beside 
the simple and extended one.  This wouldn't really match with the 
requirements of JDBC and similar APIs because the APIs for sending 
queries don't indicate whether dynamic result sets are expected or 
required, you only indicate that later by how you process the result 
sets.  So we really need to use the existing ways of sending off the 
queries.  Also, avoiding a third query protocol is probably desirable in 
general to avoid extra code and APIs.

So here is my sketch on how this functionality could be woven into the 
extended query protocol.  I'll go through how the existing protocol 
exchange works and then point out the additions that I have in mind.

These additions could be enabled by a _pq_ startup parameter sent by the 
client.  Alternatively, it might also work without that because the 
client would just reject protocol messages it doesn't understand, but 
that's probably less desirable behavior.

So here is how it goes:

C: Parse
S: ParseComplete

At this point, the server would know whether the statement it has parsed 
can produce dynamic result sets.  For a stored procedure, this would be 
declared with the procedure definition, so when the CALL statement is 
parsed, this can be noticed.  I don't actually plan any other cases, but 
for the sake of discussion, perhaps some variant of EXPLAIN could also 
return multiple result sets, and that could also be detected from 
parsing the EXPLAIN invocation.

At this point a client would usually do

C: Describe (statement)
S: ParameterDescription
S: RowDescription

New would be that the server would now also respond with a new message, say,

S: DynamicResultInfo

that indicates that dynamic result sets will follow later.  The message 
would otherwise be empty.  (We could perhaps include the number of 
result sets, but this might not actually be useful, and perhaps it's 
better not to spent effort on counting things that don't need to be 
counted.)

(If we don't guard this by a _pq_ startup parameter from the client, an 
old client would now error out because of an unexpected protocol message.)

Now the normal bind and execute sequence follows:

C: Bind
S: BindComplete
(C: Describe (portal))
(S: RowDescription)
C: Execute
S: ... (DataRows)
S: CommandComplete

In the case of a CALL with output parameters, this "primary" result set 
contains one row with the output parameters (existing behavior).

Now, if the client has seen DynamicResultInfo earlier, it should now go 
into a new subsequence to get the remaining result sets, like this 
(naming obviously to be refined):

C: NextResult
S: NextResultReady
C: Describe (portal)
S: RowDescription
C: Execute
....
S: CommandComplete
C: NextResult
...
C: NextResult
S: NoNextResult
C: Sync
S: ReadyForQuery

I think this would all have to use the unnamed portal, but perhaps there 
could be other uses with named portals.  Some details to be worked out.

One could perhaps also do without the DynamicResultInfo message and just 
put extra information into the CommandComplete message indicating "there 
are more result sets after this one".

(Following the model from the simple query protocol, CommandComplete 
really means one result set complete, not the whole top-level command. 
ReadyForQuery means the whole command is complete.  This is perhaps 
debatable, and interesting questions could also arise when considering 
what should happen in the simple query protocol when a query string 
consists of multiple commands each returning multiple result sets.  But 
it doesn't really seem sensible to cater to that.)

One thing that's missing in this sequence is a way to specify the 
desired output format (text/binary) for each result set.  This could be 
added to the NextResult message, but at that point the client doesn't 
yet know the number of columns in the result set, so we could only do it 
globally.  Then again, since the result sets are dynamic, it's less 
likely that a client would be coded to set per-column output codes. 
Then again, I would hate to bake such a restriction into the protocol, 
because some is going to try.  (I suspect what would be more useful in 
practice is to designate output formats per data type.)  So if we wanted 
to have this fully featured, it might have to look something like this:

C: NextResult
S: NextResultReady
C: Describe (dynamic) (new message subkind)
S: RowDescription
C: Bind (zero parameters, optionally format codes)
S: BindComplete
C: Describe (portal)
S: RowDescription
C: Execute
...

While this looks more complicated, client libraries could reuse existing 
code that starts processing with a Bind message and continues to 
CommandComplete, and then just loops back around.

The mapping of this to libpq in a simple case could look like this:

PQsendQueryParams(conn, "CALL ...", ...);
PQgetResult(...);  // gets output parameters
PQnextResult(...);  // new: sends NextResult+Bind
PQgetResult(...);  // and repeat

Again, it's not clear here how to declare the result column output 
formats.  Since libpq doesn't appear to expose the Bind message 
separately, I'm not sure what to do here.

In JDBC, the NextResult message would correspond to the 
Statement.getMoreResults() method.  It will need a bit of conceptual 
adjustment because the first result set sent on the protocol is actually 
the output parameters, which the JDBC API returns separately from a 
ResultSet, so the initial CallableStatement.execute() call will need to 
process the primary result set and then send NextResult and obtain the 
first dynamic result as the first ResultSet for its API, but that can be 
handled internally.

Thoughts so far?

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Dilip Kumar
Дата:
Сообщение: Re: Logical replication CPU-bound with TRUNCATE/DROP/CREATE many tables
Следующее
От: Greg Nancarrow
Дата:
Сообщение: Re: Parallel INSERT (INTO ... SELECT ...)