Обсуждение: dynamic result sets support in extended query protocol

Поиск
Список
Период
Сортировка

dynamic result sets support in extended query protocol

От
Peter Eisentraut
Дата:
I want to progress work on stored procedures returning multiple result 
sets.  Examples of how this could work on the SQL side have previously 
been shown [0].  We also have ongoing work to make psql show multiple 
result sets [1].  This appears to work fine in the simple query 
protocol.  But the extended query protocol doesn't support multiple 
result sets at the moment [2].  This would be desirable to be able to 
use parameter binding, and also since one of the higher-level goals 
would be to support the use case of stored procedures returning multiple 
result sets via JDBC.

[0]: 
https://www.postgresql.org/message-id/flat/4580ff7b-d610-eaeb-e06f-4d686896b93b%402ndquadrant.com
[1]: https://commitfest.postgresql.org/29/2096/
[2]: https://www.postgresql.org/message-id/9507.1534370765%40sss.pgh.pa.us

(Terminology: I'm calling this project "dynamic result sets", which 
includes several concepts: 1) multiple result sets, 2) those result sets 
can have different structures, 3) the structure of the result sets is 
decided at run time, not declared in the schema/procedure definition/etc.)

One possibility I rejected was to invent a third query protocol beside 
the simple and extended one.  This wouldn't really match with the 
requirements of JDBC and similar APIs because the APIs for sending 
queries don't indicate whether dynamic result sets are expected or 
required, you only indicate that later by how you process the result 
sets.  So we really need to use the existing ways of sending off the 
queries.  Also, avoiding a third query protocol is probably desirable in 
general to avoid extra code and APIs.

So here is my sketch on how this functionality could be woven into the 
extended query protocol.  I'll go through how the existing protocol 
exchange works and then point out the additions that I have in mind.

These additions could be enabled by a _pq_ startup parameter sent by the 
client.  Alternatively, it might also work without that because the 
client would just reject protocol messages it doesn't understand, but 
that's probably less desirable behavior.

So here is how it goes:

C: Parse
S: ParseComplete

At this point, the server would know whether the statement it has parsed 
can produce dynamic result sets.  For a stored procedure, this would be 
declared with the procedure definition, so when the CALL statement is 
parsed, this can be noticed.  I don't actually plan any other cases, but 
for the sake of discussion, perhaps some variant of EXPLAIN could also 
return multiple result sets, and that could also be detected from 
parsing the EXPLAIN invocation.

At this point a client would usually do

C: Describe (statement)
S: ParameterDescription
S: RowDescription

New would be that the server would now also respond with a new message, say,

S: DynamicResultInfo

that indicates that dynamic result sets will follow later.  The message 
would otherwise be empty.  (We could perhaps include the number of 
result sets, but this might not actually be useful, and perhaps it's 
better not to spent effort on counting things that don't need to be 
counted.)

(If we don't guard this by a _pq_ startup parameter from the client, an 
old client would now error out because of an unexpected protocol message.)

Now the normal bind and execute sequence follows:

C: Bind
S: BindComplete
(C: Describe (portal))
(S: RowDescription)
C: Execute
S: ... (DataRows)
S: CommandComplete

In the case of a CALL with output parameters, this "primary" result set 
contains one row with the output parameters (existing behavior).

Now, if the client has seen DynamicResultInfo earlier, it should now go 
into a new subsequence to get the remaining result sets, like this 
(naming obviously to be refined):

C: NextResult
S: NextResultReady
C: Describe (portal)
S: RowDescription
C: Execute
....
S: CommandComplete
C: NextResult
...
C: NextResult
S: NoNextResult
C: Sync
S: ReadyForQuery

I think this would all have to use the unnamed portal, but perhaps there 
could be other uses with named portals.  Some details to be worked out.

One could perhaps also do without the DynamicResultInfo message and just 
put extra information into the CommandComplete message indicating "there 
are more result sets after this one".

(Following the model from the simple query protocol, CommandComplete 
really means one result set complete, not the whole top-level command. 
ReadyForQuery means the whole command is complete.  This is perhaps 
debatable, and interesting questions could also arise when considering 
what should happen in the simple query protocol when a query string 
consists of multiple commands each returning multiple result sets.  But 
it doesn't really seem sensible to cater to that.)

One thing that's missing in this sequence is a way to specify the 
desired output format (text/binary) for each result set.  This could be 
added to the NextResult message, but at that point the client doesn't 
yet know the number of columns in the result set, so we could only do it 
globally.  Then again, since the result sets are dynamic, it's less 
likely that a client would be coded to set per-column output codes. 
Then again, I would hate to bake such a restriction into the protocol, 
because some is going to try.  (I suspect what would be more useful in 
practice is to designate output formats per data type.)  So if we wanted 
to have this fully featured, it might have to look something like this:

C: NextResult
S: NextResultReady
C: Describe (dynamic) (new message subkind)
S: RowDescription
C: Bind (zero parameters, optionally format codes)
S: BindComplete
C: Describe (portal)
S: RowDescription
C: Execute
...

While this looks more complicated, client libraries could reuse existing 
code that starts processing with a Bind message and continues to 
CommandComplete, and then just loops back around.

The mapping of this to libpq in a simple case could look like this:

PQsendQueryParams(conn, "CALL ...", ...);
PQgetResult(...);  // gets output parameters
PQnextResult(...);  // new: sends NextResult+Bind
PQgetResult(...);  // and repeat

Again, it's not clear here how to declare the result column output 
formats.  Since libpq doesn't appear to expose the Bind message 
separately, I'm not sure what to do here.

In JDBC, the NextResult message would correspond to the 
Statement.getMoreResults() method.  It will need a bit of conceptual 
adjustment because the first result set sent on the protocol is actually 
the output parameters, which the JDBC API returns separately from a 
ResultSet, so the initial CallableStatement.execute() call will need to 
process the primary result set and then send NextResult and obtain the 
first dynamic result as the first ResultSet for its API, but that can be 
handled internally.

Thoughts so far?

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: dynamic result sets support in extended query protocol

От
Tatsuo Ishii
Дата:
Are you proposing to bump up the protocol version (either major or
minor)?  I am asking because it seems you are going to introduce some
new message types.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

> I want to progress work on stored procedures returning multiple result
> sets.  Examples of how this could work on the SQL side have previously
> been shown [0].  We also have ongoing work to make psql show multiple
> result sets [1].  This appears to work fine in the simple query
> protocol.  But the extended query protocol doesn't support multiple
> result sets at the moment [2].  This would be desirable to be able to
> use parameter binding, and also since one of the higher-level goals
> would be to support the use case of stored procedures returning
> multiple result sets via JDBC.
> 
> [0]:
> https://www.postgresql.org/message-id/flat/4580ff7b-d610-eaeb-e06f-4d686896b93b%402ndquadrant.com
> [1]: https://commitfest.postgresql.org/29/2096/
> [2]:
> https://www.postgresql.org/message-id/9507.1534370765%40sss.pgh.pa.us
> 
> (Terminology: I'm calling this project "dynamic result sets", which
> includes several concepts: 1) multiple result sets, 2) those result
> sets can have different structures, 3) the structure of the result
> sets is decided at run time, not declared in the schema/procedure
> definition/etc.)
> 
> One possibility I rejected was to invent a third query protocol beside
> the simple and extended one.  This wouldn't really match with the
> requirements of JDBC and similar APIs because the APIs for sending
> queries don't indicate whether dynamic result sets are expected or
> required, you only indicate that later by how you process the result
> sets.  So we really need to use the existing ways of sending off the
> queries.  Also, avoiding a third query protocol is probably desirable
> in general to avoid extra code and APIs.
> 
> So here is my sketch on how this functionality could be woven into the
> extended query protocol.  I'll go through how the existing protocol
> exchange works and then point out the additions that I have in mind.
> 
> These additions could be enabled by a _pq_ startup parameter sent by
> the client.  Alternatively, it might also work without that because
> the client would just reject protocol messages it doesn't understand,
> but that's probably less desirable behavior.
> 
> So here is how it goes:
> 
> C: Parse
> S: ParseComplete
> 
> At this point, the server would know whether the statement it has
> parsed can produce dynamic result sets.  For a stored procedure, this
> would be declared with the procedure definition, so when the CALL
> statement is parsed, this can be noticed.  I don't actually plan any
> other cases, but for the sake of discussion, perhaps some variant of
> EXPLAIN could also return multiple result sets, and that could also be
> detected from parsing the EXPLAIN invocation.
> 
> At this point a client would usually do
> 
> C: Describe (statement)
> S: ParameterDescription
> S: RowDescription
> 
> New would be that the server would now also respond with a new
> message, say,
> 
> S: DynamicResultInfo
> 
> that indicates that dynamic result sets will follow later.  The
> message would otherwise be empty.  (We could perhaps include the
> number of result sets, but this might not actually be useful, and
> perhaps it's better not to spent effort on counting things that don't
> need to be counted.)
> 
> (If we don't guard this by a _pq_ startup parameter from the client,
> an old client would now error out because of an unexpected protocol
> message.)
> 
> Now the normal bind and execute sequence follows:
> 
> C: Bind
> S: BindComplete
> (C: Describe (portal))
> (S: RowDescription)
> C: Execute
> S: ... (DataRows)
> S: CommandComplete
> 
> In the case of a CALL with output parameters, this "primary" result
> set contains one row with the output parameters (existing behavior).
> 
> Now, if the client has seen DynamicResultInfo earlier, it should now
> go into a new subsequence to get the remaining result sets, like this
> (naming obviously to be refined):
> 
> C: NextResult
> S: NextResultReady
> C: Describe (portal)
> S: RowDescription
> C: Execute
> ....
> S: CommandComplete
> C: NextResult
> ...
> C: NextResult
> S: NoNextResult
> C: Sync
> S: ReadyForQuery
> 
> I think this would all have to use the unnamed portal, but perhaps
> there could be other uses with named portals.  Some details to be
> worked out.
> 
> One could perhaps also do without the DynamicResultInfo message and
> just put extra information into the CommandComplete message indicating
> "there are more result sets after this one".
> 
> (Following the model from the simple query protocol, CommandComplete
> really means one result set complete, not the whole top-level
> command. ReadyForQuery means the whole command is complete.  This is
> perhaps debatable, and interesting questions could also arise when
> considering what should happen in the simple query protocol when a
> query string consists of multiple commands each returning multiple
> result sets.  But it doesn't really seem sensible to cater to that.)
> 
> One thing that's missing in this sequence is a way to specify the
> desired output format (text/binary) for each result set.  This could
> be added to the NextResult message, but at that point the client
> doesn't yet know the number of columns in the result set, so we could
> only do it globally.  Then again, since the result sets are dynamic,
> it's less likely that a client would be coded to set per-column output
> codes. Then again, I would hate to bake such a restriction into the
> protocol, because some is going to try.  (I suspect what would be more
> useful in practice is to designate output formats per data type.)  So
> if we wanted to have this fully featured, it might have to look
> something like this:
> 
> C: NextResult
> S: NextResultReady
> C: Describe (dynamic) (new message subkind)
> S: RowDescription
> C: Bind (zero parameters, optionally format codes)
> S: BindComplete
> C: Describe (portal)
> S: RowDescription
> C: Execute
> ...
> 
> While this looks more complicated, client libraries could reuse
> existing code that starts processing with a Bind message and continues
> to CommandComplete, and then just loops back around.
> 
> The mapping of this to libpq in a simple case could look like this:
> 
> PQsendQueryParams(conn, "CALL ...", ...);
> PQgetResult(...);  // gets output parameters
> PQnextResult(...);  // new: sends NextResult+Bind
> PQgetResult(...);  // and repeat
> 
> Again, it's not clear here how to declare the result column output
> formats.  Since libpq doesn't appear to expose the Bind message
> separately, I'm not sure what to do here.
> 
> In JDBC, the NextResult message would correspond to the
> Statement.getMoreResults() method.  It will need a bit of conceptual
> adjustment because the first result set sent on the protocol is
> actually the output parameters, which the JDBC API returns separately
> from a ResultSet, so the initial CallableStatement.execute() call will
> need to process the primary result set and then send NextResult and
> obtain the first dynamic result as the first ResultSet for its API,
> but that can be handled internally.
> 
> Thoughts so far?
> 
> -- 
> Peter Eisentraut              http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
> 
> 



Re: dynamic result sets support in extended query protocol

От
Peter Eisentraut
Дата:
On 2020-10-08 10:23, Tatsuo Ishii wrote:
> Are you proposing to bump up the protocol version (either major or
> minor)?  I am asking because it seems you are going to introduce some
> new message types.

It wouldn't be a new major version.  It could either be a new minor 
version, or it would be guarded by a _pq_ protocol message to enable 
this functionality from the client, as described.  Or both?  We haven't 
done this sort of thing a lot, so some discussion on the details might 
be necessary.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: dynamic result sets support in extended query protocol

От
Andrew Dunstan
Дата:
On 10/8/20 3:46 AM, Peter Eisentraut wrote:
> I want to progress work on stored procedures returning multiple result
> sets.  Examples of how this could work on the SQL side have previously
> been shown [0].  We also have ongoing work to make psql show multiple
> result sets [1].  This appears to work fine in the simple query
> protocol.  But the extended query protocol doesn't support multiple
> result sets at the moment [2].  This would be desirable to be able to
> use parameter binding, and also since one of the higher-level goals
> would be to support the use case of stored procedures returning
> multiple result sets via JDBC.
>
> [0]:
> https://www.postgresql.org/message-id/flat/4580ff7b-d610-eaeb-e06f-4d686896b93b%402ndquadrant.com
> [1]: https://commitfest.postgresql.org/29/2096/
> [2]:
> https://www.postgresql.org/message-id/9507.1534370765%40sss.pgh.pa.us
>
> (Terminology: I'm calling this project "dynamic result sets", which
> includes several concepts: 1) multiple result sets, 2) those result
> sets can have different structures, 3) the structure of the result
> sets is decided at run time, not declared in the schema/procedure
> definition/etc.)
>
> One possibility I rejected was to invent a third query protocol beside
> the simple and extended one.  This wouldn't really match with the
> requirements of JDBC and similar APIs because the APIs for sending
> queries don't indicate whether dynamic result sets are expected or
> required, you only indicate that later by how you process the result
> sets.  So we really need to use the existing ways of sending off the
> queries.  Also, avoiding a third query protocol is probably desirable
> in general to avoid extra code and APIs.
>
> So here is my sketch on how this functionality could be woven into the
> extended query protocol.  I'll go through how the existing protocol
> exchange works and then point out the additions that I have in mind.
>
> These additions could be enabled by a _pq_ startup parameter sent by
> the client.  Alternatively, it might also work without that because
> the client would just reject protocol messages it doesn't understand,
> but that's probably less desirable behavior.
>
> So here is how it goes:
>
> C: Parse
> S: ParseComplete
>
> At this point, the server would know whether the statement it has
> parsed can produce dynamic result sets.  For a stored procedure, this
> would be declared with the procedure definition, so when the CALL
> statement is parsed, this can be noticed.  I don't actually plan any
> other cases, but for the sake of discussion, perhaps some variant of
> EXPLAIN could also return multiple result sets, and that could also be
> detected from parsing the EXPLAIN invocation.
>
> At this point a client would usually do
>
> C: Describe (statement)
> S: ParameterDescription
> S: RowDescription
>
> New would be that the server would now also respond with a new
> message, say,
>
> S: DynamicResultInfo
>
> that indicates that dynamic result sets will follow later.  The
> message would otherwise be empty.  (We could perhaps include the
> number of result sets, but this might not actually be useful, and
> perhaps it's better not to spent effort on counting things that don't
> need to be counted.)
>
> (If we don't guard this by a _pq_ startup parameter from the client,
> an old client would now error out because of an unexpected protocol
> message.)
>
> Now the normal bind and execute sequence follows:
>
> C: Bind
> S: BindComplete
> (C: Describe (portal))
> (S: RowDescription)
> C: Execute
> S: ... (DataRows)
> S: CommandComplete
>
> In the case of a CALL with output parameters, this "primary" result
> set contains one row with the output parameters (existing behavior).
>
> Now, if the client has seen DynamicResultInfo earlier, it should now
> go into a new subsequence to get the remaining result sets, like this
> (naming obviously to be refined):
>
> C: NextResult
> S: NextResultReady
> C: Describe (portal)
> S: RowDescription
> C: Execute
> ....
> S: CommandComplete
> C: NextResult
> ...
> C: NextResult
> S: NoNextResult
> C: Sync
> S: ReadyForQuery
>
> I think this would all have to use the unnamed portal, but perhaps
> there could be other uses with named portals.  Some details to be
> worked out.
>
> One could perhaps also do without the DynamicResultInfo message and
> just put extra information into the CommandComplete message indicating
> "there are more result sets after this one".
>
> (Following the model from the simple query protocol, CommandComplete
> really means one result set complete, not the whole top-level command.
> ReadyForQuery means the whole command is complete.  This is perhaps
> debatable, and interesting questions could also arise when considering
> what should happen in the simple query protocol when a query string
> consists of multiple commands each returning multiple result sets. 
> But it doesn't really seem sensible to cater to that.)
>
> One thing that's missing in this sequence is a way to specify the
> desired output format (text/binary) for each result set.  This could
> be added to the NextResult message, but at that point the client
> doesn't yet know the number of columns in the result set, so we could
> only do it globally.  Then again, since the result sets are dynamic,
> it's less likely that a client would be coded to set per-column output
> codes. Then again, I would hate to bake such a restriction into the
> protocol, because some is going to try.  (I suspect what would be more
> useful in practice is to designate output formats per data type.)  So
> if we wanted to have this fully featured, it might have to look
> something like this:
>
> C: NextResult
> S: NextResultReady
> C: Describe (dynamic) (new message subkind)
> S: RowDescription
> C: Bind (zero parameters, optionally format codes)
> S: BindComplete
> C: Describe (portal)
> S: RowDescription
> C: Execute
> ...
>
> While this looks more complicated, client libraries could reuse
> existing code that starts processing with a Bind message and continues
> to CommandComplete, and then just loops back around.
>
> The mapping of this to libpq in a simple case could look like this:
>
> PQsendQueryParams(conn, "CALL ...", ...);
> PQgetResult(...);  // gets output parameters
> PQnextResult(...);  // new: sends NextResult+Bind
> PQgetResult(...);  // and repeat
>
> Again, it's not clear here how to declare the result column output
> formats.  Since libpq doesn't appear to expose the Bind message
> separately, I'm not sure what to do here.
>
> In JDBC, the NextResult message would correspond to the
> Statement.getMoreResults() method.  It will need a bit of conceptual
> adjustment because the first result set sent on the protocol is
> actually the output parameters, which the JDBC API returns separately
> from a ResultSet, so the initial CallableStatement.execute() call will
> need to process the primary result set and then send NextResult and
> obtain the first dynamic result as the first ResultSet for its API,
> but that can be handled internally.
>
> Thoughts so far?
>


Exciting stuff. But I'm a bit concerned about the sequence of
resultsets. The JDBC docco for CallableStatement says:

    A CallableStatement can return one ResultSet object or multiple
    ResultSet objects. Multiple ResultSet objects are handled using
    operations inherited from Statement.

    For maximum portability, a call's ResultSet objects and update
    counts should be processed prior to getting the values of output
    parameters.

And this is more or less in line with the pattern that I've seen when
converting SPs from other systems - the OUT params are usually set at
the end with things like status flags and error messages.

If the OUT parameter resultset has to come first (which is how I read
your proposal - please correct me if I'm wrong) we'll have to stack up
all the resultsets until the SP returns, then send the OUT params, then
send the remaining resultsets. That seems ... suboptimal.  The
alternative would be to send the OUT params last. That might result in
the driver needing to do some lookahead and caching, but I don't think
it's unmanageable. Of course, your protocol would also need changing.


cheers


andrew


--
Andrew Dunstan
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: dynamic result sets support in extended query protocol

От
Dave Cramer
Дата:



On Fri, 9 Oct 2020 at 13:33, Andrew Dunstan <andrew@dunslane.net> wrote:

On 10/8/20 3:46 AM, Peter Eisentraut wrote:
> I want to progress work on stored procedures returning multiple result
> sets.  Examples of how this could work on the SQL side have previously
> been shown [0].  We also have ongoing work to make psql show multiple
> result sets [1].  This appears to work fine in the simple query
> protocol.  But the extended query protocol doesn't support multiple
> result sets at the moment [2].  This would be desirable to be able to
> use parameter binding, and also since one of the higher-level goals
> would be to support the use case of stored procedures returning
> multiple result sets via JDBC.
>
> [0]:
> https://www.postgresql.org/message-id/flat/4580ff7b-d610-eaeb-e06f-4d686896b93b%402ndquadrant.com
> [1]: https://commitfest.postgresql.org/29/2096/
> [2]:
> https://www.postgresql.org/message-id/9507.1534370765%40sss.pgh.pa.us
>
> (Terminology: I'm calling this project "dynamic result sets", which
> includes several concepts: 1) multiple result sets, 2) those result
> sets can have different structures, 3) the structure of the result
> sets is decided at run time, not declared in the schema/procedure
> definition/etc.)
>
> One possibility I rejected was to invent a third query protocol beside
> the simple and extended one.  This wouldn't really match with the
> requirements of JDBC and similar APIs because the APIs for sending
> queries don't indicate whether dynamic result sets are expected or
> required, you only indicate that later by how you process the result
> sets.  So we really need to use the existing ways of sending off the
> queries.  Also, avoiding a third query protocol is probably desirable
> in general to avoid extra code and APIs.
>
> So here is my sketch on how this functionality could be woven into the
> extended query protocol.  I'll go through how the existing protocol
> exchange works and then point out the additions that I have in mind.
>
> These additions could be enabled by a _pq_ startup parameter sent by
> the client.  Alternatively, it might also work without that because
> the client would just reject protocol messages it doesn't understand,
> but that's probably less desirable behavior.
>
> So here is how it goes:
>
> C: Parse
> S: ParseComplete
>
> At this point, the server would know whether the statement it has
> parsed can produce dynamic result sets.  For a stored procedure, this
> would be declared with the procedure definition, so when the CALL
> statement is parsed, this can be noticed.  I don't actually plan any
> other cases, but for the sake of discussion, perhaps some variant of
> EXPLAIN could also return multiple result sets, and that could also be
> detected from parsing the EXPLAIN invocation.
>
> At this point a client would usually do
>
> C: Describe (statement)
> S: ParameterDescription
> S: RowDescription
>
> New would be that the server would now also respond with a new
> message, say,
>
> S: DynamicResultInfo
>
> that indicates that dynamic result sets will follow later.  The
> message would otherwise be empty.  (We could perhaps include the
> number of result sets, but this might not actually be useful, and
> perhaps it's better not to spent effort on counting things that don't
> need to be counted.)
>
> (If we don't guard this by a _pq_ startup parameter from the client,
> an old client would now error out because of an unexpected protocol
> message.)
>
> Now the normal bind and execute sequence follows:
>
> C: Bind
> S: BindComplete
> (C: Describe (portal))
> (S: RowDescription)
> C: Execute
> S: ... (DataRows)
> S: CommandComplete
>
> In the case of a CALL with output parameters, this "primary" result
> set contains one row with the output parameters (existing behavior).
>
> Now, if the client has seen DynamicResultInfo earlier, it should now
> go into a new subsequence to get the remaining result sets, like this
> (naming obviously to be refined):
>
> C: NextResult
> S: NextResultReady
> C: Describe (portal)
> S: RowDescription
> C: Execute
> ....
> S: CommandComplete
> C: NextResult
> ...
> C: NextResult
> S: NoNextResult
> C: Sync
> S: ReadyForQuery
>
> I think this would all have to use the unnamed portal, but perhaps
> there could be other uses with named portals.  Some details to be
> worked out.
>
> One could perhaps also do without the DynamicResultInfo message and
> just put extra information into the CommandComplete message indicating
> "there are more result sets after this one".
>
> (Following the model from the simple query protocol, CommandComplete
> really means one result set complete, not the whole top-level command.
> ReadyForQuery means the whole command is complete.  This is perhaps
> debatable, and interesting questions could also arise when considering
> what should happen in the simple query protocol when a query string
> consists of multiple commands each returning multiple result sets. 
> But it doesn't really seem sensible to cater to that.)
>
> One thing that's missing in this sequence is a way to specify the
> desired output format (text/binary) for each result set.  This could
> be added to the NextResult message, but at that point the client
> doesn't yet know the number of columns in the result set, so we could
> only do it globally.  Then again, since the result sets are dynamic,
> it's less likely that a client would be coded to set per-column output
> codes. Then again, I would hate to bake such a restriction into the
> protocol, because some is going to try.  (I suspect what would be more
> useful in practice is to designate output formats per data type.)  So
> if we wanted to have this fully featured, it might have to look
> something like this:
>
> C: NextResult
> S: NextResultReady
> C: Describe (dynamic) (new message subkind)
> S: RowDescription
> C: Bind (zero parameters, optionally format codes)
> S: BindComplete
> C: Describe (portal)
> S: RowDescription
> C: Execute
> ...
>
> While this looks more complicated, client libraries could reuse
> existing code that starts processing with a Bind message and continues
> to CommandComplete, and then just loops back around.
>
> The mapping of this to libpq in a simple case could look like this:
>
> PQsendQueryParams(conn, "CALL ...", ...);
> PQgetResult(...);  // gets output parameters
> PQnextResult(...);  // new: sends NextResult+Bind
> PQgetResult(...);  // and repeat
>
> Again, it's not clear here how to declare the result column output
> formats.  Since libpq doesn't appear to expose the Bind message
> separately, I'm not sure what to do here.
>
> In JDBC, the NextResult message would correspond to the
> Statement.getMoreResults() method.  It will need a bit of conceptual
> adjustment because the first result set sent on the protocol is
> actually the output parameters, which the JDBC API returns separately
> from a ResultSet, so the initial CallableStatement.execute() call will
> need to process the primary result set and then send NextResult and
> obtain the first dynamic result as the first ResultSet for its API,
> but that can be handled internally.
>
> Thoughts so far?
>


Exciting stuff. But I'm a bit concerned about the sequence of
resultsets. The JDBC docco for CallableStatement says:

    A CallableStatement can return one ResultSet object or multiple
    ResultSet objects. Multiple ResultSet objects are handled using
    operations inherited from Statement.

    For maximum portability, a call's ResultSet objects and update
    counts should be processed prior to getting the values of output
    parameters.

And this is more or less in line with the pattern that I've seen when
converting SPs from other systems - the OUT params are usually set at
the end with things like status flags and error messages.

If the OUT parameter resultset has to come first (which is how I read
your proposal - please correct me if I'm wrong) we'll have to stack up
all the resultsets until the SP returns, then send the OUT params, then
send the remaining resultsets. That seems ... suboptimal.  The
alternative would be to send the OUT params last. That might result in
the driver needing to do some lookahead and caching, but I don't think
it's unmanageable. Of course, your protocol would also need changing.


cheers


andrew


--
Andrew Dunstan
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Currently the JDBC driver does NOT  do :

 At this point a client would usually do
>
> C: Describe (statement)
> S: ParameterDescription
> S: RowDescription

We do not do the Describe until we use a named statement and decide that the extra round trip is worth it.

Making this assumption will cause a performance regression on all queries.

If we are going to make a protocol change there are a number of other things the drivers want. https://github.com/pgjdbc/pgjdbc/blob/master/backend_protocol_v4_wanted_features.md

Thanks,

Dave

Re: dynamic result sets support in extended query protocol

От
Andres Freund
Дата:
Hi,

On 2020-10-08 09:46:38 +0200, Peter Eisentraut wrote:
> New would be that the server would now also respond with a new message, say,
> 
> S: DynamicResultInfo

> Now, if the client has seen DynamicResultInfo earlier, it should now go into
> a new subsequence to get the remaining result sets, like this (naming
> obviously to be refined):

Hm. Isn't this going to be a lot more latency sensitive than we'd like?
This would basically require at least one additional roundtrip for
everything that *potentially* could return multiple result sets, even if
no additional results are returned, right? And it'd add at least one
additional roundtrip for every result set that's actually sent.

Is there really a good reason for forcing the client to issue
NextResult, Describe, Execute for each of the dynamic result sets? It's
not like there's really a case for allowing the clients to skip them,
right?  Why aren't we sending something more like

S: CommandPartiallyComplete
S: RowDescription
S: DataRow...
S: CommandPartiallyComplete
S: RowDescription
S: DataRow...
...
S: CommandComplete
C: Sync

gated by a _pq_ parameter, of course.


> I think this would all have to use the unnamed portal, but perhaps there
> could be other uses with named portals.  Some details to be worked out.

Which'd avoid this too, but:

> One thing that's missing in this sequence is a way to specify the desired
> output format (text/binary) for each result set.

Is a good point. I personally think avoiding the back and forth is more
important though. But if we could address both at the same time...


> (I suspect what would be more useful in practice is to designate
> output formats per data type.)

Yea, that'd be *really* useful. It sucks that we basically require
multiple round trips to make realistic use of the binary data for the
few types where it's a huge win (e.g. bytea).

Greetings,

Andres Freund



Re: dynamic result sets support in extended query protocol

От
Dave Cramer
Дата:
Hi,


On Fri, 9 Oct 2020 at 14:46, Andres Freund <andres@anarazel.de> wrote:
Hi,

On 2020-10-08 09:46:38 +0200, Peter Eisentraut wrote:
> New would be that the server would now also respond with a new message, say,
>
> S: DynamicResultInfo

> Now, if the client has seen DynamicResultInfo earlier, it should now go into
> a new subsequence to get the remaining result sets, like this (naming
> obviously to be refined):

Hm. Isn't this going to be a lot more latency sensitive than we'd like?
This would basically require at least one additional roundtrip for
everything that *potentially* could return multiple result sets, even if
no additional results are returned, right? And it'd add at least one
additional roundtrip for every result set that's actually sent.

Agreed as mentioned. 

Is there really a good reason for forcing the client to issue
NextResult, Describe, Execute for each of the dynamic result sets? It's
not like there's really a case for allowing the clients to skip them,
right?  Why aren't we sending something more like

S: CommandPartiallyComplete
S: RowDescription
S: DataRow...
S: CommandPartiallyComplete
S: RowDescription
S: DataRow...
...
S: CommandComplete
C: Sync

gated by a _pq_ parameter, of course.


> I think this would all have to use the unnamed portal, but perhaps there
> could be other uses with named portals.  Some details to be worked out.

Which'd avoid this too, but:

> One thing that's missing in this sequence is a way to specify the desired
> output format (text/binary) for each result set.

Is a good point. I personally think avoiding the back and forth is more
important though. But if we could address both at the same time...


> (I suspect what would be more useful in practice is to designate
> output formats per data type.)

Yea, that'd be *really* useful. It sucks that we basically require
multiple round trips to make realistic use of the binary data for the
few types where it's a huge win (e.g. bytea).

Yes!!! Ideally in the startup message. 

Dave

Re: dynamic result sets support in extended query protocol

От
Andres Freund
Дата:
Hi,

On 2020-10-09 14:49:11 -0400, Dave Cramer wrote:
> On Fri, 9 Oct 2020 at 14:46, Andres Freund <andres@anarazel.de> wrote:
> > > (I suspect what would be more useful in practice is to designate
> > > output formats per data type.)
> >
> > Yea, that'd be *really* useful. It sucks that we basically require
> > multiple round trips to make realistic use of the binary data for the
> > few types where it's a huge win (e.g. bytea).
> >
> 
> Yes!!! Ideally in the startup message.

I don't think startup is a good choice. For one, it's size limited. But
more importantly, before having successfully established a connection,
there's really no way the driver can know which types it should list as
to be sent in binary (consider e.g. some postgis types, which'd greatly
benefit from being sent in binary, but also just version dependent
stuff).

The hard part around this really is whether and how to deal with changes
in type definitions. From types just being created - comparatively
simple - to extensions being dropped and recreated, with oids
potentially being reused.

Greetings,

Andres Freund



Re: dynamic result sets support in extended query protocol

От
Dave Cramer
Дата:


On Fri, 9 Oct 2020 at 14:59, Andres Freund <andres@anarazel.de> wrote:
Hi,

On 2020-10-09 14:49:11 -0400, Dave Cramer wrote:
> On Fri, 9 Oct 2020 at 14:46, Andres Freund <andres@anarazel.de> wrote:
> > > (I suspect what would be more useful in practice is to designate
> > > output formats per data type.)
> >
> > Yea, that'd be *really* useful. It sucks that we basically require
> > multiple round trips to make realistic use of the binary data for the
> > few types where it's a huge win (e.g. bytea).
> >
>
> Yes!!! Ideally in the startup message.

I don't think startup is a good choice. For one, it's size limited. But
more importantly, before having successfully established a connection,
there's really no way the driver can know which types it should list as
to be sent in binary (consider e.g. some postgis types, which'd greatly
benefit from being sent in binary, but also just version dependent
stuff).

For the most part we know exactly which types we want in binary for 99% of queries.
 
The hard part around this really is whether and how to deal with changes
in type definitions. From types just being created - comparatively
simple - to extensions being dropped and recreated, with oids
potentially being reused.

Fair point but this is going to be much more complex than just sending most of the results in binary which would speed up the overwhelming majority of queries

Dave Cramer

Re: dynamic result sets support in extended query protocol

От
Peter Eisentraut
Дата:
On 2020-10-09 21:02, Dave Cramer wrote:
> For the most part we know exactly which types we want in binary for 99% 
> of queries.
> 
>     The hard part around this really is whether and how to deal with changes
>     in type definitions. From types just being created - comparatively
>     simple - to extensions being dropped and recreated, with oids
>     potentially being reused.
> 
> 
> Fair point but this is going to be much more complex than just sending 
> most of the results in binary which would speed up the overwhelming 
> majority of queries

I've been studying in more detail how the JDBC driver handles binary 
format use.  Having some kind of message "use binary for these types" 
would match its requirements quite exactly.  (I have also studied 
npgsql, but it appears to work quite differently.  More input from there 
and other places with similar requirements would be welcome.)  The 
question as mentioned above is how to deal with type changes.  Let's 
work through a couple of options.

We could send the type/format list with every query.  For example, we 
could extend/enhance/alter the Bind message so that instead of a 
format-per-column it sends a format-per-type.  But then you'd need to 
send the complete type list every time.  The JDBC driver currently has 
20+ types already hardcoded and more optionally, so you'd send 100+ 
bytes for every query, plus required effort for encoding and decoding. 
That seems unattractive.

Or we send the type/format list once near the beginning of the session. 
Then we need to deal with types being recreated or updated etc.

The first option is that we "lock" the types against changes (ignoring 
whether that's actually possible right now).  That would mean you 
couldn't update an affected type/extension while a JDBC session is 
active.  That's no good.  (Imagine connection pools with hours of server 
lifetime.)

Another option is that we invalidate the session when a thus-registered 
type changes.  Also no good.  (We don't want an extension upgrade 
suddenly breaking all open connections.)

Finally, we could do it an a best-effort basis.  We use binary format 
for registered types, until there is some invalidation event for the 
type, at which point we revert to default/text format until the end of a 
session (or until another protocol message arrives re-registering the 
type).  This should work, because the result row descriptor contains the 
actual format type, and there is no guarantee that it's the same one 
that was requested.

So how about that last option?  I imagine a new protocol message, say, 
TypeFormats, that contains a number of type/format pairs.  The message 
would typically be sent right after the first ReadyForQuery, gets no 
response.  It could also be sent at any other time, but I expect that to 
be less used in practice.  Binary format is used for registered types if 
they have binary format support functions, otherwise text continues to 
be used.  There is no error response for types without binary support. 
(There should probably be an error response for registering a type that 
does not exist.)

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: dynamic result sets support in extended query protocol

От
Dave Cramer
Дата:


On Tue, 20 Oct 2020 at 05:57, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote:
On 2020-10-09 21:02, Dave Cramer wrote:
> For the most part we know exactly which types we want in binary for 99%
> of queries.
>
>     The hard part around this really is whether and how to deal with changes
>     in type definitions. From types just being created - comparatively
>     simple - to extensions being dropped and recreated, with oids
>     potentially being reused.
>
>
> Fair point but this is going to be much more complex than just sending
> most of the results in binary which would speed up the overwhelming
> majority of queries

I've been studying in more detail how the JDBC driver handles binary
format use.  Having some kind of message "use binary for these types"
would match its requirements quite exactly.  (I have also studied
npgsql, but it appears to work quite differently.  More input from there
and other places with similar requirements would be welcome.)  The
question as mentioned above is how to deal with type changes.  Let's
work through a couple of options.

I've added Vladimir (pgjdbc), Shay (npgsql) and Mark Paluch (r2dbc)  to this discussion. 
I'm sure there are others but I'm not acquainted with them

We could send the type/format list with every query.  For example, we
could extend/enhance/alter the Bind message so that instead of a
format-per-column it sends a format-per-type.  But then you'd need to
send the complete type list every time.  The JDBC driver currently has
20+ types already hardcoded and more optionally, so you'd send 100+
bytes for every query, plus required effort for encoding and decoding.
That seems unattractive.

Or we send the type/format list once near the beginning of the session.
Then we need to deal with types being recreated or updated etc.

The first option is that we "lock" the types against changes (ignoring
whether that's actually possible right now).  That would mean you
couldn't update an affected type/extension while a JDBC session is
active.  That's no good.  (Imagine connection pools with hours of server
lifetime.)

Another option is that we invalidate the session when a thus-registered
type changes.  Also no good.  (We don't want an extension upgrade
suddenly breaking all open connections.)

Agreed the first 2 options are not viable.
 
Finally, we could do it an a best-effort basis.  We use binary format
for registered types, until there is some invalidation event for the
type, at which point we revert to default/text format until the end of a
session (or until another protocol message arrives re-registering the
type). 
 
Does the driver tell the server what registered types it wants in binary ?
 
This should work, because the result row descriptor contains the
actual format type, and there is no guarantee that it's the same one
that was requested.

So how about that last option?  I imagine a new protocol message, say,
TypeFormats, that contains a number of type/format pairs.  The message
would typically be sent right after the first ReadyForQuery, gets no
response. 
 
This seems a bit hard to control. How long do you wait for no response?
 
It could also be sent at any other time, but I expect that to
be less used in practice.  Binary format is used for registered types if
they have binary format support functions, otherwise text continues to
be used.  There is no error response for types without binary support.
(There should probably be an error response for registering a type that
does not exist.)

I'm not sure we (pgjdbc) want all types with binary support functions sent automatically. Turns out that decoding binary is sometimes slower than decoding the text and the on wire overhead isn't significant. Timestamps/dates with timezone are also interesting as the binary output does not include the timezone.

The notion of a status change message is appealing however. I used the term status change on purpose as there are other server changes we would like to be made aware of. For instance if someone changes the search path, we would like to know. I'm sort of expanding the scope here but if we are imagining ... :)

Dave

Re: dynamic result sets support in extended query protocol

От
Shay Rojansky
Дата:
Very interesting conversation, thanks for including me Dave. Here are some thoughts from the Npgsql perspective,

Re the binary vs. text discussion... A long time ago, Npgsql became a "binary-only" driver, meaning that it never sends or receives values in text encoding, and practically always uses the extended protocol. This was because in most (all?) cases, encoding/decoding binary is more efficient, and maintaining two encoders/decoders (one for text, one for binary) made less and less sense. So by default, Npgsql just requests "all binary" in all Bind messages it sends (there's an API for the user to request text, in which case they get pure strings which they're responsible for parsing). Binary handling is implemented for almost all PG types which support it, and I've hardly seen any complaints about this for the last few years. I'd be interested in any arguments against this decision (Dave, when have you seen that decoding binary is slower than decoding text?).

Given the above, allowing the client to specify in advance which types should be in binary sounds good, but wouldn't help Npgsql much (since by default it already requests binary for everything). It would slightly help in allowing binary-unsupported types to automatically come back as text without manual user API calls, but as I wrote above this is an extremely rare scenario that people don't care much about.

> Is there really a good reason for forcing the client to issue NextResult, Describe, Execute for each of the dynamic result sets?

I very much agree - it should be possible to execute a procedure and consume all results in a single roundtrip, otherwise this is quite a perf killer.

Peter, from your original message:

> Following the model from the simple query protocol, CommandComplete really means one result set complete, not the whole top-level command. ReadyForQuery means the whole command is complete. This is perhaps debatable, and interesting questions could also arise when considering what should happen in the simple query protocol when a query string consists of multiple commands each returning multiple result sets. But it doesn't really seem sensible to cater to that

Npgsql implements batching of multiple statements via the extended protocol in a similar way. In other words, the .NET API allows users to pack multiple SQL statements and execute them in one roundtrip, and Npgsql does this by sending Parse1/Bind1/Describe1/Execute1/Parse2/Bind2/Describe2/Execute2/Sync. So CommandComplete signals completion of a single statement in the batch, whereas ReadyForQuery signals completion of the entire batch. This means that the "interesting questions" mentioned above are possibly relevant to the extended protocol as well.

Re: dynamic result sets support in extended query protocol

От
Jack Christensen
Дата:
Regarding decoding binary vs text performance: There can be a significant performance cost to fetching the binary format over the text format for types such as text. See https://www.postgresql.org/message-id/CAMovtNoHFod2jMAKQjjxv209PCTJx5Kc66anwWvX0mEiaXwgmA%40mail.gmail.com for the previous discussion.

From the pgx driver (https://github.com/jackc/pgx) perspective:

A "use binary for these types" message sent once at the beginning of the session would not only be helpful for dynamic result sets but could simplify use of the extended protocol in general.

Upthread someone posted a page pgjdbc detailing desired changes to the backend protocol (https://github.com/pgjdbc/pgjdbc/blob/master/backend_protocol_v4_wanted_features.md). I concur with almost everything there, but in particular the first suggestion of the backend automatically converting binary values like it does text values would be huge. That combined with the "use binary for these types" message could greatly simplify the driver side work in using the binary format.

CommandComplete vs ReadyForQuery -- pgx does the same as Npgsql in that it bundles batches multiple queries together in the extended protocol and uses CommandComplete for statement completion and ReadyForQuery for batch completion.



On Tue, Oct 20, 2020 at 9:28 AM Shay Rojansky <roji@roji.org> wrote:
Very interesting conversation, thanks for including me Dave. Here are some thoughts from the Npgsql perspective,

Re the binary vs. text discussion... A long time ago, Npgsql became a "binary-only" driver, meaning that it never sends or receives values in text encoding, and practically always uses the extended protocol. This was because in most (all?) cases, encoding/decoding binary is more efficient, and maintaining two encoders/decoders (one for text, one for binary) made less and less sense. So by default, Npgsql just requests "all binary" in all Bind messages it sends (there's an API for the user to request text, in which case they get pure strings which they're responsible for parsing). Binary handling is implemented for almost all PG types which support it, and I've hardly seen any complaints about this for the last few years. I'd be interested in any arguments against this decision (Dave, when have you seen that decoding binary is slower than decoding text?).

Given the above, allowing the client to specify in advance which types should be in binary sounds good, but wouldn't help Npgsql much (since by default it already requests binary for everything). It would slightly help in allowing binary-unsupported types to automatically come back as text without manual user API calls, but as I wrote above this is an extremely rare scenario that people don't care much about.

> Is there really a good reason for forcing the client to issue NextResult, Describe, Execute for each of the dynamic result sets?

I very much agree - it should be possible to execute a procedure and consume all results in a single roundtrip, otherwise this is quite a perf killer.

Peter, from your original message:

> Following the model from the simple query protocol, CommandComplete really means one result set complete, not the whole top-level command. ReadyForQuery means the whole command is complete. This is perhaps debatable, and interesting questions could also arise when considering what should happen in the simple query protocol when a query string consists of multiple commands each returning multiple result sets. But it doesn't really seem sensible to cater to that

Npgsql implements batching of multiple statements via the extended protocol in a similar way. In other words, the .NET API allows users to pack multiple SQL statements and execute them in one roundtrip, and Npgsql does this by sending Parse1/Bind1/Describe1/Execute1/Parse2/Bind2/Describe2/Execute2/Sync. So CommandComplete signals completion of a single statement in the batch, whereas ReadyForQuery signals completion of the entire batch. This means that the "interesting questions" mentioned above are possibly relevant to the extended protocol as well.

Re: dynamic result sets support in extended query protocol

От
Andres Freund
Дата:
Hi,

On 2020-10-20 18:55:41 -0500, Jack Christensen wrote:
> Upthread someone posted a page pgjdbc detailing desired changes to the
> backend protocol (
> https://github.com/pgjdbc/pgjdbc/blob/master/backend_protocol_v4_wanted_features.md).

A lot of the stuff on there seems way beyond what can be achieved in
something incrementally added to the protocol. Fair enough in an article
about "v4" of the protocol. But I don't think we are - nor should we be
- talking about a full new protocol version here. Instead we are talking
about extending the protocol, where the extensions are opt-in.

Greetings,

Andres Freund



Re: dynamic result sets support in extended query protocol

От
Dave Cramer
Дата:


On Tue, 20 Oct 2020 at 20:09, Andres Freund <andres@anarazel.de> wrote:
Hi,

On 2020-10-20 18:55:41 -0500, Jack Christensen wrote:
> Upthread someone posted a page pgjdbc detailing desired changes to the
> backend protocol (
> https://github.com/pgjdbc/pgjdbc/blob/master/backend_protocol_v4_wanted_features.md).

A lot of the stuff on there seems way beyond what can be achieved in
something incrementally added to the protocol. Fair enough in an article
about "v4" of the protocol. But I don't think we are - nor should we be
- talking about a full new protocol version here. Instead we are talking
about extending the protocol, where the extensions are opt-in.

You are correct we are not talking about a whole new protocol, but why not ?
Seems to me we would have a lot more latitude to get it right if we didn't have this limitation.

Dave

Re: dynamic result sets support in extended query protocol

От
Andres Freund
Дата:
Hi,

On 2020-10-20 20:17:45 -0400, Dave Cramer wrote:
> You are correct we are not talking about a whole new protocol, but why not ?
> Seems to me we would have a lot more latitude to get it right if we didn't
> have this limitation.

A new protocol will face a much bigger adoption hurdle, and there's much
stuff that we'll want to do that we'll have a hard time ever getting off
the ground. Whereas opt-in extensions are much easier to get off the ground.

Greetings,

Andres Freund



Re: dynamic result sets support in extended query protocol

От
Peter Eisentraut
Дата:
On 2020-10-20 12:24, Dave Cramer wrote:
>     Finally, we could do it an a best-effort basis.  We use binary format
>     for registered types, until there is some invalidation event for the
>     type, at which point we revert to default/text format until the end
>     of a
>     session (or until another protocol message arrives re-registering the
>     type). 
> 
> Does the driver tell the server what registered types it wants in binary ?

Yes, the driver tells the server, "whenever you send these types, send 
them in binary" (all other types keep sending in text).

>     This should work, because the result row descriptor contains the
>     actual format type, and there is no guarantee that it's the same one
>     that was requested.
> 
>     So how about that last option?  I imagine a new protocol message, say,
>     TypeFormats, that contains a number of type/format pairs.  The message
>     would typically be sent right after the first ReadyForQuery, gets no
>     response. 
> 
> This seems a bit hard to control. How long do you wait for no response?

In this design, you don't need a response.

>     It could also be sent at any other time, but I expect that to
>     be less used in practice.  Binary format is used for registered
>     types if
>     they have binary format support functions, otherwise text continues to
>     be used.  There is no error response for types without binary support.
>     (There should probably be an error response for registering a type that
>     does not exist.)
> 
> I'm not sure we (pgjdbc) want all types with binary support functions 
> sent automatically. Turns out that decoding binary is sometimes slower 
> than decoding the text and the on wire overhead isn't significant. 
> Timestamps/dates with timezone are also interesting as the binary output 
> does not include the timezone.

In this design, you pick the types you want.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: dynamic result sets support in extended query protocol

От
Peter Eisentraut
Дата:
On 2020-10-09 20:46, Andres Freund wrote:
> Is there really a good reason for forcing the client to issue
> NextResult, Describe, Execute for each of the dynamic result sets? It's
> not like there's really a case for allowing the clients to skip them,
> right?  Why aren't we sending something more like
> 
> S: CommandPartiallyComplete
> S: RowDescription
> S: DataRow...
> S: CommandPartiallyComplete
> S: RowDescription
> S: DataRow...
> ...
> S: CommandComplete
> C: Sync

I want to post my current patch, to keep this discussion moving.  There 
are still a number of pieces to pull together, but what I have is a 
self-contained functioning prototype.

The interesting thing about the above message sequence is that the 
"CommandPartiallyComplete" isn't actually necessary.  Since an Execute 
message normally does not issue a RowDescription response, the 
appearance of one is already enough to mark the beginning of a new 
result set.  Moreover, libpq already handles this correctly, so we 
wouldn't need to change it at all.

We might still want to add a new protocol message, for clarity perhaps, 
and that would probably only be a few lines of code on either side, but 
that would only serve for additional error checking and wouldn't 
actually be needed to identify what's going on.

What else we need:

- Think about what should happen if the Execute message specifies a row 
count, and what should happen during subsequent Execute messages on the 
same portal.  I suspect that there isn't a particularly elegant answer, 
but we need to pick some behavior.

- Some way for psql to display multiple result sets. Proposals have been 
made in [0] and [1].  (You need either patch or one like it for the 
regression tests in this patch to pass.)

- Session-level default result formats setting, proposed in [2].  Not 
strictly necessary, but would be most sensible to coordinate these two.

- We don't have a way to test the extended query protocol.  I have 
attached my test program, but we might want to think about something 
more permanent.  Proposals for this have already been made in [3].

- Right now, this only supports returning dynamic result sets from a 
top-level CALL.  Specifications for passing dynamic result sets from one 
procedure to a calling procedure exist in the SQL standard and could be 
added later.

(All the SQL additions in this patch are per SQL standard.  DB2 appears 
to be the closest existing implementation.)


[0]:
https://www.postgresql.org/message-id/flat/4580ff7b-d610-eaeb-e06f-4d686896b93b%402ndquadrant.com
[1]: https://commitfest.postgresql.org/29/2096/
[2]: https://commitfest.postgresql.org/31/2812/
[3]: 
https://www.postgresql.org/message-id/4f733cca-5e07-e167-8b38-05b5c9066d04%402ndQuadrant.com

-- 
Peter Eisentraut
2ndQuadrant, an EDB company
https://www.2ndquadrant.com/

Вложения

Re: dynamic result sets support in extended query protocol

От
Peter Eisentraut
Дата:
Here is an updated patch with some merge conflicts resolved, to keep it 
fresh.  It's still pending in the commit fest from last time.

My focus right now is to work on the "psql - add SHOW_ALL_RESULTS 
option" patch (https://commitfest.postgresql.org/33/2096/) first, which 
is pretty much a prerequisite to this one.  The attached patch set 
contains a minimal variant of that patch in 0001 and 0002, just to get 
this working, but disregard those for the purposes of code review.

The 0003 patch contains comprehensive documentation and test changes 
that can explain the feature in its current form.

Вложения

Re: dynamic result sets support in extended query protocol

От
vignesh C
Дата:
On Tue, Jun 29, 2021 at 7:10 PM Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:
>
> Here is an updated patch with some merge conflicts resolved, to keep it
> fresh.  It's still pending in the commit fest from last time.
>
> My focus right now is to work on the "psql - add SHOW_ALL_RESULTS
> option" patch (https://commitfest.postgresql.org/33/2096/) first, which
> is pretty much a prerequisite to this one.  The attached patch set
> contains a minimal variant of that patch in 0001 and 0002, just to get
> this working, but disregard those for the purposes of code review.
>
> The 0003 patch contains comprehensive documentation and test changes
> that can explain the feature in its current form.

One of the patch v3-0003-Dynamic-result-sets-from-procedures.patch
does not apply on HEAD, please post an updated patch for it:
Hunk #1 FAILED at 57.
1 out of 1 hunk FAILED -- saving rejects to file
src/include/commands/defrem.h.rej

Regards,
Vignesh



Re: dynamic result sets support in extended query protocol

От
Peter Eisentraut
Дата:
rebased patch set

On 22.07.21 08:06, vignesh C wrote:
> On Tue, Jun 29, 2021 at 7:10 PM Peter Eisentraut
> <peter.eisentraut@enterprisedb.com> wrote:
>>
>> Here is an updated patch with some merge conflicts resolved, to keep it
>> fresh.  It's still pending in the commit fest from last time.
>>
>> My focus right now is to work on the "psql - add SHOW_ALL_RESULTS
>> option" patch (https://commitfest.postgresql.org/33/2096/) first, which
>> is pretty much a prerequisite to this one.  The attached patch set
>> contains a minimal variant of that patch in 0001 and 0002, just to get
>> this working, but disregard those for the purposes of code review.
>>
>> The 0003 patch contains comprehensive documentation and test changes
>> that can explain the feature in its current form.
> 
> One of the patch v3-0003-Dynamic-result-sets-from-procedures.patch
> does not apply on HEAD, please post an updated patch for it:
> Hunk #1 FAILED at 57.
> 1 out of 1 hunk FAILED -- saving rejects to file
> src/include/commands/defrem.h.rej
> 
> Regards,
> Vignesh
> 
> 


Вложения

Re: dynamic result sets support in extended query protocol

От
Zhihong Yu
Дата:


On Mon, Aug 30, 2021 at 1:23 PM Peter Eisentraut <peter.eisentraut@enterprisedb.com> wrote:
rebased patch set

On 22.07.21 08:06, vignesh C wrote:
> On Tue, Jun 29, 2021 at 7:10 PM Peter Eisentraut
> <peter.eisentraut@enterprisedb.com> wrote:
>>
>> Here is an updated patch with some merge conflicts resolved, to keep it
>> fresh.  It's still pending in the commit fest from last time.
>>
>> My focus right now is to work on the "psql - add SHOW_ALL_RESULTS
>> option" patch (https://commitfest.postgresql.org/33/2096/) first, which
>> is pretty much a prerequisite to this one.  The attached patch set
>> contains a minimal variant of that patch in 0001 and 0002, just to get
>> this working, but disregard those for the purposes of code review.
>>
>> The 0003 patch contains comprehensive documentation and test changes
>> that can explain the feature in its current form.
>
> One of the patch v3-0003-Dynamic-result-sets-from-procedures.patch
> does not apply on HEAD, please post an updated patch for it:
> Hunk #1 FAILED at 57.
> 1 out of 1 hunk FAILED -- saving rejects to file
> src/include/commands/defrem.h.rej
>
> Regards,
> Vignesh
>
>

Hi,

+    <term><literal>WITH RETURN</literal></term>
+    <term><literal>WITHOUT RETURN</literal></term>
+    <listitem>
+     <para>
+      This option is only valid for cursors defined inside a procedure.

Since there are two options listed, I think using 'These options are' would be better. 

For CurrentProcedure(),

+       return InvalidOid;
+   else
+       return llast_oid(procedure_stack);

The word 'else' can be omitted.

Cheers

Re: dynamic result sets support in extended query protocol

От
Julien Rouhaud
Дата:
Hi,

On Mon, Aug 30, 2021 at 02:11:34PM -0700, Zhihong Yu wrote:
> On Mon, Aug 30, 2021 at 1:23 PM Peter Eisentraut <
> peter.eisentraut@enterprisedb.com> wrote:
> 
> > rebased patch set
> 
> +    <term><literal>WITH RETURN</literal></term>
> +    <term><literal>WITHOUT RETURN</literal></term>
> +    <listitem>
> +     <para>
> +      This option is only valid for cursors defined inside a procedure.
> 
> Since there are two options listed, I think using 'These options are' would
> be better.
> 
> For CurrentProcedure(),
> 
> +       return InvalidOid;
> +   else
> +       return llast_oid(procedure_stack);
> 
> The word 'else' can be omitted.

The cfbot reports that the patch doesn't apply anymore:
http://cfbot.cputube.org/patch_36_2911.log.

Since you mentioned that this patch depends on the SHOW_ALL_RESULTS psql patch
which is still being worked on, I'm not expecting much activity here until the
prerequirements are done.  It also seems better to mark this patch as Waiting
on Author as further reviews are probably not really needed for now.



Re: dynamic result sets support in extended query protocol

От
Peter Eisentraut
Дата:
On 12.01.22 11:20, Julien Rouhaud wrote:
> Since you mentioned that this patch depends on the SHOW_ALL_RESULTS psql patch
> which is still being worked on, I'm not expecting much activity here until the
> prerequirements are done.  It also seems better to mark this patch as Waiting
> on Author as further reviews are probably not really needed for now.

Well, a review on the general architecture and approach would have been 
useful.  But I understand that without the psql work, it's difficult for 
a reviewer to even get started on this patch.  It's also similarly 
difficult for me to keep updating it.  So I'll set it to Returned with 
feedback for now and take it off the table.  I want to get back to it 
when the prerequisites are more settled.



Re: dynamic result sets support in extended query protocol

От
Peter Eisentraut
Дата:
On 01.02.22 15:40, Peter Eisentraut wrote:
> On 12.01.22 11:20, Julien Rouhaud wrote:
>> Since you mentioned that this patch depends on the SHOW_ALL_RESULTS 
>> psql patch
>> which is still being worked on, I'm not expecting much activity here 
>> until the
>> prerequirements are done.  It also seems better to mark this patch as 
>> Waiting
>> on Author as further reviews are probably not really needed for now.
> 
> Well, a review on the general architecture and approach would have been 
> useful.  But I understand that without the psql work, it's difficult for 
> a reviewer to even get started on this patch.  It's also similarly 
> difficult for me to keep updating it.  So I'll set it to Returned with 
> feedback for now and take it off the table.  I want to get back to it 
> when the prerequisites are more settled.

Now that the psql support for multiple result sets exists, I want to 
revive this patch.  It's the same as the last posted version, except now 
it doesn't require any psql changes or any weird test modifications anymore.

(Old news: This patch allows declaring a cursor WITH RETURN in a 
procedure to make the cursor's data be returned as a result of the CALL 
invocation.  The procedure needs to be declared with the DYNAMIC RESULT 
SETS attribute.)

Вложения

Re: dynamic result sets support in extended query protocol

От
Pavel Stehule
Дата:
Hi


pá 14. 10. 2022 v 9:12 odesílatel Peter Eisentraut <peter.eisentraut@enterprisedb.com> napsal:
On 01.02.22 15:40, Peter Eisentraut wrote:
> On 12.01.22 11:20, Julien Rouhaud wrote:
>> Since you mentioned that this patch depends on the SHOW_ALL_RESULTS
>> psql patch
>> which is still being worked on, I'm not expecting much activity here
>> until the
>> prerequirements are done.  It also seems better to mark this patch as
>> Waiting
>> on Author as further reviews are probably not really needed for now.
>
> Well, a review on the general architecture and approach would have been
> useful.  But I understand that without the psql work, it's difficult for
> a reviewer to even get started on this patch.  It's also similarly
> difficult for me to keep updating it.  So I'll set it to Returned with
> feedback for now and take it off the table.  I want to get back to it
> when the prerequisites are more settled.

Now that the psql support for multiple result sets exists, I want to
revive this patch.  It's the same as the last posted version, except now
it doesn't require any psql changes or any weird test modifications anymore.

(Old news: This patch allows declaring a cursor WITH RETURN in a
procedure to make the cursor's data be returned as a result of the CALL
invocation.  The procedure needs to be declared with the DYNAMIC RESULT
SETS attribute.)

I did a quick test of this patch, and it is working pretty well.

I have two ideas.

1. there can be possibility to set "dynamic result sets" to unknown. The behaviour of the "dynamic result sets" option is a little bit confusing. I expect the number of result sets should be exactly the same as this number. But the warning is raised only when this number is acrossed. For this implementation the correct name should be like "max dynamic result sets" or some like this. At this moment, I see this feature "dynamic result sets" more confusing, and because the effect is just a warning, then I don't see a strong benefit. I can see some benefit if I can declare so CALL will be without dynamic result sets, or with exact number of dynamic result sets or with unknown number of dynamic result sets. And if the result is not expected, then an exception should be raised (not warning).

2. Unfortunately, it doesn't work nicely with pagers. It starts a pager for one result, and waits for the end, and starts pager for the second result, and waits for the end. There is not a possibility to see all results at one time. The current behavior is correct, but I don't think it is user friendly. I think I can teach pspg to support multiple documents. But I need a more robust protocol and some separators - minimally an empty line (but some ascii control char can be safer). As second step we can introduce new psql option like PSQL_MULTI_PAGER, that can be used when possible result sets is higher than 1

Regards

Pavel

Re: dynamic result sets support in extended query protocol

От
Peter Eisentraut
Дата:
On 14.10.22 19:22, Pavel Stehule wrote:
> 1. there can be possibility to set "dynamic result sets" to unknown. The 
> behaviour of the "dynamic result sets" option is a little bit confusing. 
> I expect the number of result sets should be exactly the same as this 
> number. But the warning is raised only when this number is acrossed. For 
> this implementation the correct name should be like "max dynamic result 
> sets" or some like this. At this moment, I see this feature "dynamic 
> result sets" more confusing, and because the effect is just a warning, 
> then I don't see a strong benefit. I can see some benefit if I can 
> declare so CALL will be without dynamic result sets, or with exact 
> number of dynamic result sets or with unknown number of dynamic result 
> sets. And if the result is not expected, then an exception should be 
> raised (not warning).

All of this is specified by the SQL standard.  (What I mean by that is 
that if we want to deviate from that, we should have strong reasons 
beyond "it seems a bit odd".)

> 2. Unfortunately, it doesn't work nicely with pagers. It starts a pager 
> for one result, and waits for the end, and starts pager for the second 
> result, and waits for the end. There is not a possibility to see all 
> results at one time. The current behavior is correct, but I don't think 
> it is user friendly. I think I can teach pspg to support multiple 
> documents. But I need a more robust protocol and some separators - 
> minimally an empty line (but some ascii control char can be safer). As 
> second step we can introduce new psql option like PSQL_MULTI_PAGER, that 
> can be used when possible result sets is higher than 1

I think that is unrelated to this patch.  Multiple result sets already 
exist and libpq and psql handle them.  This patch introduces another way 
in which multiple result sets can be produced on the server, but it 
doesn't touch the client side.  So your concerns should be added either 
as a new feature or possibly as a bug against existing psql functionality.



Re: dynamic result sets support in extended query protocol

От
Peter Eisentraut
Дата:
On 14.10.22 09:11, Peter Eisentraut wrote:
> Now that the psql support for multiple result sets exists, I want to 
> revive this patch.  It's the same as the last posted version, except now 
> it doesn't require any psql changes or any weird test modifications 
> anymore.
> 
> (Old news: This patch allows declaring a cursor WITH RETURN in a 
> procedure to make the cursor's data be returned as a result of the CALL 
> invocation.  The procedure needs to be declared with the DYNAMIC RESULT 
> SETS attribute.)

I added tests using the new psql \bind command to test this 
functionality in the extended query protocol, which showed that this got 
broken since I first wrote this patch.  This "blame" is on the pipeline 
mode in libpq patch (acb7e4eb6b1c614c68a62fb3a6a5bba1af0a2659).  I need 
to spend more time on this and figure out how to repair it.  In the 
meantime, here is an updated patch set with the current status.

Вложения

Re: dynamic result sets support in extended query protocol

От
Alvaro Herrera
Дата:
On 2022-Nov-22, Peter Eisentraut wrote:

> I added tests using the new psql \bind command to test this functionality in
> the extended query protocol, which showed that this got broken since I first
> wrote this patch.  This "blame" is on the pipeline mode in libpq patch
> (acb7e4eb6b1c614c68a62fb3a6a5bba1af0a2659).  I need to spend more time on
> this and figure out how to repair it.  In the meantime, here is an updated
> patch set with the current status.

I looked at this a little bit to understand why it fails with \bind.  As
you say, it does interact badly with pipeline mode -- more precisely, it
collides with the queue handling that was added for pipeline.  The
problem is that in extended query mode, we "advance" the queue in
PQgetResult when asyncStatus is READY -- fe-exec.c line 2110 ff.  But
the protocol relies on returning READY when the second RowDescriptor
message is received (fe-protocol3.c line 319), so libpq gets confused
and everything blows up.  libpq needs the queue to stay put until all
the results from that query have been consumed.

If you comment out the pqCommandQueueAdvance() in fe-exec.c line 2124,
your example works correctly and no longer throws a libpq error (but of
course, other things break).

I suppose that in order for this to work, we would have to find another
way to "advance" the queue that doesn't rely on the status being
PGASYNC_READY.

-- 
Álvaro Herrera        Breisgau, Deutschland  —  https://www.EnterpriseDB.com/



Re: dynamic result sets support in extended query protocol

От
Alvaro Herrera
Дата:
On 2022-Dec-21, Alvaro Herrera wrote:

> I suppose that in order for this to work, we would have to find another
> way to "advance" the queue that doesn't rely on the status being
> PGASYNC_READY.

I think the way to make this work is to increase the coupling between
fe-exec.c and fe-protocol.c by making the queue advance occur when
CommandComplete is received.  This is likely more correct protocol-wise
than what we're doing now: we would consider the command as done when
the server tells us it is done, rather than relying on internal libpq
state.

-- 
Álvaro Herrera        Breisgau, Deutschland  —  https://www.EnterpriseDB.com/



Re: dynamic result sets support in extended query protocol

От
Alvaro Herrera
Дата:
On 2022-Nov-22, Peter Eisentraut wrote:

> I added tests using the new psql \bind command to test this functionality in
> the extended query protocol, which showed that this got broken since I first
> wrote this patch.  This "blame" is on the pipeline mode in libpq patch
> (acb7e4eb6b1c614c68a62fb3a6a5bba1af0a2659).  I need to spend more time on
> this and figure out how to repair it.

Applying this patch, your test queries seem to work correctly.

This is quite WIP, especially because there's a couple of scenarios
uncovered by tests that I'd like to ensure correctness about, but if you
would like to continue adding tests for extended query and dynamic
result sets, it may be helpful.

-- 
Álvaro Herrera        Breisgau, Deutschland  —  https://www.EnterpriseDB.com/
"How strange it is to find the words "Perl" and "saner" in such close
proximity, with no apparent sense of irony. I doubt that Larry himself
could have managed it."         (ncm, http://lwn.net/Articles/174769/)

Вложения

Re: dynamic result sets support in extended query protocol

От
Peter Eisentraut
Дата:
On 30.01.23 14:06, Alvaro Herrera wrote:
> On 2022-Nov-22, Peter Eisentraut wrote:
> 
>> I added tests using the new psql \bind command to test this functionality in
>> the extended query protocol, which showed that this got broken since I first
>> wrote this patch.  This "blame" is on the pipeline mode in libpq patch
>> (acb7e4eb6b1c614c68a62fb3a6a5bba1af0a2659).  I need to spend more time on
>> this and figure out how to repair it.
> 
> Applying this patch, your test queries seem to work correctly.

Great!

> This is quite WIP, especially because there's a couple of scenarios
> uncovered by tests that I'd like to ensure correctness about, but if you
> would like to continue adding tests for extended query and dynamic
> result sets, it may be helpful.

I should note that it is debatable whether my patch extends the extended 
query protocol or just uses it within its existing spec but in new ways. 
  It just happened to work in old libpq versions without any changes. 
So you should keep that in mind as you refine your patch, since the way 
the protocol has been extended/creatively-used is still subject to review.




Re: dynamic result sets support in extended query protocol

От
Peter Eisentraut
Дата:
On 31.01.23 12:07, Peter Eisentraut wrote:
>> Applying this patch, your test queries seem to work correctly.
> 
> Great!
> 
>> This is quite WIP, especially because there's a couple of scenarios
>> uncovered by tests that I'd like to ensure correctness about, but if you
>> would like to continue adding tests for extended query and dynamic
>> result sets, it may be helpful.
> 
> I should note that it is debatable whether my patch extends the extended 
> query protocol or just uses it within its existing spec but in new ways. 
>   It just happened to work in old libpq versions without any changes. So 
> you should keep that in mind as you refine your patch, since the way the 
> protocol has been extended/creatively-used is still subject to review.

After some consideration, I have an idea how to proceed with this.  I 
have split my original patch into two incremental patches.  The first 
patch implements the original feature, but just for the simple query 
protocol.  (The simple query protocol already supports multiple result 
sets.)  Attempting to return dynamic result sets using the extended 
query protocol will result in an error.  The second patch then adds the 
extended query protocol support back in, but it still has the issues 
with libpq that we are discussing.

I think this way we could have a chance to get the first part into PG16 
or early into PG17, and then the second part can be worked on with less 
stress.  This would also allow us to consider a minor protocol version 
bump, and the handling of binary format for dynamic result sets (like 
https://commitfest.postgresql.org/42/3777/), and maybe some other issues.

The attached patches are the same as before, rebased over master and 
split up as described.  I haven't done any significant work on the 
contents, but I will try to get the 0001 patch into a more polished 
state soon.
Вложения

Re: dynamic result sets support in extended query protocol

От
Peter Eisentraut
Дата:
On 20.02.23 13:58, Peter Eisentraut wrote:
> The attached patches are the same as before, rebased over master and 
> split up as described.  I haven't done any significant work on the 
> contents, but I will try to get the 0001 patch into a more polished 
> state soon.

I've done a bit of work on this patch, mainly cleaned up and expanded 
the tests, and also added DO support, which is something that had been 
requested (meaning you can return result sets from DO with this 
facility).  Here is a new version.

Вложения