Обсуждение: Query preparation
Hi, i was wondering if there is any reason why the query preparation is done when the query is executed instead of when the preparedStatement is created. Has any one looked into moving it? Thanks
John Lister wrote: > Hi, i was wondering if there is any reason why the query preparation is > done when the query is executed instead of when the preparedStatement is > created. It would be an extra network round-trip on every query, and there's no requirement to prepare it early. > Has any one looked into moving it? Why would you want to move it? -O
Oliver Jowett wrote > John Lister wrote: > >> Hi, i was wondering if there is any reason why the query preparation is >> done when the query is executed instead of when the preparedStatement is >> created. >> > > It would be an extra network round-trip on every query, and there's no > requirement to prepare it early. > > I'm not sure why it would be there would be an extra round trip? The parse is done anyway, the only extra round trip would be if you did a describe (which admittedly i'm looking into optionally doing for binary transfers) or if you didn't use the preparedStatement to do any queries which is i'd imagine is rare. Maybe i've missed something? >> Has any one looked into moving it? >> > > Why would you want to move it? > > To me it would be cleaner, it looks like the current driver was written prior to v3 of the protocol where only simple queries are used and everything is transmitted as text. The extended protocol stuff has been bolted subsequently to the execute part of the driver rather than doing it when the query is prepared... btw, i have nothing but respect for the writers after doing some work on the another driver some years ago. Also, in fairness i've done a lot of work with the oracle stuff vwhich create portals (cursors) and prepare the statement before the execution and this makes more sense to me. Also, with regard to extra network traffic and latency i'm not suggesting changing the way normal statements are used, only PreparedStatements. This to me fits in more with the way the JDBC spec is worded and i'd guess the more usual use case of statements - Statement for quick and dirty one off queries, PreparedStatement for more complicated/multiple use queries. JOHN
Oliver Jowett wrote: > John Lister wrote: > >> Hi, i was wondering if there is any reason why the query preparation is >> done when the query is executed instead of when the preparedStatement is >> created. >> > > It would be an extra network round-trip on every query, and there's no > requirement to prepare it early. > Just going through the code again and please correct me if i'm wrong, but at the moment there is unnecessary network traffic/latency. Assuming a PreparedStatement, every time execute is called, sendParse and sendDescribePortal are called (the former checks if a server side prepared statement has been created and aborts if so). Admittedly sendParse is only called as many times as setPrepareThreshold is set to (which admittedly may be as low as 1) Apart from the use of unnamed statements/portals within a multi threaded environment which complicates things with parsing, i would have thought you could get away with calling describe(portal) just once. I appreciate that it would need a lot of refactoring to move things around, but are there any objections? Also, i don't know how many server side prepared statements/portals postgresql supports (hopefully memory limited) but would it also not make more sense to make all preparedStatements actually be server side preparedStatements. I can't see what overhead this would add apart from named statements/portals I'd be happy to look into this if there was support... Thanks
John Lister <john.lister-ps@kickstone.com> writes:
> Oliver Jowett wrote:
>> It would be an extra network round-trip on every query, and there's no
>> requirement to prepare it early.
>>
> Just going through the code again and please correct me if i'm wrong,
> but at the moment there is unnecessary network traffic/latency. Assuming
> a PreparedStatement, every time execute is called, sendParse and
> sendDescribePortal are called (the former checks if a server side
> prepared statement has been created and aborts if so). Admittedly
> sendParse is only called as many times as setPrepareThreshold is set to
> (which admittedly may be as low as 1)
I think you're missing the cue that those are "send" operations.  They
don't wait for a response to come back.  In fact, assuming that the JDBC
driver is implemented the way I think, the actual data for both messages
is expected to go out as a single network packet after the driver
reaches the point of expecting a response.
            regards, tom lane
			
		Tom Lane wrote: > John Lister <john.lister-ps@kickstone.com> writes: > >> Oliver Jowett wrote: >> >>> It would be an extra network round-trip on every query, and there's no >>> requirement to prepare it early. >>> >>> >> Just going through the code again and please correct me if i'm wrong, >> but at the moment there is unnecessary network traffic/latency. Assuming >> a PreparedStatement, every time execute is called, sendParse and >> sendDescribePortal are called (the former checks if a server side >> prepared statement has been created and aborts if so). Admittedly >> sendParse is only called as many times as setPrepareThreshold is set to >> (which admittedly may be as low as 1) >> > > I think you're missing the cue that those are "send" operations. They > don't wait for a response to come back. In fact, assuming that the JDBC > driver is implemented the way I think, the actual data for both messages > is expected to go out as a single network packet after the driver > reaches the point of expecting a response. > I appreciate that the send operations (and responses) may be grouped into a single packet, but i was thinking of the scenario (which i would have thought is fairly common) where you prepare the statement and execute it multiple times with different parameters... This will result in the parse and describe(portal) commands being sent twice as is (unless i set the prepare threshold to be 1). Admittedly, this will probably not result in any extra packets as the total length of all commands is likely to be less than the packet length (the query size is likely to be the limiting factor here), but this causes extra load on both the server and client - although maybe not enough to worry about. I was thinking that you could parse/describe the query just once. I also accept that (for the binary transfer stuff) describing the statement first would require an additional network packet or 2 as you'd need to read the results before you could work out the bind.. JOHN
John Lister wrote: > Oliver Jowett wrote >> John Lister wrote: >> >>> Hi, i was wondering if there is any reason why the query preparation is >>> done when the query is executed instead of when the preparedStatement is >>> created. >>> >> >> It would be an extra network round-trip on every query, and there's no >> requirement to prepare it early. >> >> > I'm not sure why it would be there would be an extra round trip? The > parse is done anyway, the only extra round trip would be if you did a > describe (which admittedly i'm looking into optionally doing for binary > transfers) or if you didn't use the preparedStatement to do any queries > which is i'd imagine is rare. > > Maybe i've missed something? You presumably want to wait for the result of Parse before returning from prepareStatement() if you are doing preparation at that point. Otherwise I don't quite see the point? And that's where the extra round-trip is. There are also another couple of hurdles to doing preparation earlier: 1) use of the unnamed statement would become impossible 2) at the time of prepareStatement, you don't have parameter type information, which is needed by Parse. (2) seems to be a showstopper to me. >> Why would you want to move it? >> > To me it would be cleaner I don't think it would be much cleaner, if at all. Are there any other reasons for moving it, other than implementation reasons? > Also, with regard to extra network traffic and latency i'm not suggesting changing the way normal statements are used,only PreparedStatements. This to me fits in more with the way the JDBC spec is worded and i'd guess the more usual usecase of statements - Statement for quick and dirty one off queries, PreparedStatement for more complicated/multiple usequeries. A common use of PreparedStatements is for safe parameter interpolation. I very rarely use plain statements in my own code for this reason, even for one-shot queries. No point in writing yet another SQL string escaper when the driver already deals with it .. -O
John Lister wrote: > Oliver Jowett wrote: >> John Lister wrote: >> >>> Hi, i was wondering if there is any reason why the query preparation is >>> done when the query is executed instead of when the preparedStatement is >>> created. >>> >> >> It would be an extra network round-trip on every query, and there's no >> requirement to prepare it early. >> > Just going through the code again and please correct me if i'm wrong, > but at the moment there is unnecessary network traffic/latency. Assuming > a PreparedStatement, every time execute is called, sendParse and > sendDescribePortal are called (the former checks if a server side > prepared statement has been created and aborts if so). Admittedly > sendParse is only called as many times as setPrepareThreshold is set to > (which admittedly may be as low as 1) Recreating the unnamed statement each time is exactly the point! Use of an unnamed statement forces replanning (at Bind) using the actual parameter values available at the time of execution, which can substantially change the performance of the query plan (because when the actual parameter values are known, the query planner can use gathered column statistics to provide better cost estimates). Using a named statement in all cases is *not* desirable. Parameter types may change between executions which can completely change the meaning of the query, so yes, it is necessary to re-Describe each time. There's no additional latency here (beyond latency induced by the extra network traffic itself, which should be very minor) because the driver doesn't wait for a response from the Parse/Bind/Describe before doing the rest of the query. > I appreciate that it would need a lot of refactoring to move things around, but are there any objections? Moving to use named statements exclusively is a Bad Idea. You will kill the performance of many simple queries. See above. -O
John Lister wrote: > The extended protocol stuff has been > bolted subsequently to the execute part of the driver rather than doing > it when the query is prepared... That was indeed what happened (slightly more than "bolted on", there was a rewrite of the network layer!), but remember that the driver actually supports both v2 and v3 protocols. There are fewer cases that need v2 support these days, but I'm sure there would be some unhappy people if the driver suddenly dropped it altogether. So keep in mind that any architectural changes will still need to support v2 as well at this stage. -O
John Lister wrote: > I appreciate that the send operations (and responses) may be grouped > into a single packet, but i was thinking of the scenario (which i would > have thought is fairly common) where you prepare the statement and > execute it multiple times with different parameters... > > This will result in the parse and describe(portal) commands being sent > twice as is (unless i set the prepare threshold to be 1). Admittedly, > this will probably not result in any extra packets as the total length > of all commands is likely to be less than the packet length (the query > size is likely to be the limiting factor here), but this causes extra > load on both the server and client - although maybe not enough to worry > about. > > I was thinking that you could parse/describe the query just once. Incidentally, a customer of ours bumped into the overhead of the describe portal step just yesterday. It doesn't cause any extra network roundtrips, and is usually pretty cheap, but in this case they were selecting a single row from a table with 145 columns thousands of times in a loop. Furthermore, I believe most of the columns were NULL. They're using a PreparedStatement, but we still send the Describe portal message on every execution. A quick & dirty patch to cache the RowDescription information cut over 10% off the total runtime of the application. Does anyone see a problem with caching the result set descriptor (RowDescription)? AFAICS it should never change after a statement is prepared. If not, I'll polish up and submit the patch. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Heikki Linnakangas wrote: > Does anyone see a problem with caching the result set descriptor > (RowDescription)? AFAICS it should never change after a statement is > prepared. If not, I'll polish up and submit the patch. If you tie it to the existing named statement mechanisms I think that works. We invalidate the named statement anyway if the parameter types change, and you want to invalidate any cached row description at the same time. Schema changes could bite you, but I think they bite named statements in other ways anyway. (Not sure how far the server-side efforts to do plan invalidation progressed) -O
Oliver Jowett <oliver@opencloud.com> writes:
> Heikki Linnakangas wrote:
>> Does anyone see a problem with caching the result set descriptor
>> (RowDescription)? AFAICS it should never change after a statement is
>> prepared. If not, I'll polish up and submit the patch.
> If you tie it to the existing named statement mechanisms I think that
> works. We invalidate the named statement anyway if the parameter types
> change, and you want to invalidate any cached row description at the
> same time. Schema changes could bite you, but I think they bite named
> statements in other ways anyway. (Not sure how far the server-side
> efforts to do plan invalidation progressed)
It should be safe.  The server is set up to throw an error if you try to
re-use a prepared statement whose output tuple descriptor would have
changed due to schema changes.  So there's a backstop if the driver's
invalidation logic misses anything.
            regards, tom lane
			
		Tom Lane wrote: > Oliver Jowett <oliver@opencloud.com> writes: >> Heikki Linnakangas wrote: >>> Does anyone see a problem with caching the result set descriptor >>> (RowDescription)? AFAICS it should never change after a statement is >>> prepared. If not, I'll polish up and submit the patch. > >> If you tie it to the existing named statement mechanisms I think that >> works. We invalidate the named statement anyway if the parameter types >> change, and you want to invalidate any cached row description at the >> same time. Schema changes could bite you, but I think they bite named >> statements in other ways anyway. (Not sure how far the server-side >> efforts to do plan invalidation progressed) > > It should be safe. The server is set up to throw an error if you try to > re-use a prepared statement whose output tuple descriptor would have > changed due to schema changes. So there's a backstop if the driver's > invalidation logic misses anything. Here's the patch. Describe message is only sent in the first invocation of a query. The Field[] array constructed from the RowDescription message is saved in the SimpleQuery object for reuse. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com Index: org/postgresql/core/v3/QueryExecutorImpl.java =================================================================== RCS file: /cvsroot/jdbc/pgjdbc/org/postgresql/core/v3/QueryExecutorImpl.java,v retrieving revision 1.43 diff -u -r1.43 QueryExecutorImpl.java --- org/postgresql/core/v3/QueryExecutorImpl.java 15 Nov 2008 17:48:52 -0000 1.43 +++ org/postgresql/core/v3/QueryExecutorImpl.java 16 Apr 2009 10:43:03 -0000 @@ -894,7 +894,7 @@ } } - private void sendDescribePortal(Portal portal) throws IOException { + private void sendDescribePortal(SimpleQuery query, Portal portal) throws IOException { // // Send Describe. // @@ -915,6 +915,9 @@ if (encodedPortalName != null) pgStream.Send(encodedPortalName); // portal name to close pgStream.SendChar(0); // end of portal name + + pendingDescribePortalQueue.add(query); + query.setDescribed(true); } private void sendDescribeStatement(SimpleQuery query, SimpleParameterList params, boolean describeOnly) throws IOException{ @@ -938,9 +941,10 @@ pgStream.SendChar(0); // end message pendingDescribeStatementQueue.add(new Object[]{query, params, new Boolean(describeOnly), query.getStatementName()}); + query.setDescribed(true); } - private void sendExecute(Query query, Portal portal, int limit) throws IOException { + private void sendExecute(SimpleQuery query, Portal portal, int limit) throws IOException { // // Send Execute. // @@ -1071,8 +1075,8 @@ // A statement describe will also output a RowDescription, // so don't reissue it here if we've already done so. // - if (!noMeta && !describeStatement) - sendDescribePortal(portal); + if (!noMeta && !describeStatement && !query.isDescribed()) + sendDescribePortal(query, portal); sendExecute(query, portal, rows); } @@ -1159,7 +1163,6 @@ boolean noResults = (flags & QueryExecutor.QUERY_NO_RESULTS) != 0; boolean bothRowsAndStatus = (flags & QueryExecutor.QUERY_BOTH_ROWS_AND_STATUS) != 0; - Field[] fields = null; Vector tuples = null; int len; @@ -1175,6 +1178,7 @@ int parseIndex = 0; int describeIndex = 0; + int describePortalIndex = 0; int bindIndex = 0; int executeIndex = 0; @@ -1260,15 +1264,19 @@ if (doneAfterRowDescNoData) { Object describeData[] = (Object[])pendingDescribeStatementQueue.get(describeIndex++); - Query currentQuery = (Query)describeData[0]; + SimpleQuery currentQuery = (SimpleQuery)describeData[0]; - if (fields != null || tuples != null) + Field[] fields = currentQuery.getFields(); + + if (fields != null) { // There was a resultset. + tuples = new Vector(); handler.handleResultRows(currentQuery, fields, tuples, null); - fields = null; tuples = null; } } + else + describePortalIndex++; break; case 's': // Portal Suspended (end of Execute) @@ -1281,12 +1289,16 @@ { Object[] executeData = (Object[])pendingExecuteQueue.get(executeIndex++); - Query currentQuery = (Query)executeData[0]; + SimpleQuery currentQuery = (SimpleQuery)executeData[0]; Portal currentPortal = (Portal)executeData[1]; + + Field[] fields = currentQuery.getFields(); + if (fields != null && !noResults && tuples == null) + tuples = new Vector(); + handler.handleResultRows(currentQuery, fields, tuples, currentPortal); } - fields = null; tuples = null; break; @@ -1298,13 +1310,16 @@ { Object[] executeData = (Object[])pendingExecuteQueue.get(executeIndex++); - Query currentQuery = (Query)executeData[0]; + SimpleQuery currentQuery = (SimpleQuery)executeData[0]; Portal currentPortal = (Portal)executeData[1]; + Field[] fields = currentQuery.getFields(); + if (fields != null && !noResults && tuples == null) + tuples = new Vector(); + if (fields != null || tuples != null) { // There was a resultset. handler.handleResultRows(currentQuery, fields, tuples, null); - fields = null; tuples = null; if (bothRowsAndStatus) @@ -1411,18 +1426,20 @@ break; case 'T': // Row Description (response to Describe) - fields = receiveFields(); + Field[] fields = receiveFields(); tuples = new Vector(); + if (doneAfterRowDescNoData) { Object describeData[] = (Object[])pendingDescribeStatementQueue.get(describeIndex++); Query currentQuery = (Query)describeData[0]; - if (fields != null || tuples != null) - { // There was a resultset. - handler.handleResultRows(currentQuery, fields, tuples, null); - fields = null; - tuples = null; - } + handler.handleResultRows(currentQuery, fields, tuples, null); + tuples = null; + } + else + { + SimpleQuery query = (SimpleQuery)pendingDescribePortalQueue.get(describePortalIndex++); + query.setFields(fields); } break; @@ -1440,6 +1457,7 @@ pendingParseQueue.clear(); // No more ParseComplete messages expected. pendingDescribeStatementQueue.clear(); // No more ParameterDescription messages expected. + pendingDescribePortalQueue.clear(); // No more RowDescription messages expected. pendingBindQueue.clear(); // No more BindComplete messages expected. pendingExecuteQueue.clear(); // No more query executions expected. break; @@ -1688,6 +1706,7 @@ private final ArrayList pendingBindQueue = new ArrayList(); // list of Portal instances private final ArrayList pendingExecuteQueue = new ArrayList(); // list of {SimpleQuery,Portal} object arrays private final ArrayList pendingDescribeStatementQueue = new ArrayList(); // list of {SimpleQuery, SimpleParameterList,Boolean} object arrays + private final ArrayList pendingDescribePortalQueue = new ArrayList(); // list of {SimpleQuery, SimpleParameterList,Boolean} object arrays private long nextUniqueID = 1; private final ProtocolConnectionImpl protoConnection; Index: org/postgresql/core/v3/SimpleQuery.java =================================================================== RCS file: /cvsroot/jdbc/pgjdbc/org/postgresql/core/v3/SimpleQuery.java,v retrieving revision 1.13 diff -u -r1.13 SimpleQuery.java --- org/postgresql/core/v3/SimpleQuery.java 30 Sep 2008 23:41:23 -0000 1.13 +++ org/postgresql/core/v3/SimpleQuery.java 16 Apr 2009 10:43:03 -0000 @@ -104,6 +104,20 @@ return encodedStatementName; } + void setFields(Field[] fields) { + this.fields = fields; + } + Field[] getFields() { + return fields; + } + // Have we sent a Describe (portal or statement) message for this query yet? + boolean isDescribed() { + return described; + } + void setDescribed(boolean described) { + this.described = described; + } + void setCleanupRef(PhantomReference cleanupRef) { if (this.cleanupRef != null) { this.cleanupRef.clear(); @@ -122,12 +136,16 @@ statementName = null; encodedStatementName = null; + fields = null; + described = false; } private final String[] fragments; private final ProtocolConnectionImpl protoConnection; private String statementName; private byte[] encodedStatementName; + private Field[] fields; + private boolean described; private PhantomReference cleanupRef; private int[] preparedTypes;
Heikki Linnakangas wrote: > > Here's the patch. Describe message is only sent in the first > invocation of a query. The Field[] array constructed from the > RowDescription message is saved in the SimpleQuery object for reuse. > Cheers, I'd already added similar code while expanding the binary support for arrays. good to see both on the same lines.. I'll try and get the binary stuff sorted soon.
On Thu, 16 Apr 2009, Heikki Linnakangas wrote:
> Here's the patch. Describe message is only sent in the first invocation
> of a query. The Field[] array constructed from the RowDescription
> message is saved in the SimpleQuery object for reuse.
>
Applied with minor revisions.  When sending a Describe Statement message
you'll also get a Row Description message back, so we need to add the
query to the pendingDescribePortalQueue there as well.  Doing this always
avoids some of the other special casing you tried to put in to handle
describe statement (which didn't work if you see the new unit test I
added).
While looking at this code it reminded me that we're also sending Describe
Statement too often and I've fixed that to only do it when necessary.
Previously the following code would describe the statement for every
execution.
PreparedStatement ps = conn.prepareStatement("SELECT ?::int");
for (int i=0; i<100; i++) {
     ps.setObject(1, new Integer(i), Types.OTHER);
     ResultSet rs = ps.executeQuery();
}
Also the JDBC coding style guidelines require four space indentation
rather than tabs, so you'll see some additional changes from that.
Kris Jurka