Обсуждение: Query preparation

Поиск
Список
Период
Сортировка

Query preparation

От
John Lister
Дата:
Hi, i was wondering if there is any reason why the query preparation is
done when the query is executed instead of when the preparedStatement is
created.

Has any one looked into moving it?

Thanks

Re: Query preparation

От
Oliver Jowett
Дата:
John Lister wrote:
> Hi, i was wondering if there is any reason why the query preparation is
> done when the query is executed instead of when the preparedStatement is
> created.

It would be an extra network round-trip on every query, and there's no
requirement to prepare it early.

> Has any one looked into moving it?

Why would you want to move it?

-O

Re: Query preparation

От
John Lister
Дата:
Oliver Jowett wrote
> John Lister wrote:
>
>> Hi, i was wondering if there is any reason why the query preparation is
>> done when the query is executed instead of when the preparedStatement is
>> created.
>>
>
> It would be an extra network round-trip on every query, and there's no
> requirement to prepare it early.
>
>
I'm not sure why it would be there would be an extra round trip? The
parse is done anyway, the only extra round trip would be if you did a
describe (which admittedly i'm looking into optionally doing for binary
transfers) or if you didn't use the preparedStatement to do any queries
which is i'd imagine is rare.

Maybe i've missed something?


>> Has any one looked into moving it?
>>
>
> Why would you want to move it?
>
>

To me it would be cleaner, it looks like the current driver was written
prior to v3 of the protocol where only simple queries are used and
everything is transmitted as text. The extended protocol stuff has been
bolted subsequently to the execute part of the driver rather than doing
it when the query is prepared... btw, i have nothing but respect for the
writers after doing some work on the another driver some years ago.

Also, in fairness i've done a lot of work with the oracle stuff vwhich
create portals (cursors) and prepare the statement before the execution
and this makes more sense to me.

Also, with regard to extra network traffic and latency i'm not
suggesting changing the way normal statements are used, only
PreparedStatements. This to me fits in more with the way the JDBC spec
is worded and i'd guess the more usual use case of statements -
Statement for quick and dirty one off queries, PreparedStatement for
more complicated/multiple use queries.

JOHN


Re: Query preparation

От
John Lister
Дата:
Oliver Jowett wrote:
> John Lister wrote:
>
>> Hi, i was wondering if there is any reason why the query preparation is
>> done when the query is executed instead of when the preparedStatement is
>> created.
>>
>
> It would be an extra network round-trip on every query, and there's no
> requirement to prepare it early.
>
Just going through the code again and please correct me if i'm wrong,
but at the moment there is unnecessary network traffic/latency. Assuming
a PreparedStatement, every time execute is called, sendParse and
sendDescribePortal are called (the former checks if a server side
prepared statement has been created and aborts if so). Admittedly
sendParse is only called as many times as setPrepareThreshold is set to
(which admittedly may be as low as 1)

Apart from the use of unnamed statements/portals within a multi threaded
environment which complicates things with parsing, i would have thought
you could get away with calling describe(portal) just once.

I appreciate that it would need a lot of refactoring to move things
around, but are there any objections?
Also, i don't know how many server side prepared statements/portals
postgresql supports (hopefully memory limited) but would it also not
make more sense to make all preparedStatements actually be server side
preparedStatements. I can't see what overhead this would add apart from
named statements/portals

I'd be happy to look into this if there was support...

Thanks

Re: Query preparation

От
Tom Lane
Дата:
John Lister <john.lister-ps@kickstone.com> writes:
> Oliver Jowett wrote:
>> It would be an extra network round-trip on every query, and there's no
>> requirement to prepare it early.
>>
> Just going through the code again and please correct me if i'm wrong,
> but at the moment there is unnecessary network traffic/latency. Assuming
> a PreparedStatement, every time execute is called, sendParse and
> sendDescribePortal are called (the former checks if a server side
> prepared statement has been created and aborts if so). Admittedly
> sendParse is only called as many times as setPrepareThreshold is set to
> (which admittedly may be as low as 1)

I think you're missing the cue that those are "send" operations.  They
don't wait for a response to come back.  In fact, assuming that the JDBC
driver is implemented the way I think, the actual data for both messages
is expected to go out as a single network packet after the driver
reaches the point of expecting a response.

            regards, tom lane

Re: Query preparation

От
John Lister
Дата:
Tom Lane wrote:
> John Lister <john.lister-ps@kickstone.com> writes:
>
>> Oliver Jowett wrote:
>>
>>> It would be an extra network round-trip on every query, and there's no
>>> requirement to prepare it early.
>>>
>>>
>> Just going through the code again and please correct me if i'm wrong,
>> but at the moment there is unnecessary network traffic/latency. Assuming
>> a PreparedStatement, every time execute is called, sendParse and
>> sendDescribePortal are called (the former checks if a server side
>> prepared statement has been created and aborts if so). Admittedly
>> sendParse is only called as many times as setPrepareThreshold is set to
>> (which admittedly may be as low as 1)
>>
>
> I think you're missing the cue that those are "send" operations.  They
> don't wait for a response to come back.  In fact, assuming that the JDBC
> driver is implemented the way I think, the actual data for both messages
> is expected to go out as a single network packet after the driver
> reaches the point of expecting a response.
>
I appreciate that the send operations (and responses) may be grouped
into a single packet, but i was thinking of the scenario (which i would
have thought is fairly common) where you prepare the statement and
execute it multiple times with different parameters...

This will result in the parse and describe(portal) commands being sent
twice as is (unless i set the prepare threshold to be 1). Admittedly,
this will probably not result in any extra packets as the total length
of all commands is likely to be less than the packet length (the query
size is likely to be the limiting factor here), but this causes extra
load on both the server and client - although maybe not enough to worry
about.

I was thinking that you could parse/describe the query just once.

I also accept that (for the binary transfer stuff) describing the
statement first would require an additional network packet or 2 as you'd
need to read the results before you could work out the bind..

JOHN




Re: Query preparation

От
Oliver Jowett
Дата:
John Lister wrote:
> Oliver Jowett wrote
>> John Lister wrote:
>>
>>> Hi, i was wondering if there is any reason why the query preparation is
>>> done when the query is executed instead of when the preparedStatement is
>>> created.
>>>
>>
>> It would be an extra network round-trip on every query, and there's no
>> requirement to prepare it early.
>>
>>
> I'm not sure why it would be there would be an extra round trip? The
> parse is done anyway, the only extra round trip would be if you did a
> describe (which admittedly i'm looking into optionally doing for binary
> transfers) or if you didn't use the preparedStatement to do any queries
> which is i'd imagine is rare.
>
> Maybe i've missed something?

You presumably want to wait for the result of Parse before returning
from prepareStatement() if you are doing preparation at that point.
Otherwise I don't quite see the point? And that's where the extra
round-trip is.

There are also another couple of hurdles to doing preparation earlier:

1) use of the unnamed statement would become impossible
2) at the time of prepareStatement, you don't have parameter type
information, which is needed by Parse.

(2) seems to be a showstopper to me.

>> Why would you want to move it?
>>
> To me it would be cleaner

I don't think it would be much cleaner, if at all. Are there any other
reasons for moving it, other than implementation reasons?

> Also, with regard to extra network traffic and latency i'm not suggesting changing the way normal statements are
used,only PreparedStatements. This to me fits in more with the way the JDBC spec is worded and i'd guess the more usual
usecase of statements - Statement for quick and dirty one off queries, PreparedStatement for more complicated/multiple
usequeries. 

A common use of PreparedStatements is for safe parameter interpolation.
I very rarely use plain statements in my own code for this reason, even
for one-shot queries. No point in writing yet another SQL string escaper
when the driver already deals with it ..

-O

Re: Query preparation

От
Oliver Jowett
Дата:
John Lister wrote:
> Oliver Jowett wrote:
>> John Lister wrote:
>>
>>> Hi, i was wondering if there is any reason why the query preparation is
>>> done when the query is executed instead of when the preparedStatement is
>>> created.
>>>
>>
>> It would be an extra network round-trip on every query, and there's no
>> requirement to prepare it early.
>>
> Just going through the code again and please correct me if i'm wrong,
> but at the moment there is unnecessary network traffic/latency. Assuming
> a PreparedStatement, every time execute is called, sendParse and
> sendDescribePortal are called (the former checks if a server side
> prepared statement has been created and aborts if so). Admittedly
> sendParse is only called as many times as setPrepareThreshold is set to
> (which admittedly may be as low as 1)

Recreating the unnamed statement each time is exactly the point! Use of
an unnamed statement forces replanning (at Bind) using the actual
parameter values available at the time of execution, which can
substantially change the performance of the query plan (because when the
actual parameter values are known, the query planner can use gathered
column statistics to provide better cost estimates). Using a named
statement in all cases is *not* desirable.

Parameter types may change between executions which can completely
change the meaning of the query, so yes, it is necessary to re-Describe
each time.

There's no additional latency here (beyond latency induced by the extra
network traffic itself, which should be very minor) because the driver
doesn't wait for a response from the Parse/Bind/Describe before doing
the rest of the query.

> I appreciate that it would need a lot of refactoring to move things around, but are there any objections?

Moving to use named statements exclusively is a Bad Idea. You will kill
the performance of many simple queries. See above.

-O

Re: Query preparation

От
Oliver Jowett
Дата:
John Lister wrote:
> The extended protocol stuff has been
> bolted subsequently to the execute part of the driver rather than doing
> it when the query is prepared...

That was indeed what happened (slightly more than "bolted on", there was
a rewrite of the network layer!), but remember that the driver actually
supports both v2 and v3 protocols. There are fewer cases that need v2
support these days, but I'm sure there would be some unhappy people if
the driver suddenly dropped it altogether. So keep in mind that any
architectural changes will still need to support v2 as well at this stage.

-O


Re: Query preparation

От
Heikki Linnakangas
Дата:
John Lister wrote:
> I appreciate that the send operations (and responses) may be grouped
> into a single packet, but i was thinking of the scenario (which i would
> have thought is fairly common) where you prepare the statement and
> execute it multiple times with different parameters...
>
> This will result in the parse and describe(portal) commands being sent
> twice as is (unless i set the prepare threshold to be 1). Admittedly,
> this will probably not result in any extra packets as the total length
> of all commands is likely to be less than the packet length (the query
> size is likely to be the limiting factor here), but this causes extra
> load on both the server and client - although maybe not enough to worry
> about.
>
> I was thinking that you could parse/describe the query just once.

Incidentally, a customer of ours bumped into the overhead of the
describe portal step just yesterday. It doesn't cause any extra network
roundtrips, and is usually pretty cheap, but in this case they were
selecting a single row from a table with 145 columns thousands of times
in a loop. Furthermore, I believe most of the columns were NULL. They're
using a PreparedStatement, but we still send the Describe portal message
on every execution. A quick & dirty patch to cache the RowDescription
information cut over 10% off the total runtime of the application.

Does anyone see a problem with caching the result set descriptor
(RowDescription)? AFAICS it should never change after a statement is
prepared. If not, I'll polish up and submit the patch.

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

Re: Query preparation

От
Oliver Jowett
Дата:
Heikki Linnakangas wrote:

> Does anyone see a problem with caching the result set descriptor
> (RowDescription)? AFAICS it should never change after a statement is
> prepared. If not, I'll polish up and submit the patch.

If you tie it to the existing named statement mechanisms I think that
works. We invalidate the named statement anyway if the parameter types
change, and you want to invalidate any cached row description at the
same time. Schema changes could bite you, but I think they bite named
statements in other ways anyway. (Not sure how far the server-side
efforts to do plan invalidation progressed)

-O

Re: Query preparation

От
Tom Lane
Дата:
Oliver Jowett <oliver@opencloud.com> writes:
> Heikki Linnakangas wrote:
>> Does anyone see a problem with caching the result set descriptor
>> (RowDescription)? AFAICS it should never change after a statement is
>> prepared. If not, I'll polish up and submit the patch.

> If you tie it to the existing named statement mechanisms I think that
> works. We invalidate the named statement anyway if the parameter types
> change, and you want to invalidate any cached row description at the
> same time. Schema changes could bite you, but I think they bite named
> statements in other ways anyway. (Not sure how far the server-side
> efforts to do plan invalidation progressed)

It should be safe.  The server is set up to throw an error if you try to
re-use a prepared statement whose output tuple descriptor would have
changed due to schema changes.  So there's a backstop if the driver's
invalidation logic misses anything.

            regards, tom lane

Re: Query preparation

От
Heikki Linnakangas
Дата:
Tom Lane wrote:
> Oliver Jowett <oliver@opencloud.com> writes:
>> Heikki Linnakangas wrote:
>>> Does anyone see a problem with caching the result set descriptor
>>> (RowDescription)? AFAICS it should never change after a statement is
>>> prepared. If not, I'll polish up and submit the patch.
>
>> If you tie it to the existing named statement mechanisms I think that
>> works. We invalidate the named statement anyway if the parameter types
>> change, and you want to invalidate any cached row description at the
>> same time. Schema changes could bite you, but I think they bite named
>> statements in other ways anyway. (Not sure how far the server-side
>> efforts to do plan invalidation progressed)
>
> It should be safe.  The server is set up to throw an error if you try to
> re-use a prepared statement whose output tuple descriptor would have
> changed due to schema changes.  So there's a backstop if the driver's
> invalidation logic misses anything.

Here's the patch. Describe message is only sent in the first invocation
of a query. The Field[] array constructed from the RowDescription
message is saved in the SimpleQuery object for reuse.

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com
Index: org/postgresql/core/v3/QueryExecutorImpl.java
===================================================================
RCS file: /cvsroot/jdbc/pgjdbc/org/postgresql/core/v3/QueryExecutorImpl.java,v
retrieving revision 1.43
diff -u -r1.43 QueryExecutorImpl.java
--- org/postgresql/core/v3/QueryExecutorImpl.java    15 Nov 2008 17:48:52 -0000    1.43
+++ org/postgresql/core/v3/QueryExecutorImpl.java    16 Apr 2009 10:43:03 -0000
@@ -894,7 +894,7 @@
         }
     }

-    private void sendDescribePortal(Portal portal) throws IOException {
+    private void sendDescribePortal(SimpleQuery query, Portal portal) throws IOException {
         //
         // Send Describe.
         //
@@ -915,6 +915,9 @@
         if (encodedPortalName != null)
             pgStream.Send(encodedPortalName); // portal name to close
         pgStream.SendChar(0);                 // end of portal name
+
+        pendingDescribePortalQueue.add(query);
+        query.setDescribed(true);
     }

     private void sendDescribeStatement(SimpleQuery query, SimpleParameterList params, boolean describeOnly) throws
IOException{ 
@@ -938,9 +941,10 @@
         pgStream.SendChar(0);                       // end message

         pendingDescribeStatementQueue.add(new Object[]{query, params, new Boolean(describeOnly),
query.getStatementName()});
+        query.setDescribed(true);
     }

-    private void sendExecute(Query query, Portal portal, int limit) throws IOException {
+    private void sendExecute(SimpleQuery query, Portal portal, int limit) throws IOException {
         //
         // Send Execute.
         //
@@ -1071,8 +1075,8 @@
         // A statement describe will also output a RowDescription,
         // so don't reissue it here if we've already done so.
         //
-        if (!noMeta && !describeStatement)
-            sendDescribePortal(portal);
+        if (!noMeta && !describeStatement && !query.isDescribed())
+            sendDescribePortal(query, portal);

         sendExecute(query, portal, rows);
     }
@@ -1159,7 +1163,6 @@
         boolean noResults = (flags & QueryExecutor.QUERY_NO_RESULTS) != 0;
         boolean bothRowsAndStatus = (flags & QueryExecutor.QUERY_BOTH_ROWS_AND_STATUS) != 0;

-        Field[] fields = null;
         Vector tuples = null;

         int len;
@@ -1175,6 +1178,7 @@

         int parseIndex = 0;
         int describeIndex = 0;
+        int describePortalIndex = 0;
         int bindIndex = 0;
         int executeIndex = 0;

@@ -1260,15 +1264,19 @@

                 if (doneAfterRowDescNoData) {
                     Object describeData[] = (Object[])pendingDescribeStatementQueue.get(describeIndex++);
-                    Query currentQuery = (Query)describeData[0];
+                    SimpleQuery currentQuery = (SimpleQuery)describeData[0];

-                    if (fields != null || tuples != null)
+                    Field[] fields = currentQuery.getFields();
+
+                    if (fields != null)
                     { // There was a resultset.
+                        tuples = new Vector();
                         handler.handleResultRows(currentQuery, fields, tuples, null);
-                        fields = null;
                         tuples = null;
                     }
                 }
+                else
+                    describePortalIndex++;
                 break;

             case 's':    // Portal Suspended (end of Execute)
@@ -1281,12 +1289,16 @@

                 {
                     Object[] executeData = (Object[])pendingExecuteQueue.get(executeIndex++);
-                    Query currentQuery = (Query)executeData[0];
+                    SimpleQuery currentQuery = (SimpleQuery)executeData[0];
                     Portal currentPortal = (Portal)executeData[1];
+
+                    Field[] fields = currentQuery.getFields();
+                    if (fields != null && !noResults && tuples == null)
+                        tuples = new Vector();
+
                     handler.handleResultRows(currentQuery, fields, tuples, currentPortal);
                 }

-                fields = null;
                 tuples = null;
                 break;

@@ -1298,13 +1310,16 @@

                 {
                     Object[] executeData = (Object[])pendingExecuteQueue.get(executeIndex++);
-                    Query currentQuery = (Query)executeData[0];
+                    SimpleQuery currentQuery = (SimpleQuery)executeData[0];
                     Portal currentPortal = (Portal)executeData[1];

+                    Field[] fields = currentQuery.getFields();
+                    if (fields != null && !noResults && tuples == null)
+                        tuples = new Vector();
+
                     if (fields != null || tuples != null)
                     { // There was a resultset.
                         handler.handleResultRows(currentQuery, fields, tuples, null);
-                        fields = null;
                         tuples = null;

                         if (bothRowsAndStatus)
@@ -1411,18 +1426,20 @@
                 break;

             case 'T':  // Row Description (response to Describe)
-                fields = receiveFields();
+                Field[] fields = receiveFields();
                 tuples = new Vector();
+
                 if (doneAfterRowDescNoData) {
                     Object describeData[] = (Object[])pendingDescribeStatementQueue.get(describeIndex++);
                     Query currentQuery = (Query)describeData[0];

-                    if (fields != null || tuples != null)
-                    { // There was a resultset.
-                        handler.handleResultRows(currentQuery, fields, tuples, null);
-                        fields = null;
-                        tuples = null;
-                    }
+                    handler.handleResultRows(currentQuery, fields, tuples, null);
+                    tuples = null;
+                }
+                else
+                {
+                    SimpleQuery query = (SimpleQuery)pendingDescribePortalQueue.get(describePortalIndex++);
+                    query.setFields(fields);
                 }
                 break;

@@ -1440,6 +1457,7 @@

                 pendingParseQueue.clear();              // No more ParseComplete messages expected.
                 pendingDescribeStatementQueue.clear();  // No more ParameterDescription messages expected.
+                pendingDescribePortalQueue.clear();     // No more RowDescription messages expected.
                 pendingBindQueue.clear();               // No more BindComplete messages expected.
                 pendingExecuteQueue.clear();            // No more query executions expected.
                 break;
@@ -1688,6 +1706,7 @@
     private final ArrayList pendingBindQueue = new ArrayList(); // list of Portal instances
     private final ArrayList pendingExecuteQueue = new ArrayList(); // list of {SimpleQuery,Portal} object arrays
     private final ArrayList pendingDescribeStatementQueue = new ArrayList(); // list of {SimpleQuery,
SimpleParameterList,Boolean} object arrays 
+    private final ArrayList pendingDescribePortalQueue = new ArrayList(); // list of {SimpleQuery,
SimpleParameterList,Boolean} object arrays 

     private long nextUniqueID = 1;
     private final ProtocolConnectionImpl protoConnection;
Index: org/postgresql/core/v3/SimpleQuery.java
===================================================================
RCS file: /cvsroot/jdbc/pgjdbc/org/postgresql/core/v3/SimpleQuery.java,v
retrieving revision 1.13
diff -u -r1.13 SimpleQuery.java
--- org/postgresql/core/v3/SimpleQuery.java    30 Sep 2008 23:41:23 -0000    1.13
+++ org/postgresql/core/v3/SimpleQuery.java    16 Apr 2009 10:43:03 -0000
@@ -104,6 +104,20 @@
         return encodedStatementName;
     }

+    void setFields(Field[] fields) {
+        this.fields = fields;
+    }
+    Field[] getFields() {
+        return fields;
+    }
+    // Have we sent a Describe (portal or statement) message for this query yet?
+    boolean isDescribed() {
+        return described;
+    }
+    void setDescribed(boolean described) {
+        this.described = described;
+    }
+
     void setCleanupRef(PhantomReference cleanupRef) {
         if (this.cleanupRef != null) {
             this.cleanupRef.clear();
@@ -122,12 +136,16 @@

         statementName = null;
         encodedStatementName = null;
+        fields = null;
+        described = false;
     }

     private final String[] fragments;
     private final ProtocolConnectionImpl protoConnection;
     private String statementName;
     private byte[] encodedStatementName;
+    private Field[] fields;
+    private boolean described;
     private PhantomReference cleanupRef;
     private int[] preparedTypes;


Re: Query preparation

От
John Lister
Дата:
Heikki Linnakangas wrote:
>
> Here's the patch. Describe message is only sent in the first
> invocation of a query. The Field[] array constructed from the
> RowDescription message is saved in the SimpleQuery object for reuse.
>
Cheers, I'd already added similar code while expanding the binary
support for arrays. good to see both on the same lines..

I'll try and get the binary stuff sorted soon.


Re: Query preparation

От
Kris Jurka
Дата:

On Thu, 16 Apr 2009, Heikki Linnakangas wrote:

> Here's the patch. Describe message is only sent in the first invocation
> of a query. The Field[] array constructed from the RowDescription
> message is saved in the SimpleQuery object for reuse.
>

Applied with minor revisions.  When sending a Describe Statement message
you'll also get a Row Description message back, so we need to add the
query to the pendingDescribePortalQueue there as well.  Doing this always
avoids some of the other special casing you tried to put in to handle
describe statement (which didn't work if you see the new unit test I
added).

While looking at this code it reminded me that we're also sending Describe
Statement too often and I've fixed that to only do it when necessary.
Previously the following code would describe the statement for every
execution.

PreparedStatement ps = conn.prepareStatement("SELECT ?::int");
for (int i=0; i<100; i++) {
     ps.setObject(1, new Integer(i), Types.OTHER);
     ResultSet rs = ps.executeQuery();
}

Also the JDBC coding style guidelines require four space indentation
rather than tabs, so you'll see some additional changes from that.

Kris Jurka