Обсуждение: SOLVED ... Re: Getting rid of a cursor from JDBC .... Re: Re: HELP: How to tame the 8.3.x JDBC driver with a biq guery result set

Поиск
Список
Период
Сортировка
Statement.close() appears to get the job done (in my envrionment, PG's driver never sees a Connection.close() because of DBCP).

I'd consider the fact that ResultSet.close() does not release the implicit cursor to be something of a bug, but it may well have been fixed already.

Cheers
Dave

On Mon, Apr 19, 2010 at 6:28 PM, Dave Crooke <dcrooke@gmail.com> wrote:
Scott - I tried to post a SOLVED followup to the JDBC list but it was rejected :-!

I now have the opposite problem of getting rid of the cursor :-) ResultSet.close() does not work. I am trying to do a DROP TABLE from the other Connection, to whack the table I just finished the ETL on, but it just hangs indefintiely, and pg_locks shows the shared read lock still sitting there.

I am trying a Statement.close() and Connection.close() now, but I fear I may have to do something slightly ugly, as I have Apache DBCP sitting in between me and the actual PG JDBC driver.

I am hoping the slightly ugly thing is only closing the underlying connection, and does not have to be /etc/init.d/postgresql8.3 restart :-) Is there a backdoor way to forcibly get rid of a lock you don't need any more?

Cheers
Dave

On Mon, Apr 19, 2010 at 1:05 PM, Scott Carey <scott@richrelevance.com> wrote:
On Apr 15, 2010, at 1:01 PM, Dave Crooke wrote:
> On Thu, Apr 15, 2010 at 2:42 PM, Dave Crooke <dcrooke@gmail.com> wrote:
> Hey folks
>
> I am trying to do a full table scan on a large table from Java, using a straightforward "select * from foo". I've run into these problems:
>
> 1. By default, the PG JDBC driver attempts to suck the entire result set into RAM, resulting in java.lang.OutOfMemoryError ... this is not cool, in fact I consider it a serious bug (even MySQL gets this right ;-) I am only testing with a 9GB result set, but production needs to scale to 200GB or more, so throwing hardware at is is not feasible.
>

For scrolling large result sets you have to do the following to prevent it from loading the whole thing into memory:


Use forward-only, read-only result scrolling and set the fetch size.  Some of these may be the default depending on what the connection pool is doing, but if set otherwise it may cause the whole result set to load into memory.  I regularly read several GB result sets with ~10K fetch size batches.

Something like:
Statement st =  conn.createStatement(java.sql.ResultSet.TYPE_FORWARD_ONLY, java.sql.ResultSet.CONCUR_READ_ONLY)
st.setFetchSize(FETCH_SIZE);

That's what I''m using, albeit without any args to createStatement, and it now works.
 



> 2. I tried using the official taming method, namely java.sql.Statement.setFetchSize(1000) and this makes it blow up entirely with an error I have no context for, as follows (the number C_10 varies, e.g. C_12 last time) ...
>
> org.postgresql.util.PSQLException: ERROR: portal "C_10" does not exist
>     at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:1592)
>     at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1327)
>     at org.postgresql.core.v3.QueryExecutorImpl.fetch(QueryExecutorImpl.java:1527)
>     at org.postgresql.jdbc2.AbstractJdbc2ResultSet.next(AbstractJdbc2ResultSet.java:1843)
>
> This is definitely a bug :-)
>
>

I have no idea what that is.

It was because I was also writing to the same Connection ... when you call Connection.commit() with the PG JDBC driver, it also kills all your open cursors.

I think this is a side effect of the PG internal design where it does MVCC within a table (rows have multiple versions with min and max transaction ids) ... even a query in PG has a notional virtual transaction ID, whereas in e.g. Oracle, a query has a start time and visibility horizon, and as long as you have enough undo tablespace, it has an existence which is totally independent of any transactions going on around it even on the same JDBC connection.


 


Dave Crooke <dcrooke@gmail.com> wrote:

> I'd consider the fact that ResultSet.close() does not release the
> implicit cursor to be something of a bug

What's your reasoning on that?  The definitions of cursors in the
spec, if memory serves, allow a cursor to be closed and re-opened;
why would this be treated differently?

-Kevin

AFAICT from the Java end, ResultSet.close() is supposed to be final. There is no way I know of in JDBC to get a handle back to the cursor on the server side once you have made this call - in fact, its sole purpose is to inform the server in a timely fashion that this cursor is no longer required, since the ResultSet itself is a Java object and thus subject to garbage collection and finalizer hooks.

At a pragmatic level, the PGSQL JDBC driver has a lot of odd behaviours which, while they may or may not be in strict compliance with the letter of the standard, are very different from any other mainstream database that I have accessed from Java .... what I'd consider as normative behaviour, using regular JDBC calls without the need to jump through all these weird hoops, is exhibited by all of the following: Oracle, SQL Server, DB2, MySQL, Apache Derby and JET (MS-Access file-based back end, the .mdb format)

In practce, this places PGSQL as the odd one out, which is a bit of a turn-off to expereinced Java people who are PG newbies for what is otherwise an excellent database.

At my current post, I came into a shop that had PG as the only real database, so I have learned to love it, and de-supported Derby and the other toy back ends we used to use. And to be fair, from a back end perspective, PG is better than MySQL in terms of manageability .... I am running 250GB databases on small systems with no issues.

At my previous shop, we built a couple of database-backed apps from scratch, and despite a desire to use PG due to there being more certainty over its future licensing (it was just after Sun had bought MySQL AG), I ended up switching from PG to MySQL 5.0.47 (last open source version) because of the difficulties I was having with the PG driver.

I consider part of the acme of great FOSS is to make it easy to use for newbies and thus attract a larger user base, but that is just my $0.02 worth.

Cheers
Dave

On Tue, Apr 20, 2010 at 9:28 AM, Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote:
Dave Crooke <dcrooke@gmail.com> wrote:

> I'd consider the fact that ResultSet.close() does not release the
> implicit cursor to be something of a bug

What's your reasoning on that?  The definitions of cursors in the
spec, if memory serves, allow a cursor to be closed and re-opened;
why would this be treated differently?

-Kevin


On Mon, 19 Apr 2010, Dave Crooke wrote:

> Statement.close() appears to get the job done (in my envrionment, PG's
> driver never sees a Connection.close() because of DBCP).
>
> I'd consider the fact that ResultSet.close() does not release the implicit
> cursor to be something of a bug, but it may well have been fixed already.

PG doesn't release the locks acquired by the query until transaction end.
So closing a cursor will release some backend memory, but it won't release
the locks.  The way the driver implements ResultSet.close() is to put
the close message into a queue so that the next time a message is sent to
the backend we'll also send the cursor close message.  This avoids an
extra network roundtrip for the close action.

In any case Statement.close isn't helping you here either.  It's really
Connection.commit/rollback that's releasing the locks.

Kris Jurka

Dave Crooke <dcrooke@gmail.com> wrote:

> AFAICT from the Java end, ResultSet.close() is supposed to be
> final.

For that ResultSet.  That doesn't mean a ResultSet defines a cursor.
Such methods as setCursorName, setFetchSize, and setFetchDirection
are associated with a Statement.  Think of the ResultSet as the
result of a cursor *scan* generated by opening the cursor defined by
the Statement.

http://java.sun.com/javase/6/docs/api/java/sql/ResultSet.html#close%28%29

Notice that the ResultSet is automatically closed if the Statement
that generated it is re-executed.  That is very much consistent with
Statement as the equivalent of a cursor, and not very consistent
with a ResultSet as the equivalent of a cursor.

> There is no way I know of in JDBC to get a handle back to the
> cursor on the server side once you have made this call - in fact,
> its sole purpose is to inform the server in a timely fashion that
> this cursor is no longer required, since the ResultSet itself is a
> Java object and thus subject to garbage collection and finalizer
> hooks.

Again, you're talking about the *results* from *opening* the cursor.

> At a pragmatic level, the PGSQL JDBC driver has a lot of odd
> behaviours which, while they may or may not be in strict
> compliance with the letter of the standard, are very different
> from any other mainstream database that I have accessed from Java
> .... what I'd consider as normative behaviour, using regular JDBC
> calls without the need to jump through all these weird hoops, is
> exhibited by all of the following: Oracle, SQL Server, DB2, MySQL,
> Apache Derby and JET (MS-Access file-based back end, the .mdb
> format)

Are you talking about treating the Statement object as representing
a cursor and the ResultSet representing the results from opening
the cursor, or are you thinking of something else here?

> In practce, this places PGSQL as the odd one out, which is a bit
> of a turn-off to expereinced Java people who are PG newbies for
> what is otherwise an excellent database.

Huh.  I dropped PostgreSQL into an environment with hundreds of
databases, and the applications pretty much "just worked" for us.
Of course, we were careful to write to the SQL standard and the JDBC
API, not to some other product's implementation of them.

There were a few bugs we managed to hit which hadn't previously been
noticed, but those were promptly fixed.  As I recall, about the only
other things which caused me problems were:

(1)  Needing to setFetchSize to avoid materializing the entire
result set in RAM on the client.

(2)  Fixing a race condition in our software which was benign in
other products, but clearly my own bug.

(3)  Working around the fact that COALESCE(NULL, NULL) can't be used
everywhere NULL can.

> At my previous shop, we built a couple of database-backed apps
> from scratch, and despite a desire to use PG due to there being
> more certainty over its future licensing (it was just after Sun
> had bought MySQL AG), I ended up switching from PG to MySQL 5.0.47
> (last open source version) because of the difficulties I was
> having with the PG driver.

Just out of curiosity, did you discuss that on the PostgreSQL lists?
Can you reference the thread(s)?

> I consider part of the acme of great FOSS is to make it easy to
> use for newbies and thus attract a larger user base, but that is
> just my $0.02 worth.

Sure, but I would consider it a step away from that to follow
MySQL's interpretation of cursors rather than the standard's.
YMMV, of course.

-Kevin

I don't want to get into a big debate about standards, but I will clarify a couple of things inline below.

My key point is that the PG JDBC driver resets people's expecations who have used JDBC with other databases, and that is going to reflect negatively on Postgres if Postgres is in the minority, standards nothwithstanding, and I feel badly about that, because PG rocks!

Cheers
Dave

On Tue, Apr 20, 2010 at 11:32 AM, Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote:
Dave Crooke <dcrooke@gmail.com> wrote:

> AFAICT from the Java end, ResultSet.close() is supposed to be
> final.

For that ResultSet.  That doesn't mean a ResultSet defines a cursor.
Such methods as setCursorName, setFetchSize, and setFetchDirection
are associated with a Statement.  Think of the ResultSet as the
result of a cursor *scan* generated by opening the cursor defined by
the Statement.

http://java.sun.com/javase/6/docs/api/java/sql/ResultSet.html#close%28%29

Notice that the ResultSet is automatically closed if the Statement
that generated it is re-executed.  That is very much consistent with
Statement as the equivalent of a cursor, and not very consistent
with a ResultSet as the equivalent of a cursor.

True, but mechanically there is no other choice - the ResultSet is created by Statement.executeQuery() and by then it's already in motion .... in the case of Postgres with default settings, the JVM blows out before that call returns.

I am not explicitly creating any cursors, all I'm doing is running a query with a very large ResultSet.
 

Again, you're talking about the *results* from *opening* the cursor.

> At a pragmatic level, the PGSQL JDBC driver has a lot of odd
> behaviours which, while they may or may not be in strict
> compliance with the letter of the standard, are very different
> from any other mainstream database that I have accessed from Java
> .... what I'd consider as normative behaviour, using regular JDBC
> calls without the need to jump through all these weird hoops, is
> exhibited by all of the following: Oracle, SQL Server, DB2, MySQL,
> Apache Derby and JET (MS-Access file-based back end, the .mdb
> format)

Are you talking about treating the Statement object as representing
a cursor and the ResultSet representing the results from opening
the cursor, or are you thinking of something else here?

Specific examples:

a. the fact that Statement.executeQuery("select * from huge_table") works out of the box with every one of those databases, but results in java.langOutOfMemory with PG without special setup. Again, this is to the letter of the standard, it's just not very user friendly.

b. The fact that with enterprise grade commercital databases, you can mix reads and writes on the same Connection, whereas with PG Connection.commit() kills open cursors.

The fact that I've been using JDBC for 12 years with half a dozen database products, in blissful ignorance of these fine distinctions in the standard until I had to deal with them with PG, is kinda what my point is :-)

I understand the reasons for some of these limitations, but by no means all of them.
 
Huh.  I dropped PostgreSQL into an environment with hundreds of
databases, and the applications pretty much "just worked" for us.
Of course, we were careful to write to the SQL standard and the JDBC
API, not to some other product's implementation of them.

True, but not everyone can hire every developer to be a JDBC / SQL language lawyer. All of our SQL is either ANSI or created by the Hibernate PGSQL adapter, with the exception of a daily "VACUUM ANALYSE" which I added ;-)

I do believe that when there are two ways to implement a standard, the "it just works" way is far preferable to the "well, I know you probably think this is a bug, because 90% of the client code out there chokes on it, but actually we are standards compliant, it's everyone else who is doing it wrong" way.

I used to work at a storage startup that did exactly the latter, using an obscure HTTP/1.1 standard feature that absolutely none of the current browsers or HTTP libraries supports, and so it was a constant source of frustration for customers and tech support alike. I no longer work there ;-)

It's kinda like making stuff that has to work with Windows - you know Microsoft doesn't follow it's own standards, but you gotta make our code work with theirs, so you play ball with their rules.


(1)  Needing to setFetchSize to avoid materializing the entire
result set in RAM on the client.

I don't understand the rationale for why PG, unlike every other database, doesn't make this a sensible default, e.g, 10,000 rows ... maybe because the locks stay in place until you call Connection.close() or Connection.commit() ? ;-)
 

(2)  Fixing a race condition in our software which was benign in
other products, but clearly my own bug.

Been there and done that with code developed on single-threaded DB's (JET / Derby) ... not what I'm griping about here though, the base code with no extra JDBC setup calls works perfectly against Oracle.
 
Just out of curiosity, did you discuss that on the PostgreSQL lists?
Can you reference the thread(s)?

No, I was in a hurry, and the "just works" model was available with both MySQL and Berkeley DB, so I didn't see the point in engaging. I felt the in house paranoia about the MySQL licensing (our CFO) was not justified, and it was the devil I knew, I was taking a look at PG which was then foreign to me as a "genius of the and" alternative.
 

Sure, but I would consider it a step away from that to follow
MySQL's interpretation of cursors rather than the standard's.
YMMV, of course.

I wouldn't hold MySQL up to be a particularly good implmentation of anything, other than speed (MyISAM) and usability (the CLI) .... I find Oracle's JDBC implmentation to be both user friendly and (largely) standards compliant.

YMMV too :-)

I hope this can be taken in the amicable spirit of gentlemanly debate in which it is offered, and in the context that we all want to see PG grow and continue to succeed.

Cheers
Dave


On Tue, Apr 20, 2010 at 3:29 PM, Dave Crooke <dcrooke@gmail.com> wrote:
I wouldn't hold MySQL up to be a particularly good implmentation of anything, other than speed (MyISAM) and usability (the CLI) .... I find Oracle's JDBC implmentation to be both user friendly and (largely) standards compliant.

Dave,

I've been following along at home and agree with you right up until you mention the MySQL CLI being usable.  I work with the thing every day.  The plain, vanilla install on my Ubuntu laptop lacks proper readline support.  Hitting ctrl-c will sometimes kill the running query and sometimes kill the CLI.  Its far from a paragon of usability.  That last time I used psql it didn't have any of those issues.

Full disclosure:  mysql does have proper readline support on a Centos machine I have access to.  ctrl-c still kills the shell.

Your other points are good though.

--Nik
Dave Crooke <dcrooke@gmail.com> wrote:

> a. the fact that Statement.executeQuery("select * from
> huge_table") works out of the box with every one of those
> databases, but results in java.langOutOfMemory with PG without
> special setup. Again, this is to the letter of the standard, it's
> just not very user friendly.

The way I read it, it's *allowed* by the standard, but not
*required* by the standard.  I agree it's not very friendly
behavior.  I made some noise about it early in my use of PostgreSQL,
but let it go once I had it covered for my own shop.  I agree it's a
barrier to conversion -- it often comes up here with new PostgreSQL
users, and who knows how many people give up on PostgreSQL without
coming here when they hit it?

It's not just an issue in JDBC, either; it's generally the default
in PostgreSQL interfaces.  That seems to be by design, with the
rationale that it prevents returning some part of a result set and
then throwing an error.  Anyone coming from another database
probably already handles that, so they won't tend to be impressed by
that argument, but it would be hard to change that as a default
behavior in PostgreSQL without breaking a lot of existing code for
PostgreSQL users at this point.  :-(

> b. The fact that with enterprise grade commercital databases, you
> can mix reads and writes on the same Connection, whereas with PG
> Connection.commit() kills open cursors.

Well, I know that with Sybase ASE (and therefore it's probably also
true of Microsoft SQL Server, since last I saw they both use TDS
protocol), unless you're using a cursor, if you execute another
statement through JDBC on the same connection which has a pending
ResultSet, it reads the rest of the ResultSet into RAM (the behavior
you don't like), before executing the new statement.  So at least
for those databases you can't really claim *both* a and b as points.

Oops -- I just noticed you said "enterprise grade".  ;-)

> The fact that I've been using JDBC for 12 years with half a dozen
> database products, in blissful ignorance of these fine
> distinctions in the standard until I had to deal with them with
> PG, is kinda what my point is :-)

OK, point taken.

> I understand the reasons for some of these limitations, but by no
> means all of them.

Well, one of the cool things about open source is that users have
the opportunity to "scratch their own itches".  The JDBC
implementation is 100% Java, so if changing something there would be
helpful to you, you can do so.  If you're careful about it, you may
be able to contribute it back to the community to save others the
pain.  If you want to take a shot at some of this, I'd be willing to
help a bit.  If nothing else, the attempt may give you better
perspective on the reasons for some of the limitations.  ;-)

>> (1)  Needing to setFetchSize to avoid materializing the entire
>> result set in RAM on the client.
>
> I don't understand the rationale for why PG, unlike every other
> database, doesn't make this a sensible default, e.g, 10,000 rows

I took a bit of a look at this, years ago.  My recollection is that,
based on the nature of the data stream, you would need to do
something similar to databases using TDS -- you could read as you go
as long as no other statement is executed on the connection; but
you'd need to add code to recognize the exceptional circumstance and
suck the rest of the result set down the wire to RAM should it be
necessary to "clear the way" for another statement.

If you give it a shot, you might want to see whether it's possible
to avoid an irritating implementation artifact of the TDS JDBC
drivers: if you close a ResultSet or a Statement with an open
ResultSet without first invoking Statement.cancel, they would suck
back the rest of the results (and ignore them) -- making for a big
delay sometimes on a close invocation.  As I recall, the
justification was that for executions involving multiple result
sets, they needed to do this to get at the next one cleanly;
although some forms of execute don't support multiple results, and
it doesn't do you a lot of good on Statement close, so you'd think
these could have been optimized.

> I find Oracle's JDBC implmentation to be both user friendly and
> (largely) standards compliant.

Where there are issues with usability or standards compliance with
PostgreSQL, especially for something which works well for you in
other products, I hope you raise them on these lists.  Perhaps there
are already ways to deal with them, perhaps we need to better
document something, and perhaps some change can be made to
accommodate the issue.  Even if no action is taken at the time it is
helpful to the project, because the number of people raising an
issue is often taken into consideration when deciding whether to
change something.  Also, someone running into the issue later may
find the discussion on a search and gain helpful information.

> I hope this can be taken in the amicable spirit of gentlemanly
> debate in which it is offered, and in the context that we all want
> to see PG grow and continue to succeed.

Sure -- and I hope my posts haven't been taken in any other light.

-Kevin

I digest this down to "this is the best that can be achieved on a connection that's single threaded"

I think the big difference with Oracle is this:

i. in Oracle, a SELECT does not have to be a transaction, in the sense that PG's SELECT does ... but in Oracle, a SELECT can fail mid-stream if you wait too long and the UNDO tablespace wraps (ORA-600), i.e. Oracle does not lock on SELECT. Oracle is optimized for lots of small transactions that typically commit, PG supports arbitrary transaction mixes of any size, but is less efficient at the workload for which Oracle is specialized.

ii. SELECT always creates an implicit cursor in Oracle, but access to these cursors can be interleaved arbitrarily on one connection both with each other and transactions (writes)

After consiering the context you offered, I'd recommend the following two minor changes to the PG driver ....

a. Make setFetchSize(10000) the default

b. If someone does call rs.close() before the end of the ResultSet, and has not created an explicit cursor at the JDBC level, flag the query / lock / virtual transaction in some way in the JDBC driver that tells it that it can just dump the cursor on a subsequent stmt.close(), conn.commit() or conn.close() call without sucking down the rest of the data.

AFAICT, this will make the behaviour more like other DB's without sacrifcing anything, but I don't know what default behaviour expectations might be out there in PG land.

Cheers
Dave

On Tue, Apr 20, 2010 at 3:22 PM, Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote:
(Lots of good explanatory stuff)


On Tue, 20 Apr 2010, Dave Crooke wrote:

> a. Make setFetchSize(10000) the default

The reason this is not done is that the mechanism used for fetching a
piece of the results at a time can change the query plan used if using a
PreparedStatement.  There are three ways to plan a PreparedStatement:

a) Using the exact parameter values by substituting them directly into the
query.  This isn't really "planned" as you can't re-use it at all.  This
is only available using the V2 protocol.

b) Using the parameter values for statistics, but not making any stronger
guarantees about them.  So the parameters will be used for evaluating the
selectivity, but not to perform other optimizations like
contraint_exclusion or transforming a LIKE operation to a range query.
This is the default plan type the JDBC driver uses.

c) Planning the query with no regard for the parameters passed to it.
This is the plan type the JDBC driver uses when it sees the same
PreparedStatement being re-used multiple times or when it is respecting
setFetchSize and allowing for partial results.

We must use (c) for partial results instead of (b) because of some
limitations of the server.  Currently you cannot have two statements of
type (b) open on the same connection.  So since the driver can't know if
the user will issue another query before fetching the remainder of the
first query's results, it must setup the first query to be of type (c) so
that multiple statements can exist simultaneously.

Switching the default plan type to (c) will cause a significant number of
complaints as performance on some queries will go into the tank.  Perhaps
we could have a default fetchSize for plain Statements as it won't affect
the plan.  I could also see making this a URL parameter though so it could
be set as the default with only a configuration, not a code change.

> b. If someone does call rs.close() before the end of the ResultSet, and has
> not created an explicit cursor at the JDBC level, flag the query / lock /
> virtual transaction in some way in the JDBC driver that tells it that it can
> just dump the cursor on a subsequent stmt.close(), conn.commit() or
> conn.close() call without sucking down the rest of the data.

This is already true.  The JDBC driver only asks the server for more of
the ResultSet when a next() call requires it.  So the server isn't
constantly spewing out rows that the driver must deal with, the driver
only gets the rows it asks for.  Once the ResultSet is closed, it won't
ask for any more.

Kris Jurka


On Tue, Apr 20, 2010 at 5:05 PM, Kris Jurka <books@ejurka.com> wrote:
> The reason this is not done is that the mechanism used for fetching a piece
> of the results at a time can change the query plan used if using a
> PreparedStatement.  There are three ways to plan a PreparedStatement:
>
> a) Using the exact parameter values by substituting them directly into the
> query.  This isn't really "planned" as you can't re-use it at all.  This is
> only available using the V2 protocol.
>
> b) Using the parameter values for statistics, but not making any stronger
> guarantees about them.  So the parameters will be used for evaluating the
> selectivity, but not to perform other optimizations like contraint_exclusion
> or transforming a LIKE operation to a range query. This is the default plan
> type the JDBC driver uses.

Hmm.  I didn't think this was possible.  How are you doing this?

> c) Planning the query with no regard for the parameters passed to it. This
> is the plan type the JDBC driver uses when it sees the same
> PreparedStatement being re-used multiple times or when it is respecting
> setFetchSize and allowing for partial results.
>
> We must use (c) for partial results instead of (b) because of some
> limitations of the server.  Currently you cannot have two statements of type
> (b) open on the same connection.  So since the driver can't know if the user
> will issue another query before fetching the remainder of the first query's
> results, it must setup the first query to be of type (c) so that multiple
> statements can exist simultaneously.
>
> Switching the default plan type to (c) will cause a significant number of
> complaints as performance on some queries will go into the tank.  Perhaps we
> could have a default fetchSize for plain Statements as it won't affect the
> plan.  I could also see making this a URL parameter though so it could be
> set as the default with only a configuration, not a code change.

...Robert



On Wed, Apr 21, 2010 at 10:41 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Tue, Apr 20, 2010 at 5:05 PM, Kris Jurka <books@ejurka.com> wrote:
> The reason this is not done is that the mechanism used for fetching a piece
> of the results at a time can change the query plan used if using a
> PreparedStatement.  There are three ways to plan a PreparedStatement:
>
> a) Using the exact parameter values by substituting them directly into the
> query.  This isn't really "planned" as you can't re-use it at all.  This is
> only available using the V2 protocol.
>
> b) Using the parameter values for statistics, but not making any stronger
> guarantees about them.  So the parameters will be used for evaluating the
> selectivity, but not to perform other optimizations like contraint_exclusion
> or transforming a LIKE operation to a range query. This is the default plan
> type the JDBC driver uses.

Hmm.  I didn't think this was possible.  How are you doing this?

More to the point is there some option that can shift you into method a?  I'm thinking of warehousing type applications where you want to re-plan a good portion of your queries.

>> On Tue, Apr 20, 2010 at 5:05 PM, Kris Jurka <books@ejurka.com> wrote:
>>> ... There are three ways to plan a PreparedStatement:

FWIW, I think there is some consensus to experiment (in the 9.1 cycle)
with making the server automatically try replanning of parameterized
queries with the actual parameter values substituted.  It'll keep doing
so if it finds that that produces a significantly better plan than the
generic parameterized plan; which is what you'd expect if there's a
chance to optimize a LIKE search, eliminate partitions, etc.

            regards, tom lane



On Wed, Apr 21, 2010 at 11:30 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> On Tue, Apr 20, 2010 at 5:05 PM, Kris Jurka <books@ejurka.com> wrote:
>>> ... There are three ways to plan a PreparedStatement:

FWIW, I think there is some consensus to experiment (in the 9.1 cycle)
with making the server automatically try replanning of parameterized
queries with the actual parameter values substituted.  It'll keep doing
so if it finds that that produces a significantly better plan than the
generic parameterized plan; which is what you'd expect if there's a
chance to optimize a LIKE search, eliminate partitions, etc.

                       regards, tom lane

That'd be wonderful.

On Wed, 21 Apr 2010, Robert Haas wrote:

> On Tue, Apr 20, 2010 at 5:05 PM, Kris Jurka <books@ejurka.com> wrote:
>>
>> b) Using the parameter values for statistics, but not making any stronger
>> guarantees about them.  So the parameters will be used for evaluating the
>> selectivity, but not to perform other optimizations like contraint_exclusion
>> or transforming a LIKE operation to a range query. This is the default plan
>> type the JDBC driver uses.
>
> Hmm.  I didn't think this was possible.  How are you doing this?

This is only possible at the protocol level, it's not available using SQL
commands only.  You do this by creating an unnamed instead of a named
statement:

http://www.postgresql.org/docs/8.4/static/protocol-flow.html#PROTOCOL-FLOW-EXT-QUERY

     Query planning for named prepared-statement objects occurs when
     the Parse message is processed. If a query will be repeatedly
     executed with different parameters, it might be beneficial to send
     a single Parse message containing a parameterized query, followed
     by multiple Bind and Execute messages. This will avoid replanning
     the query on each execution.

     The unnamed prepared statement is likewise planned during Parse
     processing if the Parse message defines no parameters. But if
     there are parameters, query planning occurs during Bind processing
     instead. This allows the planner to make use of the actual values
     of the parameters provided in the Bind message when planning the
     query.


Kris Jurka


On Wed, 21 Apr 2010, Nikolas Everett wrote:

> More to the point is there some option that can shift you into method a?
>  I'm thinking of warehousing type applications where you want to re-plan
> a good portion of your queries.
>

This can be done by connecting to the database using the V2 protocol (use
URL option protocolVersion=2).  This does remove some functionality of
the driver that is only available for V3 protocol, but will work just
fine for query execution.

Kris Jurka