Обсуждение: libpq - lack of support to set the fetch size

Поиск

Список

Период

Сортировка

libpq - lack of support to set the fetch size

От

matshyeq

Дата:

10 марта 2014 г., 00:16:49

Hello,

I've found an issue when tried to implement fetching rows from big table (2mln rows) in my app.

Basically I don't find an elegant and easy way (other than always use cursors) to limit the number of rows returned.

This causes my application to break due to the excessive memory consumption.

I'm using Perl and DBD::Pg library but contacted maintainer who actually pointed out this is an issue that goes much deeper (libpq):

"Unfortunately, this is a limitation in the underlying driver (libpq) rather than DBD::Pg itself. There have been talks over the years of supporting this, but nothing concrete yet. Your best bet would be to ask about this on the Postgres lists"

Would you consider putting this on the roadmap, so one day it gets improved?

Re, the details of the issue, I believe this has been well described at:

http://stackoverflow.com/questions/21960121/perl-dbdpg-script-fails-when-selecting-data-from-big-table

Kind Regards

~Msciwoj

Re: libpq - lack of support to set the fetch size

От

Adrian Klaver

Дата:

10 марта 2014 г., 00:26:49

On 03/09/2014 06:43 AM, matshyeq wrote:
> Hello,
>
> I've found an issue when tried to implement fetching rows from big table
> (2mln rows) in my app.
> Basically I don't find an elegant and easy way (other than always use
> cursors) to limit the number of rows returned.
> This causes my application to break due to the excessive memory consumption.

LIMIT does not work?

>
> I'm using Perl and DBD::Pg library but contacted maintainer who actually
> pointed out this is an issue that goes much deeper (libpq):
>
> "Unfortunately, this is a limitation in the underlying driver (libpq)
> rather than DBD::Pg itself. There have been talks over the years of
> supporting this, but nothing concrete yet. Your best bet would be to ask
> about this on the Postgres lists"
>
> Would you consider putting this on the roadmap, so one day it gets improved?
>
> Re, the details of the issue, I believe this has been well described at:
> http://stackoverflow.com/questions/21960121/perl-dbdpg-script-fails-when-selecting-data-from-big-table
>
> Kind Regards
> ~Msciwoj


--
Adrian Klaver
adrian.klaver@aklaver.com

Re: libpq - lack of support to set the fetch size

От

"Daniel Verite"

Дата:

10 марта 2014 г., 01:18:13

    matshyeq wrote:

> "Unfortunately, this is a limitation in the underlying driver (libpq) rather
> than DBD::Pg itself. There have been talks over the years of supporting
> this, but nothing concrete yet. Your best bet would be to ask about this on
> the Postgres lists"
>
> Would you consider putting this on the roadmap, so one day it gets improved?

This improvement seems to have actually been made since 9.2 with
the PQsetSingleRowMode() function:

http://postgresql.org/docs/current/static/libpq-single-row-mode.html


Best regards,
--
Daniel
PostgreSQL-powered mail user agent and storage: http://www.manitou-mail.org

Re: libpq - lack of support to set the fetch size

От

Albe Laurenz

Дата:

10 марта 2014 г., 06:58:39

Daniel Verite wrote:
> matshyeq wrote:

[ runs out of memory on the client because all results from a large query are retrieved at once ]

>> "Unfortunately, this is a limitation in the underlying driver (libpq) rather
>> than DBD::Pg itself. There have been talks over the years of supporting
>> this, but nothing concrete yet. Your best bet would be to ask about this on
>> the Postgres lists"
>>
>> Would you consider putting this on the roadmap, so one day it gets improved?
> 
> This improvement seems to have actually been made since 9.2 with
> the PQsetSingleRowMode() function:
> 
> http://postgresql.org/docs/current/static/libpq-single-row-mode.html

Yes, DBD::Pg could be improved to make use of that; the problem is probably
that the code would have to differentiate between PostgreSQL versions.

Your solution with using
   SELECT ... OFFSET ? LIMIT 1
in a loop is bound to suck.

First of all, there is no guarantee that the rows will be returned in
the same order each time, see for example
http://www.postgresql.org/docs/current/static/runtime-config-compatible.html#GUC-SYNCHRONIZE-SEQSCANS
Also, unless you operate with an isolation level higher than READ COMMITTED,
the various SELECTs could operate on different data sets.

So you are likely to end up with incorrect results sooner or later
if you use OFFSET and LIMIT without an ORDER BY clause.

Then you will have really bad performance, especially with a large table,
because each SELECT statement will have to start scanning the table again.
The complexity will rise from O(n) to O(n^2).

You can improve on this by using ORDER BY with an index and remembering
the last returned row (get and read http://sql-performance-explained.com/).

Finally, you will have a client-server round trip for each row returned.
This is a problem you would also have when using PQsetSingleRowMode().

Yours,
Laurenz Albe

Re: libpq - lack of support to set the fetch size

От

Albe Laurenz

Дата:

10 марта 2014 г., 09:42:25

matshyeq wrote:
> Postgresql is there for a good while perceived as one of the best (or just simply the best!?)
> available open source DB solution, so I'm really surprised this functionality is not yet supported...

You can retrieve the full result set,
you can retrieve it row by row,
you can use a LIMIT clause to retrieve it in batches.

Can you explain how exactly the functionality would look that
you are missing?

Yours,
Laurenz Albe

Re: libpq - lack of support to set the fetch size

От

John R Pierce

Дата:

10 марта 2014 г., 09:48:16

On 3/9/2014 6:43 AM, matshyeq wrote:

Hello,

I've found an issue when tried to implement fetching rows from big table (2mln rows) in my app.
Basically I don't find an elegant and easy way (other than always use cursors) to limit the number of rows returned.
This causes my application to break due to the excessive memory consumption.

I'm using Perl and DBD::Pg library but contacted maintainer who actually pointed out this is an issue that goes much deeper (libpq):

"Unfortunately, this is a limitation in the underlying driver (libpq) rather than DBD::Pg itself. There have been talks over the years of supporting this, but nothing concrete yet. Your best bet would be to ask about this on the Postgres lists"

in addition to what the others suggested, you can use a CURSOR to read through results in arbitrary sized blocks.

-- 
john r pierce                                      37N 122W
somewhere on the middle of the left coast

Re: libpq - lack of support to set the fetch size

От

Marko Kreen

Дата:

10 марта 2014 г., 13:53:17

On Mon, Mar 10, 2014 at 06:58:26AM +0000, Albe Laurenz wrote:
> Daniel Verite wrote:
> > matshyeq wrote:
>
> [ runs out of memory on the client because all results from a large query are retrieved at once ]
>
> >> "Unfortunately, this is a limitation in the underlying driver (libpq) rather
> >> than DBD::Pg itself. There have been talks over the years of supporting
> >> this, but nothing concrete yet. Your best bet would be to ask about this on
> >> the Postgres lists"
> >>
> >> Would you consider putting this on the roadmap, so one day it gets improved?
> >
> > This improvement seems to have actually been made since 9.2 with
> > the PQsetSingleRowMode() function:
> >
> > http://postgresql.org/docs/current/static/libpq-single-row-mode.html

> Finally, you will have a client-server round trip for each row returned.
> This is a problem you would also have when using PQsetSingleRowMode().

PQsetSingleRowMode() does not do additional roudtrips, it loads rows
from libpq internal buffer.

--
marko

Re: libpq - lack of support to set the fetch size

От

matshyeq

Дата:

10 марта 2014 г., 15:07:26

Fully agree with Laurenz.

LIMIT in some (limited!) cases could be seen as a workaround but it's far from being elegant (what if your end user types the query?)

If 'SingleRowMode' goes row-by-row then again it's not a solution, especially given that this particular issue applies to rather large row sets.

The only solution is CURSOR based which I find an awkward low level hack comparing to elegant option supported by native library.

Postgresql is there for a good while perceived as one of the best (or just simply the best!?) available open source DB solution, so I'm really surprised this functionality is not yet supported...

On Mon, Mar 10, 2014 at 6:58 AM, Albe Laurenz <laurenz.albe@wien.gv.at> wrote:

Daniel Verite wrote:
> matshyeq wrote:

[ runs out of memory on the client because all results from a large query are retrieved at once ]

>> "Unfortunately, this is a limitation in the underlying driver (libpq) rather
>> than DBD::Pg itself. There have been talks over the years of supporting
>> this, but nothing concrete yet. Your best bet would be to ask about this on
>> the Postgres lists"
>>
>> Would you consider putting this on the roadmap, so one day it gets improved?
>
> This improvement seems to have actually been made since 9.2 with
> the PQsetSingleRowMode() function:
>
> http://postgresql.org/docs/current/static/libpq-single-row-mode.html

Yes, DBD::Pg could be improved to make use of that; the problem is probably
that the code would have to differentiate between PostgreSQL versions.

Your solution with using
SELECT ... OFFSET ? LIMIT 1
in a loop is bound to suck.

First of all, there is no guarantee that the rows will be returned in
the same order each time, see for example
http://www.postgresql.org/docs/current/static/runtime-config-compatible.html#GUC-SYNCHRONIZE-SEQSCANS
Also, unless you operate with an isolation level higher than READ COMMITTED,
the various SELECTs could operate on different data sets.

So you are likely to end up with incorrect results sooner or later
if you use OFFSET and LIMIT without an ORDER BY clause.

Then you will have really bad performance, especially with a large table,
because each SELECT statement will have to start scanning the table again.
The complexity will rise from O(n) to O(n^2).

You can improve on this by using ORDER BY with an index and remembering
the last returned row (get and read http://sql-performance-explained.com/).

Finally, you will have a client-server round trip for each row returned.
This is a problem you would also have when using PQsetSingleRowMode().

Yours,
Laurenz Albe

Thank you,

Kind Regards

~Maciek

Re: libpq - lack of support to set the fetch size

От

matshyeq

Дата:

10 марта 2014 г., 15:07:38

>Albe Laurenz wrote:

I would believe the stackoverflow (http://stackoverflow.com/questions/21960121/perl-script-fails-when-selecting-data-from-big-postgresql-table) question referred to explains the issue well.

> You can retrieve the full result set,

not an option because of client memory limitations (in this case it's poor client spec but there always are some, especially when you want to pull 1e7 rows)

> you can retrieve it row by row,

not an option because of performance (db calls/network roundtrips)

> you can use a LIMIT clause to retrieve it in batches.

you pointed the best why it's not a feasible option (complexity, isolation levels, not always possible ie. when custom query and last but not least: far from being elegant)

> CURSOR option

As already explained at stackoverflow - I'm using it as a workaround. My general point is it forces developers to use lower level communication with DB (cursors) therefore not as elegant as just setting RowCacheSize parameter as specified by DBI. According to DBD::Pg maintainer this hasn't and can't be implemented for PostgreSQL due to the lack of support in its own libpq library.

So again.., I'm really surprised this functionality is not yet supported in PostgreSQL. Does that mean everybody have been implementing this through cursors?

To recap what's on stackoverflow - The functionality I'm talking about would be an equivalent of JDBC setFetchSize() function to optimize the load from (any) database in batches, like in the example below:

Statement st = conn.createStatement(java.sql.ResultSet.TYPE_FORWARD_ONLY, java.sql.ResultSet.CONCUR_READ_ONLY);

// Set the fetch size to 1000.

st.setFetchSize(1000);

// Execute the given sql query

String sql = "select * from bigtable";

ResultSet rs = statement.executeQuery(sql);

while (rs.next()) {

⋮

}

where underneath ResultSet.next() doesn't actually fetch one row at a time from the RESULT-SET. It returns that from the (local) ROW-SET and fetches ROW-SET (transparently) whenever it becomes exhausted on the local client.

Actually, curious now if this functionality has been implemented in PostgreSQL JDBC drivers...?

Anyway, according to one of the DBD::Pg developers it's impossible to bring this functionality as the problem lies deeper, within libpq library:

"Unfortunately, this is a limitation in the underlying driver (libpq) rather than DBD::Pg itself. There have been talks over the years of supporting this, but nothing concrete yet."

So probably the best is to ask Greg to speak to details if still unclear.

Kind Regards,

Maciek

On Mon, Mar 10, 2014 at 9:42 AM, Albe Laurenz <laurenz.albe@wien.gv.at> wrote:

matshyeq wrote:
> Postgresql is there for a good while perceived as one of the best (or just simply the best!?)
> available open source DB solution, so I'm really surprised this functionality is not yet supported...

You can retrieve the full result set,
you can retrieve it row by row,
you can use a LIMIT clause to retrieve it in batches.

Can you explain how exactly the functionality would look that
you are missing?

Yours,
Laurenz Albe

Thank you,

Kind Regards

~Maciek

Re: libpq - lack of support to set the fetch size

От

Adrian Klaver

Дата:

10 марта 2014 г., 15:20:12

On 03/10/2014 04:51 AM, matshyeq wrote:
>>Albe Laurenz wrote:
>
> I would believe the stackoverflow
> (http://stackoverflow.com/questions/21960121/perl-script-fails-when-selecting-data-from-big-postgresql-table)
> question referred to explains the issue well.
>
>> You can retrieve the full result set,
> not an option because of client memory limitations (in this case it's
> poor client spec but there always are some, especially when you want to
> pull 1e7 rows)
>
>> you can retrieve it row by row,
> not an option because of performance (db calls/network roundtrips)
>
>> you can use a LIMIT clause to retrieve it in batches.
> you pointed the best why it's not a feasible option (complexity,
> isolation levels, not always possible ie. when custom query and last but
> not least: far from being elegant)
>
>> CURSOR option
> As already explained at stackoverflow - I'm using it as a workaround. My
> general point is it forces developers to use lower level communication
> with DB (cursors) therefore not as elegant as just setting RowCacheSize
> parameter as specified by DBI. According to DBD::Pg maintainer this
> hasn't and can't be implemented for PostgreSQL due to the lack of
> support in its own libpq library.
> So again.., I'm really surprised this functionality is not yet supported
> in PostgreSQL. Does that mean everybody have been implementing this
> through cursors?
>
> To recap what's on stackoverflow - The functionality I'm talking about
> would be an equivalent of JDBC setFetchSize()
> <http://docs.oracle.com/javase/7/docs/api/java/sql/Statement.html#setFetchSize(int)> function
> to optimize the load from (any) database in batches, like in the example
> below:
>
>       Statement st =
> conn.createStatement(java.sql.ResultSet.TYPE_FORWARD_ONLY,
> java.sql.ResultSet.CONCUR_READ_ONLY);
>
>       // Set the fetch size to 1000.
>
>       st.setFetchSize(1000);
>
>       // Execute the given sql query
>
>       String sql = "select * from bigtable";
>
>       ResultSet rs = statement.executeQuery(sql);
>
>       while (rs.next()) {
>
>            ⋮
>
>       }
>
>
> where underneath ResultSet.next() doesn't actually fetch one row at a
> time from the RESULT-SET. It returns that from the (local) ROW-SET and
> fetches ROW-SET (transparently) whenever it becomes exhausted on the
> local client.
>
> Actually, curious now if this functionality has been implemented in
> PostgreSQL JDBC drivers...?

Yes, using a cursor.

http://jdbc.postgresql.org/documentation/92/query.html

By default the driver collects all the results for the query at once.
This can be inconvenient for large data sets so the JDBC driver provides
a means of basing a ResultSet on a database cursor and only fetching a
small number of rows.

.....

>
> Anyway, according to one of the DBD::Pg developers it's impossible to
> bring this functionality as the problem lies deeper, within libpq library:
>
> "Unfortunately, this is a limitation in the underlying driver (libpq)
> rather than DBD::Pg itself. There have been talks over the years of
> supporting this, but nothing concrete yet."
>
> So probably the best is to ask Greg to speak to details if still unclear.
>
> Kind Regards,
> Maciek
>
>
>
> On Mon, Mar 10, 2014 at 9:42 AM, Albe Laurenz <laurenz.albe@wien.gv.at
> <mailto:laurenz.albe@wien.gv.at>> wrote:
>
>     matshyeq wrote:
>      > Postgresql is there for a good while perceived as one of the best
>     (or just simply the best!?)
>      > available open source DB solution, so I'm really surprised this
>     functionality is not yet supported...
>
>     You can retrieve the full result set,
>     you can retrieve it row by row,
>     you can use a LIMIT clause to retrieve it in batches.
>
>     Can you explain how exactly the functionality would look that
>     you are missing?
>
>     Yours,
>     Laurenz Albe
>
>
>
>
> --
> Thank you,
> Kind Regards
> ~Maciek


--
Adrian Klaver
adrian.klaver@aklaver.com

Re: libpq - lack of support to set the fetch size

От

Tom Lane

Дата:

10 марта 2014 г., 15:20:31

matshyeq <matshyeq@gmail.com> writes:
> If 'SingleRowMode' goes row-by-row then again it's not a solution,
> especially given that this particular issue applies to rather large row
> sets.

Perhaps you should actually experiment with that solution instead of
rejecting it out of hand.  Or at least RTFM about it.

It does have limitations: you can't interleave fetching of different
large query results.  But I don't have a problem telling people they
ought to use cursors for such cases.

            regards, tom lane

Re: libpq - lack of support to set the fetch size

От

Jeff Janes

Дата:

10 марта 2014 г., 16:17:01

On Sun, Mar 9, 2014 at 6:43 AM, matshyeq <matshyeq@gmail.com> wrote:

Hello,

I've found an issue when tried to implement fetching rows from big table (2mln rows) in my app.
Basically I don't find an elegant and easy way (other than always use cursors) to limit the number of rows returned.
This causes my application to break due to the excessive memory consumption.

I'm using Perl and DBD::Pg library but contacted maintainer who actually pointed out this is an issue that goes much deeper (libpq):

"Unfortunately, this is a limitation in the underlying driver (libpq) rather than DBD::Pg itself. There have been talks over the years of supporting this, but nothing concrete yet. Your best bet would be to ask about this on the Postgres lists"

I don't think this is correct. First, DBD::Pg could get tricky and automatically wrap your query in a cursor and then fetch from the cursor behind the scenes. I believe that this is what Python's module does for you in some modes. Second, the feature needed to do this without even using a cursor was added 1.5 years ago (PQsetSingleRowMode). The DBD::Pg was just not taught how to use it yet.

The first strategy could probably be done purely in Perl, the second would require changes to the C parts of DBD::Pg.

Of course just because it can be implemented in DBD::Pg doesn't mean anyone has an obligation to do it. You could speed that along by contributing the code yourself. But I would say the ball is firmly in DBD::Pg's court.

Cheers,

Jeff

Re: libpq - lack of support to set the fetch size

От

"Daniel Verite"

Дата:

10 марта 2014 г., 19:12:35

    matshyeq wrote:

> The only solution is CURSOR based which I find an awkward low level hack
> comparing to elegant option supported by native library.

That's not the only solution, even with the current DBD::Pg
you could do:

$dbh->do("COPY (sql-squery) TO STDOUT");
my $data;
while ($dbh->pg_getcopydata($data) >= 0) {
    # process $data
}

The results would be streamed as opposed to being accumulated in
memory. Also COPY is optimized for high performance.

The drawback is you'd have to parse $data according to the
specific rules of the COPY format, which may be easy or
not-so-easy depending on the actual data, numeric or text
or other, whether it has NULLs, backslashes and so on.

Best regards,
--
Daniel
PostgreSQL-powered mail user agent and storage: http://www.manitou-mail.org

Re: libpq - lack of support to set the fetch size

От

matshyeq

Дата:

11 марта 2014 г., 14:58:50

Marko, Tom, Adrian, Jeff, Daniel - thank you all for valuable feedback!

Two general questions:

- when using PQsetSingleRowMode() function - does it give an option to define how many rows to cache on client's side (like JDBC setFetchSize() does) or leaves it at pqlib's discretion?

- is it/would it be possible to add corresponding option to pgAdmin to limit initially (and each subsequently) returned rows in Query Tool by custom defined max value?

@Tom

>Perhaps you should actually experiment with that solution instead of rejecting it out of hand. Or at least RTFM about it.

As I'm not using pqlib directly I'm unable to leverage PQsetSingleRowMode() call (or at least I'm not aware how to do this from DBD::Pg)

I simply passed you feedback given by them.

@Adrian

The example in the documentation you refer to actually demonstrates this has been properly implemented in JDBC.

By properly I mean call to:

setFetchSize()

works, whatever it actually does behind the scenes (cursors?) it doesn't actually require a developer to declare and utilize cursors explicitly, like

st.execute("DECLARE csr CURSOR FOR SELECT * FROM myBigTable;");

conn.prepareStatement("fetch 1000 from csr");

⋮

@Jeff

I'll make a suggestion to DBD::Pg development

@Daniel

Very interesting alternative. You're welcome to contribute to this stackoverflow question!

Does it mean $data is a reference to single returned row of data (`COPY formatted`)?

Kind Regards

Msciwoj

Re: libpq - lack of support to set the fetch size

От

Marko Kreen

Дата:

12 марта 2014 г., 09:30:13

On Tue, Mar 11, 2014 at 12:39:12PM +0000, matshyeq wrote:
> - when using PQsetSingleRowMode() function - does it give an option to
> define how many rows to cache on client's side (like JDBC setFetchSize()
> does) or leaves it at pqlib's discretion?

This option would not make sense as you are not "fetching" anything,
full resultset is being streamed from server over TCP connection.

> - is it/would it be possible to add corresponding option to pgAdmin to
> limit initially (and each subsequently) returned rows in Query Tool by
> custom defined max value?

It could close connection in the middle of resultset but that seems like
bad idea.  LIMIT N or FETCH N are better for such task.

--
marko

Re: libpq - lack of support to set the fetch size

От

Marko Kreen

Дата:

12 марта 2014 г., 11:41:01

On Wed, Mar 12, 2014 at 10:57:03AM +0000, matshyeq wrote:
> On Wed, Mar 12, 2014 at 9:30 AM, Marko Kreen <markokr@gmail.com> wrote:
> > This option would not make sense as you are not "fetching" anything,
> > full resultset is being streamed from server over TCP connection.
>
> Well, I don't know what "streamed" exactly means here.
> If server pushes sequentially all the data not asking client if ready to
> receive then that's what the issue is about.

This problem is handled in kernel's TCP stack - it will slow down
the connection if userspace does not read fast enough.

IOW, don't worry about it.

> If client asks server for another chunk each time it has received previous
> one then to me it's implicit 'fetching' scenario where user/developer
> doesn't have an option to define fetch size.

This is how it is usually implemented and configurable fetch size
is indeed useful in such situation.  But it requires separate round-trip
for each chunk so single-row-mode is superior method for processing
large queries without huge buffers.

> >  > - is it/would it be possible to add corresponding option to pgAdmin to
> >
>  > limit initially (and each subsequently) returned rows in Query Tool by
> >
>  > custom defined max value?
> >
>
> > It could close connection in the middle of resultset but that seems like
> > bad idea.  LIMIT N or FETCH N are better for such task.
>
> I don't see why? I can't think of any single SQL tool I've been working
> with that didn't have this functionality, really.

Yeah, I see no reason pgAdmin cannot implement it.  Implementing
such feature with transparently adding LIMIT or FETCH is better.

PQsetSingleRowMode() is not meant for partial resultsets, but it can
be used if you accept the downsides.

--
marko

Re: libpq - lack of support to set the fetch size

От

matshyeq

Дата:

12 марта 2014 г., 13:15:33

On Wed, Mar 12, 2014 at 9:30 AM, Marko Kreen <markokr@gmail.com> wrote:

On Tue, Mar 11, 2014 at 12:39:12PM +0000, matshyeq wrote:

> - when using PQsetSingleRowMode() function - does it give an option to
> define how many rows to cache on client's side (like JDBC setFetchSize()
> does) or leaves it at pqlib's discretion?

This option would not make sense as you are not "fetching" anything,
full resultset is being streamed from server over TCP connection.

Well, I don't know what "streamed" exactly means here.

If server pushes sequentially all the data not asking client if ready to receive then that's what the issue is about.

If client asks server for another chunk each time it has received previous one then to me it's implicit 'fetching' scenario where user/developer doesn't have an option to define fetch size.

> - is it/would it be possible to add corresponding option to pgAdmin to
> limit initially (and each subsequently) returned rows in Query Tool by
> custom defined max value?

It could close connection in the middle of resultset but that seems like
bad idea. LIMIT N or FETCH N are better for such task.

I don't see why? I can't think of any single SQL tool I've been working with that didn't have this functionality, really.

The principle I find very simple and useful.

There is defined "fetch row size" parameter (each tool calls give its own name),

after submitting ANY query, client fetches result set rows but not more than that.

Some programs even automatically define this value based on result grid size displayed on the screen.

User then usually has two buttons, fetch another batch/screen or fetch all - he decides.

If he decides way too late (break for coffee) then he simply resubmits the query (and potentially change the parameter first)...

I don't find value in auto-fetching millions of rows for user to present on the screen.

Also I don't think it's particularly useful when you need to know and apply database specific SQL syntax to limit the rows.

If you join multiple tables that may be even more tricky (which table to apply limit? or use subquerying instead?).

Last but non least, I can even see now this Option was once available in pgAdmin, but disappeared over time

http://www.pgadmin.org/docs/1.10/query.html

"In the options dialog, you can specify a default limit for the rowset size to retrieve. By default, this value will be 100. If the number of rows to retrieve from the server exceeds this value, a message box will appear asking what to do to prevent retrieval of an unexpected high amount of data. You may decide to retrieve just the first rows, as configured with the max rows setting, or retrieving the complete rowset regardless of the setting, or abort the query, effectively retrieving zero rows."

This is pretty much exactly what I'm talking about here (pgAdmin functionality)

and its equivalent for developers (pqlib)

Regards

Msciwoj

Re: libpq - lack of support to set the fetch size

От

matshyeq

Дата:

12 марта 2014 г., 13:17:27

> Well, I don't know what "streamed" exactly means here.
> If server pushes sequentially all the data not asking client if ready to
> receive then that's what the issue is about.

This problem is handled in kernel's TCP stack - it will slow down
the connection if userspace does not read fast enough.

IOW, don't worry about it.

Again, this is something 'at discretion' of the pqlib library therefore not sure how 'slowing down' really works in different scenarios, like ie. server and app client on the same machine?
Still see quite a value in specifying this explicitly (like JDBC does), which BTW I did by using CURSOR as a workaround.

> If client asks server for another chunk each time it has received previous
> one then to me it's implicit 'fetching' scenario where user/developer
> doesn't have an option to define fetch size.

This is how it is usually implemented and configurable fetch size
is indeed useful in such situation. But it requires separate round-trip
for each chunk so single-row-mode is superior method for processing
large queries without huge buffers.

I wouldn't worry about this (round trips) myself - at the end of the day client decides when and how much to pull.
Also, in the datawarehouse area, we consciously tune this parameter for each DB source individually which is considered very (and good!) common practice.

> > > - is it/would it be possible to add corresponding option to pgAdmin to
> >
> > limit initially (and each subsequently) returned rows in Query Tool by
> >
> > custom defined max value?
> >
>
> > It could close connection in the middle of resultset but that seems like
> > bad idea. LIMIT N or FETCH N are better for such task.
>
> I don't see why? I can't think of any single SQL tool I've been working
> with that didn't have this functionality, really.

Yeah, I see no reason pgAdmin cannot implement it.
Implementing
such feature with transparently adding LIMIT or FETCH is better.

Don't fully agree with last statement. FETCH is WAY better and more functional than LIMIT.

PQsetSingleRowMode() is not meant for partial resultsets, but it can
be used if you accept the downsides.

Given the 'discretion mode' I'm still not sure if that's a proper and elegant way to meet the requirement.
Maybe it is? Anyway, This is just me, an end user giving you, the PostgreSQL maintaining developers a chance to hear 'the voice of the customer'

Regards,

~Msciwoj

Re: libpq - lack of support to set the fetch size

От

Adrian Klaver

Дата:

12 марта 2014 г., 13:54:07

On 03/12/2014 06:05 AM, matshyeq wrote:

> Don't fully agree with last statement. FETCH is WAY better and more
> functional than LIMIT.
>
>         PQsetSingleRowMode() is not meant for partial resultsets, but it can
>         be used if you accept the downsides.
>
> Given the 'discretion mode' I'm still not sure if that's a proper and
> elegant way to meet the requirement.

Following this discussion it would seem that Postgres meets the
requirement, but that what you want is that the interfaces you use make
use of the capabilities. As mentioned previously the JDBC driver and the
Python driver(psycopg2) already do what you want. The next step, to me
at least, is take what you have learned and contact the projects
(DBD::Pg, pgAdmin) in question.

> Maybe it is? Anyway, This is just me, an end user giving you, the
> PostgreSQL maintaining developers a chance to hear 'the voice of the
> customer'
>
> Regards,
> ~Msciwoj

--
Adrian Klaver
adrian.klaver@aklaver.com

Re: libpq - lack of support to set the fetch size

От

John DeSoi

Дата:

13 марта 2014 г., 03:30:33

On Mar 12, 2014, at 5:57 AM, matshyeq <matshyeq@gmail.com> wrote:

> I don't see why? I can't think of any single SQL tool I've been working with that didn't have this functionality,
really.
> The principle I find very simple and useful.
> There is defined "fetch row size" parameter (each tool calls give its own name),
> after submitting ANY query, client fetches result set rows but not more than that.
> Some programs even automatically define this value based on result grid size displayed on the screen.
> User then usually has two buttons, fetch another batch/screen or fetch all - he decides.
> If he decides way too late (break for coffee) then he simply resubmits the query (and potentially change the
parameterfirst)... 
>
> I don't find value in auto-fetching millions of rows for user to present on the screen.
> Also I don't think it's particularly useful when you need to know and apply database specific SQL syntax to limit the
rows.
> If you join multiple tables that may be even more tricky (which table to apply limit? or use subquerying instead?).

Using the extend query protocol, Postgres has a built-in way to limit the number of rows returned from any select
withoutany textual manipulation of the query.  

I'm not sure if libpq exposes this capability in the API, but it should not be too difficult to implement.

See:

http://www.postgresql.org/docs/current/static/protocol-flow.html#PROTOCOL-FLOW-EXT-QUERY

> Once a portal exists, it can be executed using an Execute message. The Execute message specifies the portal name
(emptystring denotes the unnamed portal) and a maximum result-row count (zero meaning "fetch all rows"). The result-row
countis only meaningful for portals containing commands that return row sets; in other cases the command is always
executedto completion, and the row count is ignored. The possible responses to Execute are the same as those described
abovefor queries issued via simple query protocol, except that Execute doesn't cause ReadyForQuery or RowDescription to
beissued. 


John DeSoi, Ph.D.

Re: libpq - lack of support to set the fetch size

От

"Greg Sabino Mullane"

Дата:

02 апреля 2014 г., 22:07:42

-----BEGIN PGP SIGNED MESSAGE-----
Hash: RIPEMD160


> Second, the feature needed to do this without even
> using a cursor was added 1.5 years ago (PQsetSingleRowMode).  The DBD::Pg
> was just not taught how to use it yet.

True. And we were hoping for something better, so we can make one request
for 10,000 rows to libpq rather than call PQgetResult 10,000 times, but
we'll move ahead with implementing RowCacheSize via PQsetSingleRowMode.

- --
Greg Sabino Mullane greg@turnstep.com
PGP Key: 0x14964AC8 201404021428
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8
-----BEGIN PGP SIGNATURE-----

iEYEAREDAAYFAlM8ie8ACgkQvJuQZxSWSsit0gCgn0qMRRnep2sVUeM+BLyQoIkS
dtMAoPvM71oL+YdQg+84/xT2TxLj3wek
=GVH+
-----END PGP SIGNATURE-----

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: libpq - lack of support to set the fetch size