Thomas Lockhart <lockhart@alumni.caltech.edu> writes:
>> I am trying to run a select statement, and I keep running out of
>> memory. I have noticed that libpq has the nice feature that I don't have
>> to request each row one at a time. After a little investigate, I found
>> that libpq appears to get the entire result set at once. Well, at least
>> PQgetvalue() doesn't appear to do anything besides return a pointer to a
>> string. There is no lazy evaluation. It doesn't just fetch the row I
>> need, and flush old ones as memory permits.
> Use a cursor at the query level to keep the result set in the backend.
> But you will still (possibly) run out of memory, since the *backend*
> must keep the result set in memory and/or on disk.
Er, no. DECLARE CURSOR does not cause the backend to buffer the whole
result set; it just stores the state of the executor's plan tree. AFAIK
you can deal with an indefinitely large result set if you use a cursor
and fetch just a limited number of rows at a time.
The bottleneck here is mainly that libpq's API is defined in terms of
providing random access to a result set, no matter how large --- so
libpq has to buffer the whole result set in client memory.
Aside from random access there are also error-reporting issues.
Currently libpq guarantees to tell you about any errors encountered
during a query before you start to read result rows. That guarantee
wouldn't hold in a streaming-results scenario.
These issues have been discussed quite a few times before --- see the
pg-interfaces archives. I think everyone agrees that it'd be a good
idea to have a streamable libpq interface, but no one's stepped up to
the plate to define or implement one...
regards, tom lane