Help with retrieving large results sets and memory usage.
От | Peter.Rupp@ual.com |
---|---|
Тема | Help with retrieving large results sets and memory usage. |
Дата | |
Msg-id | H0002de120a5a419.1061217851.awhq6377@MHS обсуждение исходный текст |
Ответы |
Re: Help with retrieving large results sets and memory usage.
|
Список | pgsql-interfaces |
Hi everybody, I have several general and specific questions regarding the protocol that postgresql uses to send large result sets to client processes over a TCP network connection. I'm connecting to a 7.3.4 Postgres database, running on HPUX 11.i. I'm using two ways to connect using the network: 1) Connecting from Java using the JDBC type-4 driver built in the postgres distribution, and 2) Connecting from C-based Python using the Python PyGreSQL interface module (pg), also supplied and built from the 7.3.4 postgres distribution. When using either interface, I notice that when I select rows from a table, it appears that the client interface retrieves the entire results set at once...and loads it into memory on the client machine. When I recently had to retrieve all rows in a table (the data size totalled 110Meg) The memory usage in the JVM jumped to rougly 330Meg. In Python, it went to 440-450Meg. I observed the memory usage with a fairly sophisticated performance monitoring tool (HP's glance/perfview) I understand that all interfaces need to keep a copy of the data around somewhere, so this is understandable; However, there are times when I will need to select a large amount of data.... where there could be thousands or millions of rows and the data returned could easily exceed the memory size of our machine. For a work-around to this issue, I looked at: 1) I could break the sql statement up into multiple calls....attempting to limit the size of the results set retrieved. However, this proves very difficult (if not impossible) for several of our tables. 2) Perhaps the postgres network protocol and/or the postgres 'postmaster' daemon allow for retrieval of a portion of a results set at a time? In other words, is there a way for the client JDBC or Pysql module to bring in only a portion of the result set into memory? That is....read say (500 rows, or 50meg of data from the network, hand this off to the client program. When program needs more, then the adaptor reads the next 500 rows, or something to that effect. 3) I'm not sure if the JDBC (or any other db connection api) allows for row/size buffering as I describe in #2, perhaps this is an efficiency that's left to the implementation. However, I'm proficient enough in C and Java to add this functionality, if postgres engine itself will allow it. Do you folks publish the network symantics and protocols used to make sql requests and retrieve their results? Any documentation about byte-stream structures, etc would be really helpful; I could look at the source code, but if there's anybody who can summarize this...it could help me a lot. Thanks very much for your assistance. ==pete== Best regards, ==pete== P. A. Rupp United Airlines Corp. (WHQKT) voice: 847-700-3226 email: peter.rupp@ual.com
В списке pgsql-interfaces по дате отправления: