Обсуждение: cursor interface to libpq
I am trying to run a select statement, and I keep running out of memory. I have noticed that libpq has the nice feature that I don't have to request each row one at a time. After a little investigate, I found that libpq appears to get the entire result set at once. Well, at least PQgetvalue() doesn't appear to do anything besides return a pointer to a string. There is no lazy evaluation. It doesn't just fetch the row I need, and flush old ones as memory permits. Is there any interface that I can link to a C/C++ program that will allow me row at a time access to the data? Not being able to run select statements is going to be a fairly sizeable problem for me. This might also explain why when I attempt to dump my database, it also fails, pg_dump seems to be based on libpq. I have shutting down my database and just backing up the filesystem to be safe. I run in a shop that hopes to go 24/7 sometime soon, so I need to find an answer RSN. Thanks, Kirby
> I am trying to run a select statement, and I keep running out of > memory. I have noticed that libpq has the nice feature that I don't have > to request each row one at a time. After a little investigate, I found > that libpq appears to get the entire result set at once. Well, at least > PQgetvalue() doesn't appear to do anything besides return a pointer to a > string. There is no lazy evaluation. It doesn't just fetch the row I > need, and flush old ones as memory permits. Use a cursor at the query level to keep the result set in the backend. But you will still (possibly) run out of memory, since the *backend* must keep the result set in memory and/or on disk. Make sure you have enough swap space, disk space, and main memory to handle the size of database you have. > Is there any interface that I can link to a C/C++ program that > will allow me row at a time access to the data? Not being able to run > select statements is going to be a fairly sizeable problem for me. Look at cursors, but that doesn't eliminate the need for enough memory. Expect to buy more memory, and make sure you have enough swap space. Send more details on what you have currently and what your database looks like and we can give you better feedback on your resource requirements. - Thomas
Thomas Lockhart <lockhart@alumni.caltech.edu> writes: >> I am trying to run a select statement, and I keep running out of >> memory. I have noticed that libpq has the nice feature that I don't have >> to request each row one at a time. After a little investigate, I found >> that libpq appears to get the entire result set at once. Well, at least >> PQgetvalue() doesn't appear to do anything besides return a pointer to a >> string. There is no lazy evaluation. It doesn't just fetch the row I >> need, and flush old ones as memory permits. > Use a cursor at the query level to keep the result set in the backend. > But you will still (possibly) run out of memory, since the *backend* > must keep the result set in memory and/or on disk. Er, no. DECLARE CURSOR does not cause the backend to buffer the whole result set; it just stores the state of the executor's plan tree. AFAIK you can deal with an indefinitely large result set if you use a cursor and fetch just a limited number of rows at a time. The bottleneck here is mainly that libpq's API is defined in terms of providing random access to a result set, no matter how large --- so libpq has to buffer the whole result set in client memory. Aside from random access there are also error-reporting issues. Currently libpq guarantees to tell you about any errors encountered during a query before you start to read result rows. That guarantee wouldn't hold in a streaming-results scenario. These issues have been discussed quite a few times before --- see the pg-interfaces archives. I think everyone agrees that it'd be a good idea to have a streamable libpq interface, but no one's stepped up to the plate to define or implement one... regards, tom lane
Thomas, Right now I have a database that has right around 4 Million rows in the primary table. I am porting it away from MySQL basically because I want transactions. When the table is written as insert statements it is around 3.2Gb, when that is gzipped it comes out to 1.3Gb. That table size will probably continue to grow at a rate of 200,000 rows a week. I don't know the exact size of each row, but it is under the 8K limit. Right now I am working on a FreeBSD box with 1.5Gb of swap and 256MB of Ram. I believe that I could get that upgraded to 1Gb of ram, and add as much swap space as I wanted. But to be honest I really don't want to store every row of every select statement in memory. I believe that the database is around 1.5Gb on disk almost all of it in the one table. The whole thought of having to keep the entire table in memory at once to do a backup is scary. I don't expect anyone at work to appreciate having to match the disk space of the database in order to do backups. I will either have to switch RDBM's, or unwrap the pg_dump to use the lower level primatives of parseInput, getAnotherTuple, or better yet implement it with the cursors at the query level and then I could get by until a better solution came along. I can deal with most everything with the exception of the backups. Any advice, or expectations of memory requirements would be appreciated. Sorry, if this is the wrong forum for this dicussion, but my original question started here. Thanks, Kirby On Wed, 20 Sep 2000, Thomas Lockhart wrote: > > I am trying to run a select statement, and I keep running out of > > memory. I have noticed that libpq has the nice feature that I don't have > > to request each row one at a time. After a little investigate, I found > > that libpq appears to get the entire result set at once. Well, at least > > PQgetvalue() doesn't appear to do anything besides return a pointer to a > > string. There is no lazy evaluation. It doesn't just fetch the row I > > need, and flush old ones as memory permits. > > Use a cursor at the query level to keep the result set in the backend. > But you will still (possibly) run out of memory, since the *backend* > must keep the result set in memory and/or on disk. Make sure you have > enough swap space, disk space, and main memory to handle the size of > database you have. > > > Is there any interface that I can link to a C/C++ program that > > will allow me row at a time access to the data? Not being able to run > > select statements is going to be a fairly sizeable problem for me. > > Look at cursors, but that doesn't eliminate the need for enough memory. > > Expect to buy more memory, and make sure you have enough swap space. > Send more details on what you have currently and what your database > looks like and we can give you better feedback on your resource > requirements. > > - Thomas >
> Right now I have a database that has right around 4 Million > rows in the primary table. I am porting it away from MySQL basically > because I want transactions. When the table is written as insert > statements it is around 3.2Gb, when that is gzipped it comes out to 1.3Gb. > That table size will probably continue to grow at a rate of 200,000 > rows a week. I don't know the exact size of each row, but it is under the > 8K limit. Right now I am working on a FreeBSD box with 1.5Gb of swap and > 256MB of Ram. I believe that I could get that upgraded to 1Gb of ram, and > add as much swap space as I wanted. But to be honest I really don't want > to store every row of every select statement in memory. I believe that > the database is around 1.5Gb on disk almost all of it in the one table. afaik this should all work. You can run pg_dump and pipe the output to a tape drive or to gzip. You *know* that a real backup will take something like the size of the database (maybe a factor of two or so less) since the data has to go somewhere. Postgres *should* be able to store intermediate results etc on disk, so the "out of memory" might be due to a per-process limit on your FreeBSD machine. Others with experience on that platform might have some specific suggestions. Scrappy, a FreeBSD partisan, probably has tables on his systems much bigger than the one under discussion. Perhaps he will speak up here?? Good luck! - Thomas
Thomas Lockhart <lockhart@alumni.caltech.edu> writes: > afaik this should all work. You can run pg_dump and pipe the output to a > tape drive or to gzip. You *know* that a real backup will take something > like the size of the database (maybe a factor of two or so less) since > the data has to go somewhere. pg_dump in default mode (ie, dump data as COPY commands) doesn't have a problem with huge tables because the COPY data is just dumped out in a streaming fashion. If you insist on using the "dump data as insert commands" option then huge tables cause a memory problem in pg_dump, but on the other hand you are going to get pretty tired of waiting for such a script to reload, too. I recommend just using the default behavior ... regards, tom lane