Обсуждение: Out of Memory errors while running pg_dump
Hello, this past weekend I received a couple of Out of Memory errors while running pg_dump for two different selects against the catalogs, one with pg_get_viewdef() and the other with one of the pg_index join pg_class left join pg_depend queries). Is it work_mem I should be increasing with PGOPTIONS to avoid this? Erik Jones DBA | Emma® erik@myemma.com 800.595.4401 or 615.292.5888 615.292.0777 (fax) Emma helps organizations everywhere communicate & market in style. Visit us online at http://www.myemma.com
Erik Jones <erik@myemma.com> writes: > Hello, this past weekend I received a couple of Out of Memory errors > while running pg_dump for two different selects against the > catalogs, one with pg_get_viewdef() and the other with one of the > pg_index join pg_class left join pg_depend queries). Is it work_mem > I should be increasing with PGOPTIONS to avoid this? We'd need to see more details to really give decent advice. Exactly what queries and exactly what was the error message (in particular I'm wondering how large the failed request was)? Which PG version? Can you get the memory context dump out of the postmaster log? regards, tom lane
On Feb 4, 2008, at 1:27 PM, Tom Lane wrote: > We'd need to see more details to really give decent advice. Exactly > what queries and exactly what was the error message (in particular > I'm wondering how large the failed request was)? Which PG version? > Can you get the memory context dump out of the postmaster log? Sure. I've attached an archive with the full memory context and error for each. Note that I'm already 99% sure that this is due to our exorbitantly large relation set which is why I think pg_dump's catalog queries are running out of work_mem (currently at just over 32MB). Erik Jones DBA | Emma® erik@myemma.com 800.595.4401 or 615.292.5888 615.292.0777 (fax) Emma helps organizations everywhere communicate & market in style. Visit us online at http://www.myemma.com
Вложения
Erik Jones <erik@myemma.com> writes: > Sure. I've attached an archive with the full memory context and > error for each. Note that I'm already 99% sure that this is due to > our exorbitantly large relation set which is why I think pg_dump's > catalog queries are running out of work_mem (currently at just over > 32MB). work_mem doesn't seem to be your problem --- what it looks like to me is that it's CacheMemoryContext and subsidiary contexts that's growing to unreasonable size, no doubt because of all the relcache entries for all those tables pg_dump has to touch. I'm wondering a bit why CacheMemoryContext has so much free space in it, but even if it had none you'd still be at risk. There isn't any provision in the current backend to limit the number of relcache entries, so eventually you're gonna run out of space if you have enough tables. Even so, you seem to be well under 1Gb in the server process. How much RAM is in the machine? Are you sure the postmaster is being launched under ulimit unlimited? If it's a 32-bit machine, maybe you need to back off shared_buffers or other shmem size parameters so that more address space is left for backend private memory. In the long run you probably ought to rethink having so many tables; that doesn't sound like great database design to me. A possible stopgap answer is to be selective about how many tables get dumped per pg_dump run, though I'm worried about the risk of leaving some out entirely. regards, tom lane
On Feb 4, 2008, at 3:26 PM, Tom Lane wrote: > Erik Jones <erik@myemma.com> writes: >> Sure. I've attached an archive with the full memory context and >> error for each. Note that I'm already 99% sure that this is due to >> our exorbitantly large relation set which is why I think pg_dump's >> catalog queries are running out of work_mem (currently at just over >> 32MB). > > work_mem doesn't seem to be your problem --- what it looks like to > me is > that it's CacheMemoryContext and subsidiary contexts that's growing to > unreasonable size, no doubt because of all the relcache entries for > all > those tables pg_dump has to touch. I'm wondering a bit why > CacheMemoryContext has so much free space in it, but even if it had > none > you'd still be at risk. There isn't any provision in the current > backend to limit the number of relcache entries, so eventually you're > gonna run out of space if you have enough tables. > > Even so, you seem to be well under 1Gb in the server process. How > much > RAM is in the machine? 16GB total. When this occurred we had over 9G free on the system. > Are you sure the postmaster is being launched > under ulimit unlimited? ulimit -a gives: core file size (blocks, -c) unlimited data seg size (kbytes, -d) unlimited file size (blocks, -f) unlimited open files (-n) 256 pipe size (512 bytes, -p) 10 stack size (kbytes, -s) 10240 cpu time (seconds, -t) unlimited max user processes (-u) 16357 virtual memory (kbytes, -v) unlimited > If it's a 32-bit machine, maybe you need to > back off shared_buffers or other shmem size parameters so that more > address space is left for backend private memory. It is a 32-bit machine and we're currently set @ 2GB for shared_buffers. For the others: 512 max_connections, 512 max_locks_per_transaction and 0 max_prepared_transactions. While having both of those two 512s may seem large, much less and I get Out of Memory errors that specifically suggest increasing max_lock.s_per_transaction > In the long run you probably ought to rethink having so many tables; > that doesn't sound like great database design to me. No, this is definitely not a good design. It was one set up by application developers about three years ago who apparently had the very wrong idea about table inheritance. Since I took over the database administration last summer it's been a game of whack-a-mole trying deal with issues that keep popping up because of it. I do have migration plans to remedy this, but it's slow going as I'm also dependent on application developer time for the necessary migrations there. > A possible stopgap answer is to be selective about how many tables get > dumped per pg_dump run, though I'm worried about the risk of leaving > some out entirely. Well, once a month I dump tables specific to accounts that have been closed for more than 90 days, everything left is pretty critical as it's either internal company data or active client data. Thanks for all of your help and suggestions yet again, Tom, You're willingness to take at look at these kind of things is more appreciated than you probably know :) Erik Jones DBA | Emma® erik@myemma.com 800.595.4401 or 615.292.5888 615.292.0777 (fax) Emma helps organizations everywhere communicate & market in style. Visit us online at http://www.myemma.com
Erik Jones <erik@myemma.com> writes: > On Feb 4, 2008, at 3:26 PM, Tom Lane wrote: >> Are you sure the postmaster is being launched >> under ulimit unlimited? > ulimit -a gives: One possible gotcha is that ulimit in an interactive shell isn't necessarily the same environment that an init script will see; you might want to temporarily put it into the init script, or if you have plperlu or another untrusted language installed, you could try system("ulimit -a >/tmp/foo") to check what the backend is really seeing. But assuming that that's not it... >> If it's a 32-bit machine, maybe you need to >> back off shared_buffers or other shmem size parameters so that more >> address space is left for backend private memory. > It is a 32-bit machine and we're currently set @ 2GB for > shared_buffers. For the others: 512 max_connections, 512 > max_locks_per_transaction and 0 max_prepared_transactions. While > having both of those two 512s may seem large, much less and I get Out > of Memory errors that specifically suggest increasing > max_lock.s_per_transaction Yeah, because pg_dump will need a lock for each table it's touching, so you need a lot of lock table slots. But the fact that you're using half the address space for shared buffers explains why it's tanking not too far past the 1Gb point for local memory. I'd suggest backing that off to 1Gb or less of shared_buffers to give yourself breathing room for a large relcache. Given the amount of RAM in the machine, I doubt you'll lose much in performance --- the disk blocks that might have been in shared buffers will live in OS disk buffers instead, which is not that much more expensive to access. regards, tom lane
I wrote: > ... I'm wondering a bit why > CacheMemoryContext has so much free space in it, but even if it had none > you'd still be at risk. I tried to reproduce this by creating a whole lot of trivial tables and then pg_dump'ing them: create table t0 (f1 int primary key); insert into t0 values(0); create table t1 (f1 int primary key); insert into t1 values(1); create table t2 (f1 int primary key); insert into t2 values(2); create table t3 (f1 int primary key); insert into t3 values(3); create table t4 (f1 int primary key); insert into t4 values(4); create table t5 (f1 int primary key); insert into t5 values(5); ... (about 17000 tables before I got bored) I looked at the backend memory stats at the end of the pg_dump run and found CacheMemoryContext: 50624864 total in 29 blocks; 608160 free (2 chunks); 50016704 used which compares awfully favorably to your results of CacheMemoryContext: 897715768 total in 129 blocks; 457826000 free (2305222 chunks); 439889768 used CacheMemoryContext: 788990232 total in 147 blocks; 192993824 free (1195074 chunks); 595996408 used Have you really got 200000+ tables? Even if you do, the amount of wasted memory in your runs seems really high. What PG version is this exactly? Can you show us the exact schemas of some representative tables? regards, tom lane
On Mon, 2008-02-04 at 16:11 -0600, Erik Jones wrote: > > Are you sure the postmaster is being launched > > under ulimit unlimited? > > ulimit -a gives: > > core file size (blocks, -c) unlimited > data seg size (kbytes, -d) unlimited > file size (blocks, -f) unlimited > open files (-n) 256 > pipe size (512 bytes, -p) 10 > stack size (kbytes, -s) 10240 > cpu time (seconds, -t) unlimited > max user processes (-u) 16357 > virtual memory (kbytes, -v) unlimited > If you are using FreeBSD, is it possible you ran into the kern.maxdsiz kernel tunable? It can only be adjusted at boot, I think. Regards, Jeff Davis
On Feb 5, 2008, at 4:28 PM, Jeff Davis wrote: > On Mon, 2008-02-04 at 16:11 -0600, Erik Jones wrote: >>> Are you sure the postmaster is being launched >>> under ulimit unlimited? >> >> ulimit -a gives: >> >> core file size (blocks, -c) unlimited >> data seg size (kbytes, -d) unlimited >> file size (blocks, -f) unlimited >> open files (-n) 256 >> pipe size (512 bytes, -p) 10 >> stack size (kbytes, -s) 10240 >> cpu time (seconds, -t) unlimited >> max user processes (-u) 16357 >> virtual memory (kbytes, -v) unlimited >> > > If you are using FreeBSD, is it possible you ran into the kern.maxdsiz > kernel tunable? It can only be adjusted at boot, I think. We're on Solaris. Erik Jones DBA | Emma® erik@myemma.com 800.595.4401 or 615.292.5888 615.292.0777 (fax) Emma helps organizations everywhere communicate & market in style. Visit us online at http://www.myemma.com