Обсуждение: plsql gets "out of memory"
Hi all, I'm a newbie here. I'm trying to test pgsql with my mysql data. If the performance is good, I will migrate from mysql to pgsql. I installed pgsql 9.1rc on my Ubuntu server. I'm trying to import a large sql file dumped from mysql into pgsql with 'plsql -f'. The file is around 30G with bulk insert commands in it. It rans several hours and then aborted with an "out of memory" error. This is the tail of the log I got: INSERT 0 280 INSERT 0 248 INSERT 0 210 INSERT 0 199 invalid command \n out of memory On server side, I only found these errors related to invalid UTF-8 characters which is related to escape characters when exported from mysql. 2011-08-29 19:19:29 CST ERROR: invalid byte sequence for encoding "UTF8": 0x00 2011-08-29 19:55:35 CST LOG: unexpected EOF on client connection My understanding is this is a client side issue and not related to any server memory setting. But how can ajust the memory setting of the psql program? To handle the escape character '\' which is default in mysql but not in pgsql, I have already made some rough modification to the exported sql dump file: sed "s/,'/,E'/g" |sed 's/\\0/ /g'. I guess there might still be some characters missing handling and that might cause the insert command to be split to several invalid pgsql commands. Would that be the cause of the "out of memory" error?
well, thank you for the quick reply but actually I'm not concerning the performance as of now. My problem is related to the bulk insert of client side program psql. Or, it's a problem of migrating vast data from mysql to pgsql. 于2011年8月29日 21:25:29,Julio Leyva写到: > > check this out > http://www.revsys.com/writings/postgresql-performance.html > > > > Date: Mon, 29 Aug 2011 21:11:19 +0800 > > From: ruralhunter@gmail.com > > To: pgsql-admin@postgresql.org > > Subject: [ADMIN] plsql gets "out of memory" > > > > Hi all, > > I'm a newbie here. I'm trying to test pgsql with my mysql data. If the > > performance is good, I will migrate from mysql to pgsql. > > I installed pgsql 9.1rc on my Ubuntu server. I'm trying to import a > > large sql file dumped from mysql into pgsql with 'plsql -f'. The file is > > around 30G with bulk insert commands in it. It rans several hours and > > then aborted with an "out of memory" error. This is the tail of the log > > I got: > > INSERT 0 280 > > INSERT 0 248 > > INSERT 0 210 > > INSERT 0 199 > > invalid command \n > > out of memory > > > > On server side, I only found these errors related to invalid UTF-8 > > characters which is related to escape characters when exported from > mysql. > > 2011-08-29 19:19:29 CST ERROR: invalid byte sequence for encoding > > "UTF8": 0x00 > > 2011-08-29 19:55:35 CST LOG: unexpected EOF on client connection > > > > My understanding is this is a client side issue and not related to any > > server memory setting. But how can ajust the memory setting of the psql > > program? > > > > To handle the escape character '\' which is default in mysql but not in > > pgsql, I have already made some rough modification to the exported sql > > dump file: > > sed "s/,'/,E'/g" |sed 's/\\0/ /g'. I guess there might still be some > > characters missing handling and that might cause the insert command to > > be split to several invalid pgsql commands. Would that be the cause of > > the "out of memory" error? > > > > > > -- > > Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org) > > To make changes to your subscription: > > http://www.postgresql.org/mailpref/pgsql-admin
Rural Hunter <ruralhunter@gmail.com> wrote: > it's a problem of migrating vast data from mysql to pgsql. I don't know how helpful you'll find this, but when we migrated to PostgreSQL a few years ago, we had good luck with using Java to stream from one database to another. Be sure to include a lot of inserts in each database transactions. We used prepared statements for the inserts. I don't know whether you would consider our data "vast" -- we had about 100 production databases. At the time the largest was 200 GB. -Kevin
yes, the size of my database(around 200g) is very close to yours. How long did it take you to migrate the data(100-200G)? I thought about to write java code to transfer the data but it's a bit trivial. It would be the best if I can complete this with regular export/import way. if that won't work, then writing java code to do the task will be the choice. Another problem is, from my test, the performance of bulk insert looks not very good. I estimated the importing of 30G data would take about 10-20 hours based on the progress. I already did some tweaks such as: fsync=off archive_mode=off Increase checkpoint_segments drop indexes and primary keys I have about 5G memory free on the server and have these memory settings: shared_buffers = 1GB work_mem = 8MB maintenance_work_mem = 1GB effective_cache_size = 4GB Not sure if there is any room to increase the performance of bulk insert. I monitored the server when the import running. The utilization of CPU/disk is very low. The memory usage seems no much change when pgsql is up or down. So looks the performance bottleneck is not on cpu/disk/memory. I'm a bit lost on this and have no idea what to check/change. 于 2011/8/29 21:44, Kevin Grittner 写道: > Rural Hunter<ruralhunter@gmail.com> wrote: > >> it's a problem of migrating vast data from mysql to pgsql. > > I don't know how helpful you'll find this, but when we migrated to > PostgreSQL a few years ago, we had good luck with using Java to > stream from one database to another. Be sure to include a lot of > inserts in each database transactions. We used prepared statements > for the inserts. > > I don't know whether you would consider our data "vast" -- we had > about 100 production databases. At the time the largest was 200 GB. > > -Kevin >
Rural Hunter <ruralhunter@gmail.com> wrote: > yes, the size of my database(around 200g) is very close to yours. > How long did it take you to migrate the data(100-200G)? I don't remember off-hand, and I'm not sure it's worth digging for the info because so much has changed in the intervening six years. Recent PostgreSQL releases are much faster, and our current hardware is also much faster. I just ran bonnie++ on our new server just arrived (yet to be put into production) and get 619930 KB/sec for sequential writes. 200 MB at that rate is 338 seconds. :-) > Another problem is, from my test, the performance of bulk insert > looks not very good. I estimated the importing of 30G data would > take about 10-20 hours based on the progress. I already did some > tweaks such as: > fsync=off > archive_mode=off Good (but don't forget to change that once the bulk load is done). You should probably also turn off full_page_writes and synchronous_commit. I've seen benchmarks which show that this helps, even with the other settings you mention. > Increase checkpoint_segments There can be paradoxical results with that. For reasons yet to be determined, bulk conversion (unlike most workloads) sometimes runs faster with a small setting like the default. You have to test to see how it works in your environment. > drop indexes and primary keys Yeah, you definitely want to build those only after the data for a table is loaded. I also recommend a VACUUM FREEZE ANALYZE on the database unless most of these rows will be deleted or updated before you run a billion database transactions. Otherwise you will get a painful "anti-wraparound" autovacuum on everything, probably at a time of heavy usage. > I have about 5G memory free on the server and have these memory > settings: > shared_buffers = 1GB > work_mem = 8MB > maintenance_work_mem = 1GB > effective_cache_size = 4GB Reasonable for that amount of RAM, but that's seems like underpowered hardware for the size of the database. If the production conversion is going to be run against different hardware, these tests may not be giving you very good numbers for what to expect. > Not sure if there is any room to increase the performance of bulk > insert. I monitored the server when the import running. The > utilization of CPU/disk is very low. We tended to see low disk utilization, and saturated CPUs. We worked around this by using a small queue of rows in the Java conversion process, with one thread reading and a separate thread writing. We also run several conversion processes at once, on separate tables. See about using COPY, either through JDBC support or outputting your files in a format COPY can digest. It can be significantly faster than INSERT, even if you optimize the INSERTs by using prepared statements and a large number of rows per transaction. Also, if at all possible, load the data within the same database transaction where the table is created. This can allow for an optimization where the data is not logged to WAL because if the transaction aborts, the table is not there anyway. > The memory usage seems no much change when pgsql is up or down. So > looks the performance bottleneck is not on cpu/disk/memory. I'm a > bit lost on this and have no idea what to check/change. Network latency? Avoid encrypted connections for the load, and do whatever you can to minimize latency, like attaching both machines to the same switch. You can't improve performance much if you're working on the things which are only using a small part of the time. Identifying the source of your delays is the most important thing at this point. -Kevin
Hi Kevin, Thank you very much for the quick and detailed answers/suggestions. I will check and try them. 于 2011/8/29 23:18, Kevin Grittner 写道: > Good (but don't forget to change that once the bulk load is done). You > should probably also turn off full_page_writes and synchronous_commit. > I've seen benchmarks which show that this helps, even with the other > settings you mention. I forgot to mention synchronous_commit is already off and I will test with full_page_writes off. >> Increase checkpoint_segments > > There can be paradoxical results with that. For reasons yet to be > determined, bulk conversion (unlike most workloads) sometimes runs > faster with a small setting like the default. You have to test to > see how it works in your environment. yes, will check that. > Yeah, you definitely want to build those only after the data for a > table is loaded. I also recommend a VACUUM FREEZE ANALYZE on the > database unless most of these rows will be deleted or updated before > you run a billion database transactions. Otherwise you will get a > painful "anti-wraparound" autovacuum on everything, probably at a time > of heavy usage. hmm....I will try to turn autovacuum off though I didn't see any resource intension caused by it. >> I have about 5G memory free on the server and have these memory >> settings: >> shared_buffers = 1GB >> work_mem = 8MB >> maintenance_work_mem = 1GB >> effective_cache_size = 4GB > > Reasonable for that amount of RAM, but that's seems like > underpowered hardware for the size of the database. If the > production conversion is going to be run against different hardware, > these tests may not be giving you very good numbers for what to > expect. I have mysql run on the server. The memory above is just for the import for pgsql. I will shutdown mysql and give the memory(32G totally) to pgsql when I doing the performance tests. > Network latency? Avoid encrypted connections for the load, and do > whatever you can to minimize latency, like attaching both machines to > the same switch. You can't improve performance much if you're working > on the things which are only using a small part of the time. > Identifying the source of your delays is the most important thing at > this point. No, I do the import locally on the db server so the network letency can be excluded.
Rural Hunter <ruralhunter@gmail.com> wrote: > 2011/8/29 23:18, Kevin Grittner: >> I also recommend a VACUUM FREEZE ANALYZE on the database unless >> most of these rows will be deleted or updated before you run a >> billion database transactions. Otherwise you will get a painful >> "anti-wraparound" autovacuum on everything, probably at a time >> of heavy usage. > hmm....I will try to turn autovacuum off though I didn't see any > resource intension caused by it. Well, turning off autovacuum during a bulk load is probably a net gain if it's insert-only (i.e., no need to query just-loaded data to decide what to do with new rows); but that's not what I was getting at. Bulk loading 200 GB of data which is not going to be deleted or updated heavily is setting a performance time bomb without a VACUUM FREEZE. At some point, perhaps months later, it will be necessary to freeze the tuples to prevent data loss, and since this occurs based on a threshold of how many transaction IDs have been consumed, it is most likely to happen at peak OLTP loads, when it will be the biggest problem. A VACUUM FREEZE (and you might as well throw in ANALYZE while you're at it) will take care of that up front. As a side benefit it will keep SELECT statements from generating heavy *write* loads on the first access to tuples, and will perform other maintenance which will improve database performance. I just count the time for VACUUM FREEZE ANALYZE as part of the bulk load time required before letting in users. >> Network latency? > No, I do the import locally on the db server so the network > letency can be excluded. Hmm... I don't remember the details, but there was a problem at some point where Linux pipe connections could introduce significant latency, and you could get much better performance on a TCP connection through localhost. It might be worth a try. (Maybe someone else will remember the details.) -Kevin
Thank you. I didn't understand what 'vacuum freeze' actually does. I will check the detail to see if it's good for my situation. and I will also test the load by tcp connection. Thanks again for all your advices and they are really very helpful to me! 于 2011/8/30 0:06, Kevin Grittner 写道: > Rural Hunter<ruralhunter@gmail.com> wrote: >> 2011/8/29 23:18, Kevin Grittner: > >>> I also recommend a VACUUM FREEZE ANALYZE on the database unless >>> most of these rows will be deleted or updated before you run a >>> billion database transactions. Otherwise you will get a painful >>> "anti-wraparound" autovacuum on everything, probably at a time >>> of heavy usage. >> hmm....I will try to turn autovacuum off though I didn't see any >> resource intension caused by it. > > Well, turning off autovacuum during a bulk load is probably a net > gain if it's insert-only (i.e., no need to query just-loaded data to > decide what to do with new rows); but that's not what I was getting > at. Bulk loading 200 GB of data which is not going to be deleted or > updated heavily is setting a performance time bomb without a VACUUM > FREEZE. At some point, perhaps months later, it will be necessary > to freeze the tuples to prevent data loss, and since this occurs > based on a threshold of how many transaction IDs have been consumed, > it is most likely to happen at peak OLTP loads, when it will be the > biggest problem. A VACUUM FREEZE (and you might as well throw in > ANALYZE while you're at it) will take care of that up front. As a > side benefit it will keep SELECT statements from generating heavy > *write* loads on the first access to tuples, and will perform other > maintenance which will improve database performance. > > I just count the time for VACUUM FREEZE ANALYZE as part of the bulk > load time required before letting in users. > >>> Network latency? > >> No, I do the import locally on the db server so the network >> letency can be excluded. > > Hmm... I don't remember the details, but there was a problem at > some point where Linux pipe connections could introduce significant > latency, and you could get much better performance on a TCP > connection through localhost. It might be worth a try. (Maybe > someone else will remember the details.) > > -Kevin >
2011/8/29 Rural Hunter <ruralhunter@gmail.com>: > Hi all, > I'm a newbie here. I'm trying to test pgsql with my mysql data. If the > performance is good, I will migrate from mysql to pgsql. > I installed pgsql 9.1rc on my Ubuntu server. I'm trying to import a large > sql file dumped from mysql into pgsql with 'plsql -f'. The file is around > 30G with bulk insert commands in it. It rans several hours and then aborted > with an "out of memory" error. This is the tail of the log I got: > INSERT 0 280 > INSERT 0 248 > INSERT 0 210 > INSERT 0 199 > invalid command \n > out of memory I'd look at what's leading to the invalid command \n up there. I doubt the out of memory is more than a symptom from that. I.e. some part of your inserts are improperly formatted and the machine is then trying to insert a row with what it thinks is a column of several gigabytes.
Yes, seems so. I separated one of the tables having the problem and did a test. First, loaded it with original file and saw the same: invalid command then follows "out of memory". After I fixed those escape characters, the table could be loaded successfully. 于 2011/8/30 10:14, Scott Marlowe 写道: > 2011/8/29 Rural Hunter<ruralhunter@gmail.com>: >> Hi all, >> I'm a newbie here. I'm trying to test pgsql with my mysql data. If the >> performance is good, I will migrate from mysql to pgsql. >> I installed pgsql 9.1rc on my Ubuntu server. I'm trying to import a large >> sql file dumped from mysql into pgsql with 'plsql -f'. The file is around >> 30G with bulk insert commands in it. It rans several hours and then aborted >> with an "out of memory" error. This is the tail of the log I got: >> INSERT 0 280 >> INSERT 0 248 >> INSERT 0 210 >> INSERT 0 199 >> invalid command \n >> out of memory > I'd look at what's leading to the invalid command \n up there. I > doubt the out of memory is more than a symptom from that. I.e. some > part of your inserts are improperly formatted and the machine is then > trying to insert a row with what it thinks is a column of several > gigabytes. >
Hi Kevin, I did another try with following additional changes based on our discussion: 1. use the tcp connection 2. turn off autovacuum 3. turn off full_page_writes I could import more than 30G data in about 2 hours. That's totally acceptable performance to me with the current server capability. There is a minor issue though. I saw a few errors during the import: ERROR: invalid byte sequence for encoding "UTF8": 0xe6272c ERROR: invalid byte sequence for encoding "UTF8": 0xe5272c ERROR: invalid byte sequence for encoding "UTF8": 0xe5272c ERROR: invalid byte sequence for encoding "UTF8": 0xe5272c ERROR: invalid byte sequence for encoding "UTF8": 0xe68e27 ERROR: invalid byte sequence for encoding "UTF8": 0xe7272c ERROR: invalid byte sequence for encoding "UTF8": 0xe5272c ERROR: invalid byte sequence for encoding "UTF8": 0xe5a427 My data was exported from an UTF8 MySQL database and my pgsql db is also UTF8. I got 8 errors above only with about 3 million records imported. The strange thing is, I usually see the problematic SQL output in the log if there is any error for that SQL so I have a chance to fix the data manually. But for the errors above, I don't see any SQL logged. The pgsql log just output error log same as above with no additional info: 2011-09-01 11:26:32 CST ERROR: invalid byte sequence for encoding "UTF8": 0xe6272c 2011-09-01 11:26:47 CST ERROR: invalid byte sequence for encoding "UTF8": 0xe5272c 2011-09-01 11:26:53 CST ERROR: invalid byte sequence for encoding "UTF8": 0xe5272c 2011-09-01 11:26:58 CST ERROR: invalid byte sequence for encoding "UTF8": 0xe5272c 2011-09-01 11:26:58 CST ERROR: invalid byte sequence for encoding "UTF8": 0xe68e27 2011-09-01 11:27:01 CST ERROR: invalid byte sequence for encoding "UTF8": 0xe7272c 2011-09-01 11:27:06 CST ERROR: invalid byte sequence for encoding "UTF8": 0xe5272c 2011-09-01 11:27:15 CST ERROR: invalid byte sequence for encoding "UTF8": 0xe5a427 What could be the cause of that? 于 2011/8/30 9:29, Rural Hunter 写道: > Thank you. I didn't understand what 'vacuum freeze' actually does. I > will check the detail to see if it's good for my situation. and I will > also test the load by tcp connection. Thanks again for all your > advices and they are really very helpful to me! > > 于 2011/8/30 0:06, Kevin Grittner 写道: >> Rural Hunter<ruralhunter@gmail.com> wrote: >>> 2011/8/29 23:18, Kevin Grittner: >> >>>> I also recommend a VACUUM FREEZE ANALYZE on the database unless >>>> most of these rows will be deleted or updated before you run a >>>> billion database transactions. Otherwise you will get a painful >>>> "anti-wraparound" autovacuum on everything, probably at a time >>>> of heavy usage. >>> hmm....I will try to turn autovacuum off though I didn't see any >>> resource intension caused by it. >> >> Well, turning off autovacuum during a bulk load is probably a net >> gain if it's insert-only (i.e., no need to query just-loaded data to >> decide what to do with new rows); but that's not what I was getting >> at. Bulk loading 200 GB of data which is not going to be deleted or >> updated heavily is setting a performance time bomb without a VACUUM >> FREEZE. At some point, perhaps months later, it will be necessary >> to freeze the tuples to prevent data loss, and since this occurs >> based on a threshold of how many transaction IDs have been consumed, >> it is most likely to happen at peak OLTP loads, when it will be the >> biggest problem. A VACUUM FREEZE (and you might as well throw in >> ANALYZE while you're at it) will take care of that up front. As a >> side benefit it will keep SELECT statements from generating heavy >> *write* loads on the first access to tuples, and will perform other >> maintenance which will improve database performance. >> >> I just count the time for VACUUM FREEZE ANALYZE as part of the bulk >> load time required before letting in users. >> >>>> Network latency? >> >>> No, I do the import locally on the db server so the network >>> letency can be excluded. >> >> Hmm... I don't remember the details, but there was a problem at >> some point where Linux pipe connections could introduce significant >> latency, and you could get much better performance on a TCP >> connection through localhost. It might be worth a try. (Maybe >> someone else will remember the details.) >> >> -Kevin >> >
On Fri, Sep 2, 2011 at 9:30 PM, Rural Hunter <ruralhunter@gmail.com> wrote: > Hi Kevin, > > I did another try with following additional changes based on our discussion: > 1. use the tcp connection > 2. turn off autovacuum > 3. turn off full_page_writes > > I could import more than 30G data in about 2 hours. That's totally > acceptable performance to me with the current server capability. There is a > minor issue though. I saw a few errors during the import: > ERROR: invalid byte sequence for encoding "UTF8": 0xe6272c > ERROR: invalid byte sequence for encoding "UTF8": 0xe5272c > ERROR: invalid byte sequence for encoding "UTF8": 0xe5272c > ERROR: invalid byte sequence for encoding "UTF8": 0xe5272c > ERROR: invalid byte sequence for encoding "UTF8": 0xe68e27 > ERROR: invalid byte sequence for encoding "UTF8": 0xe7272c > ERROR: invalid byte sequence for encoding "UTF8": 0xe5272c > ERROR: invalid byte sequence for encoding "UTF8": 0xe5a427 > > My data was exported from an UTF8 MySQL database and my pgsql db is also > UTF8. I got 8 errors above only with about 3 million records imported. The > strange thing is, I usually see the problematic SQL output in the log if > there is any error for that SQL so I have a chance to fix the data manually. > But for the errors above, I don't see any SQL logged. The pgsql log just > output error log same as above with no additional info: > 2011-09-01 11:26:32 CST ERROR: invalid byte sequence for encoding "UTF8": > 0xe6272c > 2011-09-01 11:26:47 CST ERROR: invalid byte sequence for encoding "UTF8": > 0xe5272c > 2011-09-01 11:26:53 CST ERROR: invalid byte sequence for encoding "UTF8": > 0xe5272c > 2011-09-01 11:26:58 CST ERROR: invalid byte sequence for encoding "UTF8": > 0xe5272c > 2011-09-01 11:26:58 CST ERROR: invalid byte sequence for encoding "UTF8": > 0xe68e27 > 2011-09-01 11:27:01 CST ERROR: invalid byte sequence for encoding "UTF8": > 0xe7272c > 2011-09-01 11:27:06 CST ERROR: invalid byte sequence for encoding "UTF8": > 0xe5272c > 2011-09-01 11:27:15 CST ERROR: invalid byte sequence for encoding "UTF8": > 0xe5a427 > > What could be the cause of that? MySQL probably has looser checking of proper UTF-8 encodings.