Обсуждение: plsql gets "out of memory"

Поиск
Список
Период
Сортировка

plsql gets "out of memory"

От
Rural Hunter
Дата:
Hi all,
I'm a newbie here. I'm trying to test pgsql with my mysql data. If the
performance is good, I will migrate from mysql to pgsql.
I installed pgsql 9.1rc on my Ubuntu server. I'm trying to import a
large sql file dumped from mysql into pgsql with 'plsql -f'. The file is
around 30G with bulk insert commands in it. It rans several hours and
then aborted with an "out of memory" error. This is the tail of the log
I got:
INSERT 0 280
INSERT 0 248
INSERT 0 210
INSERT 0 199
invalid command \n
out of memory

On server side, I only found these errors related to invalid UTF-8
characters which is related to escape characters when exported from mysql.
2011-08-29 19:19:29 CST ERROR:  invalid byte sequence for encoding
"UTF8": 0x00
2011-08-29 19:55:35 CST LOG:  unexpected EOF on client connection

My understanding is this is a client side issue and not related to any
server memory setting. But how can ajust the memory setting of the psql
program?

To handle the escape character '\' which is default in mysql but not in
pgsql, I have already made some rough modification to the exported sql
dump file:
sed "s/,'/,E'/g" |sed 's/\\0/ /g'. I guess there might still be some
characters missing handling and that might cause the insert command to
be split to several invalid pgsql commands. Would that be the cause of
the "out of memory" error?


Re: plsql gets "out of memory"

От
Rural Hunter
Дата:
well, thank you for the quick reply but actually I'm not concerning the
performance as of now. My problem is related to the bulk insert of
client side program psql. Or, it's a problem of migrating vast data from
mysql to pgsql.

于2011年8月29日 21:25:29,Julio Leyva写到:
>
> check this out
> http://www.revsys.com/writings/postgresql-performance.html
>
>
> > Date: Mon, 29 Aug 2011 21:11:19 +0800
> > From: ruralhunter@gmail.com
> > To: pgsql-admin@postgresql.org
> > Subject: [ADMIN] plsql gets "out of memory"
> >
> > Hi all,
> > I'm a newbie here. I'm trying to test pgsql with my mysql data. If the
> > performance is good, I will migrate from mysql to pgsql.
> > I installed pgsql 9.1rc on my Ubuntu server. I'm trying to import a
> > large sql file dumped from mysql into pgsql with 'plsql -f'. The file is
> > around 30G with bulk insert commands in it. It rans several hours and
> > then aborted with an "out of memory" error. This is the tail of the log
> > I got:
> > INSERT 0 280
> > INSERT 0 248
> > INSERT 0 210
> > INSERT 0 199
> > invalid command \n
> > out of memory
> >
> > On server side, I only found these errors related to invalid UTF-8
> > characters which is related to escape characters when exported from
> mysql.
> > 2011-08-29 19:19:29 CST ERROR: invalid byte sequence for encoding
> > "UTF8": 0x00
> > 2011-08-29 19:55:35 CST LOG: unexpected EOF on client connection
> >
> > My understanding is this is a client side issue and not related to any
> > server memory setting. But how can ajust the memory setting of the psql
> > program?
> >
> > To handle the escape character '\' which is default in mysql but not in
> > pgsql, I have already made some rough modification to the exported sql
> > dump file:
> > sed "s/,'/,E'/g" |sed 's/\\0/ /g'. I guess there might still be some
> > characters missing handling and that might cause the insert command to
> > be split to several invalid pgsql commands. Would that be the cause of
> > the "out of memory" error?
> >
> >
> > --
> > Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
> > To make changes to your subscription:
> > http://www.postgresql.org/mailpref/pgsql-admin



Re: plsql gets "out of memory"

От
"Kevin Grittner"
Дата:
Rural Hunter <ruralhunter@gmail.com> wrote:

> it's a problem of migrating vast data from mysql to pgsql.

I don't know how helpful you'll find this, but when we migrated to
PostgreSQL a few years ago, we had good luck with using Java to
stream from one database to another.  Be sure to include a lot of
inserts in each database transactions.  We used prepared statements
for the inserts.

I don't know whether you would consider our data "vast" -- we had
about 100 production databases.  At the time the largest was 200 GB.

-Kevin

Re: plsql gets "out of memory"

От
Rural Hunter
Дата:
yes, the size of my database(around 200g) is very close to yours. How
long did it take you to migrate the data(100-200G)? I thought about to
write java code to transfer the data but it's a bit trivial. It would be
the best if I can complete this with regular export/import way. if that
won't work, then writing java code to do the task will be the choice.

Another problem is, from my test, the performance of bulk insert looks
not very good. I estimated the importing of 30G data would take about
10-20 hours based on the progress. I already did some tweaks such as:
fsync=off
archive_mode=off
Increase checkpoint_segments
drop indexes and primary keys

I have about 5G memory free on the server and have these memory settings:
shared_buffers = 1GB
work_mem = 8MB
maintenance_work_mem = 1GB
effective_cache_size = 4GB

Not sure if there is any room to increase the performance of bulk
insert. I monitored the server when the import running. The utilization
of CPU/disk is very low. The memory usage seems no much change when
pgsql is up or down. So looks the performance bottleneck is not on
cpu/disk/memory. I'm a bit lost on this and have no idea what to
check/change.

于 2011/8/29 21:44, Kevin Grittner 写道:
> Rural Hunter<ruralhunter@gmail.com>  wrote:
>
>> it's a problem of migrating vast data from mysql to pgsql.
>
> I don't know how helpful you'll find this, but when we migrated to
> PostgreSQL a few years ago, we had good luck with using Java to
> stream from one database to another.  Be sure to include a lot of
> inserts in each database transactions.  We used prepared statements
> for the inserts.
>
> I don't know whether you would consider our data "vast" -- we had
> about 100 production databases.  At the time the largest was 200 GB.
>
> -Kevin
>


Re: plsql gets "out of memory"

От
"Kevin Grittner"
Дата:
Rural Hunter <ruralhunter@gmail.com> wrote:

> yes, the size of my database(around 200g) is very close to yours.
> How long did it take you to migrate the data(100-200G)?

I don't remember off-hand, and I'm not sure it's worth digging for
the info because so much has changed in the intervening six years.
Recent PostgreSQL releases are much faster, and our current hardware
is also much faster.  I just ran bonnie++ on our new server just
arrived (yet to be put into production) and get 619930 KB/sec for
sequential writes.  200 MB at that rate is 338 seconds.  :-)

> Another problem is, from my test, the performance of bulk insert
> looks not very good. I estimated the importing of 30G data would
> take about 10-20 hours based on the progress. I already did some
> tweaks such as:
> fsync=off
> archive_mode=off

Good (but don't forget to change that once the bulk load is done).
You should probably also turn off full_page_writes and
synchronous_commit.  I've seen benchmarks which show that this
helps, even with the other settings you mention.

> Increase checkpoint_segments

There can be paradoxical results with that.  For reasons yet to be
determined, bulk conversion (unlike most workloads) sometimes runs
faster with a small setting like the default.  You have to test to
see how it works in your environment.

> drop indexes and primary keys

Yeah, you definitely want to build those only after the data for a
table is loaded.  I also recommend a VACUUM FREEZE ANALYZE on the
database unless most of these rows will be deleted or updated before
you run a billion database transactions.  Otherwise you will get a
painful "anti-wraparound" autovacuum on everything, probably at a
time of heavy usage.

> I have about 5G memory free on the server and have these memory
> settings:
> shared_buffers = 1GB
> work_mem = 8MB
> maintenance_work_mem = 1GB
> effective_cache_size = 4GB

Reasonable for that amount of RAM, but that's seems like
underpowered hardware for the size of the database.  If the
production conversion is going to be run against different hardware,
these tests may not be giving you very good numbers for what to
expect.

> Not sure if there is any room to increase the performance of bulk
> insert. I monitored the server when the import running. The
> utilization of CPU/disk is very low.

We tended to see low disk utilization, and saturated CPUs.  We
worked around this by using a small queue of rows in the Java
conversion process, with one thread reading and a separate thread
writing.  We also run several conversion processes at once, on
separate tables.  See about using COPY, either through JDBC support
or outputting your files in a format COPY can digest.  It can be
significantly faster than INSERT, even if you optimize the INSERTs
by using prepared statements and a large number of rows per
transaction.  Also, if at all possible, load the data within the
same database transaction where the table is created.  This can
allow for an optimization where the data is not logged to WAL
because if the transaction aborts, the table is not there anyway.

> The memory usage seems no much change when pgsql is up or down. So
> looks the performance bottleneck is not on cpu/disk/memory. I'm a
> bit lost on this and have no idea what to check/change.

Network latency?  Avoid encrypted connections for the load, and do
whatever you can to minimize latency, like attaching both machines
to the same switch.  You can't improve performance much if you're
working on the things which are only using a small part of the time.
Identifying the source of your delays is the most important thing
at this point.

-Kevin

Re: plsql gets "out of memory"

От
Rural Hunter
Дата:
Hi Kevin,

Thank you very much for the quick and detailed answers/suggestions. I
will check and try them.

于 2011/8/29 23:18, Kevin Grittner 写道:
> Good (but don't forget to change that once the bulk load is done). You
> should probably also turn off full_page_writes and synchronous_commit.
> I've seen benchmarks which show that this helps, even with the other
> settings you mention.
I forgot to mention synchronous_commit is already off and I will test
with full_page_writes off.
>> Increase checkpoint_segments
>
> There can be paradoxical results with that.  For reasons yet to be
> determined, bulk conversion (unlike most workloads) sometimes runs
> faster with a small setting like the default.  You have to test to
> see how it works in your environment.
yes, will check that.
> Yeah, you definitely want to build those only after the data for a
> table is loaded. I also recommend a VACUUM FREEZE ANALYZE on the
> database unless most of these rows will be deleted or updated before
> you run a billion database transactions. Otherwise you will get a
> painful "anti-wraparound" autovacuum on everything, probably at a time
> of heavy usage.
hmm....I will try to turn autovacuum off though I didn't see any
resource intension caused by it.
>> I have about 5G memory free on the server and have these memory
>> settings:
>> shared_buffers = 1GB
>> work_mem = 8MB
>> maintenance_work_mem = 1GB
>> effective_cache_size = 4GB
>
> Reasonable for that amount of RAM, but that's seems like
> underpowered hardware for the size of the database.  If the
> production conversion is going to be run against different hardware,
> these tests may not be giving you very good numbers for what to
> expect.
I have mysql run on the server. The memory above is just for the import
for pgsql. I will shutdown mysql and give the memory(32G totally) to
pgsql when I doing the performance tests.
> Network latency? Avoid encrypted connections for the load, and do
> whatever you can to minimize latency, like attaching both machines to
> the same switch. You can't improve performance much if you're working
> on the things which are only using a small part of the time.
> Identifying the source of your delays is the most important thing at
> this point.
No, I do the import locally on the db server so the network letency can
be excluded.

Re: plsql gets "out of memory"

От
"Kevin Grittner"
Дата:
Rural Hunter <ruralhunter@gmail.com> wrote:
> 2011/8/29 23:18, Kevin Grittner:

>> I also recommend a VACUUM FREEZE ANALYZE on the database unless
>> most of these rows will be deleted or updated before you run a
>> billion database transactions. Otherwise you will get a painful
>> "anti-wraparound" autovacuum on everything, probably at a time
>> of heavy usage.
> hmm....I will try to turn autovacuum off though I didn't see any
> resource intension caused by it.

Well, turning off autovacuum during  a bulk load is probably a net
gain if it's insert-only (i.e., no need to query just-loaded data to
decide what to do with new rows); but that's not what I was getting
at.  Bulk loading 200 GB of data which is not going to be deleted or
updated heavily is setting a performance time bomb without a VACUUM
FREEZE.  At some point, perhaps months later, it will be necessary
to freeze the tuples to prevent data loss, and since this occurs
based on a threshold of how many transaction IDs have been consumed,
it is most likely to happen at peak OLTP loads, when it will be the
biggest problem.  A VACUUM FREEZE (and you might as well throw in
ANALYZE while you're at it) will take care of that up front.  As a
side benefit it will keep SELECT statements from generating heavy
*write* loads on the first access to tuples, and will perform other
maintenance which will improve database performance.

I just count the time for VACUUM FREEZE ANALYZE as part of the bulk
load time required before letting in users.

>> Network latency?

> No, I do the import locally on the db server so the network
> letency can be excluded.

Hmm...  I don't remember the details, but there was a problem at
some point where Linux pipe connections could introduce significant
latency, and you could get much better performance on a TCP
connection through localhost.  It might be worth a try.  (Maybe
someone else will remember the details.)

-Kevin

Re: plsql gets "out of memory"

От
Rural Hunter
Дата:
Thank you. I didn't understand what 'vacuum freeze' actually does. I
will check the detail to see if it's good for my situation. and I will
also test the load by tcp connection. Thanks again for all your advices
and they are really very helpful to me!

于 2011/8/30 0:06, Kevin Grittner 写道:
> Rural Hunter<ruralhunter@gmail.com>  wrote:
>> 2011/8/29 23:18, Kevin Grittner:
>
>>> I also recommend a VACUUM FREEZE ANALYZE on the database unless
>>> most of these rows will be deleted or updated before you run a
>>> billion database transactions. Otherwise you will get a painful
>>> "anti-wraparound" autovacuum on everything, probably at a time
>>> of heavy usage.
>> hmm....I will try to turn autovacuum off though I didn't see any
>> resource intension caused by it.
>
> Well, turning off autovacuum during  a bulk load is probably a net
> gain if it's insert-only (i.e., no need to query just-loaded data to
> decide what to do with new rows); but that's not what I was getting
> at.  Bulk loading 200 GB of data which is not going to be deleted or
> updated heavily is setting a performance time bomb without a VACUUM
> FREEZE.  At some point, perhaps months later, it will be necessary
> to freeze the tuples to prevent data loss, and since this occurs
> based on a threshold of how many transaction IDs have been consumed,
> it is most likely to happen at peak OLTP loads, when it will be the
> biggest problem.  A VACUUM FREEZE (and you might as well throw in
> ANALYZE while you're at it) will take care of that up front.  As a
> side benefit it will keep SELECT statements from generating heavy
> *write* loads on the first access to tuples, and will perform other
> maintenance which will improve database performance.
>
> I just count the time for VACUUM FREEZE ANALYZE as part of the bulk
> load time required before letting in users.
>
>>> Network latency?
>
>> No, I do the import locally on the db server so the network
>> letency can be excluded.
>
> Hmm...  I don't remember the details, but there was a problem at
> some point where Linux pipe connections could introduce significant
> latency, and you could get much better performance on a TCP
> connection through localhost.  It might be worth a try.  (Maybe
> someone else will remember the details.)
>
> -Kevin
>


Re: plsql gets "out of memory"

От
Scott Marlowe
Дата:
2011/8/29 Rural Hunter <ruralhunter@gmail.com>:
> Hi all,
> I'm a newbie here. I'm trying to test pgsql with my mysql data. If the
> performance is good, I will migrate from mysql to pgsql.
> I installed pgsql 9.1rc on my Ubuntu server. I'm trying to import a large
> sql file dumped from mysql into pgsql with 'plsql -f'. The file is around
> 30G with bulk insert commands in it. It rans several hours and then aborted
> with an "out of memory" error. This is the tail of the log I got:
> INSERT 0 280
> INSERT 0 248
> INSERT 0 210
> INSERT 0 199
> invalid command \n
> out of memory

I'd look at what's leading to the invalid command \n up there.  I
doubt the out of memory is more than a symptom from that.  I.e. some
part of your inserts are improperly formatted and the machine is then
trying to insert a row with what it thinks is a column of several
gigabytes.

Re: plsql gets "out of memory"

От
Rural Hunter
Дата:
Yes, seems so. I separated one of the tables having the problem and did
a test. First, loaded it with original file and saw the same: invalid
command then follows "out of memory". After I fixed those escape
characters, the table could be loaded successfully.

于 2011/8/30 10:14, Scott Marlowe 写道:
> 2011/8/29 Rural Hunter<ruralhunter@gmail.com>:
>> Hi all,
>> I'm a newbie here. I'm trying to test pgsql with my mysql data. If the
>> performance is good, I will migrate from mysql to pgsql.
>> I installed pgsql 9.1rc on my Ubuntu server. I'm trying to import a large
>> sql file dumped from mysql into pgsql with 'plsql -f'. The file is around
>> 30G with bulk insert commands in it. It rans several hours and then aborted
>> with an "out of memory" error. This is the tail of the log I got:
>> INSERT 0 280
>> INSERT 0 248
>> INSERT 0 210
>> INSERT 0 199
>> invalid command \n
>> out of memory
> I'd look at what's leading to the invalid command \n up there.  I
> doubt the out of memory is more than a symptom from that.  I.e. some
> part of your inserts are improperly formatted and the machine is then
> trying to insert a row with what it thinks is a column of several
> gigabytes.
>


Re: plsql gets "out of memory"

От
Rural Hunter
Дата:
Hi Kevin,

I did another try with following additional changes based on our discussion:
1. use the tcp connection
2. turn off autovacuum
3. turn off full_page_writes

I could import more than 30G data in about 2 hours. That's totally
acceptable performance to me with the current server capability.  There
is a minor issue though. I saw a few errors during the import:
ERROR:  invalid byte sequence for encoding "UTF8": 0xe6272c
ERROR:  invalid byte sequence for encoding "UTF8": 0xe5272c
ERROR:  invalid byte sequence for encoding "UTF8": 0xe5272c
ERROR:  invalid byte sequence for encoding "UTF8": 0xe5272c
ERROR:  invalid byte sequence for encoding "UTF8": 0xe68e27
ERROR:  invalid byte sequence for encoding "UTF8": 0xe7272c
ERROR:  invalid byte sequence for encoding "UTF8": 0xe5272c
ERROR:  invalid byte sequence for encoding "UTF8": 0xe5a427

My data was exported from an UTF8 MySQL database and my pgsql db is also
UTF8. I got 8 errors above only with about 3 million records imported.
The strange thing is, I usually see the problematic SQL output in the
log if there is any error for that SQL so I have a chance to fix the
data manually. But for the errors above, I don't see any SQL logged. The
pgsql log just output error log same as above with no additional info:
2011-09-01 11:26:32 CST ERROR:  invalid byte sequence for encoding
"UTF8": 0xe6272c
2011-09-01 11:26:47 CST ERROR:  invalid byte sequence for encoding
"UTF8": 0xe5272c
2011-09-01 11:26:53 CST ERROR:  invalid byte sequence for encoding
"UTF8": 0xe5272c
2011-09-01 11:26:58 CST ERROR:  invalid byte sequence for encoding
"UTF8": 0xe5272c
2011-09-01 11:26:58 CST ERROR:  invalid byte sequence for encoding
"UTF8": 0xe68e27
2011-09-01 11:27:01 CST ERROR:  invalid byte sequence for encoding
"UTF8": 0xe7272c
2011-09-01 11:27:06 CST ERROR:  invalid byte sequence for encoding
"UTF8": 0xe5272c
2011-09-01 11:27:15 CST ERROR:  invalid byte sequence for encoding
"UTF8": 0xe5a427

What could be the cause of that?


于 2011/8/30 9:29, Rural Hunter 写道:
> Thank you. I didn't understand what 'vacuum freeze' actually does. I
> will check the detail to see if it's good for my situation. and I will
> also test the load by tcp connection. Thanks again for all your
> advices and they are really very helpful to me!
>
> 于 2011/8/30 0:06, Kevin Grittner 写道:
>> Rural Hunter<ruralhunter@gmail.com>  wrote:
>>> 2011/8/29 23:18, Kevin Grittner:
>>
>>>> I also recommend a VACUUM FREEZE ANALYZE on the database unless
>>>> most of these rows will be deleted or updated before you run a
>>>> billion database transactions. Otherwise you will get a painful
>>>> "anti-wraparound" autovacuum on everything, probably at a time
>>>> of heavy usage.
>>> hmm....I will try to turn autovacuum off though I didn't see any
>>> resource intension caused by it.
>>
>> Well, turning off autovacuum during  a bulk load is probably a net
>> gain if it's insert-only (i.e., no need to query just-loaded data to
>> decide what to do with new rows); but that's not what I was getting
>> at.  Bulk loading 200 GB of data which is not going to be deleted or
>> updated heavily is setting a performance time bomb without a VACUUM
>> FREEZE.  At some point, perhaps months later, it will be necessary
>> to freeze the tuples to prevent data loss, and since this occurs
>> based on a threshold of how many transaction IDs have been consumed,
>> it is most likely to happen at peak OLTP loads, when it will be the
>> biggest problem.  A VACUUM FREEZE (and you might as well throw in
>> ANALYZE while you're at it) will take care of that up front.  As a
>> side benefit it will keep SELECT statements from generating heavy
>> *write* loads on the first access to tuples, and will perform other
>> maintenance which will improve database performance.
>>
>> I just count the time for VACUUM FREEZE ANALYZE as part of the bulk
>> load time required before letting in users.
>>
>>>> Network latency?
>>
>>> No, I do the import locally on the db server so the network
>>> letency can be excluded.
>>
>> Hmm...  I don't remember the details, but there was a problem at
>> some point where Linux pipe connections could introduce significant
>> latency, and you could get much better performance on a TCP
>> connection through localhost.  It might be worth a try.  (Maybe
>> someone else will remember the details.)
>>
>> -Kevin
>>
>


Re: plsql gets "out of memory"

От
Scott Marlowe
Дата:
On Fri, Sep 2, 2011 at 9:30 PM, Rural Hunter <ruralhunter@gmail.com> wrote:
> Hi Kevin,
>
> I did another try with following additional changes based on our discussion:
> 1. use the tcp connection
> 2. turn off autovacuum
> 3. turn off full_page_writes
>
> I could import more than 30G data in about 2 hours. That's totally
> acceptable performance to me with the current server capability.  There is a
> minor issue though. I saw a few errors during the import:
> ERROR:  invalid byte sequence for encoding "UTF8": 0xe6272c
> ERROR:  invalid byte sequence for encoding "UTF8": 0xe5272c
> ERROR:  invalid byte sequence for encoding "UTF8": 0xe5272c
> ERROR:  invalid byte sequence for encoding "UTF8": 0xe5272c
> ERROR:  invalid byte sequence for encoding "UTF8": 0xe68e27
> ERROR:  invalid byte sequence for encoding "UTF8": 0xe7272c
> ERROR:  invalid byte sequence for encoding "UTF8": 0xe5272c
> ERROR:  invalid byte sequence for encoding "UTF8": 0xe5a427
>
> My data was exported from an UTF8 MySQL database and my pgsql db is also
> UTF8. I got 8 errors above only with about 3 million records imported. The
> strange thing is, I usually see the problematic SQL output in the log if
> there is any error for that SQL so I have a chance to fix the data manually.
> But for the errors above, I don't see any SQL logged. The pgsql log just
> output error log same as above with no additional info:
> 2011-09-01 11:26:32 CST ERROR:  invalid byte sequence for encoding "UTF8":
> 0xe6272c
> 2011-09-01 11:26:47 CST ERROR:  invalid byte sequence for encoding "UTF8":
> 0xe5272c
> 2011-09-01 11:26:53 CST ERROR:  invalid byte sequence for encoding "UTF8":
> 0xe5272c
> 2011-09-01 11:26:58 CST ERROR:  invalid byte sequence for encoding "UTF8":
> 0xe5272c
> 2011-09-01 11:26:58 CST ERROR:  invalid byte sequence for encoding "UTF8":
> 0xe68e27
> 2011-09-01 11:27:01 CST ERROR:  invalid byte sequence for encoding "UTF8":
> 0xe7272c
> 2011-09-01 11:27:06 CST ERROR:  invalid byte sequence for encoding "UTF8":
> 0xe5272c
> 2011-09-01 11:27:15 CST ERROR:  invalid byte sequence for encoding "UTF8":
> 0xe5a427
>
> What could be the cause of that?

MySQL probably has looser checking of proper UTF-8 encodings.