Обсуждение: Postgres batch write very slow - what to do

От:
femski
Дата:

Folks !

I have a batch application that writes approx. 4 million rows into a narrow
table. I am using JDBC addBatch/ExecuteBatch with auto commit turned off.
Batch size is 100. So far I am seeing Postgres take roughly five times the
time it takes to do this in the Oracle.

I have played with many parameters. Only one that seems to have any affect
is fsync - but thats only 10% or so.
Initially I got the warning that checkpoints were happening too often so I
increased the segments to 24. Warnings stopped, but no real improvement in
performance.

Is postgres really that slow ? What am I missing ?

Here are the changes to my postgressql.cong file.

shared_buffers = 768MB
work_mem = 256MB
maintenance_work_mem = 128MB
fsync = off

checkpoint_segments = 24
autovacuum = on

Thank you,

-Sanjay
--
View this message in context: http://www.nabble.com/Postgres-batch-write-very-slow---what-to-do-tf3395195.html#a9452092
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.


От:
Heikki Linnakangas
Дата:

femski wrote:
> I have a batch application that writes approx. 4 million rows into a narrow
> table. I am using JDBC addBatch/ExecuteBatch with auto commit turned off.
> Batch size is 100. So far I am seeing Postgres take roughly five times the
> time it takes to do this in the Oracle.

The usual tricks are:
- Drop indexes before loading, and rebuild them afterwards.
- Disable foreign key references while loading
- Use COPY instead of JDBC

Is the loading I/O or CPU bound? How long does the load actually take?

Are you running the latest version of PostgreSQL and the JDBC driver?

Have you read the "Populating a Database" chapter in the manual:
http://www.postgresql.org/docs/8.2/interactive/populate.html

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

От:
"Merlin Moncure"
Дата:

On 3/13/07, femski <> wrote:
>
> Folks !
>
> I have a batch application that writes approx. 4 million rows into a narrow
> table. I am using JDBC addBatch/ExecuteBatch with auto commit turned off.
> Batch size is 100. So far I am seeing Postgres take roughly five times the
> time it takes to do this in the Oracle.
>
> I have played with many parameters. Only one that seems to have any affect
> is fsync - but thats only 10% or so.
> Initially I got the warning that checkpoints were happening too often so I
> increased the segments to 24. Warnings stopped, but no real improvement in
> performance.
>
> Is postgres really that slow ? What am I missing ?

how many inserts/sec are you getting approximately.  Maybe you have
some type of network issue.

merlin

От:
femski
Дата:

I am runing Postgres 8.2 on OpenSuse 10.2 with latest jdbc driver. I moved
the app to be collocated with the server. Oracle takes 60 sec. Postgres 275
sec. For 4.7 million rows.

There are 4 CPUs on the server and one is runing close to 100% during
inserts.
Network history shows spikes of upto 60% of the bandwidth (Gnome System
monitor graph). I have a gigabit card - but should not enter into picture
since its on local host.

thanks,

-Sanjay
--
View this message in context: http://www.nabble.com/Postgres-batch-write-very-slow---what-to-do-tf3395195.html#a9453712
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.


От:
Tom Lane
Дата:

femski <> writes:
> I am runing Postgres 8.2 on OpenSuse 10.2 with latest jdbc driver. I moved
> the app to be collocated with the server. Oracle takes 60 sec. Postgres 275
> sec. For 4.7 million rows.

> There are 4 CPUs on the server and one is runing close to 100% during
> inserts.
> Network history shows spikes of upto 60% of the bandwidth (Gnome System
> monitor graph).

It sounds like you're incurring a network round trip for each row, which
will be expensive even for a co-located application.  Perhaps Oracle's
JDBC driver is smart enough to avoid that.  I'm not sure what tricks are
available for bulk loading with our JDBC driver --- the page Heikki
mentioned explains things from a server perspective but I dunno how that
translates into JDBC.  The folks who hang out on pgsql-jdbc could
probably give you some tips.

            regards, tom lane

От:
Heikki Linnakangas
Дата:

Tom Lane wrote:
> It sounds like you're incurring a network round trip for each row, which
> will be expensive even for a co-located application.  Perhaps Oracle's
> JDBC driver is smart enough to avoid that.  I'm not sure what tricks are
> available for bulk loading with our JDBC driver --- the page Heikki
> mentioned explains things from a server perspective but I dunno how that
> translates into JDBC.  The folks who hang out on pgsql-jdbc could
> probably give you some tips.

OP said he's using addBatch/executeBatch with a batch size of 100. The
JDBC driver sends the whole batch before waiting for responses.

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

От:
Tom Lane
Дата:

Heikki Linnakangas <> writes:
> OP said he's using addBatch/executeBatch with a batch size of 100. The
> JDBC driver sends the whole batch before waiting for responses.

Perhaps a bit of checking with a packet sniffer would be warranted.
If it's really working like that he shouldn't see the network utilization
load he reported ...

            regards, tom lane

От:
femski
Дата:

Ok, I turned off XDMCP and network bandwidth utilization dropped to less than
5%.
Timings remained the same.

Curiously five times faster time for Oracle came from a client running on a
different host than the server.
To make things worse for Postgres, when I replace "hostname" in jdbc string
to "localhost" or 127.0.0.1
it runs another 60% slower (446 sec vs 275 sec). Strange.

Before I take this discussion to jdbc list, why is CPU utilization 100%
during insert ? could that be a bottleneck. How to eliminate it ? These are
Intel WordCrest  5110 Xeon cores.

thank you

-Sanjay


femski wrote:
>
> I am runing Postgres 8.2 on OpenSuse 10.2 with latest jdbc driver. I moved
> the app to be collocated with the server. Oracle takes 60 sec. Postgres
> 275 sec. For 4.7 million rows.
>
> There are 4 CPUs on the server and one is runing close to 100% during
> inserts.
> Network history shows spikes of upto 60% of the bandwidth (Gnome System
> monitor graph). I have a gigabit card - but should not enter into picture
> since its on local host.
>
> thanks,
>
> -Sanjay
>

--
View this message in context: http://www.nabble.com/Postgres-batch-write-very-slow---what-to-do-tf3395195.html#a9473692
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.


От:
"Merlin Moncure"
Дата:

On 3/14/07, femski <> wrote:
>
> Ok, I turned off XDMCP and network bandwidth utilization dropped to less than
> 5%.
> Timings remained the same.
>
> Curiously five times faster time for Oracle came from a client running on a
> different host than the server.
> To make things worse for Postgres, when I replace "hostname" in jdbc string
> to "localhost" or 127.0.0.1
> it runs another 60% slower (446 sec vs 275 sec). Strange.
>
> Before I take this discussion to jdbc list, why is CPU utilization 100%
> during insert ? could that be a bottleneck. How to eliminate it ? These are
> Intel WordCrest  5110 Xeon cores.

when loading to oracle, does it utilize more than one core? istm your
best bet would be to split load process to 4+ backends...

merlin

От:
femski
Дата:

I am using Oracle XE so its using only one core and doing just fine.
How do I split backend to 4+ processes ?
I don't want to write a multithreaded loader app.
And I didn't think Postgres can utilize multiple cores for the
same insert statement.

thanks,

-Sanjay


On 3/14/07, femski <> wrote:

when loading to oracle, does it utilize more than one core? istm your
best bet would be to split load process to 4+ backends...

merlin


--
View this message in context: http://www.nabble.com/Postgres-batch-write-very-slow---what-to-do-tf3395195.html#a9486986
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.


От:
"Merlin Moncure"
Дата:

On 3/15/07, femski <> wrote:
>
> I am using Oracle XE so its using only one core and doing just fine.
> How do I split backend to 4+ processes ?
> I don't want to write a multithreaded loader app.
> And I didn't think Postgres can utilize multiple cores for the
> same insert statement.

well, what sql is the jdbc driver creating exactly?  It is probably
running inserts in a transaction.   your load is about 17k inserts/sec
which about right for postgres on your hardware.  you have the
following options to play increase insert performance:

* tweak postgresql.conf
  fsync: off it is not already
  wal_segments: bump to at least 24 or so
  maintenance_work_mem: if you create key after insert, bump this high
(it speeds create index)
  bgwriter settings: you can play with these, try disabling bgwriter
first (maxpages=0)
  full_page_writes=off might help, not 100% sure about this

* distribute load
  make load app multi thrreaded.

* use copy for bulk load
  [is there a way to make jdbc driver do this?]

* use multi-line inserts (at least 10 rows/insert)...nearly as fast as copy

* if jdbc driver is not already doing so, prepare your statements and execute.

merlin

От:
femski
Дата:

I tried maxpages = 0 and  full_page_writes=off and it seemed to be taking
forever.
All other tricks I have already tried.

At this point I wondering if its a jdbc client side issue - I am using the
latest 8.1.
(as I said in an earlier post - I am using addBatch with batch size of 100).
But just in case - I am missing something.

If 17k record/sec is right around expected then I must say I am little
disappointed from the "most advanced open source database".

thanks for all your help.

-Sanjay


Merlin Moncure-2 wrote:
>
> On 3/15/07, femski <> wrote:
> well, what sql is the jdbc driver creating exactly?  It is probably
> running inserts in a transaction.   your load is about 17k inserts/sec
> which about right for postgres on your hardware.  you have the
> following options to play increase insert performance:
>
> * tweak postgresql.conf
>   fsync: off it is not already
>   wal_segments: bump to at least 24 or so
>   maintenance_work_mem: if you create key after insert, bump this high
> (it speeds create index)
>   bgwriter settings: you can play with these, try disabling bgwriter
> first (maxpages=0)
>   full_page_writes=off might help, not 100% sure about this
>
> * distribute load
>   make load app multi thrreaded.
>
> * use copy for bulk load
>   [is there a way to make jdbc driver do this?]
>
> * use multi-line inserts (at least 10 rows/insert)...nearly as fast as
> copy
>
> * if jdbc driver is not already doing so, prepare your statements and
> execute.
>
> merlin
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
>        subscribe-nomail command to  so that your
>        message can get through to the mailing list cleanly
>
>

--
View this message in context: http://www.nabble.com/Postgres-batch-write-very-slow---what-to-do-tf3395195.html#a9492938
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.


От:
Tom Lane
Дата:

femski <> writes:
> If 17k record/sec is right around expected then I must say I am little
> disappointed from the "most advanced open source database".

Well, the software is certainly capable of much more than that;
for instance, on a not-too-new Dell x86_64 machine:

regression=# \timing
Timing is on.
regression=# create table t1(f1 int);
CREATE TABLE
Time: 3.614 ms
regression=# insert into t1 select * from generate_series(1,1000000);
INSERT 0 1000000
Time: 3433.483 ms

which works out to something a little shy of 300K rows/sec.  Of course
the main difference from what I think you're trying to do is the lack of
any per-row round trips to the client code.  But you need to look into
where the bottleneck is, not just assume it's insoluble.

            regards, tom lane

От:
"Merlin Moncure"
Дата:

On 3/15/07, femski <> wrote:
>
> I tried maxpages = 0 and  full_page_writes=off and it seemed to be taking
> forever.
> All other tricks I have already tried.
>
> At this point I wondering if its a jdbc client side issue - I am using the
> latest 8.1.
> (as I said in an earlier post - I am using addBatch with batch size of 100).
> But just in case - I am missing something.
>
> If 17k record/sec is right around expected then I must say I am little
> disappointed from the "most advanced open source database".

Be careful...you are just testing one very specific thing and it its
extremely possible that the Oracle JDBC batch insert is more optimized
than PostgreSQL's.  On my little pentium 4 workstation, by inserting
10 rows per insert:
insert values ([...]), ([...]), [8 more rows];

I got a 5x speedup in insert performance using this feature (which is
unfortunately new for 8.2).  Oracle is most likely pulling similar
tricks inside the driver.  PostgreSQL is much faster than you think...

merlin

От:
"Merlin Moncure"
Дата:

On 3/16/07, Bob Dusek <> wrote:
> This may or may not be related to what you're seeing... but, when we
> changed from Postgres 7.4.2 to 7.4.8, our batch processing slowed down
> fairly significantly.
>
> Here's what we were doing:
>
> Step 1) Build a larg file full of SQL insert statements.
> Step 2) Feed the file directly to "psql" using "psql dbname <
> insertfile".
>
> The time of execution for step 2 seemed like it nearly doubled from
> 7.4.2 to 7.4.8, for whatever reason (could have been the way Suse
> compiled the binaries).  Perhaps the slowdown was something we could
> have/should have tweaked with config options.

> At any rate, what we did to speed it up was to wrap the entire file in a
> transaction, as such: "BEGIN; ..filecontents.. COMMIT;"
>
> Apparently the autocommit stuff in the version of 7.4.8 we were using
> was just *doggedly* slow.
>
> Perhaps you're already using a transaction for your batch, though.  Or,
> maybe the problem isn't with Postgres.  Just thought I'd share.

If you are inserting records one by one without transaction (and no
fsync), i/o is going to determine your insertion speed.  not really
sure what was happening in your case...it looks like quite a different
type of issue from the OP.

anyways, to the OP some quick googling regarding postgresql jdbc
driver showed that the batch insert case is just not as optimized (in
the driver) as it could be.  The driver could do multi statement
inserts or use the libpq copy api, either of which would result in
huge performance gain.

merlin

От:
"Merlin Moncure"
Дата:

On 3/16/07, Merlin Moncure <> wrote:
> Be careful...you are just testing one very specific thing and it its
> extremely possible that the Oracle JDBC batch insert is more optimized
> than PostgreSQL's.  On my little pentium 4 workstation, by inserting
> 10 rows per insert:
> insert values ([...]), ([...]), [8 more rows];

small correction here, I actually went and looked at the JDBC api and
realized 'addBatch' means to run multiple stmts at once, not batch
inserting.  femski, your best bet is to lobby the JDBC folks to build
support for 'copy' into the driver for faster bulk loads (or help out
in that regard).

merlin

От:
"Joshua D. Drake"
Дата:

femski wrote:
> Folks !
>
> I have a batch application that writes approx. 4 million rows into a narrow
> table. I am using JDBC addBatch/ExecuteBatch with auto commit turned off.
> Batch size is 100. So far I am seeing Postgres take roughly five times the
> time it takes to do this in the Oracle.

If you are using 8.2 could you try with the multi value inserts?

insert into foo(bar) values (bang) (bong) (bing) ...?

>
> I have played with many parameters. Only one that seems to have any affect
> is fsync - but thats only 10% or so.
> Initially I got the warning that checkpoints were happening too often so I
> increased the segments to 24. Warnings stopped, but no real improvement in
> performance.
>
> Is postgres really that slow ? What am I missing ?
>
> Here are the changes to my postgressql.cong file.
>
> shared_buffers = 768MB
> work_mem = 256MB
> maintenance_work_mem = 128MB
> fsync = off
>
> checkpoint_segments = 24
> autovacuum = on
>
> Thank you,
>
> -Sanjay


--

      === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
             http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/


От:
Carlos Moreno
Дата:

Joshua D. Drake wrote:

>insert into foo(bar) values (bang) (bong) (bing) ...?
>
>
>

Nit pick  (with a "correct me if I'm wrong" disclaimer :-)) :

Wouldn't that be  (bang), (bong), (bing) ....  ??

Carlos
--


От:
"Joshua D. Drake"
Дата:

Carlos Moreno wrote:
> Joshua D. Drake wrote:
>
>> insert into foo(bar) values (bang) (bong) (bing) ...?
>>
>>
>>
>
> Nit pick  (with a "correct me if I'm wrong" disclaimer :-)) :
>
> Wouldn't that be  (bang), (bong), (bing) ....  ??

Yes.

J


>
> Carlos
> --
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster
>


--

      === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
             http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/


От:
"hubert depesz lubaczewski"
Дата:

On 3/13/07, femski <> wrote:
> I have a batch application that writes approx. 4 million rows into a narrow
> table. I am using JDBC addBatch/ExecuteBatch with auto commit turned off.
> Batch size is 100. So far I am seeing Postgres take roughly five times the
> time it takes to do this in the Oracle.

you can try to use pg_bulkload.
since it is called as standard function you shouldn't have problems
with jdbc. and it's apparently fast.

depesz

http://pgfoundry.org/projects/pgbulkload/