Обсуждение: [PERFORM] Improving PostgreSQL insert performance

Поиск

Список

Период

Сортировка

[PERFORM] Improving PostgreSQL insert performance

От

Frits Jalvingh

Дата:

09 июня 2017 г., 19:04:29

Hi all,

I am trying to improve the runtime of a big data warehouse application. One significant bottleneck found was insert performance, so I am investigating ways of getting Postgresql to insert data faster. I ran several tests on a fast machine to find out what performs best, and compared the results with the same actions in Oracle on that same machine.

So far I am finding that PostgreSQL insert performance is several times slower than Oracle performance, and I would be grateful for some help trying to decrease this gap...

To test I wrote a simple Java program which inserts data into a simple table, using statement batching and delayed commits. The table looks as follows:

create table h_test(

id_h integer

, source_system_id integer

, organisation_id integer

, load_dts timestamp without time zone

, boekingdetailid text

);

No constraints, no indexes.

The java program and PostgreSQL run on the same machine. The best results I've got are:

PostgreSQL inserts:

Commit size 50.000 and batch size 10.000

Inserted 1000000 rows in 7500 milliseconds, 142857.14285714287 rows per second

Inserted 1000000 rows in 7410 milliseconds, 142857.14285714287 rows per second

The exact same test done on Oracle (on the same machine) reports:

Inserted 1000000 rows in 1072 milliseconds, 1000000.0 rows per second

Increasing the row count in Oracle decreases this number a bit, but it's still fast:

Inserted 24000000 rows in 47155 milliseconds, 510638.2978723404 rows per second (oracle)

compared with:

Inserted 24000000 rows in 159929 milliseconds, 150943.3962264151 rows per second (postgresql)

I also created a small pg/sql stored procedure to insert the same 1 million rows, which runs in about 4 seconds, resulting in 250.000 rows a second. This is in the DB itself, but it still is twice as slow as Oracle with JDBC:

CREATE or replace function test() returns void AS $$

DECLARE

count integer;

BEGIN

for count in 1..1000000 loop

insert into h_test(id_h,source_system_id,organisation_id,load_dts,boekingdetailid)

values(count, 1, 12, now(), 'boe' || count || 'king' || count);

end loop;

END;

$$ LANGUAGE plpgsql;

I already changed the following config parameters:

work_mem 512MB

synchronous_commit off

shared_buffers 512mb

commit_delay 100000

autovacuum_naptime 10min

Postgres version is 9.6.3 on Ubuntu 17.04 64 bit, on a i7-4790K with 16GB memory and an Intel 750 SSD. JDBC driver is postgresql-42.1.1.

(btw: the actual load I'm trying to improve will load more than 132 million rows, and will grow).

Any help is greatly appreciated!

Regards,

Frits

Re: [PERFORM] Improving PostgreSQL insert performance

От

Andreas Kretschmer

Дата:

09 июня 2017 г., 19:24:15


Am 09.06.2017 um 15:04 schrieb Frits Jalvingh:
> Hi all,
>
> I am trying to improve the runtime of a big data warehouse
> application. One significant bottleneck found was insert performance,
> so I am investigating ways of getting Postgresql to insert data faster.

* use COPY instead of Insert, it is much faster
* bundle all Insert into one transaction
* use a separate disk/spindel for the transaction log



>
> I already changed the following config parameters:
> work_mem 512MB
> synchronous_commit off
> shared_buffers 512mb
> commit_delay 100000
> autovacuum_naptime 10min
>
> Postgres version is 9.6.3 on Ubuntu 17.04 64 bit, on a i7-4790K with
> 16GB memory and an Intel 750 SSD. JDBC driver is postgresql-42.1.1.
>

increase shared_buffers, with 16gb ram i would suggest 8gb


Regards, Andreas
--
2ndQuadrant - The PostgreSQL Support Company.
www.2ndQuadrant.com

Re: [PERFORM] Improving PostgreSQL insert performance

От

Kenneth Marshall

Дата:

09 июня 2017 г., 19:28:57

On Fri, Jun 09, 2017 at 03:24:15PM +0200, Andreas Kretschmer wrote:
>
>
> Am 09.06.2017 um 15:04 schrieb Frits Jalvingh:
> >Hi all,
> >
> >I am trying to improve the runtime of a big data warehouse
> >application. One significant bottleneck found was insert
> >performance, so I am investigating ways of getting Postgresql to
> >insert data faster.
>
> * use COPY instead of Insert, it is much faster
> * bundle all Insert into one transaction
> * use a separate disk/spindel for the transaction log
>
>
>
> >
> >I already changed the following config parameters:
> >work_mem 512MB
> >synchronous_commit off
> >shared_buffers 512mb
> >commit_delay 100000
> >autovacuum_naptime 10min
> >
> >Postgres version is 9.6.3 on Ubuntu 17.04 64 bit, on a i7-4790K
> >with 16GB memory and an Intel 750 SSD. JDBC driver is
> >postgresql-42.1.1.
> >
>
> increase shared_buffers, with 16gb ram i would suggest 8gb

+1 Without even checking, I think Oracle is configured to use a LOT
more memory than 512mb.

Regards,
Ken

Re: [PERFORM] Improving PostgreSQL insert performance

От

Frits Jalvingh

Дата:

09 июня 2017 г., 19:56:58

Hi Kenneth, Andreas,

Thanks for your tips!

I increased shared_buffers to 8GB but it has no measurable effect at all. I think that is logical: shared buffers are important for querying but not for inserting; for that the speed to write to disk seems most important- no big reason to cache the data if the commit requires a full write anyway.

I also changed the code to do only one commit; this also has no effect I can see.

It is true that Oracle had more memory assigned to it (1.5G), but unlike Postgres (which is completely on a fast SSD) Oracle runs on slower disk (ZFS)..

I will try copy, but I first need to investigate how to use it- its interface seems odd to say the least ;) I'll report back on that once done.

Any other tips would be welcome!

Regards,

Frits

On Fri, Jun 9, 2017 at 3:30 PM Kenneth Marshall <ktm@rice.edu> wrote:

On Fri, Jun 09, 2017 at 03:24:15PM +0200, Andreas Kretschmer wrote:
>
>
> Am 09.06.2017 um 15:04 schrieb Frits Jalvingh:
> >Hi all,
> >
> >I am trying to improve the runtime of a big data warehouse
> >application. One significant bottleneck found was insert
> >performance, so I am investigating ways of getting Postgresql to
> >insert data faster.
>
> * use COPY instead of Insert, it is much faster
> * bundle all Insert into one transaction
> * use a separate disk/spindel for the transaction log
>
>
>
> >
> >I already changed the following config parameters:
> >work_mem 512MB
> >synchronous_commit off
> >shared_buffers 512mb
> >commit_delay 100000
> >autovacuum_naptime 10min
> >
> >Postgres version is 9.6.3 on Ubuntu 17.04 64 bit, on a i7-4790K
> >with 16GB memory and an Intel 750 SSD. JDBC driver is
> >postgresql-42.1.1.
> >
>
> increase shared_buffers, with 16gb ram i would suggest 8gb

+1 Without even checking, I think Oracle is configured to use a LOT
more memory than 512mb.

Regards,
Ken

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Improving PostgreSQL insert performance

От

Scott Marlowe

Дата:

09 июня 2017 г., 20:33:14

On Fri, Jun 9, 2017 at 7:56 AM, Frits Jalvingh <jal@etc.to> wrote:
> Hi Kenneth, Andreas,
>
> Thanks for your tips!
>
> I increased shared_buffers to 8GB but it has no measurable effect at all. I
> think that is logical: shared buffers are important for querying but not for
> inserting; for that the speed to write to disk seems most important- no big
> reason to cache the data if the commit requires a full write anyway.
> I also changed the code to do only one commit; this also has no effect I can
> see.
>
> It is true that Oracle had more memory assigned to it (1.5G), but unlike
> Postgres (which is completely on a fast SSD) Oracle runs on slower disk
> (ZFS)..
>
> I will try copy, but I first need to investigate how to use it- its
> interface seems odd to say the least ;) I'll report back on that once done.

I you want an example of copy, just pg_dump a table:

pg_dump -d smarlowe -t test

(SNIP)
COPY test (a, b) FROM stdin;
1 abc
2 xyz
\.
(SNIP)

Re: [PERFORM] Improving PostgreSQL insert performance

От

Kenneth Marshall

Дата:

09 июня 2017 г., 20:36:37

On Fri, Jun 09, 2017 at 01:56:58PM +0000, Frits Jalvingh wrote:
> Hi Kenneth, Andreas,
>
> Thanks for your tips!
>
> I increased shared_buffers to 8GB but it has no measurable effect at all. I
> think that is logical: shared buffers are important for querying but not
> for inserting; for that the speed to write to disk seems most important- no
> big reason to cache the data if the commit requires a full write anyway.
> I also changed the code to do only one commit; this also has no effect I
> can see.
>
> It is true that Oracle had more memory assigned to it (1.5G), but unlike
> Postgres (which is completely on a fast SSD) Oracle runs on slower disk
> (ZFS)..
>
> I will try copy, but I first need to investigate how to use it- its
> interface seems odd to say the least ;) I'll report back on that once done.
>
> Any other tips would be welcome!
>
> Regards,
>
> Frits

Hi Frits,

Here is an article that is still valid:

https://www.depesz.com/2007/07/05/how-to-insert-data-to-database-as-fast-as-possible/

Regards,
Ken

Re: [PERFORM] Improving PostgreSQL insert performance

От

Frits Jalvingh

Дата:

09 июня 2017 г., 20:39:37

Hi all,

Thanks a lot for the many responses!

About preparing statements: this is done properly in Java, and pgsql does it by itself. So that cannot be done better ;)

I tried the copy command, and that indeed works quite brilliantly:

Inserted 24000000 rows in 22004 milliseconds, 1090710.7798582076 rows per second

That's faster than Oracle. But with a very bad interface I have to say for normal database work.. I will try to make this work in the tooling, but it needs some very special code to format all possible values properly, and to manage the end of the copy, so it is not usable in general which is a pity, I think.

So, I am still very interested in getting normal inserts faster, because that will gain speed for all work.. If Oracle can do it, and Postgres is able to insert fast with copy- where lies the bottleneck with the insert command? There seems to be quite a performance hit with the JDBC driver itself (as the stored procedure is a lot faster), so I can look into that. But even after that there is quite a gap..

Regards,

Frits

On Fri, Jun 9, 2017 at 4:33 PM Scott Marlowe <scott.marlowe@gmail.com> wrote:

On Fri, Jun 9, 2017 at 7:56 AM, Frits Jalvingh <jal@etc.to> wrote:
> Hi Kenneth, Andreas,
>
> Thanks for your tips!
>
> I increased shared_buffers to 8GB but it has no measurable effect at all. I
> think that is logical: shared buffers are important for querying but not for
> inserting; for that the speed to write to disk seems most important- no big
> reason to cache the data if the commit requires a full write anyway.
> I also changed the code to do only one commit; this also has no effect I can
> see.
>
> It is true that Oracle had more memory assigned to it (1.5G), but unlike
> Postgres (which is completely on a fast SSD) Oracle runs on slower disk
> (ZFS)..
>
> I will try copy, but I first need to investigate how to use it- its
> interface seems odd to say the least ;) I'll report back on that once done.

I you want an example of copy, just pg_dump a table:

pg_dump -d smarlowe -t test

(SNIP)
COPY test (a, b) FROM stdin;
1 abc
2 xyz
\.
(SNIP)

Re: [PERFORM] Improving PostgreSQL insert performance

От

"Sunkara, Amrutha"

Дата:

09 июня 2017 г., 20:46:49

Frits,

When you use the copy command, are you doing anything special to get the run time that you are indicating?

On Fri, Jun 9, 2017 at 10:39 AM, Frits Jalvingh <jal@etc.to> wrote:

Hi all,

Thanks a lot for the many responses!

About preparing statements: this is done properly in Java, and pgsql does it by itself. So that cannot be done better ;)

I tried the copy command, and that indeed works quite brilliantly:
Inserted 24000000 rows in 22004 milliseconds, 1090710.7798582076 rows per second

That's faster than Oracle. But with a very bad interface I have to say for normal database work.. I will try to make this work in the tooling, but it needs some very special code to format all possible values properly, and to manage the end of the copy, so it is not usable in general which is a pity, I think.

So, I am still very interested in getting normal inserts faster, because that will gain speed for all work.. If Oracle can do it, and Postgres is able to insert fast with copy- where lies the bottleneck with the insert command? There seems to be quite a performance hit with the JDBC driver itself (as the stored procedure is a lot faster), so I can look into that. But even after that there is quite a gap..

Regards,

Frits

On Fri, Jun 9, 2017 at 4:33 PM Scott Marlowe <scott.marlowe@gmail.com> wrote:
On Fri, Jun 9, 2017 at 7:56 AM, Frits Jalvingh <jal@etc.to> wrote:
> Hi Kenneth, Andreas,
>
> Thanks for your tips!
>
> I increased shared_buffers to 8GB but it has no measurable effect at all. I
> think that is logical: shared buffers are important for querying but not for
> inserting; for that the speed to write to disk seems most important- no big
> reason to cache the data if the commit requires a full write anyway.
> I also changed the code to do only one commit; this also has no effect I can
> see.
>
> It is true that Oracle had more memory assigned to it (1.5G), but unlike
> Postgres (which is completely on a fast SSD) Oracle runs on slower disk
> (ZFS)..
>
> I will try copy, but I first need to investigate how to use it- its
> interface seems odd to say the least ;) I'll report back on that once done.

I you want an example of copy, just pg_dump a table:

pg_dump -d smarlowe -t test

(SNIP)
COPY test (a, b) FROM stdin;
1 abc
2 xyz
\.
(SNIP)

Re: [PERFORM] Improving PostgreSQL insert performance

От

Kenneth Marshall

Дата:

09 июня 2017 г., 20:53:47

On Fri, Jun 09, 2017 at 02:39:37PM +0000, Frits Jalvingh wrote:
> Hi all,
>
> Thanks a lot for the many responses!
>
> About preparing statements: this is done properly in Java, and pgsql does
> it by itself. So that cannot be done better ;)
>
> I tried the copy command, and that indeed works quite brilliantly:
> Inserted 24000000 rows in 22004 milliseconds, 1090710.7798582076 rows per
> second
>
> That's faster than Oracle. But with a very bad interface I have to say for
> normal database work.. I will try to make this work in the tooling, but it
> needs some very special code to format all possible values properly, and to
> manage the end of the copy, so it is not usable in general which is a pity,
> I think.
>
> So, I am still very interested in getting normal inserts faster, because
> that will gain speed for all work.. If Oracle can do it, and Postgres is
> able to insert fast with copy- where lies the bottleneck with the insert
> command? There seems to be quite a performance hit with the JDBC driver
> itself (as the stored procedure is a lot faster), so I can look into that.
> But even after that there is quite a gap..
>
> Regards,
>
> Frits

Hi Frits,

Have you looked at UNLOGGED tables and also having more that 1 insert
stream running at a time. Sometimes multiple parallel inserts can be
faster.

Regards,
Ken

Re: [PERFORM] Improving PostgreSQL insert performance

От

Frits Jalvingh

Дата:

09 июня 2017 г., 20:55:04

I am not doing anything special I guess. I am adding the results of the tests and the programs I'm using to the following page:

https://etc.to/confluence/display/~admjal/PostgreSQL+performance+tests

The copy example, in Java, is at the end. All of the examples use trivial data and the same data. If you find fault please let me know ;) But the copy does insert the records as they can be seen ;)

On Fri, Jun 9, 2017 at 4:47 PM Sunkara, Amrutha <amrutha@nytimes.com> wrote:

Frits,

When you use the copy command, are you doing anything special to get the run time that you are indicating?

Re: [PERFORM] Improving PostgreSQL insert performance

От

Frits Jalvingh

Дата:

09 июня 2017 г., 20:57:58

Hi Kenneth,

I tried unlogged before, but as long as the commit interval is long it had no discerning effect that I could see.

Re: [PERFORM] Improving PostgreSQL insert performance

От

Frits Jalvingh

Дата:

09 июня 2017 г., 21:05:13

Hi Babu,

No, I did not, and the effect is quite great:

Inserted 1000000 rows in 2535 milliseconds, 394477.3175542406 rows per second

Inserted 1000000 rows in 2553 milliseconds, 391696.0438699569 rows per second

compared to (without your parameter):

Inserted 1000000 rows in 7643 milliseconds, 130838.67591259976 rows per second

That is quite an increase!! Thanks a lot for the tip!!

For those keeping score: we're now at 77% of Oracle's performance- without copy ;)

Re: [PERFORM] Improving PostgreSQL insert performance

От

John Gorman

Дата:

09 июня 2017 г., 21:08:45

You need to be careful with the setFetchSize we have tables with over 10 million rows and many columns and the PostgreSQL JDBC driver silently fails, ignores the fetch size and tries to read the entire table content into memory. I spent many agonizing days on this.

ps.setFetchSize(65536);

Regards

John

From: pgsql-performance-owner@postgresql.org [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of Frits Jalvingh
Sent: Friday, June 09, 2017 7:55 AM
To: Sunkara, Amrutha; pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Improving PostgreSQL insert performance

I am not doing anything special I guess. I am adding the results of the tests and the programs I'm using to the following page:

https://etc.to/confluence/display/~admjal/PostgreSQL+performance+tests

The copy example, in Java, is at the end. All of the examples use trivial data and the same data. If you find fault please let me know ;) But the copy does insert the records as they can be seen ;)

On Fri, Jun 9, 2017 at 4:47 PM Sunkara, Amrutha <amrutha@nytimes.com> wrote:

Frits,

When you use the copy command, are you doing anything special to get the run time that you are indicating?

Re: [PERFORM] Improving PostgreSQL insert performance

От

Frits Jalvingh

Дата:

09 июня 2017 г., 21:12:33

Hi John,

Yes, I was aware and amazed by that ;) It is actually the fetch size in combination with autocommit being on; that dies the sweet OOM death as soon as the table gets big.

But Postgres read performance, with autocommit off and fetch size arond 64K, is quite OK. But it's good to get this mentioned a lot, because as you said you can spend quite some time wondering about this!

On Fri, Jun 9, 2017 at 5:08 PM John Gorman <jgorman@eldocomp.com> wrote:

You need to be careful with the setFetchSize we have tables with over 10 million rows and many columns and the PostgreSQL JDBC driver silently fails, ignores the fetch size and tries to read the entire table content into memory. I spent many agonizing days on this.

ps.setFetchSize(65536);

Regards
John

From: pgsql-performance-owner@postgresql.org [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of Frits Jalvingh
Sent: Friday, June 09, 2017 7:55 AM
To: Sunkara, Amrutha; pgsql-performance@postgresql.org

Subject: Re: [PERFORM] Improving PostgreSQL insert performance

I am not doing anything special I guess. I am adding the results of the tests and the programs I'm using to the following page:

https://etc.to/confluence/display/~admjal/PostgreSQL+performance+tests

The copy example, in Java, is at the end. All of the examples use trivial data and the same data. If you find fault please let me know ;) But the copy does insert the records as they can be seen ;)
On Fri, Jun 9, 2017 at 4:47 PM Sunkara, Amrutha <amrutha@nytimes.com> wrote:
Frits,

When you use the copy command, are you doing anything special to get the run time that you are indicating?

Re: [PERFORM] Improving PostgreSQL insert performance

От

Scott Marlowe

Дата:

09 июня 2017 г., 21:16:51

On Fri, Jun 9, 2017 at 9:12 AM, Frits Jalvingh <jal@etc.to> wrote:
> Hi John,
>
> Yes, I was aware and amazed by that ;) It is actually the fetch size in
> combination with autocommit being on; that dies the sweet OOM death as soon
> as the table gets big.
>
> But Postgres read performance, with autocommit off and fetch size arond 64K,
> is quite OK. But it's good to get this mentioned a lot, because as you said
> you can spend quite some time wondering about this!

No production db server should have the oom killer enabled.

Re: [PERFORM] Improving PostgreSQL insert performance

От

Frits Jalvingh

Дата:

09 июня 2017 г., 21:22:35

Hi Babu,

That was all already done, as it is common practice for JDBC. Your parameter was added to the code that already did all that - and worked brilliantly there ;)

Re: [PERFORM] Improving PostgreSQL insert performance

От

Kenneth Marshall

Дата:

09 июня 2017 г., 21:33:33

On Fri, Jun 09, 2017 at 03:22:35PM +0000, Frits Jalvingh wrote:
> Hi Babu,
>
> That was all already done, as it is common practice for JDBC. Your
> parameter was added to the code that already did all that - and worked
> brilliantly there ;)
>
Hi Frits,

What was the parameter? I did not see an Email in the thread from Babu.

Regards,
Ken

Re: [PERFORM] Improving PostgreSQL insert performance

От

Frits Jalvingh

Дата:

09 июня 2017 г., 21:37:10

The parameter was

reWriteBatchedInserts = true

Either added in connection properties of in the connection URL like

jdbc:postgresql://localhost:5432/datavault_12_tst?reWriteBatchedInserts=true

BTW: It seems you need a recent driver for this; I'm using postgresql-42.1.1.jar

On Fri, Jun 9, 2017 at 5:33 PM Kenneth Marshall <ktm@rice.edu> wrote:

On Fri, Jun 09, 2017 at 03:22:35PM +0000, Frits Jalvingh wrote:
> Hi Babu,
>
> That was all already done, as it is common practice for JDBC. Your
> parameter was added to the code that already did all that - and worked
> brilliantly there ;)
>
Hi Frits,

What was the parameter? I did not see an Email in the thread from Babu.

Regards,
Ken

Re: [PERFORM] Improving PostgreSQL insert performance

От

Jeff Janes

Дата:

09 июня 2017 г., 23:07:07

On Fri, Jun 9, 2017 at 6:04 AM, Frits Jalvingh <jal@etc.to> wrote:

I already changed the following config parameters:
work_mem 512MB
synchronous_commit off

Since you are already batching up commits into large chunks, this setting is not very useful, but does risk you losing supposedly-committed data upon a crash. I would not do it.

shared_buffers 512mb

You might try increasing wal_buffers, but the default for this size of shared_buffers is 16MB, which is usually big enough.

One thing you are missing is max_wal_size. The default value of that is probably too small for what you are doing.

But if you are not using COPY, then maybe none of this matters as the bottleneck will be elsewhere.

Cheers,

Jeff

Re: [PERFORM] Improving PostgreSQL insert performance

От

Vladimir Sitnikov

Дата:

10 июня 2017 г., 04:08:34

Frits,

Would you mind sharing the source code of your benchmark?

>BTW: It seems you need a recent driver for this; I'm using postgresql-42.1.1.jar

Technically speaking, reWriteBatchedInserts was introduced in 9.4.1209 (2016-07-15)

Vladimir

Re: [PERFORM] Improving PostgreSQL insert performance

От

Frits Jalvingh

Дата:

10 июня 2017 г., 17:12:43

On Sat, Jun 10, 2017 at 12:08 AM Vladimir Sitnikov <sitnikov.vladimir@gmail.com> wrote:

Would you mind sharing the source code of your benchmark?

The source code for the several tests, plus the numbers collected so far, can be found at:

https://etc.to/confluence/display/~admjal/PostgreSQL+performance+tests

Regards,

Frits

Re: [PERFORM] Improving PostgreSQL insert performance

От

Nicolas Paris

Дата:

11 июня 2017 г., 02:12:35

> I tried the copy command, and that indeed works quite brilliantly:
> Inserted 24000000 rows in 22004 milliseconds, 1090710.7798582076 rows per
> second
>
> That's faster than Oracle. But with a very bad interface I have to say for
> normal database work.. I will try to make this work in the tooling, but it
> needs some very special code to format all possible values properly, and to
> manage the end of the copy, so it is not usable in general which is a pity, I
> think.

Have you thought about the COPY with binary format ? Thats looks more
robust than the text format you used in your benchmarks.

Re: [PERFORM] Improving PostgreSQL insert performance

От

Frits Jalvingh

Дата:

11 июня 2017 г., 02:37:48

I think binary is worse.. according to the postgres documentation:

The binary format option causes all data to be stored/read as binary format rather than as text. It is somewhat faster than the text and CSV formats, but a binary-format file is less portable across machine architectures and PostgreSQL versions. Also, the binary format is very data type specific; for example it will not work to output binary data from a smallint column and read it into an integer column, even though that would work fine in text format.

By itself it is similar in badness as both require completely different semantics than insert..

On Sat, 10 Jun 2017 at 22:12, Nicolas Paris <niparisco@gmail.com> wrote:

> I tried the copy command, and that indeed works quite brilliantly:
> Inserted 24000000 rows in 22004 milliseconds, 1090710.7798582076 rows per
> second
>
> That's faster than Oracle. But with a very bad interface I have to say for
> normal database work.. I will try to make this work in the tooling, but it
> needs some very special code to format all possible values properly, and to
> manage the end of the copy, so it is not usable in general which is a pity, I
> think.

Have you thought about the COPY with binary format ? Thats looks more
robust than the text format you used in your benchmarks.

Re: [PERFORM] Improving PostgreSQL insert performance

От

Alvaro Herrera

Дата:

11 июня 2017 г., 08:32:14

Frits Jalvingh wrote:

> So, I am still very interested in getting normal inserts faster, because
> that will gain speed for all work.. If Oracle can do it, and Postgres is
> able to insert fast with copy- where lies the bottleneck with the insert
> command? There seems to be quite a performance hit with the JDBC driver
> itself (as the stored procedure is a lot faster), so I can look into that.
> But even after that there is quite a gap..

Did you try inserting multiple tuples in one command?  Something like
INSERT INTO .. VALUES ('col1', 'col2'), ('col1', 'col2'), ('col1', 'col2')
It's supposed to be faster than single-row inserts, though I don't
know by how much.

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [PERFORM] Improving PostgreSQL insert performance

От

Joe Conway

Дата:

11 июня 2017 г., 10:15:18

On 06/10/2017 07:32 PM, Alvaro Herrera wrote:
> Frits Jalvingh wrote:
>
>> So, I am still very interested in getting normal inserts faster, because
>> that will gain speed for all work.. If Oracle can do it, and Postgres is
>> able to insert fast with copy- where lies the bottleneck with the insert
>> command? There seems to be quite a performance hit with the JDBC driver
>> itself (as the stored procedure is a lot faster), so I can look into that.
>> But even after that there is quite a gap..
>
> Did you try inserting multiple tuples in one command?  Something like
> INSERT INTO .. VALUES ('col1', 'col2'), ('col1', 'col2'), ('col1', 'col2')
> It's supposed to be faster than single-row inserts, though I don't
> know by how much.

When I did the testing of the patch originally I saw significant
improvements, e.g. 8x in early versions. The thread is here:
https://www.postgresql.org/message-id/flat/44C4451A.4010906%40joeconway.com#44C4451A.4010906@joeconway.com

Joe

--
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development

Вложения

signature.asc

Re: [PERFORM] Improving PostgreSQL insert performance

От

Frits Jalvingh

Дата:

11 июня 2017 г., 14:44:04

Hi Alvaro,

I did not try that, to be honest. I am using a single prepared statement so that the database needs to parse it only once. All executes then use the batched parameters.

I will try this later on, but I wonder whether having to reparse the statement every time compared to one prepared statement would actually be faster.

But thanks for the tip; I will take a look.

Regards,

Frits

Re: [PERFORM] Improving PostgreSQL insert performance

От

Vladimir Sitnikov

Дата:

11 июня 2017 г., 15:30:29

Alvaro>Something like

INSERT INTO .. VALUES ('col1', 'col2'), ('col1', 'col2'), ('col1', 'col2')>I did not

Frits>try that, to be honest.

pgjdbc does automatically rewrite insert values(); into insert ... values(),(),(),() when reWriteBatchedInserts=true. I don't expect manual multivalues to be noticeably faster there.

Frits>https://etc.to/confluence/display/~admjal/PostgreSQL+performance+tests

Do you really intend to measure just a single insert operation?

It looks odd, as typical applications would execute inserts for quite a while before they terminate.

You are including lots of warmup overheads (e.g. JIT-compilation), so your approach does not measure peak performance.

On the other hand, you are not measuring enough time to catch things like "DB log switch".

Would you please use JMH as a load driver?

Here's an example: https://github.com/pgjdbc/pgjdbc/blob/master/ubenchmark/src/main/java/org/postgresql/benchmark/statement/InsertBatch.java

Vladimir

Re: [PERFORM] Improving PostgreSQL insert performance

От

Alvaro Herrera

Дата:

11 июня 2017 г., 18:36:48

Vladimir Sitnikov wrote:
> Alvaro>Something like
> INSERT INTO .. VALUES ('col1', 'col2'), ('col1', 'col2'), ('col1', 'col2')>I
> did not
> Frits>try that, to be honest.
>
> pgjdbc does automatically rewrite insert values(); into insert ...
> values(),(),(),() when reWriteBatchedInserts=true. I don't expect manual
> multivalues to be noticeably faster there.

Ahh, so that's what that option does :-)  Nice to know -- great feature.

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: [PERFORM] Improving PostgreSQL insert performance

Вложения