Обсуждение: [PERFORM] Improving PostgreSQL insert performance
Am 09.06.2017 um 15:04 schrieb Frits Jalvingh: > Hi all, > > I am trying to improve the runtime of a big data warehouse > application. One significant bottleneck found was insert performance, > so I am investigating ways of getting Postgresql to insert data faster. * use COPY instead of Insert, it is much faster * bundle all Insert into one transaction * use a separate disk/spindel for the transaction log > > I already changed the following config parameters: > work_mem 512MB > synchronous_commit off > shared_buffers 512mb > commit_delay 100000 > autovacuum_naptime 10min > > Postgres version is 9.6.3 on Ubuntu 17.04 64 bit, on a i7-4790K with > 16GB memory and an Intel 750 SSD. JDBC driver is postgresql-42.1.1. > increase shared_buffers, with 16gb ram i would suggest 8gb Regards, Andreas -- 2ndQuadrant - The PostgreSQL Support Company. www.2ndQuadrant.com
On Fri, Jun 09, 2017 at 03:24:15PM +0200, Andreas Kretschmer wrote: > > > Am 09.06.2017 um 15:04 schrieb Frits Jalvingh: > >Hi all, > > > >I am trying to improve the runtime of a big data warehouse > >application. One significant bottleneck found was insert > >performance, so I am investigating ways of getting Postgresql to > >insert data faster. > > * use COPY instead of Insert, it is much faster > * bundle all Insert into one transaction > * use a separate disk/spindel for the transaction log > > > > > > >I already changed the following config parameters: > >work_mem 512MB > >synchronous_commit off > >shared_buffers 512mb > >commit_delay 100000 > >autovacuum_naptime 10min > > > >Postgres version is 9.6.3 on Ubuntu 17.04 64 bit, on a i7-4790K > >with 16GB memory and an Intel 750 SSD. JDBC driver is > >postgresql-42.1.1. > > > > increase shared_buffers, with 16gb ram i would suggest 8gb +1 Without even checking, I think Oracle is configured to use a LOT more memory than 512mb. Regards, Ken
On Fri, Jun 09, 2017 at 03:24:15PM +0200, Andreas Kretschmer wrote:
>
>
> Am 09.06.2017 um 15:04 schrieb Frits Jalvingh:
> >Hi all,
> >
> >I am trying to improve the runtime of a big data warehouse
> >application. One significant bottleneck found was insert
> >performance, so I am investigating ways of getting Postgresql to
> >insert data faster.
>
> * use COPY instead of Insert, it is much faster
> * bundle all Insert into one transaction
> * use a separate disk/spindel for the transaction log
>
>
>
> >
> >I already changed the following config parameters:
> >work_mem 512MB
> >synchronous_commit off
> >shared_buffers 512mb
> >commit_delay 100000
> >autovacuum_naptime 10min
> >
> >Postgres version is 9.6.3 on Ubuntu 17.04 64 bit, on a i7-4790K
> >with 16GB memory and an Intel 750 SSD. JDBC driver is
> >postgresql-42.1.1.
> >
>
> increase shared_buffers, with 16gb ram i would suggest 8gb
+1 Without even checking, I think Oracle is configured to use a LOT
more memory than 512mb.
Regards,
Ken
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
On Fri, Jun 9, 2017 at 7:56 AM, Frits Jalvingh <jal@etc.to> wrote: > Hi Kenneth, Andreas, > > Thanks for your tips! > > I increased shared_buffers to 8GB but it has no measurable effect at all. I > think that is logical: shared buffers are important for querying but not for > inserting; for that the speed to write to disk seems most important- no big > reason to cache the data if the commit requires a full write anyway. > I also changed the code to do only one commit; this also has no effect I can > see. > > It is true that Oracle had more memory assigned to it (1.5G), but unlike > Postgres (which is completely on a fast SSD) Oracle runs on slower disk > (ZFS).. > > I will try copy, but I first need to investigate how to use it- its > interface seems odd to say the least ;) I'll report back on that once done. I you want an example of copy, just pg_dump a table: pg_dump -d smarlowe -t test (SNIP) COPY test (a, b) FROM stdin; 1 abc 2 xyz \. (SNIP)
On Fri, Jun 09, 2017 at 01:56:58PM +0000, Frits Jalvingh wrote: > Hi Kenneth, Andreas, > > Thanks for your tips! > > I increased shared_buffers to 8GB but it has no measurable effect at all. I > think that is logical: shared buffers are important for querying but not > for inserting; for that the speed to write to disk seems most important- no > big reason to cache the data if the commit requires a full write anyway. > I also changed the code to do only one commit; this also has no effect I > can see. > > It is true that Oracle had more memory assigned to it (1.5G), but unlike > Postgres (which is completely on a fast SSD) Oracle runs on slower disk > (ZFS).. > > I will try copy, but I first need to investigate how to use it- its > interface seems odd to say the least ;) I'll report back on that once done. > > Any other tips would be welcome! > > Regards, > > Frits Hi Frits, Here is an article that is still valid: https://www.depesz.com/2007/07/05/how-to-insert-data-to-database-as-fast-as-possible/ Regards, Ken
On Fri, Jun 9, 2017 at 7:56 AM, Frits Jalvingh <jal@etc.to> wrote:
> Hi Kenneth, Andreas,
>
> Thanks for your tips!
>
> I increased shared_buffers to 8GB but it has no measurable effect at all. I
> think that is logical: shared buffers are important for querying but not for
> inserting; for that the speed to write to disk seems most important- no big
> reason to cache the data if the commit requires a full write anyway.
> I also changed the code to do only one commit; this also has no effect I can
> see.
>
> It is true that Oracle had more memory assigned to it (1.5G), but unlike
> Postgres (which is completely on a fast SSD) Oracle runs on slower disk
> (ZFS)..
>
> I will try copy, but I first need to investigate how to use it- its
> interface seems odd to say the least ;) I'll report back on that once done.
I you want an example of copy, just pg_dump a table:
pg_dump -d smarlowe -t test
(SNIP)
COPY test (a, b) FROM stdin;
1 abc
2 xyz
\.
(SNIP)
Hi all,Thanks a lot for the many responses!About preparing statements: this is done properly in Java, and pgsql does it by itself. So that cannot be done better ;)I tried the copy command, and that indeed works quite brilliantly:Inserted 24000000 rows in 22004 milliseconds, 1090710.7798582076 rows per secondThat's faster than Oracle. But with a very bad interface I have to say for normal database work.. I will try to make this work in the tooling, but it needs some very special code to format all possible values properly, and to manage the end of the copy, so it is not usable in general which is a pity, I think.So, I am still very interested in getting normal inserts faster, because that will gain speed for all work.. If Oracle can do it, and Postgres is able to insert fast with copy- where lies the bottleneck with the insert command? There seems to be quite a performance hit with the JDBC driver itself (as the stored procedure is a lot faster), so I can look into that. But even after that there is quite a gap..Regards,FritsOn Fri, Jun 9, 2017 at 4:33 PM Scott Marlowe <scott.marlowe@gmail.com> wrote:On Fri, Jun 9, 2017 at 7:56 AM, Frits Jalvingh <jal@etc.to> wrote:
> Hi Kenneth, Andreas,
>
> Thanks for your tips!
>
> I increased shared_buffers to 8GB but it has no measurable effect at all. I
> think that is logical: shared buffers are important for querying but not for
> inserting; for that the speed to write to disk seems most important- no big
> reason to cache the data if the commit requires a full write anyway.
> I also changed the code to do only one commit; this also has no effect I can
> see.
>
> It is true that Oracle had more memory assigned to it (1.5G), but unlike
> Postgres (which is completely on a fast SSD) Oracle runs on slower disk
> (ZFS)..
>
> I will try copy, but I first need to investigate how to use it- its
> interface seems odd to say the least ;) I'll report back on that once done.
I you want an example of copy, just pg_dump a table:
pg_dump -d smarlowe -t test
(SNIP)
COPY test (a, b) FROM stdin;
1 abc
2 xyz
\.
(SNIP)
On Fri, Jun 09, 2017 at 02:39:37PM +0000, Frits Jalvingh wrote: > Hi all, > > Thanks a lot for the many responses! > > About preparing statements: this is done properly in Java, and pgsql does > it by itself. So that cannot be done better ;) > > I tried the copy command, and that indeed works quite brilliantly: > Inserted 24000000 rows in 22004 milliseconds, 1090710.7798582076 rows per > second > > That's faster than Oracle. But with a very bad interface I have to say for > normal database work.. I will try to make this work in the tooling, but it > needs some very special code to format all possible values properly, and to > manage the end of the copy, so it is not usable in general which is a pity, > I think. > > So, I am still very interested in getting normal inserts faster, because > that will gain speed for all work.. If Oracle can do it, and Postgres is > able to insert fast with copy- where lies the bottleneck with the insert > command? There seems to be quite a performance hit with the JDBC driver > itself (as the stored procedure is a lot faster), so I can look into that. > But even after that there is quite a gap.. > > Regards, > > Frits Hi Frits, Have you looked at UNLOGGED tables and also having more that 1 insert stream running at a time. Sometimes multiple parallel inserts can be faster. Regards, Ken
Frits,When you use the copy command, are you doing anything special to get the run time that you are indicating?
You need to be careful with the setFetchSize we have tables with over 10 million rows and many columns and the PostgreSQL JDBC driver silently fails, ignores the fetch size and tries to read the entire table content into memory. I spent many agonizing days on this.
ps.setFetchSize(65536);
Regards
John
From: pgsql-performance-owner@postgresql.org [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of Frits Jalvingh
Sent: Friday, June 09, 2017 7:55 AM
To: Sunkara, Amrutha; pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Improving PostgreSQL insert performance
I am not doing anything special I guess. I am adding the results of the tests and the programs I'm using to the following page:
The copy example, in Java, is at the end. All of the examples use trivial data and the same data. If you find fault please let me know ;) But the copy does insert the records as they can be seen ;)
On Fri, Jun 9, 2017 at 4:47 PM Sunkara, Amrutha <amrutha@nytimes.com> wrote:
Frits,
When you use the copy command, are you doing anything special to get the run time that you are indicating?
You need to be careful with the setFetchSize we have tables with over 10 million rows and many columns and the PostgreSQL JDBC driver silently fails, ignores the fetch size and tries to read the entire table content into memory. I spent many agonizing days on this.
ps.setFetchSize(65536);
Regards
John
From: pgsql-performance-owner@postgresql.org [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of Frits Jalvingh
Sent: Friday, June 09, 2017 7:55 AM
To: Sunkara, Amrutha; pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Improving PostgreSQL insert performance
I am not doing anything special I guess. I am adding the results of the tests and the programs I'm using to the following page:
The copy example, in Java, is at the end. All of the examples use trivial data and the same data. If you find fault please let me know ;) But the copy does insert the records as they can be seen ;)
On Fri, Jun 9, 2017 at 4:47 PM Sunkara, Amrutha <amrutha@nytimes.com> wrote:
Frits,
When you use the copy command, are you doing anything special to get the run time that you are indicating?
On Fri, Jun 9, 2017 at 9:12 AM, Frits Jalvingh <jal@etc.to> wrote: > Hi John, > > Yes, I was aware and amazed by that ;) It is actually the fetch size in > combination with autocommit being on; that dies the sweet OOM death as soon > as the table gets big. > > But Postgres read performance, with autocommit off and fetch size arond 64K, > is quite OK. But it's good to get this mentioned a lot, because as you said > you can spend quite some time wondering about this! No production db server should have the oom killer enabled.
On Fri, Jun 09, 2017 at 03:22:35PM +0000, Frits Jalvingh wrote: > Hi Babu, > > That was all already done, as it is common practice for JDBC. Your > parameter was added to the code that already did all that - and worked > brilliantly there ;) > Hi Frits, What was the parameter? I did not see an Email in the thread from Babu. Regards, Ken
On Fri, Jun 09, 2017 at 03:22:35PM +0000, Frits Jalvingh wrote:
> Hi Babu,
>
> That was all already done, as it is common practice for JDBC. Your
> parameter was added to the code that already did all that - and worked
> brilliantly there ;)
>
Hi Frits,
What was the parameter? I did not see an Email in the thread from Babu.
Regards,
Ken
I already changed the following config parameters:work_mem 512MBsynchronous_commit off
shared_buffers 512mb
Would you mind sharing the source code of your benchmark?
> I tried the copy command, and that indeed works quite brilliantly: > Inserted 24000000 rows in 22004 milliseconds, 1090710.7798582076 rows per > second > > That's faster than Oracle. But with a very bad interface I have to say for > normal database work.. I will try to make this work in the tooling, but it > needs some very special code to format all possible values properly, and to > manage the end of the copy, so it is not usable in general which is a pity, I > think. Have you thought about the COPY with binary format ? Thats looks more robust than the text format you used in your benchmarks.
The binary format option causes all data to be stored/read as binary format rather than as text. It is somewhat faster than the text and CSV formats, but a binary-format file is less portable across machine architectures and PostgreSQL versions. Also, the binary format is very data type specific; for example it will not work to output binary data from a smallint column and read it into an integer column, even though that would work fine in text format.
By itself it is similar in badness as both require completely different semantics than insert..
> I tried the copy command, and that indeed works quite brilliantly:
> Inserted 24000000 rows in 22004 milliseconds, 1090710.7798582076 rows per
> second
>
> That's faster than Oracle. But with a very bad interface I have to say for
> normal database work.. I will try to make this work in the tooling, but it
> needs some very special code to format all possible values properly, and to
> manage the end of the copy, so it is not usable in general which is a pity, I
> think.
Have you thought about the COPY with binary format ? Thats looks more
robust than the text format you used in your benchmarks.
Frits Jalvingh wrote: > So, I am still very interested in getting normal inserts faster, because > that will gain speed for all work.. If Oracle can do it, and Postgres is > able to insert fast with copy- where lies the bottleneck with the insert > command? There seems to be quite a performance hit with the JDBC driver > itself (as the stored procedure is a lot faster), so I can look into that. > But even after that there is quite a gap.. Did you try inserting multiple tuples in one command? Something like INSERT INTO .. VALUES ('col1', 'col2'), ('col1', 'col2'), ('col1', 'col2') It's supposed to be faster than single-row inserts, though I don't know by how much. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 06/10/2017 07:32 PM, Alvaro Herrera wrote: > Frits Jalvingh wrote: > >> So, I am still very interested in getting normal inserts faster, because >> that will gain speed for all work.. If Oracle can do it, and Postgres is >> able to insert fast with copy- where lies the bottleneck with the insert >> command? There seems to be quite a performance hit with the JDBC driver >> itself (as the stored procedure is a lot faster), so I can look into that. >> But even after that there is quite a gap.. > > Did you try inserting multiple tuples in one command? Something like > INSERT INTO .. VALUES ('col1', 'col2'), ('col1', 'col2'), ('col1', 'col2') > It's supposed to be faster than single-row inserts, though I don't > know by how much. When I did the testing of the patch originally I saw significant improvements, e.g. 8x in early versions. The thread is here: https://www.postgresql.org/message-id/flat/44C4451A.4010906%40joeconway.com#44C4451A.4010906@joeconway.com Joe -- Crunchy Data - http://crunchydata.com PostgreSQL Support for Secure Enterprises Consulting, Training, & Open Source Development
Вложения
Frits>https://etc.to/confluence/display/~admjal/PostgreSQL+performance+tests
Do you really intend to measure just a single insert operation?
Vladimir Sitnikov wrote: > Alvaro>Something like > INSERT INTO .. VALUES ('col1', 'col2'), ('col1', 'col2'), ('col1', 'col2')>I > did not > Frits>try that, to be honest. > > pgjdbc does automatically rewrite insert values(); into insert ... > values(),(),(),() when reWriteBatchedInserts=true. I don't expect manual > multivalues to be noticeably faster there. Ahh, so that's what that option does :-) Nice to know -- great feature. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services