Обсуждение: degrading inser performance
Dear list I am experiencing a rather severe degradation of insert performance starting from an empty database: 120.000 mio SNPs imported in 28.9 sec - 4.16 mio/sec 120.000 mio SNPs imported in 40.9 sec - 2.93 mio/sec 120.000 mio SNPs imported in 49.7 sec - 2.41 mio/sec 120.000 mio SNPs imported in 58.8 sec - 2.04 mio/sec 120.000 mio SNPs imported in 68.9 sec - 1.74 mio/sec 120.000 mio SNPs imported in 77.0 sec - 1.56 mio/sec 120.000 mio SNPs imported in 85.1 sec - 1.41 mio/sec 120.000 mio SNPs imported in 94.0 sec - 1.28 mio/sec 120.000 mio SNPs imported in 103.4 sec - 1.16 mio/sec 120.000 mio SNPs imported in 108.9 sec - 1.10 mio/sec 120.000 mio SNPs imported in 117.2 sec - 1.02 mio/sec 120.000 mio SNPs imported in 122.1 sec - 0.98 mio/sec 120.000 mio SNPs imported in 132.6 sec - 0.90 mio/sec 120.000 mio SNPs imported in 142.0 sec - 0.85 mio/sec 120.000 mio SNPs imported in 147.3 sec - 0.81 mio/sec 120.000 mio SNPs imported in 154.4 sec - 0.78 mio/sec 120.000 mio SNPs imported in 163.9 sec - 0.73 mio/sec 120.000 mio SNPs imported in 170.1 sec - 0.71 mio/sec 120.000 mio SNPs imported in 179.1 sec - 0.67 mio/sec 120.000 mio SNPs imported in 186.1 sec - 0.64 mio/sec each line represents the insertion of 20000 records in two tables which is not really a whole lot. Also, these 20000 get inserted in one program run. The following lines are then again each the execution of that program. The insert are a text string in one table and a bit varying of length packed 24000 bits, also no big deal. As can be seen the degradation is severe going from 29 sec up to 186 sec for the same amount of data inserted. I have dropped the indices and primary keys, but that did not change the picture. Made commits every 100 records: also no effect. I have also played around with postgresql.conf but also this had no real effect (which is actually not surprising considering the small size of the database). At this stage the who database has a size of around 1GB. I am using pg 9.4 any idea of what might be going on? cheers Eildert
On 17.9.2015 13:32, Eildert Groeneveld wrote: > Dear list > > I am experiencing a rather severe degradation of insert performance > starting from an empty database: > > > 120.000 mio SNPs imported in 28.9 sec - 4.16 mio/sec > 120.000 mio SNPs imported in 40.9 sec - 2.93 mio/sec > 120.000 mio SNPs imported in 49.7 sec - 2.41 mio/sec > 120.000 mio SNPs imported in 58.8 sec - 2.04 mio/sec > 120.000 mio SNPs imported in 68.9 sec - 1.74 mio/sec > 120.000 mio SNPs imported in 77.0 sec - 1.56 mio/sec > 120.000 mio SNPs imported in 85.1 sec - 1.41 mio/sec > 120.000 mio SNPs imported in 94.0 sec - 1.28 mio/sec > 120.000 mio SNPs imported in 103.4 sec - 1.16 mio/sec > 120.000 mio SNPs imported in 108.9 sec - 1.10 mio/sec > 120.000 mio SNPs imported in 117.2 sec - 1.02 mio/sec > 120.000 mio SNPs imported in 122.1 sec - 0.98 mio/sec > 120.000 mio SNPs imported in 132.6 sec - 0.90 mio/sec > 120.000 mio SNPs imported in 142.0 sec - 0.85 mio/sec > 120.000 mio SNPs imported in 147.3 sec - 0.81 mio/sec > 120.000 mio SNPs imported in 154.4 sec - 0.78 mio/sec > 120.000 mio SNPs imported in 163.9 sec - 0.73 mio/sec > 120.000 mio SNPs imported in 170.1 sec - 0.71 mio/sec > 120.000 mio SNPs imported in 179.1 sec - 0.67 mio/sec > 120.000 mio SNPs imported in 186.1 sec - 0.64 mio/sec > > each line represents the insertion of 20000 records in two tables which is > not really a whole lot. Also, these 20000 get inserted in one program run. > The following lines are then again each the execution of that program. > The insert are a text string in one table and a bit varying of length packed > 24000 bits, also no big deal. > > As can be seen the degradation is severe going from 29 sec up to 186 sec > for the same amount of data inserted. > > I have dropped the indices and primary keys, but that did not change the > picture. Made commits every 100 records: also no effect. > I have also played around with postgresql.conf but also this had no real > effect (which is actually not surprising considering the small size of the > database). > > At this stage the who database has a size of around 1GB. > > I am using pg 9.4 > > any idea of what might be going on? Hello. Just a couple of questions... You talk about two tables; have you also dropped FKs (you only mention indices and PK)? What SQL do you use for inserting the data: * one INSERT per row with autocommit * one INSERT per row inside BEGIN...COMMIT * one INSERT per bulk (20 000 rows) * one COPY per bulk (20 000 rows) ? Is the loading of data the only activity on the server? See also: http://www.postgresql.org/docs/9.4/static/populate.html HTH, Ladislav Lenart
On Do, 2015-09-17 at 14:11 +0200, Ladislav Lenart wrote: > On 17.9.2015 13:32, Eildert Groeneveld wrote: > > Dear list > > > > I am experiencing a rather severe degradation of insert performance > > starting from an empty database: > > > > > > 120.000 mio SNPs imported in 28.9 sec - 4.16 mio/sec > > 120.000 mio SNPs imported in 40.9 sec - 2.93 mio/sec > > 120.000 mio SNPs imported in 49.7 sec - 2.41 mio/sec > > 120.000 mio SNPs imported in 58.8 sec - 2.04 mio/sec > > 120.000 mio SNPs imported in 68.9 sec - 1.74 mio/sec > > 120.000 mio SNPs imported in 77.0 sec - 1.56 mio/sec > > 120.000 mio SNPs imported in 85.1 sec - 1.41 mio/sec > > 120.000 mio SNPs imported in 94.0 sec - 1.28 mio/sec > > 120.000 mio SNPs imported in 103.4 sec - 1.16 mio/sec > > 120.000 mio SNPs imported in 108.9 sec - 1.10 mio/sec > > 120.000 mio SNPs imported in 117.2 sec - 1.02 mio/sec > > 120.000 mio SNPs imported in 122.1 sec - 0.98 mio/sec > > 120.000 mio SNPs imported in 132.6 sec - 0.90 mio/sec > > 120.000 mio SNPs imported in 142.0 sec - 0.85 mio/sec > > 120.000 mio SNPs imported in 147.3 sec - 0.81 mio/sec > > 120.000 mio SNPs imported in 154.4 sec - 0.78 mio/sec > > 120.000 mio SNPs imported in 163.9 sec - 0.73 mio/sec > > 120.000 mio SNPs imported in 170.1 sec - 0.71 mio/sec > > 120.000 mio SNPs imported in 179.1 sec - 0.67 mio/sec > > 120.000 mio SNPs imported in 186.1 sec - 0.64 mio/sec > > > > each line represents the insertion of 20000 records in two tables > > which is > > not really a whole lot. Also, these 20000 get inserted in one > > program run. > > The following lines are then again each the execution of that > > program. > > The insert are a text string in one table and a bit varying of > > length packed > > 24000 bits, also no big deal. > > > > As can be seen the degradation is severe going from 29 sec up to > > 186 sec > > for the same amount of data inserted. > > > > I have dropped the indices and primary keys, but that did not > > change the > > picture. Made commits every 100 records: also no effect. > > I have also played around with postgresql.conf but also this had no > > real > > effect (which is actually not surprising considering the small size > > of the > > database). > > > > At this stage the who database has a size of around 1GB. > > > > I am using pg 9.4 > > > > any idea of what might be going on? > > > Hello. > > Just a couple of questions... > > You talk about two tables; have you also dropped FKs (you only > mention indices > and PK)? yes, they were all gone > > What SQL do you use for inserting the data: I go through ecpg > * one INSERT per row with autocommit yes > * one INSERT per row inside BEGIN...COMMIT also this, same result as above > * one INSERT per bulk (20 000 rows) > * one COPY per bulk (20 000 rows) copy does not fit so well, as it is not only initial populating. > Is the loading of data the only activity on the server? yes, it is. I have this "feature" on every machine > See also: > http://www.postgresql.org/docs/9.4/static/populate.html Thanks, yes, I have been through this. millions of records seem to be the staple diet of PG, here the degradation starts already with the second 20000 record batch. > greetings Eildert > HTH, > > Ladislav Lenart > > -- Eildert Groeneveld =================================================== Institute of Farm Animal Genetics (FLI) Mariensee 31535 Neustadt Germany Tel : (+49)(0)5034 871155 Fax : (+49)(0)5034 871143 e-mail: eildert.groeneveld@fli.bund.de web: http://vce.tzv.fal.de ==================================================
On Thu, Sep 17, 2015 at 9:19 AM, Eildert Groeneveld <eildert.groeneveld@fli.bund.de> wrote:
> * one COPY per bulk (20 000 rows)
copy does not fit so well, as it is not only initial populating.
Why do you say COPY doesn't fit? It seems to me that COPY fits perfectly for your case, and would certainly make the load faster.
I suspect (not sure though) that the degradation is most because you are inserting one row at a time, and, it needs to verify FSM (Free Space Map) for each tuple inserted, when the table start to get more populated, this verification starts to become slower. If that is really the case, COPY would certainly improve that, or even INSERT with many rows at once.
I suspect (not sure though) that the degradation is most because you are inserting one row at a time, and, it needs to verify FSM (Free Space Map) for each tuple inserted, when the table start to get more populated, this verification starts to become slower. If that is really the case, COPY would certainly improve that, or even INSERT with many rows at once.
Regards,
--
Matheus de Oliveira
Thanks for your input!
On Do, 2015-09-17 at 11:21 -0300, Matheus de Oliveira wrote:
On Thu, Sep 17, 2015 at 9:19 AM, Eildert Groeneveld <eildert.groeneveld@fli.bund.de> wrote:> * one COPY per bulk (20 000 rows)
copy does not fit so well, as it is not only initial populating.Why do you say COPY doesn't fit? It seems to me that COPY fits perfectly for your case, and would certainly make the load faster.
well, more than one table needs to get populated and data is not really available in one file.
I suspect (not sure though) that the degradation is most because you are inserting one row at a time, and, it needs to verify FSM (Free Space Map) for each tuple inserted, when the table start to get more populated, this verification starts to become slower. If that is really the case, COPY would certainly improve that, or even INSERT with many rows at once.
allright, sounds reasonable.
But what is your experience: is it possible that
inserting the first 20000 records takes 29 seconds while inserting lot 20 (i.e. 9*20000 later) takes
186.9 sec? after all we are talking only about 200000 records? That take 6 times longer!!
odd, anyone has an idea?
greetings
Eildert
Regards,--Matheus de Oliveira
-- Eildert Groeneveld =================================================== Institute of Farm Animal Genetics (FLI) Mariensee 31535 Neustadt Germany Tel : (+49)(0)5034 871155 Fax : (+49)(0)5034 871143 e-mail: eildert.groeneveld@fli.bund.de web: http://vce.tzv.fal.de ==================================================
Nobody has asked what kind of machine this is ???
Hard disks, memory, etc.
What are your relevant settings in postgresql.conf ? Shared buffers, checkpoints, etc.
Also how big are the inserts ? What else is this machine doing ? Is it bare hardware, or a VM ?
On 17 September 2015 at 10:41, Eildert Groeneveld <eildert.groeneveld@fli.bund.de> wrote:
Thanks for your input!On Do, 2015-09-17 at 11:21 -0300, Matheus de Oliveira wrote:On Thu, Sep 17, 2015 at 9:19 AM, Eildert Groeneveld <eildert.groeneveld@fli.bund.de> wrote:> * one COPY per bulk (20 000 rows)
copy does not fit so well, as it is not only initial populating.Why do you say COPY doesn't fit? It seems to me that COPY fits perfectly for your case, and would certainly make the load faster.well, more than one table needs to get populated and data is not really available in one file.
I suspect (not sure though) that the degradation is most because you are inserting one row at a time, and, it needs to verify FSM (Free Space Map) for each tuple inserted, when the table start to get more populated, this verification starts to become slower. If that is really the case, COPY would certainly improve that, or even INSERT with many rows at once.allright, sounds reasonable.But what is your experience: is it possible thatinserting the first 20000 records takes 29 seconds while inserting lot 20 (i.e. 9*20000 later) takes186.9 sec? after all we are talking only about 200000 records? That take 6 times longer!!odd, anyone has an idea?greetingsEildertRegards,--Matheus de Oliveira-- Eildert Groeneveld =================================================== Institute of Farm Animal Genetics (FLI) Mariensee 31535 Neustadt Germany Tel : (+49)(0)5034 871155 Fax : (+49)(0)5034 871143 e-mail: eildert.groeneveld@fli.bund.de web: http://vce.tzv.fal.de ==================================================