Обсуждение: degrading inser performance

Поиск
Список
Период
Сортировка

degrading inser performance

От
Eildert Groeneveld
Дата:
Dear list

I am experiencing a rather severe degradation of insert performance
starting from an empty database:


       120.000 mio SNPs imported in    28.9 sec -    4.16 mio/sec
       120.000 mio SNPs imported in    40.9 sec -    2.93 mio/sec
       120.000 mio SNPs imported in    49.7 sec -    2.41 mio/sec
       120.000 mio SNPs imported in    58.8 sec -    2.04 mio/sec
       120.000 mio SNPs imported in    68.9 sec -    1.74 mio/sec
       120.000 mio SNPs imported in    77.0 sec -    1.56 mio/sec
       120.000 mio SNPs imported in    85.1 sec -    1.41 mio/sec
       120.000 mio SNPs imported in    94.0 sec -    1.28 mio/sec
       120.000 mio SNPs imported in   103.4 sec -    1.16 mio/sec
       120.000 mio SNPs imported in   108.9 sec -    1.10 mio/sec
       120.000 mio SNPs imported in   117.2 sec -    1.02 mio/sec
       120.000 mio SNPs imported in   122.1 sec -    0.98 mio/sec
       120.000 mio SNPs imported in   132.6 sec -    0.90 mio/sec
       120.000 mio SNPs imported in   142.0 sec -    0.85 mio/sec
       120.000 mio SNPs imported in   147.3 sec -    0.81 mio/sec
       120.000 mio SNPs imported in   154.4 sec -    0.78 mio/sec
       120.000 mio SNPs imported in   163.9 sec -    0.73 mio/sec
       120.000 mio SNPs imported in   170.1 sec -    0.71 mio/sec
       120.000 mio SNPs imported in   179.1 sec -    0.67 mio/sec
       120.000 mio SNPs imported in   186.1 sec -    0.64 mio/sec

each line represents the insertion of 20000 records in two tables which is
not really a whole lot. Also, these 20000 get inserted in one program run.
The following lines are then again each the execution of that program.
The insert are a text string in one table and a bit varying of length packed
24000 bits, also no big deal.

As can be seen the degradation is severe going from 29 sec up to 186 sec
for the same amount of data inserted.

I have dropped the indices and primary keys, but that did not change the
picture. Made commits every 100 records: also no effect.
I have also played around with postgresql.conf but also this had no real
effect (which is actually not surprising considering the small size of the
database).

At this stage the who database has a size of around 1GB.

I am using pg 9.4

any idea of what might be going on?

cheers

Eildert





Re: degrading inser performance

От
Ladislav Lenart
Дата:
On 17.9.2015 13:32, Eildert Groeneveld wrote:
> Dear list
>
> I am experiencing a rather severe degradation of insert performance
> starting from an empty database:
>
>
>        120.000 mio SNPs imported in    28.9 sec -    4.16 mio/sec
>        120.000 mio SNPs imported in    40.9 sec -    2.93 mio/sec
>        120.000 mio SNPs imported in    49.7 sec -    2.41 mio/sec
>        120.000 mio SNPs imported in    58.8 sec -    2.04 mio/sec
>        120.000 mio SNPs imported in    68.9 sec -    1.74 mio/sec
>        120.000 mio SNPs imported in    77.0 sec -    1.56 mio/sec
>        120.000 mio SNPs imported in    85.1 sec -    1.41 mio/sec
>        120.000 mio SNPs imported in    94.0 sec -    1.28 mio/sec
>        120.000 mio SNPs imported in   103.4 sec -    1.16 mio/sec
>        120.000 mio SNPs imported in   108.9 sec -    1.10 mio/sec
>        120.000 mio SNPs imported in   117.2 sec -    1.02 mio/sec
>        120.000 mio SNPs imported in   122.1 sec -    0.98 mio/sec
>        120.000 mio SNPs imported in   132.6 sec -    0.90 mio/sec
>        120.000 mio SNPs imported in   142.0 sec -    0.85 mio/sec
>        120.000 mio SNPs imported in   147.3 sec -    0.81 mio/sec
>        120.000 mio SNPs imported in   154.4 sec -    0.78 mio/sec
>        120.000 mio SNPs imported in   163.9 sec -    0.73 mio/sec
>        120.000 mio SNPs imported in   170.1 sec -    0.71 mio/sec
>        120.000 mio SNPs imported in   179.1 sec -    0.67 mio/sec
>        120.000 mio SNPs imported in   186.1 sec -    0.64 mio/sec
>
> each line represents the insertion of 20000 records in two tables which is
> not really a whole lot. Also, these 20000 get inserted in one program run.
> The following lines are then again each the execution of that program.
> The insert are a text string in one table and a bit varying of length packed
> 24000 bits, also no big deal.
>
> As can be seen the degradation is severe going from 29 sec up to 186 sec
> for the same amount of data inserted.
>
> I have dropped the indices and primary keys, but that did not change the
> picture. Made commits every 100 records: also no effect.
> I have also played around with postgresql.conf but also this had no real
> effect (which is actually not surprising considering the small size of the
> database).
>
> At this stage the who database has a size of around 1GB.
>
> I am using pg 9.4
>
> any idea of what might be going on?


Hello.

Just a couple of questions...

You talk about two tables; have you also dropped FKs (you only mention indices
and PK)?

What SQL do you use for inserting the data:
 * one INSERT per row with autocommit
 * one INSERT per row inside BEGIN...COMMIT
 * one INSERT per bulk (20 000 rows)
 * one COPY per bulk (20 000 rows)
?

Is the loading of data the only activity on the server?

See also:
http://www.postgresql.org/docs/9.4/static/populate.html


HTH,

Ladislav Lenart




Re: degrading inser performance

От
Eildert Groeneveld
Дата:
On Do, 2015-09-17 at 14:11 +0200, Ladislav Lenart wrote:
> On 17.9.2015 13:32, Eildert Groeneveld wrote:
> > Dear list
> >
> > I am experiencing a rather severe degradation of insert performance
> > starting from an empty database:
> >
> >
> >        120.000 mio SNPs imported in    28.9 sec -    4.16 mio/sec
> >        120.000 mio SNPs imported in    40.9 sec -    2.93 mio/sec
> >        120.000 mio SNPs imported in    49.7 sec -    2.41 mio/sec
> >        120.000 mio SNPs imported in    58.8 sec -    2.04 mio/sec
> >        120.000 mio SNPs imported in    68.9 sec -    1.74 mio/sec
> >        120.000 mio SNPs imported in    77.0 sec -    1.56 mio/sec
> >        120.000 mio SNPs imported in    85.1 sec -    1.41 mio/sec
> >        120.000 mio SNPs imported in    94.0 sec -    1.28 mio/sec
> >        120.000 mio SNPs imported in   103.4 sec -    1.16 mio/sec
> >        120.000 mio SNPs imported in   108.9 sec -    1.10 mio/sec
> >        120.000 mio SNPs imported in   117.2 sec -    1.02 mio/sec
> >        120.000 mio SNPs imported in   122.1 sec -    0.98 mio/sec
> >        120.000 mio SNPs imported in   132.6 sec -    0.90 mio/sec
> >        120.000 mio SNPs imported in   142.0 sec -    0.85 mio/sec
> >        120.000 mio SNPs imported in   147.3 sec -    0.81 mio/sec
> >        120.000 mio SNPs imported in   154.4 sec -    0.78 mio/sec
> >        120.000 mio SNPs imported in   163.9 sec -    0.73 mio/sec
> >        120.000 mio SNPs imported in   170.1 sec -    0.71 mio/sec
> >        120.000 mio SNPs imported in   179.1 sec -    0.67 mio/sec
> >        120.000 mio SNPs imported in   186.1 sec -    0.64 mio/sec
> >
> > each line represents the insertion of 20000 records in two tables
> > which is
> > not really a whole lot. Also, these 20000 get inserted in one
> > program run.
> > The following lines are then again each the execution of that
> > program.
> > The insert are a text string in one table and a bit varying of
> > length packed
> > 24000 bits, also no big deal.
> >
> > As can be seen the degradation is severe going from 29 sec up to
> > 186 sec
> > for the same amount of data inserted.
> >
> > I have dropped the indices and primary keys, but that did not
> > change the
> > picture. Made commits every 100 records: also no effect.
> > I have also played around with postgresql.conf but also this had no
> > real
> > effect (which is actually not surprising considering the small size
> > of the
> > database).
> >
> > At this stage the who database has a size of around 1GB.
> >
> > I am using pg 9.4
> >
> > any idea of what might be going on?
>
>
> Hello.
>
> Just a couple of questions...
>
> You talk about two tables; have you also dropped FKs (you only
> mention indices
> and PK)?
yes, they were all gone
>
> What SQL do you use for inserting the data:
I go through ecpg
>  * one INSERT per row with autocommit
yes
>  * one INSERT per row inside BEGIN...COMMIT
also this, same result as above
>  * one INSERT per bulk (20 000 rows)
>  * one COPY per bulk (20 000 rows)
copy does not fit so well, as it is not only initial populating.

> Is the loading of data the only activity on the server?
yes, it is. I have this "feature" on every machine

> See also:
> http://www.postgresql.org/docs/9.4/static/populate.html
Thanks, yes, I have been through this.

millions of records seem to be the staple diet of PG, here the
degradation starts already with the second 20000 record batch.
>
greetings

Eildert
> HTH,
>
> Ladislav Lenart
>
>
--
Eildert Groeneveld
===================================================
Institute of Farm Animal Genetics (FLI)
Mariensee 31535 Neustadt Germany
Tel : (+49)(0)5034 871155 Fax : (+49)(0)5034 871143
e-mail: eildert.groeneveld@fli.bund.de
web:    http://vce.tzv.fal.de
==================================================




Re: degrading inser performance

От
Matheus de Oliveira
Дата:

On Thu, Sep 17, 2015 at 9:19 AM, Eildert Groeneveld <eildert.groeneveld@fli.bund.de> wrote:
>  * one COPY per bulk (20 000 rows)
copy does not fit so well, as it is not only initial populating.

Why do you say COPY doesn't fit? It seems to me that COPY fits perfectly for your case, and would certainly make the load faster.

I suspect (not sure though) that the degradation is most because you are inserting one row at a time, and, it needs to verify FSM (Free Space Map) for each tuple inserted, when the table start to get more populated, this verification starts to become slower. If that is really the case, COPY would certainly improve that, or even INSERT with many rows at once.

Regards,
--
Matheus de Oliveira


Re: degrading inser performance

От
Eildert Groeneveld
Дата:
Thanks for your input!
On Do, 2015-09-17 at 11:21 -0300, Matheus de Oliveira wrote:

On Thu, Sep 17, 2015 at 9:19 AM, Eildert Groeneveld <eildert.groeneveld@fli.bund.de> wrote:
>  * one COPY per bulk (20 000 rows)
copy does not fit so well, as it is not only initial populating.


Why do you say COPY doesn't fit? It seems to me that COPY fits perfectly for your case, and would certainly make the load faster.
well, more than one table needs to get populated and data is not really available in one file.

I suspect (not sure though) that the degradation is most because you are inserting one row at a time, and, it needs to verify FSM (Free Space Map) for each tuple inserted, when the table start to get more populated, this verification starts to become slower. If that is really the case, COPY would certainly improve that, or even INSERT with many rows at once.
allright, sounds reasonable. 

But what is your experience: is it possible that 
inserting the first 20000 records takes 29 seconds while inserting lot 20 (i.e. 9*20000 later) takes
186.9 sec? after all we are talking only about 200000 records? That take 6 times longer!!

odd, anyone has an idea?

greetings

Eildert


Regards,
--
Matheus de Oliveira


-- 
Eildert Groeneveld
===================================================
Institute of Farm Animal Genetics (FLI)
Mariensee 31535 Neustadt Germany
Tel : (+49)(0)5034 871155 Fax : (+49)(0)5034 871143
e-mail: eildert.groeneveld@fli.bund.de 
web:    http://vce.tzv.fal.de
==================================================

Re: degrading inser performance

От
Dave Cramer
Дата:
Nobody has asked what kind of machine this is ???

Hard disks, memory, etc.

What are your relevant settings in postgresql.conf ? Shared buffers, checkpoints, etc.

Also how big are the inserts ? What else is this machine doing ? Is it bare hardware, or a VM ?

Dave Cramer

dave.cramer(at)credativ(dot)ca
http://www.credativ.ca

On 17 September 2015 at 10:41, Eildert Groeneveld <eildert.groeneveld@fli.bund.de> wrote:
Thanks for your input!
On Do, 2015-09-17 at 11:21 -0300, Matheus de Oliveira wrote:

On Thu, Sep 17, 2015 at 9:19 AM, Eildert Groeneveld <eildert.groeneveld@fli.bund.de> wrote:
>  * one COPY per bulk (20 000 rows)
copy does not fit so well, as it is not only initial populating.


Why do you say COPY doesn't fit? It seems to me that COPY fits perfectly for your case, and would certainly make the load faster.
well, more than one table needs to get populated and data is not really available in one file.

I suspect (not sure though) that the degradation is most because you are inserting one row at a time, and, it needs to verify FSM (Free Space Map) for each tuple inserted, when the table start to get more populated, this verification starts to become slower. If that is really the case, COPY would certainly improve that, or even INSERT with many rows at once.
allright, sounds reasonable. 

But what is your experience: is it possible that 
inserting the first 20000 records takes 29 seconds while inserting lot 20 (i.e. 9*20000 later) takes
186.9 sec? after all we are talking only about 200000 records? That take 6 times longer!!

odd, anyone has an idea?

greetings

Eildert


Regards,
--
Matheus de Oliveira


-- 
Eildert Groeneveld
===================================================
Institute of Farm Animal Genetics (FLI)
Mariensee 31535 Neustadt Germany
Tel : (+49)(0)5034 871155 Fax : (+49)(0)5034 871143
e-mail: eildert.groeneveld@fli.bund.de 
web:    http://vce.tzv.fal.de
==================================================