Обсуждение: Inserts and bad performance

Поиск

Список

Период

Сортировка

Inserts and bad performance

От

"Godfrin, Philippe E"

Дата:

24 ноября 2021 г., 22:15:31

Greetings

I am inserting a large number of rows, 5,10, 15 million. The python code commits every 5000 inserts. The table has partitioned children.

At first, when there were a low number of rows inserted, the inserts would run at a good clip – 30 – 50K inserts per second. Now, after inserting oh say 1.5 Billion rows, the insert rate has dropped to around 5000 inserts per second. I dropped the unique index , rebuilt the other indexes and no change. The instance is 16 vcpu and 64GB ram.

I’m perplexed, I can’t see to find any reason for the slow down…

Thanks,

Phil Godfrin | Database Administration

NOV

NOV US | Engineering Data

9720 Beechnut St | Houston, Texas 77036

M 281.825.2311

E Philippe.Godfrin@nov.com

Re: Inserts and bad performance

От

Kenneth Marshall

Дата:

24 ноября 2021 г., 22:20:10

On Wed, Nov 24, 2021 at 07:15:31PM +0000, Godfrin, Philippe E wrote:
> Greetings
> I am inserting a large number of rows, 5,10, 15 million. The python code commits every 5000 inserts. The table has
partitionedchildren.
 
> 
> At first, when there were a low number of rows inserted, the inserts would run at a good clip - 30 - 50K inserts per
second.Now, after inserting oh say 1.5 Billion rows, the insert rate has dropped to around 5000 inserts per second. I
droppedthe unique index , rebuilt the other indexes and no change. The instance is 16 vcpu and 64GB ram.
 
> 
> I'm perplexed, I can't see to find any reason for the slow down...
> Thanks,
> pg

Hi,

With not much information, it may be I/O related. CPU and RAM cannot fix
that once items need to be written to disk. Are there any errors in the
logs or CPUs maxxed out?

Regards,
Ken

RE: [EXTERNAL] Re: Inserts and bad performance

От

"Godfrin, Philippe E"

Дата:

24 ноября 2021 г., 22:30:01

My apologies for the dearth of details. No on both the cpu and errors. But I do believe it is IO related. I just can't
findit.  
I thought maybe it was index splitting so I altered the unique index with filterfactor=40 and reindexed. No change.
I then dropped the unique index. No change.
I thought maybe it was checkpoint timeouts, but there was no correlation.
Oddly enough other jobs running concurrently, are also inserting, most likely into different partitions, are running
about2x faster than others. 

I'm rather perplexed.
pg

-----Original Message-----
From: Kenneth Marshall <ktm@rice.edu>
Sent: Wednesday, November 24, 2021 1:20 PM
To: Godfrin, Philippe E <Philippe.Godfrin@nov.com>
Cc: pgsql-general@lists.postgresql.org
Subject: [EXTERNAL] Re: Inserts and bad performance

On Wed, Nov 24, 2021 at 07:15:31PM +0000, Godfrin, Philippe E wrote:
> Greetings
> I am inserting a large number of rows, 5,10, 15 million. The python code commits every 5000 inserts. The table has
partitionedchildren. 
>
> At first, when there were a low number of rows inserted, the inserts would run at a good clip - 30 - 50K inserts per
second.Now, after inserting oh say 1.5 Billion rows, the insert rate has dropped to around 5000 inserts per second. I
droppedthe unique index , rebuilt the other indexes and no change. The instance is 16 vcpu and 64GB ram. 
>
> I'm perplexed, I can't see to find any reason for the slow down...
> Thanks,
> pg

Hi,

With not much information, it may be I/O related. CPU and RAM cannot fix that once items need to be written to disk.
Arethere any errors in the logs or CPUs maxxed out? 

Regards,
Ken

Re: Inserts and bad performance

От

Tom Lane

Дата:

24 ноября 2021 г., 22:31:53

"Godfrin, Philippe E" <Philippe.Godfrin@nov.com> writes:
> I am inserting a large number of rows, 5,10, 15 million. The python code commits every 5000 inserts. The table has
partitionedchildren. 
> At first, when there were a low number of rows inserted, the inserts would run at a good clip - 30 - 50K inserts per
second.Now, after inserting oh say 1.5 Billion rows, the insert rate has dropped to around 5000 inserts per second. I
droppedthe unique index , rebuilt the other indexes and no change. The instance is 16 vcpu and 64GB ram. 

Can you drop the indexes and not rebuild them till after the bulk load is
done?  Once the indexes exceed available RAM, insert performance is going
to fall off a cliff, except maybe for indexes that are receiving purely
sequential inserts (so that only the right end of the index gets touched).

Also see

https://www.postgresql.org/docs/current/populate.html

            regards, tom lane

Re: Inserts and bad performance

От

Gavin Roy

Дата:

24 ноября 2021 г., 22:50:16

On Wed, Nov 24, 2021 at 2:15 PM Godfrin, Philippe E <Philippe.Godfrin@nov.com> wrote:

Greetings
I am inserting a large number of rows, 5,10, 15 million. The python code commits every 5000 inserts. The table has partitioned children.

On the Python client side, if you're using psycopg, you should consider using using COPY instead of INSERT if you're not:

https://www.psycopg.org/psycopg3/docs/basic/copy.html#copy

And if using psycopg2, execute_batch might be of value:

https://www.psycopg.org/docs/extras.html?highlight=insert#psycopg2.extras.execute_batch

Regards,

Gavin

RE: [EXTERNAL] Re: Inserts and bad performance

От

"Godfrin, Philippe E"

Дата:

24 ноября 2021 г., 22:59:38

Hi Tom. Good point about the index paging out of the buffer. I did that and no change. I do have the shared buffers at 40GB, so there’s a good bit there, but I also did all those things on the page you referred, except for using copy. At this point the data has not been scrubbed, so I’m trapping data errors and duplicates. I am curios though, as sidebar, why copy is considered faster than inserts. I was unable to get COPY faster than around 25K inserts a second (pretty fast anyway). Frankly, initially I was running 3 concurrent insert jobs and getting 90K ins/sec ! but after a certain number of records, the speed just dropped off.

From: Tom Lane <tgl@sss.pgh.pa.us>
Sent: Wednesday, November 24, 2021 1:32 PM
To: Godfrin, Philippe E <Philippe.Godfrin@nov.com>
Cc: pgsql-general@lists.postgresql.org
Subject: [EXTERNAL] Re: Inserts and bad performance

"Godfrin, Philippe E" <Philippe.Godfrin@nov.com> writes:
> I am inserting a large number of rows, 5,10, 15 million. The python code commits every 5000 inserts. The table has partitioned children.
> At first, when there were a low number of rows inserted, the inserts would run at a good clip - 30 - 50K inserts per second. Now, after inserting oh say 1.5 Billion rows, the insert rate has dropped to around 5000 inserts per second. I dropped the unique index , rebuilt the other indexes and no change. The instance is 16 vcpu and 64GB ram.

Can you drop the indexes and not rebuild them till after the bulk load is
done? Once the indexes exceed available RAM, insert performance is going
to fall off a cliff, except maybe for indexes that are receiving purely
sequential inserts (so that only the right end of the index gets touched).

Also see

https://www.postgresql.org/docs/current/populate.html

regards, tom lane

RE: [EXTERNAL] Re: Inserts and bad performance

От

"Godfrin, Philippe E"

Дата:

24 ноября 2021 г., 23:27:48

The notion of COPY blocks and asynchronously is very interesting

From: Gavin Roy <gavinr@aweber.com>
Sent: Wednesday, November 24, 2021 1:50 PM
To: Godfrin, Philippe E <Philippe.Godfrin@nov.com>
Cc: pgsql-general@lists.postgresql.org
Subject: [EXTERNAL] Re: Inserts and bad performance

On Wed, Nov 24, 2021 at 2:15 PM Godfrin, Philippe E <Philippe.Godfrin@nov.com> wrote:

Greetings
I am inserting a large number of rows, 5,10, 15 million. The python code commits every 5000 inserts. The table has partitioned children.

On the Python client side, if you're using psycopg, you should consider using using COPY instead of INSERT if you're not:

https://www.psycopg.org/psycopg3/docs/basic/copy.html#copy

And if using psycopg2, execute_batch might be of value:

https://www.psycopg.org/docs/extras.html?highlight=insert#psycopg2.extras.execute_batch

Regards,

Gavin

Re: [EXTERNAL] Re: Inserts and bad performance

От

Michael Lewis

Дата:

25 ноября 2021 г., 03:58:00

How many partitions? How many rows do they have when performance is slowing considerably? Does this table get many updates or is it insert only? What version of PostgreSQL? Are the inserts randomly distributed among the partitions or targeting one or a few partitions? Are you able to capture an example and run it in a transaction with explain (analyze, buffers, verbose) and then rollback?

Michael Lewis | Database Engineer

Entrata

Re: [EXTERNAL] Re: Inserts and bad performance

От

David Rowley

Дата:

25 ноября 2021 г., 04:13:26

On Thu, 25 Nov 2021 at 08:59, Godfrin, Philippe E
<Philippe.Godfrin@nov.com> wrote:
> Hi Tom. Good point about the index paging out of the buffer. I did that and no change. I do have the shared buffers
at40GB, so there’s a good bit there, but I also did all those things on the page you referred, except for using copy.
Atthis point the data has not been scrubbed, so I’m trapping data errors and duplicates. I am curios though, as
sidebar,why copy is considered faster than inserts. I was unable to get COPY faster than around 25K inserts a second
(prettyfast anyway). Frankly, initially I was running 3 concurrent insert jobs and getting 90K ins/sec ! but after a
certainnumber of records, the speed just dropped off. 

EXPLAIN (ANALYZE, BUFFERS) works with INSERTs. You just need to be
aware that using ANALYZE will perform the actual insert too. So you
might want to use BEGIN; and ROLLBACK; if it's not data that you want
to keep.

SET track_io_timing = on; might help you too.

David

Re: Inserts and bad performance

От

Ron

Дата:

25 ноября 2021 г., 07:58:19

On 11/24/21 1:15 PM, Godfrin, Philippe E wrote:

@font-face {font-family:"Cambria Math"; panose-1:2 4 5 3 5 4 6 3 2 4;}@font-face {font-family:Calibri; panose-1:2 15 5 2 2 2 4 3 2 4;}@font-face {font-family:"Calibri Light"; panose-1:2 15 3 2 2 2 4 3 2 4;}@font-face {font-family:"Lucida Console"; panose-1:2 11 6 9 4 5 4 2 2 4;}p.MsoNormal, li.MsoNormal, div.MsoNormal {margin:0in; font-size:11.0pt; font-family:"Calibri",sans-serif;}span.EmailStyle19 {mso-style-type:personal-compose; font-family:"Calibri",sans-serif; color:windowtext;}.MsoChpDefault {mso-style-type:export-only; font-family:"Calibri",sans-serif;}div.WordSection1 {page:WordSection1;}

[snip]

I dropped the unique index , rebuilt the other indexes and no change.

IMNSHO, this is the worst possible approach. Drop everything except the unique index, and then (if possible) sort the input file by the unique key. That'll increase buffered IO; otherwise, you're bopping all around the filesystem.

Using a bulk loader if possible would increase speeds

--
Angular momentum makes the world go 'round.

RE: [EXTERNAL] Re: Inserts and bad performance

От

"Godfrin, Philippe E"

Дата:

26 ноября 2021 г., 18:50:12

Excellent idea David, silly me, I didn't think of that. For the other questions:
>How many partitions? 
14
>How many rows do they have when performance is slowing considerably? 
Not sure, maybe on the low millions
>Does this table get many updates or is it insert only? 
insert
>What version of PostgreSQL? 
13
>Are the inserts randomly distributed among the partitions or targeting one or a few partitions? 
Sequentially one partition at a time, so each set of runs is inserting across each part.
>Are you able to capture an example and run it in a transaction with explain (analyze, buffers, verbose) and then
rollback?
Yes, I'm looking into that
pg


-----Original Message-----
From: David Rowley <dgrowleyml@gmail.com> 
Sent: Wednesday, November 24, 2021 7:13 PM
To: Godfrin, Philippe E <Philippe.Godfrin@nov.com>
Cc: Tom Lane <tgl@sss.pgh.pa.us>; pgsql-general@lists.postgresql.org
Subject: Re: [EXTERNAL] Re: Inserts and bad performance

On Thu, 25 Nov 2021 at 08:59, Godfrin, Philippe E <Philippe.Godfrin@nov.com> wrote:
> Hi Tom. Good point about the index paging out of the buffer. I did that and no change. I do have the shared buffers
at40GB, so there’s a good bit there, but I also did all those things on the page you referred, except for using copy.
Atthis point the data has not been scrubbed, so I’m trapping data errors and duplicates. I am curios though, as
sidebar,why copy is considered faster than inserts. I was unable to get COPY faster than around 25K inserts a second
(prettyfast anyway). Frankly, initially I was running 3 concurrent insert jobs and getting 90K ins/sec ! but after a
certainnumber of records, the speed just dropped off.
 

EXPLAIN (ANALYZE, BUFFERS) works with INSERTs. You just need to be aware that using ANALYZE will perform the actual
inserttoo. So you might want to use BEGIN; and ROLLBACK; if it's not data that you want to keep.
 

SET track_io_timing = on; might help you too.

David

RE: [EXTERNAL] Re: Inserts and bad performance

От

"Godfrin, Philippe E"

Дата:

28 ноября 2021 г., 07:14:35

Hi Gavin – thanks I hadn’t realized that about psychopg. I’m on the earlier version, so I can’t use what you recommended at this point. But I did use copy_expert.

Interestingly enough the performance of the copy statement is only slightly better than the insert, as I was running inserts with 5000 values clauses. In the end, the current config couldn’t keep up with the WAL creation, so I turned all that off. But still no perf gains. I also turned off fsync and set the kernel settings to 10% and 98% for dirty pages…

I wonder if there’s a better load product than COPY???? But I’d still like to know what separates COPY from bulk inserts…

On Wed, Nov 24, 2021 at 2:15 PM Godfrin, Philippe E <Philippe.Godfrin@nov.com> wrote:

Greetings
I am inserting a large number of rows, 5,10, 15 million. The python code commits every 5000 inserts. The table has partitioned children.

On the Python client side, if you're using psycopg, you should consider using using COPY instead of INSERT if you're not:

https://www.psycopg.org/psycopg3/docs/basic/copy.html#copy

And if using psycopg2, execute_batch might be of value:

https://www.psycopg.org/docs/extras.html?highlight=insert#psycopg2.extras.execute_batch

Regards,

Gavin

RE: [EXTERNAL] Re: Inserts and bad performance

От

"Godfrin, Philippe E"

Дата:

28 ноября 2021 г., 07:17:08

Right you are sir! I figured that out a few hours ago!

From: Ron <ronljohnsonjr@gmail.com>
Sent: Wednesday, November 24, 2021 10:58 PM
To: pgsql-general@lists.postgresql.org
Subject: [EXTERNAL] Re: Inserts and bad performance

On 11/24/21 1:15 PM, Godfrin, Philippe E wrote:

[snip]

I dropped the unique index , rebuilt the other indexes and no change.

--
Angular momentum makes the world go 'round.

Re: [EXTERNAL] Re: Inserts and bad performance

От

Gavin Flower

Дата:

29 ноября 2021 г., 00:23:23

On 28/11/21 17:17, Godfrin, Philippe E wrote:
>
> Right you are sir! I figured that out a few hours ago!
>
> pg
>
> *From:* Ron <ronljohnsonjr@gmail.com>
> *Sent:* Wednesday, November 24, 2021 10:58 PM
> *To:* pgsql-general@lists.postgresql.org
> *Subject:* [EXTERNAL] Re: Inserts and bad performance
>
> On 11/24/21 1:15 PM, Godfrin, Philippe E wrote:
>
> [snip]
>
>     I dropped the unique index , rebuilt the other indexes and no change.
>
>
> IMNSHO, this is the worst possible approach.  Drop everything *except* 
> the unique index, and then (if possible) sort the input file by the 
> unique key.   That'll increase buffered IO; otherwise, you're bopping 
> all around the filesystem.
>
> Using a bulk loader if possible would increase speeds
>
> -- 
> Angular momentum makes the world go 'round.
>
Please don't top post!


Cheers,
Gavin

Re: RE: [EXTERNAL] Re: Inserts and bad performance

От

Ali .

Дата:

30 ноября 2021 г., 09:01:20

Ok, thanks

--
Sent from Mail.ru app for Android

Wednesday, 24 November 2021, 11:28pm +03:00 from Godfrin, Philippe E philippe.godfrin@nov.com:

The notion of COPY blocks and asynchronously is very interesting

On Wed, Nov 24, 2021 at 2:15 PM Godfrin, Philippe E <Philippe.Godfrin@nov.com> wrote:

Greetings
I am inserting a large number of rows, 5,10, 15 million. The python code commits every 5000 inserts. The table has partitioned children.

On the Python client side, if you're using psycopg, you should consider using using COPY instead of INSERT if you're not:

https://www.psycopg.org/psycopg3/docs/basic/copy.html#copy

And if using psycopg2, execute_batch might be of value:

https://www.psycopg.org/docs/extras.html?highlight=insert#psycopg2.extras.execute_batch

Regards,

Gavin

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: Inserts and bad performance