Обсуждение: Inserts and bad performance
Greetings
I am inserting a large number of rows, 5,10, 15 million. The python code commits every 5000 inserts. The table has partitioned children.
At first, when there were a low number of rows inserted, the inserts would run at a good clip – 30 – 50K inserts per second. Now, after inserting oh say 1.5 Billion rows, the insert rate has dropped to around 5000 inserts per second. I dropped the unique index , rebuilt the other indexes and no change. The instance is 16 vcpu and 64GB ram.
I’m perplexed, I can’t see to find any reason for the slow down…
Thanks,
pg
Phil Godfrin | Database Administration
NOV
NOV US | Engineering Data
9720 Beechnut St | Houston, Texas 77036
M 281.825.2311
On Wed, Nov 24, 2021 at 07:15:31PM +0000, Godfrin, Philippe E wrote: > Greetings > I am inserting a large number of rows, 5,10, 15 million. The python code commits every 5000 inserts. The table has partitionedchildren. > > At first, when there were a low number of rows inserted, the inserts would run at a good clip - 30 - 50K inserts per second.Now, after inserting oh say 1.5 Billion rows, the insert rate has dropped to around 5000 inserts per second. I droppedthe unique index , rebuilt the other indexes and no change. The instance is 16 vcpu and 64GB ram. > > I'm perplexed, I can't see to find any reason for the slow down... > Thanks, > pg Hi, With not much information, it may be I/O related. CPU and RAM cannot fix that once items need to be written to disk. Are there any errors in the logs or CPUs maxxed out? Regards, Ken
My apologies for the dearth of details. No on both the cpu and errors. But I do believe it is IO related. I just can't findit. I thought maybe it was index splitting so I altered the unique index with filterfactor=40 and reindexed. No change. I then dropped the unique index. No change. I thought maybe it was checkpoint timeouts, but there was no correlation. Oddly enough other jobs running concurrently, are also inserting, most likely into different partitions, are running about2x faster than others. I'm rather perplexed. pg -----Original Message----- From: Kenneth Marshall <ktm@rice.edu> Sent: Wednesday, November 24, 2021 1:20 PM To: Godfrin, Philippe E <Philippe.Godfrin@nov.com> Cc: pgsql-general@lists.postgresql.org Subject: [EXTERNAL] Re: Inserts and bad performance On Wed, Nov 24, 2021 at 07:15:31PM +0000, Godfrin, Philippe E wrote: > Greetings > I am inserting a large number of rows, 5,10, 15 million. The python code commits every 5000 inserts. The table has partitionedchildren. > > At first, when there were a low number of rows inserted, the inserts would run at a good clip - 30 - 50K inserts per second.Now, after inserting oh say 1.5 Billion rows, the insert rate has dropped to around 5000 inserts per second. I droppedthe unique index , rebuilt the other indexes and no change. The instance is 16 vcpu and 64GB ram. > > I'm perplexed, I can't see to find any reason for the slow down... > Thanks, > pg Hi, With not much information, it may be I/O related. CPU and RAM cannot fix that once items need to be written to disk. Arethere any errors in the logs or CPUs maxxed out? Regards, Ken
"Godfrin, Philippe E" <Philippe.Godfrin@nov.com> writes: > I am inserting a large number of rows, 5,10, 15 million. The python code commits every 5000 inserts. The table has partitionedchildren. > At first, when there were a low number of rows inserted, the inserts would run at a good clip - 30 - 50K inserts per second.Now, after inserting oh say 1.5 Billion rows, the insert rate has dropped to around 5000 inserts per second. I droppedthe unique index , rebuilt the other indexes and no change. The instance is 16 vcpu and 64GB ram. Can you drop the indexes and not rebuild them till after the bulk load is done? Once the indexes exceed available RAM, insert performance is going to fall off a cliff, except maybe for indexes that are receiving purely sequential inserts (so that only the right end of the index gets touched). Also see https://www.postgresql.org/docs/current/populate.html regards, tom lane
Greetings
I am inserting a large number of rows, 5,10, 15 million. The python code commits every 5000 inserts. The table has partitioned children.
Hi Tom. Good point about the index paging out of the buffer. I did that and no change. I do have the shared buffers at 40GB, so there’s a good bit there, but I also did all those things on the page you referred, except for using copy. At this point the data has not been scrubbed, so I’m trapping data errors and duplicates. I am curios though, as sidebar, why copy is considered faster than inserts. I was unable to get COPY faster than around 25K inserts a second (pretty fast anyway). Frankly, initially I was running 3 concurrent insert jobs and getting 90K ins/sec ! but after a certain number of records, the speed just dropped off.
pg
From: Tom Lane <tgl@sss.pgh.pa.us>
Sent: Wednesday, November 24, 2021 1:32 PM
To: Godfrin, Philippe E <Philippe.Godfrin@nov.com>
Cc: pgsql-general@lists.postgresql.org
Subject: [EXTERNAL] Re: Inserts and bad performance
"Godfrin, Philippe E" <Philippe.Godfrin@nov.com> writes:
> I am inserting a large number of rows, 5,10, 15 million. The python code commits every 5000 inserts. The table has partitioned children.
> At first, when there were a low number of rows inserted, the inserts would run at a good clip - 30 - 50K inserts per second. Now, after inserting oh say 1.5 Billion rows, the insert rate has dropped to around 5000 inserts per second. I dropped the unique index , rebuilt the other indexes and no change. The instance is 16 vcpu and 64GB ram.
Can you drop the indexes and not rebuild them till after the bulk load is
done? Once the indexes exceed available RAM, insert performance is going
to fall off a cliff, except maybe for indexes that are receiving purely
sequential inserts (so that only the right end of the index gets touched).
Also see
https://www.postgresql.org/docs/current/populate.html
regards, tom lane
The notion of COPY blocks and asynchronously is very interesting
From: Gavin Roy <gavinr@aweber.com>
Sent: Wednesday, November 24, 2021 1:50 PM
To: Godfrin, Philippe E <Philippe.Godfrin@nov.com>
Cc: pgsql-general@lists.postgresql.org
Subject: [EXTERNAL] Re: Inserts and bad performance
On Wed, Nov 24, 2021 at 2:15 PM Godfrin, Philippe E <Philippe.Godfrin@nov.com> wrote:
Greetings
I am inserting a large number of rows, 5,10, 15 million. The python code commits every 5000 inserts. The table has partitioned children.
On the Python client side, if you're using psycopg, you should consider using using COPY instead of INSERT if you're not:
And if using psycopg2, execute_batch might be of value:
Regards,
Gavin
On Thu, 25 Nov 2021 at 08:59, Godfrin, Philippe E <Philippe.Godfrin@nov.com> wrote: > Hi Tom. Good point about the index paging out of the buffer. I did that and no change. I do have the shared buffers at40GB, so there’s a good bit there, but I also did all those things on the page you referred, except for using copy. Atthis point the data has not been scrubbed, so I’m trapping data errors and duplicates. I am curios though, as sidebar,why copy is considered faster than inserts. I was unable to get COPY faster than around 25K inserts a second (prettyfast anyway). Frankly, initially I was running 3 concurrent insert jobs and getting 90K ins/sec ! but after a certainnumber of records, the speed just dropped off. EXPLAIN (ANALYZE, BUFFERS) works with INSERTs. You just need to be aware that using ANALYZE will perform the actual insert too. So you might want to use BEGIN; and ROLLBACK; if it's not data that you want to keep. SET track_io_timing = on; might help you too. David
[snip]@font-face {font-family:"Cambria Math"; panose-1:2 4 5 3 5 4 6 3 2 4;}@font-face {font-family:Calibri; panose-1:2 15 5 2 2 2 4 3 2 4;}@font-face {font-family:"Calibri Light"; panose-1:2 15 3 2 2 2 4 3 2 4;}@font-face {font-family:"Lucida Console"; panose-1:2 11 6 9 4 5 4 2 2 4;}p.MsoNormal, li.MsoNormal, div.MsoNormal {margin:0in; font-size:11.0pt; font-family:"Calibri",sans-serif;}span.EmailStyle19 {mso-style-type:personal-compose; font-family:"Calibri",sans-serif; color:windowtext;}.MsoChpDefault {mso-style-type:export-only; font-family:"Calibri",sans-serif;}div.WordSection1 {page:WordSection1;}
I dropped the unique index , rebuilt the other indexes and no change.
IMNSHO, this is the worst possible approach. Drop everything except the unique index, and then (if possible) sort the input file by the unique key. That'll increase buffered IO; otherwise, you're bopping all around the filesystem.
Using a bulk loader if possible would increase speeds
Angular momentum makes the world go 'round.
Excellent idea David, silly me, I didn't think of that. For the other questions: >How many partitions? 14 >How many rows do they have when performance is slowing considerably? Not sure, maybe on the low millions >Does this table get many updates or is it insert only? insert >What version of PostgreSQL? 13 >Are the inserts randomly distributed among the partitions or targeting one or a few partitions? Sequentially one partition at a time, so each set of runs is inserting across each part. >Are you able to capture an example and run it in a transaction with explain (analyze, buffers, verbose) and then rollback? Yes, I'm looking into that pg -----Original Message----- From: David Rowley <dgrowleyml@gmail.com> Sent: Wednesday, November 24, 2021 7:13 PM To: Godfrin, Philippe E <Philippe.Godfrin@nov.com> Cc: Tom Lane <tgl@sss.pgh.pa.us>; pgsql-general@lists.postgresql.org Subject: Re: [EXTERNAL] Re: Inserts and bad performance On Thu, 25 Nov 2021 at 08:59, Godfrin, Philippe E <Philippe.Godfrin@nov.com> wrote: > Hi Tom. Good point about the index paging out of the buffer. I did that and no change. I do have the shared buffers at40GB, so there’s a good bit there, but I also did all those things on the page you referred, except for using copy. Atthis point the data has not been scrubbed, so I’m trapping data errors and duplicates. I am curios though, as sidebar,why copy is considered faster than inserts. I was unable to get COPY faster than around 25K inserts a second (prettyfast anyway). Frankly, initially I was running 3 concurrent insert jobs and getting 90K ins/sec ! but after a certainnumber of records, the speed just dropped off. EXPLAIN (ANALYZE, BUFFERS) works with INSERTs. You just need to be aware that using ANALYZE will perform the actual inserttoo. So you might want to use BEGIN; and ROLLBACK; if it's not data that you want to keep. SET track_io_timing = on; might help you too. David
Hi Gavin – thanks I hadn’t realized that about psychopg. I’m on the earlier version, so I can’t use what you recommended at this point. But I did use copy_expert.
Interestingly enough the performance of the copy statement is only slightly better than the insert, as I was running inserts with 5000 values clauses. In the end, the current config couldn’t keep up with the WAL creation, so I turned all that off. But still no perf gains. I also turned off fsync and set the kernel settings to 10% and 98% for dirty pages…
I wonder if there’s a better load product than COPY???? But I’d still like to know what separates COPY from bulk inserts…
pf
From: Gavin Roy <gavinr@aweber.com>
Sent: Wednesday, November 24, 2021 1:50 PM
To: Godfrin, Philippe E <Philippe.Godfrin@nov.com>
Cc: pgsql-general@lists.postgresql.org
Subject: [EXTERNAL] Re: Inserts and bad performance
On Wed, Nov 24, 2021 at 2:15 PM Godfrin, Philippe E <Philippe.Godfrin@nov.com> wrote:
Greetings
I am inserting a large number of rows, 5,10, 15 million. The python code commits every 5000 inserts. The table has partitioned children.
On the Python client side, if you're using psycopg, you should consider using using COPY instead of INSERT if you're not:
And if using psycopg2, execute_batch might be of value:
Regards,
Gavin
Right you are sir! I figured that out a few hours ago!
pg
From: Ron <ronljohnsonjr@gmail.com>
Sent: Wednesday, November 24, 2021 10:58 PM
To: pgsql-general@lists.postgresql.org
Subject: [EXTERNAL] Re: Inserts and bad performance
On 11/24/21 1:15 PM, Godfrin, Philippe E wrote:
[snip]
I dropped the unique index , rebuilt the other indexes and no change.
IMNSHO, this is the worst possible approach. Drop everything except the unique index, and then (if possible) sort the input file by the unique key. That'll increase buffered IO; otherwise, you're bopping all around the filesystem.
Using a bulk loader if possible would increase speeds
--
Angular momentum makes the world go 'round.
On 28/11/21 17:17, Godfrin, Philippe E wrote: > > Right you are sir! I figured that out a few hours ago! > > pg > > *From:* Ron <ronljohnsonjr@gmail.com> > *Sent:* Wednesday, November 24, 2021 10:58 PM > *To:* pgsql-general@lists.postgresql.org > *Subject:* [EXTERNAL] Re: Inserts and bad performance > > On 11/24/21 1:15 PM, Godfrin, Philippe E wrote: > > [snip] > > I dropped the unique index , rebuilt the other indexes and no change. > > > IMNSHO, this is the worst possible approach. Drop everything *except* > the unique index, and then (if possible) sort the input file by the > unique key. That'll increase buffered IO; otherwise, you're bopping > all around the filesystem. > > Using a bulk loader if possible would increase speeds > > -- > Angular momentum makes the world go 'round. > Please don't top post! Cheers, Gavin
Ok, thanks
--
Sent from Mail.ru app for Android
The notion of COPY blocks and asynchronously is very interesting
From: Gavin Roy <gavinr@aweber.com>
Sent: Wednesday, November 24, 2021 1:50 PM
To: Godfrin, Philippe E <Philippe.Godfrin@nov.com>
Cc: pgsql-general@lists.postgresql.org
Subject: [EXTERNAL] Re: Inserts and bad performance
On Wed, Nov 24, 2021 at 2:15 PM Godfrin, Philippe E <Philippe.Godfrin@nov.com> wrote:
Greetings
I am inserting a large number of rows, 5,10, 15 million. The python code commits every 5000 inserts. The table has partitioned children.
On the Python client side, if you're using psycopg, you should consider using using COPY instead of INSERT if you're not:
And if using psycopg2, execute_batch might be of value:
Regards,
Gavin