Обсуждение: Postgres bulk insert/ETL performance on high speed servers - test results
High level summary:  server ram has a significant impact on batch processing
performance (no surprise), and AWS processing can largely compete with local
servers IF the AWS network connection is optimized.
With the recent threads about insert performance (and performance in
general), I thought I'd share some numbers that could assist some other
Postgres users in planning their environments.
I am currently running a Postgres dev server in AWS and we are evaluating a
high powered physical server for our data center, for which we received a
demo unit from Cisco for testing.  Great opportunity to test a range of
pinch points that could restrict scalability and performance, comparing how
2 very different servers behave under a high bulk loading/transform
scenario.  The scenario is that I'm migrating mysql data ("v1", eventually
20tb of genomics data) over to a new Postgres server ("v2").
[As a side note, I'm attempting to get a third server spun up, being a high
powered AWS EC2 instance (an r3.4xlarge with 122gb ram, 16 cores, 6tb SSD
EBS Optimized with 16k guaranteed IOPS).  When I finish the testing against
the 3rd server, I'll report again.]
Landscape:
Source mysql server:  Dell physical 24 cores at 2.8ghz, 32gb ram, 1gbe
networking, Percona/mysql v5.5.3 on linux in our data center
AWS:  EC2 m4.xlarge instance with 16 gb ram, 4 cores at 2.4ghz, 3tb SSD.  PG
v9.5.1 on Red Hat 4.8.5-4 64 bit, on a 10gb Direct Connect link from our
data center to.
Cisco:  Hyperflex HX240c M4 node with UCS B200 M4 blade, with 256gb ram, 48
cores at 2.2ghz, 4tb direct attached Intel flash (SSD) for the OS, 10tb of
NetApp Filer SSD storage via 4gb HBA cards.  PG v9.5.1 on Red Hat 4.8.5-4 64
bit, 10gbe networking but has to throttle down to 1gbe when talking to the
mysql source server.
PASS 1:
Process:      Extract (pull the raw v1 data over the network to the 32 v2
staging tables)
Num Source Rows:      8,232,673 (Small Test)
Rowcount Compression:      1.0 (1:1 copy)
AWS Time in Secs:      1,516**
Cisco Time in Secs:      376
Difference:      4.0x
Comment:      AWS:  5.7k rows/sec    cisco:  21.9k rows/sec
(**network speed appears to be the factor, see notes below)
Process:      Transform/Load (all work local to the server - read,
transform, write as a single batch)
Num Source Rows:      5,575,255 (many smaller batches from the source
tables, all writes going to a single target table)
Avg Rowcount Compression:      10.3 (jsonb row compression resulting in 10x
fewer rows)
AWS Time in Secs:      408
Cisco Time in Secs:      294
Difference:      1.4x  (the Cisco is 40% faster...not a huge difference)
Comment:AWS:  13.6k rows/sec   Cisco:  19k rows/sec
Notes:  The testing has revealed an issue with the networking in our data
center, which appears to be causing abnormally slow transfer speed to AWS.
That is being investigated.  So if we look at just the Transform/Load
process, we can see that both AWS and the local Cisco server have comparable
processing speeds on the small dataset.
However, when I moved to a medium sized dataset of 204m rows, a different
pattern emerged.  I'm including just the Transform/Load process here, and
testing just ONE table out of the batch:
PASS 2:
Process:      Transform/Load (all work local to the server - read,
transform, write as a single batch)
Num Source Rows:      10,554,800 (one batch from just a single source table
going to a single target table)
Avg Rowcount Compression:      31.5 (jsonb row compression resulting in
31.5x fewer rows)
AWS Time in Secs:      2,493 (41.5 minutes)
Cisco Time in Secs:      661 (10 minutes)
Difference:      3.8x
Comment:AWS:  4.2k rows/sec   Cisco:  16k rows/sec
It's obvious the size of the batch exceeded the AWS server memory, resulting
in a profoundly slower processing time.  This was a true, apples to apples
comparison between Pass 1 and Pass 2: average row lengths were within 7% of
each other (1121 vs 1203) using identical table structures and processing
code, the only difference was the target server.
I'm happy to answer questions about these results.
Mike Sofen (USA)
			
		Re: Postgres bulk insert/ETL performance on high speed servers - test results
On Thu, Sep 1, 2016 at 11:30 PM, Mike Sofen <msofen@runbox.com> wrote: > PASS 2: > Process: Transform/Load (all work local to the server - read, > transform, write as a single batch) > Num Source Rows: 10,554,800 (one batch from just a single source table > going to a single target table) > Avg Rowcount Compression: 31.5 (jsonb row compression resulting in > 31.5x fewer rows) > AWS Time in Secs: 2,493 (41.5 minutes) > Cisco Time in Secs: 661 (10 minutes) > Difference: 3.8x > Comment:AWS: 4.2k rows/sec Cisco: 16k rows/sec > > It's obvious the size of the batch exceeded the AWS server memory, resulting > in a profoundly slower processing time. This was a true, apples to apples > comparison between Pass 1 and Pass 2: average row lengths were within 7% of > each other (1121 vs 1203) using identical table structures and processing > code, the only difference was the target server. > > I'm happy to answer questions about these results. Are you sure it's a memory thing and not an EBS bandwidth thing? EBS has significantly less bandwidth than direct-attached flash.
From: Claudio Freire   Sent: Friday, September 02, 2016 1:27 PM
On Thu, Sep 1, 2016 at 11:30 PM, Mike Sofen <msofen@runbox.com> wrote:
> It's obvious the size of the batch exceeded the AWS server memory,
> resulting in a profoundly slower processing time. This was a true,
> apples to apples comparison between Pass 1 and Pass 2: average row
> lengths were within 7% of each other (1121 vs 1203) using identical
> table structures and processing code, the only difference was the target server.
>
> I'm happy to answer questions about these results.
Are you sure it's a memory thing and not an EBS bandwidth thing?
EBS has significantly less bandwidth than direct-attached flash.
You raise a good point. However, other disk activities involving large data (like backup/restore and pure large table copying), on both platforms, do not seem to support that notion. I did have both our IT department and Cisco turn on instrumentation for my last test, capturing all aspects of both tests on both platforms, and I’m hoping to see the results early next week and will reply again.
Mike
On 9/4/16 7:34 AM, Mike Sofen wrote: > You raise a good point. However, other disk activities involving large > data (like backup/restore and pure large table copying), on both > platforms, do not seem to support that notion. I did have both our IT > department and Cisco turn on instrumentation for my last test, capturing > all aspects of both tests on both platforms, and I’m hoping to see the > results early next week and will reply again. Something important to remember about Postgres is that it makes virtually no efforts to optimize IO; it throws the entire problem in the OSes lap. So differences in OS config or in IO *latency* can have a massive impact on performance. Because of the sensitivity to IO latency, you can also end up with a workload that only reports say 60% IO utilization but is essentially IO bound (would be 100% IO utilization if enough read-ahead was happening). -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com 855-TREBLE2 (855-873-2532) mobile: 512-569-9461
From: Jim Nasby [mailto:Jim.Nasby@BlueTreble.com]  Sent: Wednesday, September 07, 2016 12:22 PM
On 9/4/16 7:34 AM, Mike Sofen wrote:
> You raise a good point. However, other disk activities involving
> large data (like backup/restore and pure large table copying), on both
> platforms, do not seem to support that notion. I did have both our IT
> department and Cisco turn on instrumentation for my last test,
> capturing all aspects of both tests on both platforms, and I’m hoping
> to see the results early next week and will reply again.
Something important to remember about Postgres is that it makes virtually no efforts to optimize IO; it throws the entire problem in the OSes lap. So differences in OS config or in IO *latency* can have a massive impact on performance. Because of the sensitivity to IO latency, you can also end up with a workload that only reports say 60% IO utilization but is essentially IO bound (would be 100% IO utilization if enough read-ahead was happening).
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532) mobile: 512-569-9461
=============
Hi Jim,
Thanks for that info regarding the sensitivity to IO latency. As it turns out, our network guys have determined while the AWS Direct Connect pipe is running at “normal” speed, the end to latency is quite high and are working with AWS support to see if there are any optimizations to be done. To me, the performance differences have to be tied to networking, especially since it does appear that for these EC2 instances, all data – both SSD and network – is consuming bandwidth in their network “connection”, possibly adding to PG IO pressure. I’ll keep your note in mind as we evaluate next steps.
Mike