Обсуждение: amazon ec2

Поиск
Список
Период
Сортировка

amazon ec2

От
Joel Reymont
Дата:
What are the best practices for setting up PG 9.x on Amazon EC2 to get the best performance?

    Thanks in advance, Joel

--------------------------------------------------------------------------
- for hire: mac osx device driver ninja, kernel extensions and usb drivers
---------------------+------------+---------------------------------------
http://wagerlabs.com | @wagerlabs | http://www.linkedin.com/in/joelreymont
---------------------+------------+---------------------------------------




Re: amazon ec2

От
Alan Hodgson
Дата:
On May 3, 2011 11:48:35 am Joel Reymont wrote:
> What are the best practices for setting up PG 9.x on Amazon EC2 to get the
> best performance?
>

I am also interested in tips for this. EBS seems to suck pretty bad.


Re: amazon ec2

От
Joel Reymont
Дата:
On May 3, 2011, at 8:41 PM, Alan Hodgson wrote:

> I am also interested in tips for this. EBS seems to suck pretty bad.

Alan, can you elaborate? Are you using PG on top of EBS?

--------------------------------------------------------------------------
- for hire: mac osx device driver ninja, kernel extensions and usb drivers
---------------------+------------+---------------------------------------
http://wagerlabs.com | @wagerlabs | http://www.linkedin.com/in/joelreymont
---------------------+------------+---------------------------------------




Re: amazon ec2

От
Alan Hodgson
Дата:
On May 3, 2011 12:43:13 pm you wrote:
> On May 3, 2011, at 8:41 PM, Alan Hodgson wrote:
> > I am also interested in tips for this. EBS seems to suck pretty bad.
>
> Alan, can you elaborate? Are you using PG on top of EBS?
>

Trying to, yes.

Let's see ...

EBS volumes seem to vary in speed. Some are relatively fast. Some are really
slow. Some fast ones become slow randomly. Some are fast attached to one
instance, but really slow attached to another.

Fast being a relative term, though. The fast ones seem to be able to do maybe
400 random IOPS. And of course you can only get about 80MB/sec sequential
access to them on a good day.

Which is why I'm interested in how other people are doing it. So far EC2
doesn't seem well suited to running databases at all.

Re: amazon ec2

От
Greg Spiegelberg
Дата:
On Tue, May 3, 2011 at 2:09 PM, Alan Hodgson <ahodgson@simkin.ca> wrote:
On May 3, 2011 12:43:13 pm you wrote:
> On May 3, 2011, at 8:41 PM, Alan Hodgson wrote:
> > I am also interested in tips for this. EBS seems to suck pretty bad.
>
> Alan, can you elaborate? Are you using PG on top of EBS?
>

Trying to, yes.

Let's see ...

EBS volumes seem to vary in speed. Some are relatively fast. Some are really
slow. Some fast ones become slow randomly. Some are fast attached to one
instance, but really slow attached to another.


I ran pgbench tests late last year comparing EC2, GoGrid, a 5 year-old lab server and a new server.  Whether I used a stock postgresql.conf or tweaked, the current 8.4 or 9.0, or varied the EC2 instance size EC2 was always at the bottom ranging from 409.834 to 693.100 tps.  GoGrid's pgbench TPS numbers in similar tests were, on average, 3X that of EC2 (1,399.550 to 1,631.887 tps).  The tests I conducted were small with 10 connections and total 5,000 transactions.  The single variable that helped pgbench tests in EC2 was to select an instance size where the number of cores was equal to or greater than the number of connections I used in the tests however this only improved things slightly (715.931 tps).

For comparisons purposes, I ran the same tests on a 24-way X5650 with 12 GB and SAS RAID 10.  This server typically ranged from 2,188.348 to 2,216.377 tps.

I attributed GoGrids superior performance over EC2 as EC2 simply being over-allocated but that's just speculation on my part.  To test my theory, I had wanted to put the database on a ramdisk, or like device, in EC2 and GoGrid but never got around to it.

 
Fast being a relative term, though. The fast ones seem to be able to do maybe
400 random IOPS. And of course you can only get about 80MB/sec sequential
access to them on a good day.

Which is why I'm interested in how other people are doing it. So far EC2
doesn't seem well suited to running databases at all.

 
I was doing this perhaps to convince management to give me some time to validate our software (PG backed) on some of the cloud providers but with those abysmal numbers I didn't even bother at the time.  I may revisit at some point b/c I know Amazon at least has been making architecture adjustments and updates.

Greg

Re: amazon ec2

От
Mark Rostron
Дата:
iowait is a problem on any platform that relies on spinning media, compared to RAM.
no matter how fast a disk is, and no matter how intelligent the controller is, you are still dealing with an access speed differential of 10^6 (speed of disk access compared to memory access).
i have had good results by avoiding it.
if you can do this, ec2 is not too shabby, but beware - it doesn't come free.
this is achievable under the following circumstances (and maybe there are other ways to do this).
i use a technique of pro-actively querying enough of my anticipated result set with a daemon procedure.
as long as the frequency of your query daemon execution is greater than that of the competitor processes (eg ETL and other update activity), AND a substantial part of the result set will fit in available RAM, then the result set will be served from file system cache at the time you want it.
i have found that it doesn't take much to get this to happen, once you have identified your critical result set.
like - you can get away with running it once/hour, and i'm still reducing the frequency and getting good results.
this approach basically assumes a 90/10 rule - at any point in time, you only want to access 10% of your data. if you can work out what the 10% is, and it will fit into RAM, then you can set it up to cache it.
it also imposes no additional cost in ec2, because Amazon doesn't bill you for CPU activity, although the large-RAM machines do cost more. Depends on how big your critical result set is, and how much speed you need.

dont know if this helps - the success/failure of it depends on your typical query activity, the size of your critical result set, and whether you are able to get enough RAM to make this work.

as i said it doesn't come for free, but you can make it work.

as a further point, try also checking out greenplum - it is an excellent postgres derivative with a very powerful free version. the reason why i bring it up is because it offers block-level compression (with caveats - it also doesn't come for free, so do due diligence and rtfm carefully). The compression enabled me to improve the cache hit rate, and so you further reduce the iowait problem.
greenplum is also a better parallel machine than postgres, so combining the cache technique above with greenplum compression and parallel query, i have been able to get 20:1 reduction in response times for some of our queries.
obviously introducing new database technology is a big deal, but we needed the speed, and it kinda worked.

mr


On Tue, May 3, 2011 at 1:09 PM, Alan Hodgson <ahodgson@simkin.ca> wrote:
On May 3, 2011 12:43:13 pm you wrote:
> On May 3, 2011, at 8:41 PM, Alan Hodgson wrote:
> > I am also interested in tips for this. EBS seems to suck pretty bad.
>
> Alan, can you elaborate? Are you using PG on top of EBS?
>

Trying to, yes.

Let's see ...

EBS volumes seem to vary in speed. Some are relatively fast. Some are really
slow. Some fast ones become slow randomly. Some are fast attached to one
instance, but really slow attached to another.

Fast being a relative term, though. The fast ones seem to be able to do maybe
400 random IOPS. And of course you can only get about 80MB/sec sequential
access to them on a good day.

Which is why I'm interested in how other people are doing it. So far EC2
doesn't seem well suited to running databases at all.

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: amazon ec2

От
david@lang.hm
Дата:
phoronix did some benchmarks of the ec2 machines and they show pretty poor
numbers, especially in the I/O side of things

http://www.phoronix.com/scan.php?page=article&item=amazon_ec2_round1&num=1
http://www.phoronix.com/scan.php?page=article&item=amazon_ec2_micro&num=1

David Lang


On Tue, 3 May 2011, Alan Hodgson wrote:

> Date: Tue, 3 May 2011 13:09:51 -0700
> From: Alan Hodgson <ahodgson@simkin.ca>
> To: pgsql-performance@postgresql.org
> Subject: Re: [PERFORM] amazon ec2
>
> On May 3, 2011 12:43:13 pm you wrote:
>> On May 3, 2011, at 8:41 PM, Alan Hodgson wrote:
>>> I am also interested in tips for this. EBS seems to suck pretty bad.
>>
>> Alan, can you elaborate? Are you using PG on top of EBS?
>>
>
> Trying to, yes.
>
> Let's see ...
>
> EBS volumes seem to vary in speed. Some are relatively fast. Some are really
> slow. Some fast ones become slow randomly. Some are fast attached to one
> instance, but really slow attached to another.
>
> Fast being a relative term, though. The fast ones seem to be able to do maybe
> 400 random IOPS. And of course you can only get about 80MB/sec sequential
> access to them on a good day.
>
> Which is why I'm interested in how other people are doing it. So far EC2
> doesn't seem well suited to running databases at all.
>
>

Re: amazon ec2

От
Josh Berkus
Дата:
On 5/3/11 11:48 AM, Joel Reymont wrote:
> What are the best practices for setting up PG 9.x on Amazon EC2 to get the best performance?

Yes.  Don't use EC2.

There is no "best" performance on EC2.  There's not even "good
performance".  Basically, EC2 is the platform for when performance
doesn't matter.

Use a dedicated server, or use a better cloud host.

http://it.toolbox.com/blogs/database-soup/how-to-make-your-database-perform-well-on-amazon-ec2-45725

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

Re: amazon ec2

От
Greg Smith
Дата:
Mark Rostron wrote:
> the success/failure of it depends on your typical query activity, the
> size of your critical result set, and whether you are able to get
> enough RAM to make this work.

Basically, it all comes down to "does the working set of data I access
frequently fit in RAM?"  If it does, it's possible to get reasonable
performance out of an EC2 instance.  The EBS disks are so slow, both on
average and particularly in cases where you have contention with other
users slowing you down, that any situation where you have to use them is
never going to work well.  If most of the data fits in RAM, and the CPU
resources available to your instance are sufficient to service your
queries, you might see acceptable performance.

> greenplum is also a better parallel machine than postgres, so
> combining the cache technique above with greenplum compression and
> parallel query, i have been able to get 20:1 reduction in response
> times for some of our queries.

I've also seen over a 20:1 speedup over PostgreSQL by using Greenplum's
free Community Edition server, in situations where its column store +
compression features work well on the data set.  That's easiest with an
append-only workload, and the data set needs to fit within the
constraints where indexes on compressed data are useful.  But if you fit
the use profile it's good at, you end up with considerable ability to
trade-off using more CPU resources to speed up queries.  It effectively
increases the amount of data that can be cached in RAM by a large
multiple, and in the EC2 context (where any access to disk is very slow)
it can be quite valuable.  My colleague Gabrielle wrote something about
setting this up on an earlier version of Greenplum's software at
http://blog.2ndquadrant.com/en/2010/03/installing-greenplum-sne-ec2.html
that gives an idea how that was setup.

--
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books


Re: amazon ec2

От
Greg Smith
Дата:
Greg Spiegelberg wrote:
> I ran pgbench tests late last year comparing EC2, GoGrid, a 5 year-old
> lab server and a new server.  Whether I used a stock postgresql.conf
> or tweaked, the current 8.4 or 9.0, or varied the EC2 instance size
> EC2 was always at the bottom ranging from 409.834 to 693.100 tps.
>  GoGrid's pgbench TPS numbers in similar tests were, on average, 3X
> that of EC2 (1,399.550 to 1,631.887 tps).  The tests I conducted were
> small with 10 connections and total 5,000 transactions.  The single
> variable that helped pgbench tests in EC2 was to select an instance
> size where the number of cores was equal to or greater than the number
> of connections I used in the tests however this only improved things
> slightly (715.931 tps).

The standard pgbench test is extremely sensitive to how fast
transactions can be committed to disk.  That doesn't reflect what
performance looks like on most real-world workloads, which tend toward
more reads.  The fact that GoGrid is much faster at doing commits than
EC2 is interesting, but that's only one of many parameters that impact
performance on more normal workloads.

The one parameter that can change how the test runs is turning off
synchronous_commit, which pulls the commit time out of the results to
some extent.  And if you'd switched pgbench to a more read-oriented
test, you'd discover it becomes extremely sensitive to the size of the
database, as set by pgbench's scale parameter during setup.

--
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books


Re: amazon ec2

От
Jim Nasby
Дата:
On May 3, 2011, at 5:39 PM, Greg Smith wrote:
> I've also seen over a 20:1 speedup over PostgreSQL by using Greenplum's free Community Edition server, in situations
whereits column store + compression features work well on the data set.  That's easiest with an append-only workload,
andthe data set needs to fit within the constraints where indexes on compressed data are useful.  But if you fit the
useprofile it's good at, you end up with considerable ability to trade-off using more CPU resources to speed up
queries. It effectively increases the amount of data that can be cached in RAM by a large multiple, and in the EC2
context(where any access to disk is very slow) it can be quite valuable. 

FWIW, EnterpriseDB's "InfiniCache" provides the same caching benefit. The way that works is when PG goes to evict a
pagefrom shared buffers that page gets compressed and stuffed into a memcache cluster. When PG determines that a given
pageisn't in shared buffers it will then check that memcache cluster before reading the page from disk. This allows you
tocache amounts of data that far exceed the amount of memory you could put in a physical server. 
--
Jim C. Nasby, Database Architect                   jim@nasby.net
512.569.9461 (cell)                         http://jim.nasby.net



Re: amazon ec2

От
Shaun Thomas
Дата:
On 05/03/2011 01:48 PM, Joel Reymont wrote:

> What are the best practices for setting up PG 9.x on Amazon EC2 to
> get the best performance?

Use EC2 and other Amazon hosting for cloud-based client access only.
Their shared disk services are universally despised by basically
everyone who has tried to use it for database hosting.

The recommended pattern is to have the scalable cloud clients access
(and memcache) a remote DB at a colo or managed services host. EBS is a
nice idea, and probably fine for things like image or video hosting, but
database access, especially for OLTP databases, will just result in
wailing and gnashing of teeth.

Just ask anyone who got bit by the recent EBS failure that spanned
*several* availability zones. For all those clients who thought they
were safe by deploying across multiple ACs, it was a rather rude awakening.

http://aws.amazon.com/message/65648/

The consensus seems to be that Amazon's cloud is fine... so long as you
stay far, far away from EBS. Apparently that needs a little more work.

--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604
312-676-8870
sthomas@peak6.com

______________________________________________

See  http://www.peak6.com/email_disclaimer.php
for terms and conditions related to this email

Re: amazon ec2

От
Josh Berkus
Дата:
> FWIW, EnterpriseDB's "InfiniCache" provides the same caching benefit. The way that works is when PG goes to evict a
pagefrom shared buffers that page gets compressed and stuffed into a memcache cluster. When PG determines that a given
pageisn't in shared buffers it will then check that memcache cluster before reading the page from disk. This allows you
tocache amounts of data that far exceed the amount of memory you could put in a physical server. 

So memcached basically replaces the filesystem?

That sounds cool, but I'm wondering if it's actually a performance
speedup.  Seems like it would only be a benefit for single-row lookups;
any large reads would be a mess.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

Re: amazon ec2

От
david@lang.hm
Дата:
On Wed, 4 May 2011, Josh Berkus wrote:

> Date: Wed, 04 May 2011 17:02:53 -0700
> From: Josh Berkus <josh@agliodbs.com>
> To: postgres performance list <pgsql-performance@postgresql.org>
> Subject: Re: [PERFORM] amazon ec2
>
>
>> FWIW, EnterpriseDB's "InfiniCache" provides the same caching benefit. The way that works is when PG goes to evict a
pagefrom shared buffers that page gets compressed and stuffed into a memcache cluster. When PG determines that a given
pageisn't in shared buffers it will then check that memcache cluster before reading the page from disk. This allows you
tocache amounts of data that far exceed the amount of memory you could put in a physical server. 
>
> So memcached basically replaces the filesystem?
>
> That sounds cool, but I'm wondering if it's actually a performance
> speedup.  Seems like it would only be a benefit for single-row lookups;
> any large reads would be a mess.

I think it would depend a lot on how well the page compresses.

if the network I/O plus uncompression time is faster than the seek and
read from disk it should be a win.

I don't see why the benifit would be limited to single row lookups,
anything within that page should be the same.

for multipage actions, the disk would have less of a disadvantage as
readahead may be able to hide some of the work while the memcache approach
would need to do separate transactions for each page.

this does bring up an interesting variation for hierarchical storage,
using compressed pages in memcache rather than dedicated resources on a
local system. thanks for the thought and I'll keep it in mind. I can think
of lots of cases where the database stores a relativly small set of values
that would compress well.

David Lang

Re: amazon ec2

От
Denis de Bernardy
Дата:
----- Original Message -----

> From: Josh Berkus <josh@agliodbs.com>
> To: postgres performance list <pgsql-performance@postgresql.org>
> Cc:
> Sent: Thursday, May 5, 2011 2:02 AM
> Subject: Re: [PERFORM] amazon ec2
> So memcached basically replaces the filesystem?
>
> That sounds cool, but I'm wondering if it's actually a performance
> speedup.  Seems like it would only be a benefit for single-row lookups;
> any large reads would be a mess.


I've never tested with pgsql, but with mysql it makes a *huge* difference when you're pulling data repeatedly.
Multi-rowlookups can be cached too: 

$rows = $cache->get(md5($query . '--' . serialize($args)));

if ( !$rows) {
  // query and cache for a few hours...
}

This is true even with mysql's caching features turned on. You spare the DB from doing identical queries that get
repeatedover and over. Memcache lets you pull those straight from the memory, allowing for the DB server to handle new
queriesexclusively. 


Re: amazon ec2

От
Dave Page
Дата:
On Thu, May 5, 2011 at 1:02 AM, Josh Berkus <josh@agliodbs.com> wrote:
>
>> FWIW, EnterpriseDB's "InfiniCache" provides the same caching benefit. The way that works is when PG goes to evict a
pagefrom shared buffers that page gets compressed and stuffed into a memcache cluster. When PG determines that a given
pageisn't in shared buffers it will then check that memcache cluster before reading the page from disk. This allows you
tocache amounts of data that far exceed the amount of memory you could put in a physical server. 
>
> So memcached basically replaces the filesystem?

No, it sits in between shared buffers and the filesystem, effectively
providing an additional layer of extremely large, compressed cache.
Even on a single server there can be benefits over larger shared
buffers due to the compression.

> That sounds cool, but I'm wondering if it's actually a performance
> speedup.  Seems like it would only be a benefit for single-row lookups;
> any large reads would be a mess.

Depends on the database and the workload - if you can fit your entire
100GB database in cache, and your workload is read intensive then the
speedups are potentially huge (I've seen benchmarks showing 20x+).
Write intensive workloads, less so, similarly if the working set is
far larger than your cache size.

--
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company