Обсуждение: Migrating to Postgresql and new hardware

От:
Lars
Дата:

Hi,

We are in the process of moving a web based application from a MySql to Postgresql database.
Our main reason for moving to Postgresql is problems with MySql (MyISAM) table locking.
We will buy a new set of servers to run the Postgresql databases.

The current setup is five Dell PowerEdge 2950 with 2 *  XEON E5410, 4GB RAM. PERC 5/I 256MB NV Cache, 4 * 10K Disks (3
inRAID 5 + 1 spare). 

One server is used for shared data.
Four servers are used for sharded data. A user in the system only has data in one of the shards.
There is another server to which all data is replicated but I'll leave that one out of this discussion.
These are dedicated database servers. There are more or less no stored procedures. The shared database size is about
20GBand each shard database is about 40GB (total of 20 + 40 * 4 = 180GB). I would expect the size will grow 10%-15%
thisyear. Server load might increase with 15%-30% this year. This setup is disk I/O bound. The overwhelming majority of
sqlstatements are fast (typically single row selects, updates, inserts and deletes on primary key) but there are some
slowlong running (10min) queries. 

As new server we are thinking of PowerEdge R510, 1 * Xeon X5650, 24Gb RAM, H700 512MB NV Cache.
Dell has offered two alternative SSDs:
Samsung model SS805 (100GB Solid State Disk SATA 2.5").
(http://www.plianttechnology.com/lightning_lb.php)
Pliant model LB 150S (149GB Solid State Drive SAS 3Gbps 2.5").
(http://www.samsung.com/global/business/semiconductor/products/SSD/Products_Enterprise_SSD.html)

Both are SLC drives. The price of the Pliant is about 2,3 times the price of the Samsung (does it have twice the
performance?).

One alternative is 5 servers (1 shared and 4 shards) with 5 Samsung drives (4 in RAID 10 + 1 spare).
Another alternative would be 3 servers (1 shared and 2 shards) with 5 Pliant drives (4 in RAID 10 + 1 spare). This
wouldbe slightly more expensive than the first alternative but would be easier to upgrade with two new shard servers
whenit's needed. 

Anyone have experience using the Samsung or the Pliant SSD? Any information about degraded performance over time?
Any comments on the setups? How would an alternative with 15K disks (6 RAID 10 + 1 spare, or even 10 RAID10 + 1 spare)
compare?
How would these alternatives compare in I/O performance compared to the old setup?
Anyone care to guess how the two alternatives would compare in performance running Postgresql?
How would the hardware usage of Postgresql compare to MySqls?



Regards
/Lars

От:
Andy Colson
Дата:

On 1/18/2011 4:56 AM, Lars wrote:
> Hi,
>
> We are in the process of moving a web based application from a MySql
> to Postgresql database. Our main reason for moving to Postgresql is
> problems with MySql (MyISAM) table locking. We will buy a new set of
> servers to run the Postgresql databases.
>
> The current setup is five Dell PowerEdge 2950 with 2 *  XEON E5410,
> 4GB RAM. PERC 5/I 256MB NV Cache, 4 * 10K Disks (3 in RAID 5 + 1
> spare).
>
> One server is used for shared data. Four servers are used for sharded
> data. A user in the system only has data in one of the shards. There
> is another server to which all data is replicated but I'll leave that
> one out of this discussion. These are dedicated database servers.
> There are more or less no stored procedures. The shared database size
> is about 20GB and each shard database is about 40GB (total of 20 + 40
> * 4 = 180GB). I would expect the size will grow 10%-15% this year.
> Server load might increase with 15%-30% this year. This setup is disk
> I/O bound. The overwhelming majority of sql statements are fast
> (typically single row selects, updates, inserts and deletes on
> primary key) but there are some slow long running (10min) queries.
>

No idea what mysql thinks a shard is, but in PG we have read-only hot
standby's.

The standby database is exactly the same as the master (save a bit of
data that has not been synced yet.)  I assume you know this... but I'd
really recommend trying out PG's hot-standby and make sure it works the
way you need (because I bet its different than mysql's).

Assuming the "shared" and the "sharded" databases are totally different
(lets call them database a and c), with the PG setup you'd have database
a on one computer, then one master with database b on it (where all
writes go), then several hot-standby's mirroring database b (that
support read-only queries).

As for the hardware, you'd better test it.  Got any old servers you
could put a real-world workload on?  Or just buy one new server for
testing?  Its pretty hard to guess what your usage pattern is (70% read,
  small columns, no big blobs (like photos), etc)... and even then we'd
still have to guess.

I can tell you, however, having your readers and writers not block each
other is really nice.

Not only will I not compare apples to oranges, but I really wont compare
apples in Canada to oranges in Japan. :-)

-Andy

От:
Andy Colson
Дата:

oops, call them database 'a' and database 'b'.

От:
"Strange, John W"
Дата:

Are you going to RAID the SSD drives at all?  You would likely be better off with a couple of things.

- increasing ram to 256GB on each server to cache most of the databases. (easy, and cheaper than SSD)
- move to fusionIO
- move to SLC based SSD, warning not many raid controllers will get the performance out of the SSD's at this time.

Of the three I would suggest #1, and #2, the cost of a SLC SSD raid will cost more than the fusionIO drive and still
notmatch the fusionIO drive performance.
 

Of course this is based on my experience, and I have my fireproof suit since I mentioned the word fusionIO :)

- John

-----Original Message-----
From:  [mailto:] On Behalf Of Lars
Sent: Tuesday, January 18, 2011 4:57 AM
To: 
Subject: [PERFORM] Migrating to Postgresql and new hardware

Hi,

We are in the process of moving a web based application from a MySql to Postgresql database.
Our main reason for moving to Postgresql is problems with MySql (MyISAM) table locking.
We will buy a new set of servers to run the Postgresql databases.

The current setup is five Dell PowerEdge 2950 with 2 *  XEON E5410, 4GB RAM. PERC 5/I 256MB NV Cache, 4 * 10K Disks (3
inRAID 5 + 1 spare).
 

One server is used for shared data.
Four servers are used for sharded data. A user in the system only has data in one of the shards.
There is another server to which all data is replicated but I'll leave that one out of this discussion.
These are dedicated database servers. There are more or less no stored procedures. The shared database size is about
20GBand each shard database is about 40GB (total of 20 + 40 * 4 = 180GB). I would expect the size will grow 10%-15%
thisyear. Server load might increase with 15%-30% this year. This setup is disk I/O bound. The overwhelming majority of
sqlstatements are fast (typically single row selects, updates, inserts and deletes on primary key) but there are some
slowlong running (10min) queries.
 

As new server we are thinking of PowerEdge R510, 1 * Xeon X5650, 24Gb RAM, H700 512MB NV Cache.
Dell has offered two alternative SSDs:
Samsung model SS805 (100GB Solid State Disk SATA 2.5").
(http://www.plianttechnology.com/lightning_lb.php)
Pliant model LB 150S (149GB Solid State Drive SAS 3Gbps 2.5").
(http://www.samsung.com/global/business/semiconductor/products/SSD/Products_Enterprise_SSD.html)

Both are SLC drives. The price of the Pliant is about 2,3 times the price of the Samsung (does it have twice the
performance?).

One alternative is 5 servers (1 shared and 4 shards) with 5 Samsung drives (4 in RAID 10 + 1 spare).
Another alternative would be 3 servers (1 shared and 2 shards) with 5 Pliant drives (4 in RAID 10 + 1 spare). This
wouldbe slightly more expensive than the first alternative but would be easier to upgrade with two new shard servers
whenit's needed.
 

Anyone have experience using the Samsung or the Pliant SSD? Any information about degraded performance over time?
Any comments on the setups? How would an alternative with 15K disks (6 RAID 10 + 1 spare, or even 10 RAID10 + 1 spare)
compare?
How would these alternatives compare in I/O performance compared to the old setup?
Anyone care to guess how the two alternatives would compare in performance running Postgresql?
How would the hardware usage of Postgresql compare to MySqls?



Regards
/Lars

-- 
Sent via pgsql-performance mailing list ()
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
This communication is for informational purposes only. It is not
intended as an offer or solicitation for the purchase or sale of
any financial instrument or as an official confirmation of any
transaction. All market prices, data and other information are not
warranted as to completeness or accuracy and are subject to change
without notice. Any comments or statements made herein do not
necessarily reflect those of JPMorgan Chase & Co., its subsidiaries
and affiliates.

This transmission may contain information that is privileged,
confidential, legally privileged, and/or exempt from disclosure
under applicable law. If you are not the intended recipient, you
are hereby notified that any disclosure, copying, distribution, or
use of the information contained herein (including any reliance
thereon) is STRICTLY PROHIBITED. Although this transmission and any
attachments are believed to be free of any virus or other defect
that might affect any computer system into which it is received and
opened, it is the responsibility of the recipient to ensure that it
is virus free and no responsibility is accepted by JPMorgan Chase &
Co., its subsidiaries and affiliates, as applicable, for any loss
or damage arising in any way from its use. If you received this
transmission in error, please immediately contact the sender and
destroy the material in its entirety, whether in electronic or hard
copy format. Thank you.

Please refer to http://www.jpmorgan.com/pages/disclosures for
disclosures relating to European legal entities.

От:
"Mark Felder"
Дата:

On Tue, 18 Jan 2011 16:06:17 -0600, Strange, John W
<> wrote:

> Of course this is based on my experience, and I have my fireproof suit
> since I mentioned the word fusionIO :)

I'll throw a fire blanket up as well. We have a customer who has been
running Fusion IO with Postgres for about 2 years. They get amazing
performance, but also aren't running fsync. They haven't had corruption
with OS crashes (they're very abusive to their CentOS install), but did
with a power outage (a UPS of ours went up in smoke; they weren't paying
for N+1 power). Now their data is mostly static; they run analytics once a
day and if they have a problem they can reload yesterday's data and run
the analytics again to get back up to speed. If this is the type of stuff
you're doing and you can easily get your data back to a sane state by all
means give FusionIO a whirl!

This customer did discuss this with me in length last time they stopped in
and also pointed out that FusionIO was announced as being a major part of
some trading company or bank firm's database performance junk. I don't
know the details, but I think he said they were out of Chicago. If anyone
knows what I'm talking about please share the link. Either way, it seems
that people are actually doing money transactions on FusionIO, so you can
either take that as comforting reassurance or you can start getting really
nervous about the stock market :-)


Regards,


Mark


PS, don't turn off fsync unless you know what you're doing.

От:
"mark"
Дата:

Comments in line, take em for what you paid for em.



> -----Original Message-----
> From:  [mailto:pgsql-performance-
> ] On Behalf Of Lars
> Sent: Tuesday, January 18, 2011 3:57 AM
> To: 
> Subject: [PERFORM] Migrating to Postgresql and new hardware
>
> Hi,
>
> We are in the process of moving a web based application from a MySql to
> Postgresql database.
> Our main reason for moving to Postgresql is problems with MySql
> (MyISAM) table locking.

I would never try and talk someone out of switching but.... MyISAM? What
version of MySQL and did you pick MyISAM for a good reason or just happened
to end up there?



> We will buy a new set of servers to run the Postgresql databases.
>
> The current setup is five Dell PowerEdge 2950 with 2 *  XEON E5410, 4GB
> RAM. PERC 5/I 256MB NV Cache, 4 * 10K Disks (3 in RAID 5 + 1 spare).
>
> One server is used for shared data.
> Four servers are used for sharded data. A user in the system only has
> data in one of the shards.
> There is another server to which all data is replicated but I'll leave
> that one out of this discussion.
> These are dedicated database servers. There are more or less no stored
> procedures. The shared database size is about 20GB and each shard
> database is about 40GB (total of 20 + 40 * 4 = 180GB). I would expect
> the size will grow 10%-15% this year. Server load might increase with
> 15%-30% this year. This setup is disk I/O bound. The overwhelming
> majority of sql statements are fast (typically single row selects,
> updates, inserts and deletes on primary key) but there are some slow
> long running (10min) queries.
>
> As new server we are thinking of PowerEdge R510, 1 * Xeon X5650, 24Gb
> RAM, H700 512MB NV Cache.

One would think you should notice a nice speed improvement, ceteris paribus,
since the X5650 will have ->significantly<- more memory bandwidth than the
5410s you are used to, and you are going to have a heck of a lot more ram
for things to cache in. I think the H700 is a step up in raid cards as well
but with only 4 disks your probably not maxing out there.



> Dell has offered two alternative SSDs:
> Samsung model SS805 (100GB Solid State Disk SATA 2.5").
> (http://www.plianttechnology.com/lightning_lb.php)
> Pliant model LB 150S (149GB Solid State Drive SAS 3Gbps 2.5").
> (http://www.samsung.com/global/business/semiconductor/products/SSD/Prod
> ucts_Enterprise_SSD.html)

The Samsung ones seems to indicate that they have protection in the event of
a power failure, and the pliant does not mention it.

Granted I haven't done or seen any pull the plug under max load tests on
either family, so I got nothing beyond that it is the first thing I have
looked at with every SSD that crosses my path.



>
> Both are SLC drives. The price of the Pliant is about 2,3 times the
> price of the Samsung (does it have twice the performance?).
>
> One alternative is 5 servers (1 shared and 4 shards) with 5 Samsung
> drives (4 in RAID 10 + 1 spare).
> Another alternative would be 3 servers (1 shared and 2 shards) with 5
> Pliant drives (4 in RAID 10 + 1 spare). This would be slightly more
> expensive than the first alternative but would be easier to upgrade
> with two new shard servers when it's needed.

As others have mentioned, how are you going to be doing your "shards"?



>
> Anyone have experience using the Samsung or the Pliant SSD? Any
> information about degraded performance over time?
> Any comments on the setups? How would an alternative with 15K disks (6
> RAID 10 + 1 spare, or even 10 RAID10 + 1 spare) compare?


You still may find that breaking xlog out to its own logical drive (2 drives
in raid 1) gives a speed improvement to the overall. YMMV - so tinker and
find out before you go deploying.

> How would these alternatives compare in I/O performance compared to the
> old setup?
> Anyone care to guess how the two alternatives would compare in
> performance running Postgresql?
> How would the hardware usage of Postgresql compare to MySqls?


I won't hazard a guess on the performance difference between PG w/ Fsync ON
and MySQL running with MyISAM.

If you can get your OS and PG tuned you should be able to have a database
that can have pretty decent throughput for an OLTP workload. Since that
seems to be the majority of your intended workload.


-Mark

>
>
>
> Regards
> /Lars
>
> --
> Sent via pgsql-performance mailing list (pgsql-
> )
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance


От:
Craig Ringer
Дата:

On 18/01/11 18:56, Lars wrote:
> Hi,
>
> We are in the process of moving a web based application from a MySql to Postgresql database.
> Our main reason for moving to Postgresql is problems with MySql (MyISAM) table locking.
> We will buy a new set of servers to run the Postgresql databases.

Most people seem to simply move over to InnoDB when facing these issues,
saving themselves LOTS of pain over MyISAM while minimizing transition
costs. I assume you've rejected that, but I'm interested in why.

> The current setup is five Dell PowerEdge 2950 with 2 *  XEON E5410, 4GB RAM. PERC 5/I 256MB NV Cache, 4 * 10K Disks
(3in RAID 5 + 1 spare). 
>
> One server is used for shared data.
> Four servers are used for sharded data. A user in the system only has data in one of the shards.
> There is another server to which all data is replicated but I'll leave that one out of this discussion.

Don't, if you want to have a similar thing going in your Pg deployment
later. Replication in Pg remains ... interesting. An n-to-m (or n-to-1)
replication setup can't be achieved with the built-in replication in
9.0; you need to use things like Slony-I, Bucardo, etc each of which
have their own limitations and quirks.

> These are dedicated database servers. There are more or less no stored procedures. The shared database size is about
20GBand each shard database is about 40GB (total of 20 + 40 * 4 = 180GB). I would expect the size will grow 10%-15%
thisyear. Server load might increase with 15%-30% this year. This setup is disk I/O bound. The overwhelming majority of
sqlstatements are fast (typically single row selects, updates, inserts and deletes on primary key) but there are some
slowlong running (10min) queries. 

Since you're sharding (and thus clearly don't need strong cluster-wide
ACID) have you considered looking into relaxed semi-ACID / eventually
consistent database systems? If you're doing lots of simple queries and
few of the kind of heavy lifting reporting queries RDBMSs are great for,
it may be worth considering.

If your app uses a data acesss layer, it should be pretty easy to
prototype implementations on other databases and try them out.

Even if you do go for PostgreSQL, if you're not using memcached yet
you're wasting money and effort. You might get lots more life out of
your hardware with a bit of memcached love.

--
System & Network Administrator
POST Newspapers

От:
Scott Marlowe
Дата:

On Tue, Jan 18, 2011 at 3:56 AM, Lars <> wrote:

> Any comments on the setups? How would an alternative with 15K disks (6 RAID 10 + 1 spare, or even 10 RAID10 + 1
spare)compare? 

RAID-10 is going to trounce RAID-5 for writes, which is where you
usually have the most issues.

> How would these alternatives compare in I/O performance compared to the old setup?

Only testing can tell, but I've seen 4 SATA drives in RAID-10 with no
caching or fancy controller beat a 4 disk RAID-5 with BBU controller
more than once.

От:
Lars
Дата:

> No idea what mysql thinks a shard is, but in PG we have read-only hot standby's.
I used sharding as an expression for partitioning data into several databases.
Each user in the system is unaware of any other user. The user never accesses the private data of another user. Each
usercould in theory be assigned their own database server. This makes it easy to split the 40000 users over a number of
databaseservers. There are some shared data that is stored in a special "shared" database. 

> The standby database is exactly the same as the master (save a bit of
> data that has not been synced yet.)  I assume you know this... but I'd
> really recommend trying out PG's hot-standby and make sure it works the
> way you need (because I bet its different than mysql's).

> Assuming the "shared" and the "sharded" databases are totally different
> (lets call them database a and c), with the PG setup you'd have database
> a on one computer, then one master with database b on it (where all
> writes go), then several hot-standby's mirroring database b (that
> support read-only queries).
As our data is easily partitioned into any number of servers we do not plan to use replication for load balancing. We
dohowever plan to use it to set up a backup site. 

>  Its pretty hard to guess what your usage pattern is (70% read,
>   small columns, no big blobs (like photos), etc)... and even then we'd
> still have to guess.
It's more like 40% read 60% write.

> Not only will I not compare apples to oranges, but I really wont compare
> apples in Canada to oranges in Japan. :-)
Hehe

/Lars

От:
Lars
Дата:

> Are you going to RAID the SSD drives at all?
Yes, I was thinking four drives in RAID 10 and a (hot) spare drive...

> Of course this is based on my experience, and I have my fireproof suit since
> I mentioned the word fusionIO :)
Hehe

FusionIO has some impressive stats!
SSD in RAID10 provides redundancy in case of disc failure. How do you handle this with fusionIO? Two mirrored cards?

/Lars

От:
Lars
Дата:

Thanks for the reply!

MyISAM was chosen back in 2000. I'm not aware of the reasoning behind this choice...

Dell claims both the Samsung and the Pliant are safe to use.
Below is a quote from the Pliant datasheet:
"No Write Cache:
Pliant EFDs deliver outstanding
write performance
without any dependence on
write cache and thus does
not use battery/supercap."

> As others have mentioned, how are you going to be doing your "shards"?
Hmm... shards might not have been a good word to describe it. I'll paste what I wrote in another reply:
I used sharding as an expression for partitioning data into several databases.
Each user in the system is unaware of any other user. The user never accesses the private data of another user. Each
usercould in theory be assigned their own database server. This makes it easy to split the 40000 users over a number of
databaseservers. There are some shared data that is stored in a special "shared" database. 

/Lars

-----Ursprungligt meddelande-----
Från: mark [mailto:]
Skickat: den 19 januari 2011 05:10
Till: Lars
Kopia: 
Ämne: RE: [PERFORM] Migrating to Postgresql and new hardware

Comments in line, take em for what you paid for em.



> -----Original Message-----
> From:  [mailto:pgsql-performance-
> ] On Behalf Of Lars
> Sent: Tuesday, January 18, 2011 3:57 AM
> To: 
> Subject: [PERFORM] Migrating to Postgresql and new hardware
>
> Hi,
>
> We are in the process of moving a web based application from a MySql to
> Postgresql database.
> Our main reason for moving to Postgresql is problems with MySql
> (MyISAM) table locking.

I would never try and talk someone out of switching but.... MyISAM? What
version of MySQL and did you pick MyISAM for a good reason or just happened
to end up there?



> We will buy a new set of servers to run the Postgresql databases.
>
> The current setup is five Dell PowerEdge 2950 with 2 *  XEON E5410, 4GB
> RAM. PERC 5/I 256MB NV Cache, 4 * 10K Disks (3 in RAID 5 + 1 spare).
>
> One server is used for shared data.
> Four servers are used for sharded data. A user in the system only has
> data in one of the shards.
> There is another server to which all data is replicated but I'll leave
> that one out of this discussion.
> These are dedicated database servers. There are more or less no stored
> procedures. The shared database size is about 20GB and each shard
> database is about 40GB (total of 20 + 40 * 4 = 180GB). I would expect
> the size will grow 10%-15% this year. Server load might increase with
> 15%-30% this year. This setup is disk I/O bound. The overwhelming
> majority of sql statements are fast (typically single row selects,
> updates, inserts and deletes on primary key) but there are some slow
> long running (10min) queries.
>
> As new server we are thinking of PowerEdge R510, 1 * Xeon X5650, 24Gb
> RAM, H700 512MB NV Cache.

One would think you should notice a nice speed improvement, ceteris paribus,
since the X5650 will have ->significantly<- more memory bandwidth than the
5410s you are used to, and you are going to have a heck of a lot more ram
for things to cache in. I think the H700 is a step up in raid cards as well
but with only 4 disks your probably not maxing out there.



> Dell has offered two alternative SSDs:
> Samsung model SS805 (100GB Solid State Disk SATA 2.5").
> (http://www.plianttechnology.com/lightning_lb.php)
> Pliant model LB 150S (149GB Solid State Drive SAS 3Gbps 2.5").
> (http://www.samsung.com/global/business/semiconductor/products/SSD/Prod
> ucts_Enterprise_SSD.html)

The Samsung ones seems to indicate that they have protection in the event of
a power failure, and the pliant does not mention it.

Granted I haven't done or seen any pull the plug under max load tests on
either family, so I got nothing beyond that it is the first thing I have
looked at with every SSD that crosses my path.



>
> Both are SLC drives. The price of the Pliant is about 2,3 times the
> price of the Samsung (does it have twice the performance?).
>
> One alternative is 5 servers (1 shared and 4 shards) with 5 Samsung
> drives (4 in RAID 10 + 1 spare).
> Another alternative would be 3 servers (1 shared and 2 shards) with 5
> Pliant drives (4 in RAID 10 + 1 spare). This would be slightly more
> expensive than the first alternative but would be easier to upgrade
> with two new shard servers when it's needed.

As others have mentioned, how are you going to be doing your "shards"?



>
> Anyone have experience using the Samsung or the Pliant SSD? Any
> information about degraded performance over time?
> Any comments on the setups? How would an alternative with 15K disks (6
> RAID 10 + 1 spare, or even 10 RAID10 + 1 spare) compare?


You still may find that breaking xlog out to its own logical drive (2 drives
in raid 1) gives a speed improvement to the overall. YMMV - so tinker and
find out before you go deploying.

> How would these alternatives compare in I/O performance compared to the
> old setup?
> Anyone care to guess how the two alternatives would compare in
> performance running Postgresql?
> How would the hardware usage of Postgresql compare to MySqls?


I won't hazard a guess on the performance difference between PG w/ Fsync ON
and MySQL running with MyISAM.

If you can get your OS and PG tuned you should be able to have a database
that can have pretty decent throughput for an OLTP workload. Since that
seems to be the majority of your intended workload.


-Mark

>
>
>
> Regards
> /Lars
>
> --
> Sent via pgsql-performance mailing list (pgsql-
> )
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance


От:
Craig Ringer
Дата:

On 01/19/2011 05:09 PM, Lars wrote:
> Thanks for the reply!
>
> MyISAM was chosen back in 2000. I'm not aware of the reasoning behind this choice...
>
> Dell claims both the Samsung and the Pliant are safe to use.
> Below is a quote from the Pliant datasheet:
> "No Write Cache:
> Pliant EFDs deliver outstanding
> write performance
> without any dependence on
> write cache and thus does
> not use battery/supercap."

Er ... magic? I wouldn't trust them without details on *how* it achieves
good performance, and what "good" is.

Is there *any* device on the market that efficiently handles lots of
small writes?

>
>> As others have mentioned, how are you going to be doing your "shards"?
> Hmm... shards might not have been a good word to describe it. I'll paste what I wrote in another reply:
> I used sharding as an expression for partitioning data into several databases.

"sharding" or "shards" is pretty much the standard way that setup is
described. It doesn't come up on the Pg list a lot as most people doing
web-oriented horizontally scaled apps use MySQL or fashionable non-SQL
databases, but it's pretty well known in wider circles.

--
Craig Ringer

От:
Andy Colson
Дата:

On 1/19/2011 6:42 PM, Craig Ringer wrote:
> On 01/19/2011 05:09 PM, Lars wrote:
>> Thanks for the reply!
>>
>>
>>> As others have mentioned, how are you going to be doing your "shards"?
>> Hmm... shards might not have been a good word to describe it. I'll
>> paste what I wrote in another reply:
>> I used sharding as an expression for partitioning data into several
>> databases.
>
> "sharding" or "shards" is pretty much the standard way that setup is
> described. It doesn't come up on the Pg list a lot as most people doing
> web-oriented horizontally scaled apps use MySQL or fashionable non-SQL
> databases, but it's pretty well known in wider circles.
>
> --
> Craig Ringer
>


Or... PG is just so good we've never had to use more than one database
server!  :-)

-Andy

От:
Scott Marlowe
Дата:

On Thu, Jan 20, 2011 at 7:48 AM, Andy Colson <> wrote:

> Or... PG is just so good we've never had to use more than one database
> server!  :-)

Hehe, while you can do a lot with one server, there are some scenarios
where sharding is the answer.  I have a horror story about not
sharding when we should have I can tell you over a beer sometime.

От:
Greg Smith
Дата:

Lars wrote:
> Below is a quote from the Pliant datasheet:
> "No Write Cache:
> Pliant EFDs deliver outstanding
> write performance
> without any dependence on
> write cache and thus does
> not use battery/supercap."
>

I liked the article The Register wrote about them, with the headline
"Pliant's SSDs are awesome, says Pliant".  Of course they do.  Check out
the write benchmark figures in the information review at
http://oliveraaltonen.com/2010/09/29/preliminary-benchmark-results-of-the-pliant-ssd-drives/
to see how badly performance suffers on their design from those
decisions.  The Fusion I/O devices get nearly an order of magnitude more
write IOPS in those tests.

As far as I've been able to tell, what Pliant does is just push writes
out all the time without waiting for them to be aligned with block
sizes, followed by cleaning up the wreckage later via their internal
automatic maintenance ASICs (it's sort of an always on TRIM
implementation if I'm guessing right).  That has significant limitations
both in regards to total write speed as well as device longevity.  For a
database, I'd much rather have a supercap and get ultimate write
performance without those downsides.  Depends on the read/write ratio
though; I could see a heavily read-biased system work well with their
approach.  Of course, a heavily read-based system would be better served
by having a ton of RAM instead in most cases.

Could be worst though--they could be misleading about the whole topic of
write durability like Intel is.  I consider claiming high performance
when you don't always really have it, what Pliant is doing here, to be a
much lesser sin than losing data at random and not being clear about
when that can happen.  I'd like FusionIO to put a big "expect your
server to be down for many minutes after a power interruption" warning
on their drives, too, while I'm wishing for complete vendor transparency
here.

--
Greg Smith   2ndQuadrant US       Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books


От:
Lars
Дата:

Interesting.
Would have been nice if the test was with a raid-10 setup as raid-5 is not very good for writes...

Would you get much of a performance increase with a write-cached ssd even if you got a raid controller with
(battery-backed)cache? 

/Lars

-----Ursprungligt meddelande-----
Från: Greg Smith [mailto:]
Skickat: den 29 januari 2011 01:27
Till: Lars
Kopia: mark; 
Ämne: Re: [PERFORM] Migrating to Postgresql and new hardware

Lars wrote:
> Below is a quote from the Pliant datasheet:
> "No Write Cache:
> Pliant EFDs deliver outstanding
> write performance
> without any dependence on
> write cache and thus does
> not use battery/supercap."
>

I liked the article The Register wrote about them, with the headline
"Pliant's SSDs are awesome, says Pliant".  Of course they do.  Check out
the write benchmark figures in the information review at
http://oliveraaltonen.com/2010/09/29/preliminary-benchmark-results-of-the-pliant-ssd-drives/
to see how badly performance suffers on their design from those
decisions.  The Fusion I/O devices get nearly an order of magnitude more
write IOPS in those tests.

As far as I've been able to tell, what Pliant does is just push writes
out all the time without waiting for them to be aligned with block
sizes, followed by cleaning up the wreckage later via their internal
automatic maintenance ASICs (it's sort of an always on TRIM
implementation if I'm guessing right).  That has significant limitations
both in regards to total write speed as well as device longevity.  For a
database, I'd much rather have a supercap and get ultimate write
performance without those downsides.  Depends on the read/write ratio
though; I could see a heavily read-biased system work well with their
approach.  Of course, a heavily read-based system would be better served
by having a ton of RAM instead in most cases.

Could be worst though--they could be misleading about the whole topic of
write durability like Intel is.  I consider claiming high performance
when you don't always really have it, what Pliant is doing here, to be a
much lesser sin than losing data at random and not being clear about
when that can happen.  I'd like FusionIO to put a big "expect your
server to be down for many minutes after a power interruption" warning
on their drives, too, while I'm wishing for complete vendor transparency
here.

--
Greg Smith   2ndQuadrant US       Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books