Обсуждение: SSD options, small database, ZFS

Поиск
Список
Период
Сортировка

SSD options, small database, ZFS

От
CSS
Дата:
Hello all,

I've spent some time looking through previous posts regarding
postgres and SSD drives and have also been reading up on the
subject of SSDs in general elsewhere.

Some quick background:

We're currently looking at changing our basic database setup as we
migrate away from some rather old (circa 2006 or so) hardware to
more current gear.  Our previous setup basically consisted of two
types of 1U boxes - dual-xeon with a low-end Adaptec RAID card
paired with two U320 or U160 10K SCSI drives for database and light
web frontend and mail submission duties and single P4 and dual PIII
boxes with commodity drives handling mail deliveries.  Most of the
delivery servers have been migrated to current hardware (single and
dual quad-core xeons, 4 WD RE3 SATA drives, FreeBSD w/ZFS), and
we've also moved a handful of the db/web services to one of these
servers just to see what a large increase in RAM and CPU can do for
us.  As we sort of expected, the performance overall was better,
but there's a serious disk IO wall we hit when running nightly jobs
that are write heavy and long-running.

There are currently four db/web servers that have a combined total
of 133GB of on-disk postgres data.  Given the age of the servers,
the fact that each only has two drives on rather anemic RAID
controllers with no cache/BBU, and none has more than 2GB RAM I
think it's a safe bet that consolidating this down to two modern
servers can give us better performance, allow for growth over the
next few years, and let us split db duties from web/mail
submission.  One live db server, one standby.

And this is where SSDs come in.  We're not looking at terabytes of
data here, and I don't see us growing much beyond our current
database size in the next few years.  SSDs are getting cheap enough
that this feels like a good match - we can afford CPU and RAM, we
can't afford some massive 24 drive monster and a pile of SAS
drives.  The current db boxes top out at 300 IOPS, the SATA boxes
maybe about half that.  If I can achieve 300x4 IOPS (or better,
preferably...) on 2 or 4 SSDs, I'm pretty much sold.

From what I've been reading here, this sounds quite possible.  I
understand the Intel 320 series are the best "bargain" option since
they can survive an unexpected shutdown, so I'm not going to go
looking beyond that - OCZ makes me a bit nervous and finding a
clear list of which drives have the "supercapacitor" has been
difficult.  I have no qualms about buying enough SSDs to mirror
them all.  I am aware I need to have automated alerts for various
SMART stats so I know when danger is imminent.  I know I need to
have this box replicated even with mirrors and monitoring up the
wazoo.

Here's my remaining questions:

-I'm calling our combined databases at 133GB "small", fair
assumption?  -Is there any chance that a server with dual quad core
xeons, 32GB RAM, and 2 or 4 SSDs (assume mirrored) could be slower
than the 4 old servers described above?  I'm beating those on raw
cpu, quadrupling the amount of RAM (and consolidating said RAM),
and going from disks that top out at 4x300 IOPS with SSDs that
conservatively should provide 2000 IOPS.

-We're also finally automating more stuff and trying to standardize
server configs.  One tough decision we made that has paid off quite
well was to move to ZFS.  We find the features helpful to admin
tasks outweigh the drawbacks and RAM is cheap enough that we can
deal with its tendency to eat RAM.  Is ZFS + Postgres + SSDs a bad
combo?

-Should I even be looking at the option of ZFS on SATA or low-end
SAS drives and ZIL and L2ARC on SSDs?  Initially this intrigued me,
but I can't quite get my head around how the SSD-based ZIL can deal
with flushing the metadata out when the whole system is under any
sort of extreme write-heavy load - I mean if the ZIL is absorbing
2000 IOPS of metadata writes, at some point it has to get full as
it's trying to flush this data to much slower spinning drives.

-Should my standby box be the same configuration or should I look
at actual spinning disks on that?  How rough is replication on the
underlying storage?  Would the total data written on the slave be
less or equal to the master?

Any input is appreciated.  I did really mean for this to be a much
shorter post...

Thanks,

Charles

Re: SSD options, small database, ZFS

От
Arjen van der Meijden
Дата:
On 14-10-2011 10:23, CSS wrote:
> -I'm calling our combined databases at 133GB "small", fair
> assumption?  -Is there any chance that a server with dual quad core
> xeons, 32GB RAM, and 2 or 4 SSDs (assume mirrored) could be slower
> than the 4 old servers described above?  I'm beating those on raw
> cpu, quadrupling the amount of RAM (and consolidating said RAM),
> and going from disks that top out at 4x300 IOPS with SSDs that
> conservatively should provide 2000 IOPS.

Whether 133GB is small or not probably mostly depends on how much of it
is actually touched during use. But I'd agree that it isn't a terribly
large database, I'd guess a few simple SSDs would be plenty to achieve
2000 IOPs. For lineair writes, they're still not really faster than
normal disks, but if that's combined with random access (either read or
write) you ought to be ok.
We went from 15x 15k sas-disks to 6x ssd several years back in our
MySQL-box, but since we also increased the ram from 16GB to 72GB, the
io-load dropped so much the ssd's are normally only lightly loaded...

Btw, the 5500 and 5600 Xeons are normally more efficient with a multiple
of 6 ram-modules, so you may want to have a look at 24GB (6x4), 36GB
(6x4+6x2) or 48GB (12x4 or 6x8) RAM.

Given the historical questions on the list, there is always a risk of
getting slower queries with hardware that should be much faster. For
instance, the huge increase in RAM may trigger a less efficient
query-plan. Or the disks abide by the flush-policies more correctly.
Assuming the queries are still getting good plans and there are no such
special differences, I'd agree with the assumption that its a win on
every count.
Or your update to a newer OS and PostgreSQL may trigger some worse query
plan or hardware-usage.

> -Should I even be looking at the option of ZFS on SATA or low-end
> SAS drives and ZIL and L2ARC on SSDs?  Initially this intrigued me,
> but I can't quite get my head around how the SSD-based ZIL can deal
> with flushing the metadata out when the whole system is under any
> sort of extreme write-heavy load - I mean if the ZIL is absorbing
> 2000 IOPS of metadata writes, at some point it has to get full as
> it's trying to flush this data to much slower spinning drives.

A fail-safe set-up with SSD's in ZFS assumes at least 3 in total, i.e. a
pair of SSD's for ZIL and as many as you want for L2ARC. Given your
database size, 4x160GB SSD (in "raid10") or 2x 300GB should yield plenty
of space. So given the same choice, I wouldn't bother with a set of
large capacity sata disks and ZIL/L2ARC-SSD's, I'd just go with 4x160GB
or 2x300GB SSD's.

> -Should my standby box be the same configuration or should I look
> at actual spinning disks on that?  How rough is replication on the
> underlying storage?  Would the total data written on the slave be
> less or equal to the master?

How bad is it for you if the performance of your database potentially
drops a fair bit when your slave becomes the master? If you have a
read-mostly database, you may not even need SSD's in your master-db
(given your amount of RAM). But honestly, I don't know the answer to
this question :)

Good luck with your choices,
Best regards,

Arjen

Re: SSD options, small database, ZFS

От
CSS
Дата:
Resurrecting this long-dormant thread...

On Oct 14, 2011, at 6:41 AM, Arjen van der Meijden wrote:

> On 14-10-2011 10:23, CSS wrote:
>> -I'm calling our combined databases at 133GB "small", fair
>> assumption?  -Is there any chance that a server with dual quad core
>> xeons, 32GB RAM, and 2 or 4 SSDs (assume mirrored) could be slower
>> than the 4 old servers described above?  I'm beating those on raw
>> cpu, quadrupling the amount of RAM (and consolidating said RAM),
>> and going from disks that top out at 4x300 IOPS with SSDs that
>> conservatively should provide 2000 IOPS.
>
> Whether 133GB is small or not probably mostly depends on how much of it is actually touched during use. But I'd agree
thatit isn't a terribly large database, I'd guess a few simple SSDs would be plenty to achieve 2000 IOPs. For lineair
writes,they're still not really faster than normal disks, but if that's combined with random access (either read or
write)you ought to be ok. 
> We went from 15x 15k sas-disks to 6x ssd several years back in our MySQL-box, but since we also increased the ram
from16GB to 72GB, the io-load dropped so much the ssd's are normally only lightly loaded... 

Thanks for your input on this.  It's taken some time, but I do finally have some hardware on hand
(http://imgur.com/LEC5I)and as more trickles in over the coming days, I'll be putting together our first SSD-based
postgresbox.  I have much testing to do, and I'm going to have some questions regarding that subject in another thread. 

> Btw, the 5500 and 5600 Xeons are normally more efficient with a multiple of 6 ram-modules, so you may want to have a
lookat 24GB (6x4), 36GB (6x4+6x2) or 48GB (12x4 or 6x8) RAM. 

Thanks - I really had a hard time wrapping my head around the rules on populating the banks.  If I understand it
correctly,this is due to the memory controller moving from the south(?)bridge to being integrated in the CPU. 

> Given the historical questions on the list, there is always a risk of getting slower queries with hardware that
shouldbe much faster. For instance, the huge increase in RAM may trigger a less efficient query-plan. Or the disks
abideby the flush-policies more correctly. 
> Assuming the queries are still getting good plans and there are no such special differences, I'd agree with the
assumptionthat its a win on every count. 
> Or your update to a newer OS and PostgreSQL may trigger some worse query plan or hardware-usage.

That's an interesting point, I'd not even considered that.  Is there any sort of simple documentation on the query
plannerthat might cover how things like increased RAM could impact how a query is executed? 

>> -Should I even be looking at the option of ZFS on SATA or low-end
>> SAS drives and ZIL and L2ARC on SSDs?  Initially this intrigued me,
>> but I can't quite get my head around how the SSD-based ZIL can deal
>> with flushing the metadata out when the whole system is under any
>> sort of extreme write-heavy load - I mean if the ZIL is absorbing
>> 2000 IOPS of metadata writes, at some point it has to get full as
>> it's trying to flush this data to much slower spinning drives.
>
> A fail-safe set-up with SSD's in ZFS assumes at least 3 in total, i.e. a pair of SSD's for ZIL and as many as you
wantfor L2ARC. Given your database size, 4x160GB SSD (in "raid10") or 2x 300GB should yield plenty of space. So given
thesame choice, I wouldn't bother with a set of large capacity sata disks and ZIL/L2ARC-SSD's, I'd just go with 4x160GB
or2x300GB SSD's. 

Well, I've bought 4x160GB, so that's what I'll use.  I will still do some tests with two SATA drives plus ZIL, just to
seewhat happens. 

>
>> -Should my standby box be the same configuration or should I look
>> at actual spinning disks on that?  How rough is replication on the
>> underlying storage?  Would the total data written on the slave be
>> less or equal to the master?
>
> How bad is it for you if the performance of your database potentially drops a fair bit when your slave becomes the
master?If you have a read-mostly database, you may not even need SSD's in your master-db (given your amount of RAM).
Buthonestly, I don't know the answer to this question :) 

It's complicated - during the day we're mostly looking at very scattered reads and writes, probably a bit biased
towardswrites.  But each evening we kick off a number of jobs to pre-generate stats for more complex queries...  If the
jobcould still complete in 6-8 hours, we'd probably be OK, but if it starts clogging up our normal queries during the
day,that would be a problem. 

Thanks again for your input!

Charles

>
> Good luck with your choices,
> Best regards,
>
> Arjen
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance


Re: SSD options, small database, ZFS

От
Arjen van der Meijden
Дата:

On 18-11-2011 4:44 CSS wrote:
> Resurrecting this long-dormant thread...
>> Btw, the 5500 and 5600 Xeons are normally more efficient with a multiple of 6 ram-modules, so you may want to have a
lookat 24GB (6x4), 36GB (6x4+6x2) or 48GB (12x4 or 6x8) RAM. 
>
> Thanks - I really had a hard time wrapping my head around the rules on populating the banks.  If I understand it
correctly,this is due to the memory controller moving from the south(?)bridge to being integrated in the CPU. 

That's not complete. A while back Intel introduced an integrated memory
controller in the Xeon's (I think it was with the 5500). And doing so,
they brought NUMA to the mainstream Xeons (Opterons had been doing that
from the start).
The memory controllers in 5500/5600 are "triple channel". I.e. they can
distribute their work over three memory channels at the same time. The
next generation E5 Xeon's will have "quad channel", so it'll be going
even faster with module count than.

With these kinds of cpu's its normally best to have increments of "num
channels"*"num cpu" memory modules for optimal performance. I.e. with
one "triple channel" cpu, you'd increment with three at the time, with
two cpu's you'd go with six.

Having said that, it will work with many different amounts of memory
modules, just at a (slight?) disadvantage compared to the optimal setting.

Best regards,

Arjen

Re: SSD options, small database, ZFS

От
Greg Smith
Дата:
On 11/17/2011 10:44 PM, CSS wrote:
> Is there any sort of simple documentation on the query planner that
> might cover how things like increased RAM could impact how a query is
> executed?

There is no *simple* documentation on any part of the query planner
that's also accurate.  Query planning is inherently complicated.

I think this point wasn't quite made clearly.  PostgreSQL has no idea
how much memory is in your system; it doesn't try to guess or detect
it.  However, when people move from one system to a larger one, they
tend to increase some of the query planning parameters in the
postgresql.conf to reflect the new capacity.  That type of change can
cause various types of query plan changes.  Let's say your old system
has 16GB of RAM and you set effective_cache_size to 12GB; if you upgrade
to a 64GB server, it seems logical to increase that value to 48GB to
keep the same proportions.  But that will can you different plans, and
it's possible they will be worse.  There's a similar concern if you
change work_mem because you have more memory, because that will alter
how plans do things like sorting and hashing

But you don't have to make any changes.  You can migrate to the new
hardware with zero modifications to the Postgres configuration, then
introduce changes later.

The whole memorys speed topic is also much more complicated than any
simple explanation can cover.  How many banks of RAM you can use
effectively changes based on the number of CPUs and associated chipset
too.  Someone just sent me an explanation recently of why I was seeing
some strange things on my stream-scaling benchmark program.  That dove
into a bunch of trivia around how the RAM is actually accessed on the
motherboard.  One of the reasons I keep so many samples on that
program's page is to help people navigate this whole maze, and have some
data points to set expectations against.  See
https://github.com/gregs1104/stream-scaling for the code and the samples.

--
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us


Re: SSD options, small database, ZFS

От
Scott Marlowe
Дата:
On Fri, Nov 18, 2011 at 3:09 AM, Greg Smith <greg@2ndquadrant.com> wrote:
> On 11/17/2011 10:44 PM, CSS wrote:
>>
>> Is there any sort of simple documentation on the query planner that might
>> cover how things like increased RAM could impact how a query is executed?
>
> There is no *simple* documentation on any part of the query planner that's
> also accurate.  Query planning is inherently complicated.
>
> I think this point wasn't quite made clearly.  PostgreSQL has no idea how
> much memory is in your system; it doesn't try to guess or detect it.

effective_cache_size tells the db how much memory you have.  Since you
have to set it, it can be anything you want, but if you've set it to
something much higher on the new machine then it can affect query
planning.

Re: SSD options, small database, ZFS

От
"Tomas Vondra"
Дата:
On 18 Listopad 2011, 17:17, Scott Marlowe wrote:
> On Fri, Nov 18, 2011 at 3:09 AM, Greg Smith <greg@2ndquadrant.com> wrote:
>> On 11/17/2011 10:44 PM, CSS wrote:
>>>
>>> Is there any sort of simple documentation on the query planner that
>>> might
>>> cover how things like increased RAM could impact how a query is
>>> executed?
>>
>> There is no *simple* documentation on any part of the query planner
>> that's
>> also accurate.  Query planning is inherently complicated.
>>
>> I think this point wasn't quite made clearly.  PostgreSQL has no idea
>> how
>> much memory is in your system; it doesn't try to guess or detect it.
>
> effective_cache_size tells the db how much memory you have.  Since you
> have to set it, it can be anything you want, but if you've set it to
> something much higher on the new machine then it can affect query
> planning.

That's only half of the truth. effective_cache_size is used to estimate
the page cache hit ratio, nothing else. It influences the planning a bit
(AFAIK it's used only to estimate a nested loop with inner index scan) but
it has no impact on things like work_mem, maintenance_work_mem,
wal_buffers etc.

People often bump these settings up (especially work_mem) on new hw
without properly testing the impact. PostgreSQL will happily do that
because it was commanded to, but then the system starts swapping or the
OOM killer starts shooting the processes.

Tomas


Re: SSD options, small database, ZFS

От
Amitabh Kant
Дата:
On Fri, Nov 18, 2011 at 3:39 PM, Greg Smith <greg@2ndquadrant.com> wrote:
On 11/17/2011 10:44 PM, CSS wrote:
Is there any sort of simple documentation on the query planner that might cover how things like increased RAM could impact how a query is executed?

There is no *simple* documentation on any part of the query planner that's also accurate.  Query planning is inherently complicated.

I think this point wasn't quite made clearly.  PostgreSQL has no idea how much memory is in your system; it doesn't try to guess or detect it.  However, when people move from one system to a larger one, they tend to increase some of the query planning parameters in the postgresql.conf to reflect the new capacity.  That type of change can cause various types of query plan changes.  Let's say your old system has 16GB of RAM and you set effective_cache_size to 12GB; if you upgrade to a 64GB server, it seems logical to increase that value to 48GB to keep the same proportions.  But that will can you different plans, and it's possible they will be worse.  There's a similar concern if you change work_mem because you have more memory, because that will alter how plans do things like sorting and hashing

But you don't have to make any changes.  You can migrate to the new hardware with zero modifications to the Postgres configuration, then introduce changes later.

The whole memorys speed topic is also much more complicated than any simple explanation can cover.  How many banks of RAM you can use effectively changes based on the number of CPUs and associated chipset too.  Someone just sent me an explanation recently of why I was seeing some strange things on my stream-scaling benchmark program.  That dove into a bunch of trivia around how the RAM is actually accessed on the motherboard.  One of the reasons I keep so many samples on that program's page is to help people navigate this whole maze, and have some data points to set expectations against.  See https://github.com/gregs1104/stream-scaling for the code and the samples.

--
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us


 
Greg

On a slightly unrelated note, you had once (http://archives.postgresql.org/pgsql-general/2011-08/msg00944.php) said to limit shared_buffers max to 8 GB on Linux and leave the rest for OS caching. Does the same advice hold on FreeBSD systems too?


Amitabh

Re: SSD options, small database, ZFS

От
Bruce Momjian
Дата:
Greg Smith wrote:
> The whole memorys speed topic is also much more complicated than any
> simple explanation can cover.  How many banks of RAM you can use
> effectively changes based on the number of CPUs and associated chipset
> too.  Someone just sent me an explanation recently of why I was seeing
> some strange things on my stream-scaling benchmark program.  That dove
> into a bunch of trivia around how the RAM is actually accessed on the
> motherboard.  One of the reasons I keep so many samples on that
> program's page is to help people navigate this whole maze, and have some
> data points to set expectations against.  See
> https://github.com/gregs1104/stream-scaling for the code and the samples.

I can confirm that a Xeon E5620 CPU wants memory to be in multiples of
3, and a dual-CPU 5620 system needs memory in multiples of 6.  (I
installed 12 2GB modules.)

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + It's impossible for everything to be true. +

Re: SSD options, small database, ZFS

От
Bruce Momjian
Дата:
Amitabh Kant wrote:
> > The whole memorys speed topic is also much more complicated than any
> > simple explanation can cover.  How many banks of RAM you can use
> > effectively changes based on the number of CPUs and associated chipset too.
> >  Someone just sent me an explanation recently of why I was seeing some
> > strange things on my stream-scaling benchmark program.  That dove into a
> > bunch of trivia around how the RAM is actually accessed on the motherboard.
> >  One of the reasons I keep so many samples on that program's page is to
> > help people navigate this whole maze, and have some data points to set
> > expectations against.  See
https://github.com/gregs1104/**stream-scaling<https://github.com/gregs1104/stream-scaling>forthe code and the samples. 
> >
> > --
> > Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
> > PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
> >
> >
> >
> Greg
>
> On a slightly unrelated note, you had once (
> http://archives.postgresql.org/pgsql-general/2011-08/msg00944.php) said to
> limit shared_buffers max to 8 GB on Linux and leave the rest for OS
> caching. Does the same advice hold on FreeBSD systems too?

Hard to say.  We don't know why this is happening but we are guessing it
is the overhead of managing over one million shared buffers.  Please
test and let us know.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + It's impossible for everything to be true. +

Re: SSD options, small database, ZFS

От
Amitabh Kant
Дата:

On Tue, Nov 22, 2011 at 11:41 PM, Bruce Momjian <bruce@momjian.us> wrote:
Amitabh Kant wrote:> >
>
> On a slightly unrelated note, you had once (
> http://archives.postgresql.org/pgsql-general/2011-08/msg00944.php) said to
> limit shared_buffers max to 8 GB on Linux and leave the rest for OS
> caching. Does the same advice hold on FreeBSD systems too?

Hard to say.  We don't know why this is happening but we are guessing it
is the overhead of managing over one million shared buffers.  Please
test and let us know.

--
 Bruce Momjian  <bruce@momjian.us>        http://momjian.us
 EnterpriseDB                             http://enterprisedb.com

 
I do have a FreeBSD 8.0 server running Postgresql 8.4.9 with 32 GB of RAM (Dual Processor, SAS HD/RAID 10 with BBU for data and SAS HD/RAID 1 for pg_xlog).  The current settings set effective cache size to 22 GB (pgtune generated values). The server sees around between 500 to 1200 TPS and chugs along pretty nicely. Sadly it's in production so I am not in a position to run any tests on it. Changed values for postgresql.conf are:
==========================================================
maintenance_work_mem = 1GB # pg_generate_conf wizard 2010-05-09
checkpoint_completion_target = 0.9 # pg_generate_conf wizard 2010-05-09
effective_cache_size = 22GB # pg_generate_conf wizard 2010-05-09
work_mem = 320MB # pg_generate_conf wizard 2010-05-09
wal_buffers = 8MB # pg_generate_conf wizard 2010-05-09
checkpoint_segments = 16 # pg_generate_conf wizard 2010-05-09
shared_buffers = 3840MB # pg_generate_conf wizard 2010-05-09
==========================================================

I do have another server that is to go into production(48 GB RAM, dual processor, Intel 710 SSD for data in RAID 1, SAS HD/RAID 1 for pg_xlog). Apart from running pgbench and bonnie++, is there something else that I should be testing on the new server? Greg's stream scaling seems to be for linux, so not sure if it will work in FreeBSD.

Amitabh