Обсуждение: SSD options, small database, ZFS
Hello all, I've spent some time looking through previous posts regarding postgres and SSD drives and have also been reading up on the subject of SSDs in general elsewhere. Some quick background: We're currently looking at changing our basic database setup as we migrate away from some rather old (circa 2006 or so) hardware to more current gear. Our previous setup basically consisted of two types of 1U boxes - dual-xeon with a low-end Adaptec RAID card paired with two U320 or U160 10K SCSI drives for database and light web frontend and mail submission duties and single P4 and dual PIII boxes with commodity drives handling mail deliveries. Most of the delivery servers have been migrated to current hardware (single and dual quad-core xeons, 4 WD RE3 SATA drives, FreeBSD w/ZFS), and we've also moved a handful of the db/web services to one of these servers just to see what a large increase in RAM and CPU can do for us. As we sort of expected, the performance overall was better, but there's a serious disk IO wall we hit when running nightly jobs that are write heavy and long-running. There are currently four db/web servers that have a combined total of 133GB of on-disk postgres data. Given the age of the servers, the fact that each only has two drives on rather anemic RAID controllers with no cache/BBU, and none has more than 2GB RAM I think it's a safe bet that consolidating this down to two modern servers can give us better performance, allow for growth over the next few years, and let us split db duties from web/mail submission. One live db server, one standby. And this is where SSDs come in. We're not looking at terabytes of data here, and I don't see us growing much beyond our current database size in the next few years. SSDs are getting cheap enough that this feels like a good match - we can afford CPU and RAM, we can't afford some massive 24 drive monster and a pile of SAS drives. The current db boxes top out at 300 IOPS, the SATA boxes maybe about half that. If I can achieve 300x4 IOPS (or better, preferably...) on 2 or 4 SSDs, I'm pretty much sold. From what I've been reading here, this sounds quite possible. I understand the Intel 320 series are the best "bargain" option since they can survive an unexpected shutdown, so I'm not going to go looking beyond that - OCZ makes me a bit nervous and finding a clear list of which drives have the "supercapacitor" has been difficult. I have no qualms about buying enough SSDs to mirror them all. I am aware I need to have automated alerts for various SMART stats so I know when danger is imminent. I know I need to have this box replicated even with mirrors and monitoring up the wazoo. Here's my remaining questions: -I'm calling our combined databases at 133GB "small", fair assumption? -Is there any chance that a server with dual quad core xeons, 32GB RAM, and 2 or 4 SSDs (assume mirrored) could be slower than the 4 old servers described above? I'm beating those on raw cpu, quadrupling the amount of RAM (and consolidating said RAM), and going from disks that top out at 4x300 IOPS with SSDs that conservatively should provide 2000 IOPS. -We're also finally automating more stuff and trying to standardize server configs. One tough decision we made that has paid off quite well was to move to ZFS. We find the features helpful to admin tasks outweigh the drawbacks and RAM is cheap enough that we can deal with its tendency to eat RAM. Is ZFS + Postgres + SSDs a bad combo? -Should I even be looking at the option of ZFS on SATA or low-end SAS drives and ZIL and L2ARC on SSDs? Initially this intrigued me, but I can't quite get my head around how the SSD-based ZIL can deal with flushing the metadata out when the whole system is under any sort of extreme write-heavy load - I mean if the ZIL is absorbing 2000 IOPS of metadata writes, at some point it has to get full as it's trying to flush this data to much slower spinning drives. -Should my standby box be the same configuration or should I look at actual spinning disks on that? How rough is replication on the underlying storage? Would the total data written on the slave be less or equal to the master? Any input is appreciated. I did really mean for this to be a much shorter post... Thanks, Charles
On 14-10-2011 10:23, CSS wrote: > -I'm calling our combined databases at 133GB "small", fair > assumption? -Is there any chance that a server with dual quad core > xeons, 32GB RAM, and 2 or 4 SSDs (assume mirrored) could be slower > than the 4 old servers described above? I'm beating those on raw > cpu, quadrupling the amount of RAM (and consolidating said RAM), > and going from disks that top out at 4x300 IOPS with SSDs that > conservatively should provide 2000 IOPS. Whether 133GB is small or not probably mostly depends on how much of it is actually touched during use. But I'd agree that it isn't a terribly large database, I'd guess a few simple SSDs would be plenty to achieve 2000 IOPs. For lineair writes, they're still not really faster than normal disks, but if that's combined with random access (either read or write) you ought to be ok. We went from 15x 15k sas-disks to 6x ssd several years back in our MySQL-box, but since we also increased the ram from 16GB to 72GB, the io-load dropped so much the ssd's are normally only lightly loaded... Btw, the 5500 and 5600 Xeons are normally more efficient with a multiple of 6 ram-modules, so you may want to have a look at 24GB (6x4), 36GB (6x4+6x2) or 48GB (12x4 or 6x8) RAM. Given the historical questions on the list, there is always a risk of getting slower queries with hardware that should be much faster. For instance, the huge increase in RAM may trigger a less efficient query-plan. Or the disks abide by the flush-policies more correctly. Assuming the queries are still getting good plans and there are no such special differences, I'd agree with the assumption that its a win on every count. Or your update to a newer OS and PostgreSQL may trigger some worse query plan or hardware-usage. > -Should I even be looking at the option of ZFS on SATA or low-end > SAS drives and ZIL and L2ARC on SSDs? Initially this intrigued me, > but I can't quite get my head around how the SSD-based ZIL can deal > with flushing the metadata out when the whole system is under any > sort of extreme write-heavy load - I mean if the ZIL is absorbing > 2000 IOPS of metadata writes, at some point it has to get full as > it's trying to flush this data to much slower spinning drives. A fail-safe set-up with SSD's in ZFS assumes at least 3 in total, i.e. a pair of SSD's for ZIL and as many as you want for L2ARC. Given your database size, 4x160GB SSD (in "raid10") or 2x 300GB should yield plenty of space. So given the same choice, I wouldn't bother with a set of large capacity sata disks and ZIL/L2ARC-SSD's, I'd just go with 4x160GB or 2x300GB SSD's. > -Should my standby box be the same configuration or should I look > at actual spinning disks on that? How rough is replication on the > underlying storage? Would the total data written on the slave be > less or equal to the master? How bad is it for you if the performance of your database potentially drops a fair bit when your slave becomes the master? If you have a read-mostly database, you may not even need SSD's in your master-db (given your amount of RAM). But honestly, I don't know the answer to this question :) Good luck with your choices, Best regards, Arjen
Resurrecting this long-dormant thread... On Oct 14, 2011, at 6:41 AM, Arjen van der Meijden wrote: > On 14-10-2011 10:23, CSS wrote: >> -I'm calling our combined databases at 133GB "small", fair >> assumption? -Is there any chance that a server with dual quad core >> xeons, 32GB RAM, and 2 or 4 SSDs (assume mirrored) could be slower >> than the 4 old servers described above? I'm beating those on raw >> cpu, quadrupling the amount of RAM (and consolidating said RAM), >> and going from disks that top out at 4x300 IOPS with SSDs that >> conservatively should provide 2000 IOPS. > > Whether 133GB is small or not probably mostly depends on how much of it is actually touched during use. But I'd agree thatit isn't a terribly large database, I'd guess a few simple SSDs would be plenty to achieve 2000 IOPs. For lineair writes,they're still not really faster than normal disks, but if that's combined with random access (either read or write)you ought to be ok. > We went from 15x 15k sas-disks to 6x ssd several years back in our MySQL-box, but since we also increased the ram from16GB to 72GB, the io-load dropped so much the ssd's are normally only lightly loaded... Thanks for your input on this. It's taken some time, but I do finally have some hardware on hand (http://imgur.com/LEC5I)and as more trickles in over the coming days, I'll be putting together our first SSD-based postgresbox. I have much testing to do, and I'm going to have some questions regarding that subject in another thread. > Btw, the 5500 and 5600 Xeons are normally more efficient with a multiple of 6 ram-modules, so you may want to have a lookat 24GB (6x4), 36GB (6x4+6x2) or 48GB (12x4 or 6x8) RAM. Thanks - I really had a hard time wrapping my head around the rules on populating the banks. If I understand it correctly,this is due to the memory controller moving from the south(?)bridge to being integrated in the CPU. > Given the historical questions on the list, there is always a risk of getting slower queries with hardware that shouldbe much faster. For instance, the huge increase in RAM may trigger a less efficient query-plan. Or the disks abideby the flush-policies more correctly. > Assuming the queries are still getting good plans and there are no such special differences, I'd agree with the assumptionthat its a win on every count. > Or your update to a newer OS and PostgreSQL may trigger some worse query plan or hardware-usage. That's an interesting point, I'd not even considered that. Is there any sort of simple documentation on the query plannerthat might cover how things like increased RAM could impact how a query is executed? >> -Should I even be looking at the option of ZFS on SATA or low-end >> SAS drives and ZIL and L2ARC on SSDs? Initially this intrigued me, >> but I can't quite get my head around how the SSD-based ZIL can deal >> with flushing the metadata out when the whole system is under any >> sort of extreme write-heavy load - I mean if the ZIL is absorbing >> 2000 IOPS of metadata writes, at some point it has to get full as >> it's trying to flush this data to much slower spinning drives. > > A fail-safe set-up with SSD's in ZFS assumes at least 3 in total, i.e. a pair of SSD's for ZIL and as many as you wantfor L2ARC. Given your database size, 4x160GB SSD (in "raid10") or 2x 300GB should yield plenty of space. So given thesame choice, I wouldn't bother with a set of large capacity sata disks and ZIL/L2ARC-SSD's, I'd just go with 4x160GB or2x300GB SSD's. Well, I've bought 4x160GB, so that's what I'll use. I will still do some tests with two SATA drives plus ZIL, just to seewhat happens. > >> -Should my standby box be the same configuration or should I look >> at actual spinning disks on that? How rough is replication on the >> underlying storage? Would the total data written on the slave be >> less or equal to the master? > > How bad is it for you if the performance of your database potentially drops a fair bit when your slave becomes the master?If you have a read-mostly database, you may not even need SSD's in your master-db (given your amount of RAM). Buthonestly, I don't know the answer to this question :) It's complicated - during the day we're mostly looking at very scattered reads and writes, probably a bit biased towardswrites. But each evening we kick off a number of jobs to pre-generate stats for more complex queries... If the jobcould still complete in 6-8 hours, we'd probably be OK, but if it starts clogging up our normal queries during the day,that would be a problem. Thanks again for your input! Charles > > Good luck with your choices, > Best regards, > > Arjen > > -- > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance
On 18-11-2011 4:44 CSS wrote: > Resurrecting this long-dormant thread... >> Btw, the 5500 and 5600 Xeons are normally more efficient with a multiple of 6 ram-modules, so you may want to have a lookat 24GB (6x4), 36GB (6x4+6x2) or 48GB (12x4 or 6x8) RAM. > > Thanks - I really had a hard time wrapping my head around the rules on populating the banks. If I understand it correctly,this is due to the memory controller moving from the south(?)bridge to being integrated in the CPU. That's not complete. A while back Intel introduced an integrated memory controller in the Xeon's (I think it was with the 5500). And doing so, they brought NUMA to the mainstream Xeons (Opterons had been doing that from the start). The memory controllers in 5500/5600 are "triple channel". I.e. they can distribute their work over three memory channels at the same time. The next generation E5 Xeon's will have "quad channel", so it'll be going even faster with module count than. With these kinds of cpu's its normally best to have increments of "num channels"*"num cpu" memory modules for optimal performance. I.e. with one "triple channel" cpu, you'd increment with three at the time, with two cpu's you'd go with six. Having said that, it will work with many different amounts of memory modules, just at a (slight?) disadvantage compared to the optimal setting. Best regards, Arjen
On 11/17/2011 10:44 PM, CSS wrote: > Is there any sort of simple documentation on the query planner that > might cover how things like increased RAM could impact how a query is > executed? There is no *simple* documentation on any part of the query planner that's also accurate. Query planning is inherently complicated. I think this point wasn't quite made clearly. PostgreSQL has no idea how much memory is in your system; it doesn't try to guess or detect it. However, when people move from one system to a larger one, they tend to increase some of the query planning parameters in the postgresql.conf to reflect the new capacity. That type of change can cause various types of query plan changes. Let's say your old system has 16GB of RAM and you set effective_cache_size to 12GB; if you upgrade to a 64GB server, it seems logical to increase that value to 48GB to keep the same proportions. But that will can you different plans, and it's possible they will be worse. There's a similar concern if you change work_mem because you have more memory, because that will alter how plans do things like sorting and hashing But you don't have to make any changes. You can migrate to the new hardware with zero modifications to the Postgres configuration, then introduce changes later. The whole memorys speed topic is also much more complicated than any simple explanation can cover. How many banks of RAM you can use effectively changes based on the number of CPUs and associated chipset too. Someone just sent me an explanation recently of why I was seeing some strange things on my stream-scaling benchmark program. That dove into a bunch of trivia around how the RAM is actually accessed on the motherboard. One of the reasons I keep so many samples on that program's page is to help people navigate this whole maze, and have some data points to set expectations against. See https://github.com/gregs1104/stream-scaling for the code and the samples. -- Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us
On Fri, Nov 18, 2011 at 3:09 AM, Greg Smith <greg@2ndquadrant.com> wrote: > On 11/17/2011 10:44 PM, CSS wrote: >> >> Is there any sort of simple documentation on the query planner that might >> cover how things like increased RAM could impact how a query is executed? > > There is no *simple* documentation on any part of the query planner that's > also accurate. Query planning is inherently complicated. > > I think this point wasn't quite made clearly. PostgreSQL has no idea how > much memory is in your system; it doesn't try to guess or detect it. effective_cache_size tells the db how much memory you have. Since you have to set it, it can be anything you want, but if you've set it to something much higher on the new machine then it can affect query planning.
On 18 Listopad 2011, 17:17, Scott Marlowe wrote: > On Fri, Nov 18, 2011 at 3:09 AM, Greg Smith <greg@2ndquadrant.com> wrote: >> On 11/17/2011 10:44 PM, CSS wrote: >>> >>> Is there any sort of simple documentation on the query planner that >>> might >>> cover how things like increased RAM could impact how a query is >>> executed? >> >> There is no *simple* documentation on any part of the query planner >> that's >> also accurate. Query planning is inherently complicated. >> >> I think this point wasn't quite made clearly. PostgreSQL has no idea >> how >> much memory is in your system; it doesn't try to guess or detect it. > > effective_cache_size tells the db how much memory you have. Since you > have to set it, it can be anything you want, but if you've set it to > something much higher on the new machine then it can affect query > planning. That's only half of the truth. effective_cache_size is used to estimate the page cache hit ratio, nothing else. It influences the planning a bit (AFAIK it's used only to estimate a nested loop with inner index scan) but it has no impact on things like work_mem, maintenance_work_mem, wal_buffers etc. People often bump these settings up (especially work_mem) on new hw without properly testing the impact. PostgreSQL will happily do that because it was commanded to, but then the system starts swapping or the OOM killer starts shooting the processes. Tomas
On Fri, Nov 18, 2011 at 3:39 PM, Greg Smith <greg@2ndquadrant.com> wrote:
On 11/17/2011 10:44 PM, CSS wrote:There is no *simple* documentation on any part of the query planner that's also accurate. Query planning is inherently complicated.Is there any sort of simple documentation on the query planner that might cover how things like increased RAM could impact how a query is executed?
I think this point wasn't quite made clearly. PostgreSQL has no idea how much memory is in your system; it doesn't try to guess or detect it. However, when people move from one system to a larger one, they tend to increase some of the query planning parameters in the postgresql.conf to reflect the new capacity. That type of change can cause various types of query plan changes. Let's say your old system has 16GB of RAM and you set effective_cache_size to 12GB; if you upgrade to a 64GB server, it seems logical to increase that value to 48GB to keep the same proportions. But that will can you different plans, and it's possible they will be worse. There's a similar concern if you change work_mem because you have more memory, because that will alter how plans do things like sorting and hashing
But you don't have to make any changes. You can migrate to the new hardware with zero modifications to the Postgres configuration, then introduce changes later.
The whole memorys speed topic is also much more complicated than any simple explanation can cover. How many banks of RAM you can use effectively changes based on the number of CPUs and associated chipset too. Someone just sent me an explanation recently of why I was seeing some strange things on my stream-scaling benchmark program. That dove into a bunch of trivia around how the RAM is actually accessed on the motherboard. One of the reasons I keep so many samples on that program's page is to help people navigate this whole maze, and have some data points to set expectations against. See https://github.com/gregs1104/stream-scaling for the code and the samples.
--
Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us
Greg
On a slightly unrelated note, you had once (http://archives.postgresql.org/pgsql-general/2011-08/msg00944.php) said to limit shared_buffers max to 8 GB on Linux and leave the rest for OS caching. Does the same advice hold on FreeBSD systems too?
Amitabh
Greg Smith wrote: > The whole memorys speed topic is also much more complicated than any > simple explanation can cover. How many banks of RAM you can use > effectively changes based on the number of CPUs and associated chipset > too. Someone just sent me an explanation recently of why I was seeing > some strange things on my stream-scaling benchmark program. That dove > into a bunch of trivia around how the RAM is actually accessed on the > motherboard. One of the reasons I keep so many samples on that > program's page is to help people navigate this whole maze, and have some > data points to set expectations against. See > https://github.com/gregs1104/stream-scaling for the code and the samples. I can confirm that a Xeon E5620 CPU wants memory to be in multiples of 3, and a dual-CPU 5620 system needs memory in multiples of 6. (I installed 12 2GB modules.) -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
Amitabh Kant wrote: > > The whole memorys speed topic is also much more complicated than any > > simple explanation can cover. How many banks of RAM you can use > > effectively changes based on the number of CPUs and associated chipset too. > > Someone just sent me an explanation recently of why I was seeing some > > strange things on my stream-scaling benchmark program. That dove into a > > bunch of trivia around how the RAM is actually accessed on the motherboard. > > One of the reasons I keep so many samples on that program's page is to > > help people navigate this whole maze, and have some data points to set > > expectations against. See https://github.com/gregs1104/**stream-scaling<https://github.com/gregs1104/stream-scaling>forthe code and the samples. > > > > -- > > Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD > > PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us > > > > > > > Greg > > On a slightly unrelated note, you had once ( > http://archives.postgresql.org/pgsql-general/2011-08/msg00944.php) said to > limit shared_buffers max to 8 GB on Linux and leave the rest for OS > caching. Does the same advice hold on FreeBSD systems too? Hard to say. We don't know why this is happening but we are guessing it is the overhead of managing over one million shared buffers. Please test and let us know. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
On Tue, Nov 22, 2011 at 11:41 PM, Bruce Momjian <bruce@momjian.us> wrote:
Amitabh Kant wrote:> >>Hard to say. We don't know why this is happening but we are guessing it
> On a slightly unrelated note, you had once (
> http://archives.postgresql.org/pgsql-general/2011-08/msg00944.php) said to
> limit shared_buffers max to 8 GB on Linux and leave the rest for OS
> caching. Does the same advice hold on FreeBSD systems too?
is the overhead of managing over one million shared buffers. Please
test and let us know.
I do have a FreeBSD 8.0 server running Postgresql 8.4.9 with 32 GB of RAM (Dual Processor, SAS HD/RAID 10 with BBU for data and SAS HD/RAID 1 for pg_xlog). The current settings set effective cache size to 22 GB (pgtune generated values). The server sees around between 500 to 1200 TPS and chugs along pretty nicely. Sadly it's in production so I am not in a position to run any tests on it. Changed values for postgresql.conf are:
==========================================================
maintenance_work_mem = 1GB # pg_generate_conf wizard 2010-05-09
checkpoint_completion_target = 0.9 # pg_generate_conf wizard 2010-05-09
effective_cache_size = 22GB # pg_generate_conf wizard 2010-05-09
work_mem = 320MB # pg_generate_conf wizard 2010-05-09
wal_buffers = 8MB # pg_generate_conf wizard 2010-05-09
checkpoint_segments = 16 # pg_generate_conf wizard 2010-05-09
shared_buffers = 3840MB # pg_generate_conf wizard 2010-05-09
==========================================================
I do have another server that is to go into production(48 GB RAM, dual processor, Intel 710 SSD for data in RAID 1, SAS HD/RAID 1 for pg_xlog). Apart from running pgbench and bonnie++, is there something else that I should be testing on the new server? Greg's stream scaling seems to be for linux, so not sure if it will work in FreeBSD.
Amitabh