Обсуждение: [OT] RAID controllers blocking one another?
We have a machine that serves as a fileserver and a database server. Our server hosts a raid array of 40 disk drives, attached to two3-ware cards, one 9640SE-24 and one 9640SE-16. We have noticed that activity on one controller blocks access on the second controller, not only for disk-IO but also the command line tools which become unresponsive for the inactive controller. The controllers are sitting in adjacent PCI-express slots on a machine with dual-dual AMD and 16GB of RAM. Has anyone else noticed issues like this? Throughput for either controller is a pretty respectable 150-200MB/s writing and somewhat faster for reading, but the "blocking" is problematic, as the machine is serving multiple purposes.
I know this is off-topic, but I know lots of folks here deal with very large disk arrays; it is hard to get real-world input on machines such as these.
Thanks,
Sean
I know this is off-topic, but I know lots of folks here deal with very large disk arrays; it is hard to get real-world input on machines such as these.
Thanks,
Sean
On Jan 17, 2008 2:17 PM, Sean Davis <sdavis2@mail.nih.gov> wrote: > We have a machine that serves as a fileserver and a database server. Our > server hosts a raid array of 40 disk drives, attached to two3-ware cards, > one 9640SE-24 and one 9640SE-16. We have noticed that activity on one > controller blocks access on the second controller, not only for disk-IO but > also the command line tools which become unresponsive for the inactive > controller. The controllers are sitting in adjacent PCI-express slots on a > machine with dual-dual AMD and 16GB of RAM. Has anyone else noticed issues > like this? Throughput for either controller is a pretty respectable > 150-200MB/s writing and somewhat faster for reading, but the "blocking" is > problematic, as the machine is serving multiple purposes. > > I know this is off-topic, but I know lots of folks here deal with very large > disk arrays; it is hard to get real-world input on machines such as these. Sounds like they're sharing something they shouldn't be. I'm not real familiar with PCI-express. Aren't those the ones that use up to 16 channels for I/O? Can you divide it to 8 and 8 for each PCI-express slot in the BIOS maybe, or something like that? Just a SWAG.
On Thu, 17 Jan 2008, Scott Marlowe wrote: > On Jan 17, 2008 2:17 PM, Sean Davis <sdavis2@mail.nih.gov> wrote: >> two3-ware cards, one 9640SE-24 and one 9640SE-16 > Sounds like they're sharing something they shouldn't be. I'm not real > familiar with PCI-express. Aren't those the ones that use up to 16 > channels for I/O? Can you divide it to 8 and 8 for each PCI-express > slot in the BIOS maybe, or something like that? I can't find the 9640SE-24/16 anywhere, but presuming these are similar to (or are actually) the 9650SE cards then each of them is using 8 lanes of the 16 available. I'd need to know the exact motherboard or system to even have a clue what the options are for adjusting the BIOS and whether they are shared or independant. But I haven't seen one where there's any real ability to adjust how the I/O is partitioned beyond adjusting what slot you plug things into so that's probably a dead end anyway. Given the original symptoms, one thing I would be suspicious of though is whether there's some sort of IRQ conflict going on. Sadly we still haven't left that kind of junk behind even on current PC motherboards. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
On Thu, Jan 17, 2008 at 03:07:02PM -0600, Scott Marlowe wrote: > Sounds like they're sharing something they shouldn't be. I'm not real > familiar with PCI-express. Aren't those the ones that use up to 16 > channels for I/O? Can you divide it to 8 and 8 for each PCI-express > slot in the BIOS maybe, or something like that? PCI-E is a point-to-point-system. /* Steinar */ -- Homepage: http://www.sesse.net/
On Jan 17, 2008 6:23 PM, Greg Smith <gsmith@gregsmith.com> wrote:
Thanks, Greg. After a little digging, 3-ware suggested moving one of the cards, also. We will probably give that a try. I'll also look into the bios, but since the machine is running as a fileserver, there is precious little time for downtime tinkering. FYI, here are the specs on the server.
http://www.thinkmate.com/System/8U_Dual_Xeon_i2SS40-8U_Storage_Server
Sean
On Thu, 17 Jan 2008, Scott Marlowe wrote:
> On Jan 17, 2008 2:17 PM, Sean Davis <sdavis2@mail.nih.gov> wrote:>> two3-ware cards, one 9640SE-24 and one 9640SE-16> Sounds like they're sharing something they shouldn't be. I'm not realI can't find the 9640SE-24/16 anywhere, but presuming these are similar to
> familiar with PCI-express. Aren't those the ones that use up to 16
> channels for I/O? Can you divide it to 8 and 8 for each PCI-express
> slot in the BIOS maybe, or something like that?
(or are actually) the 9650SE cards then each of them is using 8 lanes of
the 16 available. I'd need to know the exact motherboard or system to
even have a clue what the options are for adjusting the BIOS and whether
they are shared or independant.
But I haven't seen one where there's any real ability to adjust how the
I/O is partitioned beyond adjusting what slot you plug things into so
that's probably a dead end anyway. Given the original symptoms, one thing
I would be suspicious of though is whether there's some sort of IRQ
conflict going on. Sadly we still haven't left that kind of junk behind
even on current PC motherboards.
Thanks, Greg. After a little digging, 3-ware suggested moving one of the cards, also. We will probably give that a try. I'll also look into the bios, but since the machine is running as a fileserver, there is precious little time for downtime tinkering. FYI, here are the specs on the server.
http://www.thinkmate.com/System/8U_Dual_Xeon_i2SS40-8U_Storage_Server
Sean
On Fri, 18 Jan 2008, Sean Davis wrote: > FYI, here are the specs on the server. > http://www.thinkmate.com/System/8U_Dual_Xeon_i2SS40-8U_Storage_Server Now we're getting somewhere. I'll dump this on-list as it's a good example of how to fight this class of performance problems. The usual troubleshooting procedure is to figure out how the motherboard is mapping all the I/O internally and then try to move things out of the same path. That tells us that you have an Intel S5000PSL motherboard, and the tech specs are at http://support.intel.com/support/motherboards/server/s5000psl/sb/CS-022619.htm What you want to stare at is the block diagram that's Figure 10, page 27 (usually this isn't in the motherboard documentation, and instead you have to drill down into the chipset documentation to find it). Slots 5 and 6 that have PCI Express x16 connectors (but run at x8 speed) both go straight into the memory hub. Slots 3 and 4, which are x8 but run at x4 speed, go through the I/O controller first. Those should be slower, so if you put a card into there it will have a degraded top-end performance compared to slots 5/6. Line that up with the layout in Figure 2, page 17, and you should be able to get an idea what the possibilities are for moving the cards around and what the trade-offs involved are. Ideally you'd want both 3Ware cards to be in slots 5+6, but if that's your current configuration you could try moving the less important of the two (maybe the one with less drives) to either slot 3/4 and see if the contention you're seeing drops. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
On Jan 18, 2008 2:14 PM, Greg Smith <gsmith@gregsmith.com> wrote:
Greg, this is GREAT information and I'm glad you stepped through the process. It is really interesting to see to what extent these slots that have similar names (or the same names) should be expected to behave so differently. We'll try some of the things you suggest (although it will probably take a while to do) and if we come to any conclusions will let everyone know our conclusions.
Sean
On Fri, 18 Jan 2008, Sean Davis wrote:Now we're getting somewhere. I'll dump this on-list as it's a good
> FYI, here are the specs on the server.
> http://www.thinkmate.com/System/8U_Dual_Xeon_i2SS40-8U_Storage_Server
example of how to fight this class of performance problems.
The usual troubleshooting procedure is to figure out how the motherboard
is mapping all the I/O internally and then try to move things out of the
same path. That tells us that you have an Intel S5000PSL motherboard, and
the tech specs are at
http://support.intel.com/support/motherboards/server/s5000psl/sb/CS-022619.htm
What you want to stare at is the block diagram that's Figure 10, page 27
(usually this isn't in the motherboard documentation, and instead you have
to drill down into the chipset documentation to find it). Slots 5 and 6
that have PCI Express x16 connectors (but run at x8 speed) both go
straight into the memory hub. Slots 3 and 4, which are x8 but run at x4
speed, go through the I/O controller first. Those should be slower, so if
you put a card into there it will have a degraded top-end performance
compared to slots 5/6.
Line that up with the layout in Figure 2, page 17, and you should be able
to get an idea what the possibilities are for moving the cards around and
what the trade-offs involved are. Ideally you'd want both 3Ware cards to
be in slots 5+6, but if that's your current configuration you could try
moving the less important of the two (maybe the one with less drives) to
either slot 3/4 and see if the contention you're seeing drops.
Greg, this is GREAT information and I'm glad you stepped through the process. It is really interesting to see to what extent these slots that have similar names (or the same names) should be expected to behave so differently. We'll try some of the things you suggest (although it will probably take a while to do) and if we come to any conclusions will let everyone know our conclusions.
Sean
On Thu, 17 Jan 2008, Sean Davis wrote: > We have a machine that serves as a fileserver and a database server. Our > server hosts a raid array of 40 disk drives, attached to two3-ware cards, > one 9640SE-24 and one 9640SE-16. We have noticed that activity on one > controller blocks access on the second controller, not only for disk-IO but > also the command line tools which become unresponsive for the inactive > controller. The controllers are sitting in adjacent PCI-express slots on a > machine with dual-dual AMD and 16GB of RAM. Has anyone else noticed issues > like this? Throughput for either controller is a pretty respectable > 150-200MB/s writing and somewhat faster for reading, but the "blocking" is > problematic, as the machine is serving multiple purposes. > > I know this is off-topic, but I know lots of folks here deal with very large > disk arrays; it is hard to get real-world input on machines such as these. there have been a lot of discussions on the linux-kernel mailing list over the last several months on the topic of IO to one set of drives interfearing with IO to another set of drives. The soon-to-be-released 2.6.24 kernel includes a substantial amount of work in this area that (at least on initial reports) is showing significant improvements. I haven't had the time to test this out yet, so I can't add personal experiance, but it's definantly something to look at on a test system. David Lang