Re: [PERFORM] Suggestions for a HBA controller (6 x SSDs + madam RAID10)

Поиск
Список
Период
Сортировка
От Pietro Pugni
Тема Re: [PERFORM] Suggestions for a HBA controller (6 x SSDs + madam RAID10)
Дата
Msg-id 1EBE383A-C04C-40F7-AC5B-03F75F33CF09@gmail.com
обсуждение исходный текст
Ответ на Re: [PERFORM] Suggestions for a HBA controller (6 x SSDs + madamRAID10)  ("Wes Vaske (wvaske)" <wvaske@micron.com>)
Ответы Re: [PERFORM] Suggestions for a HBA controller (6 x SSDs + madam RAID10)  (Merlin Moncure <mmoncure@gmail.com>)
Список pgsql-performance
I just mounted and configured my brand new LSI 3008-8i. This server had 1 SAS expander connected to 2 backplanes (8 disks to the first backplane and no disks connected to the second backplane). After some testing I found the SAS expander was a bottleneck, so I removed it and connected the first backplane directly to the controller. 

The following results are from 4k 100% random reads (32QD) run in parallel on each single SSD:

Raw SSDs [ 4k, 100% random reads, 32 Queue Depth]
ServeRaid m5110e (with SAS expander) [numjob=1]
  read : io=5111.2MB, bw=87227KB/s, iops=21806, runt= 60002msec
  read : io=4800.6MB, bw=81927KB/s, iops=20481, runt= 60002msec
  read : io=4997.6MB, bw=85288KB/s, iops=21322, runt= 60002msec
  read : io=4796.2MB, bw=81853KB/s, iops=20463, runt= 60001msec
  read : io=5062.6MB, bw=86400KB/s, iops=21599, runt= 60001msec
  read : io=4989.6MB, bw=85154KB/s, iops=21288, runt= 60001msec
Total read iops: 126,595 ( ~ 21,160 iops/disk)


Raw SSDs [ 4k, 100% random reads, 32 Queue Depth]
Lenovo N2215 (LSI 3008-8i flashed with LSI IT firmware, without SAS expander) [numjob=1]
  read : io=15032MB, bw=256544KB/s, iops=64136, runt= 60001msec
  read : io=16679MB, bw=284656KB/s, iops=71163, runt= 60001msec
  read : io=15046MB, bw=256779KB/s, iops=64194, runt= 60001msec
  read : io=16667MB, bw=284444KB/s, iops=71111, runt= 60001msec
  read : io=16692MB, bw=284867KB/s, iops=71216, runt= 60001msec
  read : io=15149MB, bw=258534KB/s, iops=64633, runt= 60002msec
Total read iops: 406,453 ( ~ 67,742 iops/disk)


321% performance improvement.
I chose 4k 32QD because it should deliver the maximum iops and should clearly show if the I/O is properly configured.
I don’t mind testing the embedded m5110e without the SAS expander because it will be slower for sure. 


You might need to increase the number of jobs here. The primary reason for this parameter is to improve scaling when you’re single thread CPU bound. With numjob=1 FIO will use only a single thread and there’s only so much a single CPU core can do.

The HBA provided slightly better performance without removing the expander and even more slightly faster after removing the expander, but then I tried increasing numjob from 1 to 16 (tried also 12, 18, 20, 24 and 32 but found 16 to get higher iops) and the benchmarks returned expected results. I guess how this relates with Postgres.. probably effective_io_concurrency, as suggested by Merlin Moncure, should be the counterpart of numjob in fio?


I was a little unclear on the disk cache part. It’s a setting, generally in the RAID controller / HBA. It’s also a filesystem level option in Linux (hdparm) and Windows (somewhere in device manager?). The reason to disable the disk cache is that it’s NOT protected against power loss protection on the MX300. So by disabling it you can ensure 100% write consistency at the cost of write performance. (using fully power protected drives will let you keep disk cache enabled)

I always enabled the write cache during my tests. I tried to disable it but performance were too poor. Those SSD are consumer ones and don’t have any capacitor :(


> Why 64k and QD=4? I thought of 8k and larger QD. Will test as soon as possible and report here the results :)
 
It’s more representative of what you’ll see at the application level. (If you’ve got a running system, you can just use IOstat to see what your average QD is. (iostat -x 10, and it’s the column: avgqu-sz. Change from 10 seconds to whatever interval works best for your environment)

I tried your suggestion (64k, 70/30 random r/w, 4QD) on RAID0 and RAID10 (mdadm) with the new controller and the results are quite good if we think that the underlying SSDs are consumer with original firmware (overprovisioned at 25%).

RAID10 is about 22% slower in both reads and writes compared to RAID0, at least on a 1 minute run. The totals and averages were calculated from the whole fio log output using the single jobs iops.

These are the results:


############################################################################
mdadm RAID0 [ 64k, 70% random reads, 30% random writes, 04 Queue Depth]
Lenovo N2215 (LSI 3008-8i flashed with LSI IT firmware, without SAS expander) [numjob=16]
############################################################################
Run status group 0 (all jobs):
   READ: io=75943MB, aggrb=1265.7MB/s, minb=80445KB/s, maxb=81576KB/s, mint=60001msec, maxt=60004msec
  WRITE: io=32585MB, aggrb=556072KB/s, minb=34220KB/s, maxb=35098KB/s, mint=60001msec, maxt=60004msec

Disk stats (read/write):
    md127: ios=1213256/520566, merge=0/0, ticks=0/0, in_queue=0, util=0.00%, aggrios=202541/86892, aggrmerge=0/0, aggrticks=490418/137398, aggrin_queue=628566, aggrutil=99.20%
  sdf: ios=202557/86818, merge=0/0, ticks=450384/131512, in_queue=582528, util=98.58%
  sdb: ios=202626/87184, merge=0/0, ticks=573448/177336, in_queue=751784, util=99.20%
  sdg: ios=202391/86810, merge=0/0, ticks=463644/137084, in_queue=601272, util=98.46%
  sde: ios=202462/86551, merge=0/0, ticks=470028/121424, in_queue=592500, util=98.79%
  sda: ios=202287/86697, merge=0/0, ticks=473312/121192, in_queue=595044, util=98.95%
  sdh: ios=202928/87293, merge=0/0, ticks=511696/135840, in_queue=648272, util=99.14%

Total read iops: 20,242 ( ~ 3,374 iops/disk)
Total write iops: 8,679 ( ~ 1,447 iops/disk)



############################################################################
mdadm RAID10 [ 64k, 70% random reads, 30% random writes, 04 Queue Depth]
Lenovo N2215 (LSI 3008-8i flashed with LSI IT firmware, without SAS expander) [numjob=16]
############################################################################
Run status group 0 (all jobs):
   READ: io=58624MB, aggrb=976.11MB/s, minb=62125KB/s, maxb=62814KB/s, mint=60001msec, maxt=60005msec
  WRITE: io=25190MB, aggrb=429874KB/s, minb=26446KB/s, maxb=27075KB/s, mint=60001msec, maxt=60005msec

Disk stats (read/write):
    md127: ios=936349/402381, merge=0/0, ticks=0/0, in_queue=0, util=0.00%, aggrios=156357/134348, aggrmerge=0/0, aggrticks=433286/262226, aggrin_queue=696052, aggrutil=99.41%
  sdf: ios=150239/134315, merge=0/0, ticks=298268/168472, in_queue=466852, util=95.31%
  sdb: ios=153088/133664, merge=0/0, ticks=329160/188060, in_queue=517432, util=96.81%
  sdg: ios=157361/135065, merge=0/0, ticks=658208/459168, in_queue=1118588, util=99.16%
  sde: ios=161361/134315, merge=0/0, ticks=476388/278628, in_queue=756056, util=97.61%
  sda: ios=160431/133664, merge=0/0, ticks=548620/329708, in_queue=878708, util=99.41%
  sdh: ios=155667/135065, merge=0/0, ticks=289072/149324, in_queue=438680, util=96.71%

Total read iops: 15,625 ( ~ 2,604 iops/disk)
Total write iops: 6,709 ( ~ 1,118 iops/disk)



> Do you have some HBA card to suggest? What do you think of LSI SAS3008? I think it’s the same as the 3108 without RAID On Chip feature. Probably I will buy a Lenovo HBA card with that chip. It seems blazing fast (1mln IOPS) compared to the actual embedded RAID controller (LSI 2008).
 
I’ve been able to consistently get the same performance out of any of the LSI based cards. The 3008 and 3108 both work great, regardless of vendor. Just test or read up on the different configuration parameters (read ahead, write back vs write through, disk cache)

Do you have any suggestion for fine tuning this controller? I’m referring to parameters like nr_requests, queue_depth, etc.
Also, any way to optimize the various mdadm parameters available at /sys/block/mdX/ ? I disabled the internal bitmap and write performance improved.



Thank you
 Pietro Pugni



В списке pgsql-performance по дате отправления:

Предыдущее
От: "Sven R. Kunze"
Дата:
Сообщение: Re: [PERFORM] Speeding up JSON + TSQUERY + GIN
Следующее
От: Dinesh Chandra 12108
Дата:
Сообщение: [PERFORM] Performance issue in PostgreSQL server...