Обсуждение: Poor performance on HP Package Cluster
Hi! I've set up a Package Cluster ( Fail-Over Cluster ) on our two HP DL380 G4 with MSA Storrage G2.( Xeon 3,4Ghz, 6GB Ram, 2x 36GB@15rpm- Raid1) The system is running under Suse Linux Enterprise Server. My problem is, that the performance is very low. On our old Server ( Celeron 2Ghz with 2 GB of Ram ) an import of our Data takes about 10 minutes. ( 1,1GB data ) One of the DL380 it takes more than 90 minutes... Selects response time have also been increased. Celeron 3 sec, Xeon 30-40sec. I'm trying to fix the problem for two day's now, googled a lot, but i don't know what to do. Top says, my CPU spends ~50% time with wait io. top - 14:07:34 up 22 min, 3 users, load average: 1.09, 1.04, 0.78 Tasks: 74 total, 3 running, 71 sleeping, 0 stopped, 0 zombie Cpu(s): 50.0% us, 5.0% sy, 0.0% ni, 0.0% id, 45.0% wa, 0.0% hi, 0.0% si Mem: 6050356k total, 982004k used, 5068352k free, 60300k buffers Swap: 2097136k total, 0k used, 2097136k free, 786200k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 9939 postgres 18 0 254m 143m 140m R 49.3 2.4 8:35.43 postgres: postgres plate [local] INSERT 9938 postgres 16 0 13720 1440 1120 S 4.9 0.0 0:59.08 psql -d plate -f dump.sql 10738 root 15 0 3988 1120 840 R 4.9 0.0 0:00.05 top -d 0.2 1 root 16 0 640 264 216 S 0.0 0.0 0:05.03 init [3] 2 root 34 19 0 0 0 S 0.0 0.0 0:00.00 [ksoftirqd/0] vmstat 1: ClusterNode2 root $ vmstat 1 procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 1 0 0 5032012 60888 821008 0 0 216 6938 1952 5049 40 8 15 37 0 1 0 5031392 60892 821632 0 0 0 8152 2126 5725 45 6 0 49 0 1 0 5030896 60900 822144 0 0 0 8124 2052 5731 46 6 0 47 0 1 0 5030400 60908 822768 0 0 0 8144 2124 5717 44 7 0 50 1 0 0 5029904 60924 823272 0 0 0 8304 2062 5763 43 7 0 49 I've read (2004), that Xeon may have problems with content switching - is the problem still existing? Can I do something to minimize the problem? postgresql.conf: shared_buffers = 28672 effective_cache_size = 400000 random_page_cost = 2 shmall & shmmax are set to 268435456 hdparm: ClusterNode2 root $ hdparm -tT /dev/cciss/c0d0p1 /dev/cciss/c0d0p1: Timing buffer-cache reads: 3772 MB in 2.00 seconds = 1885.34 MB/sec Timing buffered disk reads: 150 MB in 2.06 seconds = 72.72 MB/sec greetings Ernst
Your HD raw IO rate seems fine, so the problem is not likely to be with the HDs. That consistent ~10x increase in how long it takes to do an import or a select is noteworthy. This "smells" like an interconnect problem. Was the Celeron locally connected to the HDs while the new Xeons are network connected? Getting 10's or even 100's of MBps throughput out of local storage is much easier than it is to do over a network. 1GbE is required if you want HDs to push 72.72MBps over a network, and not even one 10GbE line will allow you to match local buffered IO of 1885.34MBps. What size are those network connects (Server A <-> storage, Server B <-> storage, Server A <-> Server B)? Ron Peacetree At 10:16 AM 9/1/2005, Ernst Einstein wrote: >I've set up a Package Cluster ( Fail-Over Cluster ) on our two HP >DL380 G4 with MSA Storage G2.( Xeon 3,4Ghz, 6GB Ram, 2x 36GB@15rpm- >Raid1). The system is running under Suse Linux Enterprise Server. > >My problem is, that the performance is very low. On our old Server ( >Celeron 2Ghz with 2 GB of Ram ) an import of our Data takes about 10 >minutes. ( 1,1GB data ). One of the DL380 it takes more than 90 minutes... >Selects response time have also been increased. Celeron 3 sec, Xeon 30-40sec. > >I'm trying to fix the problem for two day's now, googled a lot, but >i don't know what to do. > >Top says, my CPU spends ~50% time with wait io. > >top - 14:07:34 up 22 min, 3 users, load average: 1.09, 1.04, 0.78 >Tasks: 74 total, 3 running, 71 sleeping, 0 stopped, 0 zombie >Cpu(s): 50.0% us, 5.0% sy, 0.0% ni, 0.0% id, 45.0% wa, 0.0% hi, 0.0% si >Mem: 6050356k total, 982004k used, 5068352k free, 60300k buffers >Swap: 2097136k total, 0k used, 2097136k free, 786200k cached > > PID USER PR NI VIRT RES SHR S %CPU > %MEM TIME+COMMAND > 9939 postgres 18 0 254m 143m 140m > R 49.3 2.4 8:35.43 postgres:postgres plate [local] > INSERT > 9938 postgres 16 0 13720 1440 1120 > S 4.9 0.0 0:59.08 psql -d plate -f > dump.sql >10738 root 15 0 3988 1120 840 >R 4.9 0.0 0:00.05 top -d >0.2 > 1 root 16 0 640 264 216 > S 0.0 0.0 0:05.03 > init[3] > 2 root 34 19 0 0 0 > S 0.0 0.0 0:00.00 [ksoftirqd/0] > >vmstat 1: > >ClusterNode2 root $ vmstat 1 >procs -----------memory---------- ---swap-- -----io---- --system------cpu---- > r b swpd free buff cache si so bi bo > in cs us sy id wa > 1 0 0 5032012 60888 821008 0 0 216 6938 1952 5049 > 40 8 15 37 > 0 1 0 5031392 60892 821632 0 0 0 8152 > 2126 5725 45 6 0 49 > 0 1 0 5030896 60900 822144 0 0 0 8124 > 2052 5731 46 6 0 47 > 0 1 0 5030400 60908 822768 0 0 0 8144 > 2124 5717 44 7 0 50 > 1 0 0 5029904 60924 823272 0 0 0 8304 > 2062 5763 43 7 0 49 > >I've read (2004), that Xeon may have problems with content switching >- is the problem still existing? Can I do something to minimize the >problem? > > >postgresql.conf: > >shared_buffers = 28672 >effective_cache_size = 400000 >random_page_cost = 2 > > >shmall & shmmax are set to 268435456 > >hdparm: > >ClusterNode2 root $ hdparm -tT /dev/cciss/c0d0p1 > >/dev/cciss/c0d0p1: >Timing buffer-cache reads: 3772 MB in 2.00 seconds = 1885.34 MB/sec >Timing buffered disk reads: 150 MB in 2.06 seconds = 72.72 MB/sec
Are you using the built-in HP SmartArray RAID/SCSI controllers? If so, that could be your problem, they are known to have terrible and variable performance with Linux. The only good fix is to add a simple SCSI controller to your system (HP sells them) and stay away from hardware RAID. - Luke On 9/1/05 7:16 AM, "Ernst Einstein" <Crusader@gmx.ch> wrote: > Hi! > > I've set up a Package Cluster ( Fail-Over Cluster ) on our two HP DL380 > G4 with MSA Storrage G2.( Xeon 3,4Ghz, 6GB Ram, 2x 36GB@15rpm- Raid1) > The system is running under Suse Linux Enterprise Server. > > My problem is, that the performance is very low. On our old Server > ( Celeron 2Ghz with 2 GB of Ram ) an import of our Data takes about 10 > minutes. ( 1,1GB data ) > One of the DL380 it takes more than 90 minutes... > Selects response time have also been increased. Celeron 3 sec, Xeon > 30-40sec. > > I'm trying to fix the problem for two day's now, googled a lot, but i > don't know what to do. > > Top says, my CPU spends ~50% time with wait io. > > top - 14:07:34 up 22 min, 3 users, load average: 1.09, 1.04, 0.78 > Tasks: 74 total, 3 running, 71 sleeping, 0 stopped, 0 zombie > Cpu(s): 50.0% us, 5.0% sy, 0.0% ni, 0.0% id, 45.0% wa, 0.0% hi, > 0.0% si > Mem: 6050356k total, 982004k used, 5068352k free, 60300k buffers > Swap: 2097136k total, 0k used, 2097136k free, 786200k cached > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ > COMMAND > 9939 postgres 18 0 254m 143m 140m R 49.3 2.4 8:35.43 postgres: > postgres plate [local] INSERT > 9938 postgres 16 0 13720 1440 1120 S 4.9 0.0 0:59.08 psql -d > plate -f dump.sql > 10738 root 15 0 3988 1120 840 R 4.9 0.0 0:00.05 top -d > 0.2 > 1 root 16 0 640 264 216 S 0.0 0.0 0:05.03 init > [3] > 2 root 34 19 0 0 0 S 0.0 0.0 0:00.00 > [ksoftirqd/0] > > vmstat 1: > > ClusterNode2 root $ vmstat 1 > procs -----------memory---------- ---swap-- -----io---- --system-- > ----cpu---- > r b swpd free buff cache si so bi bo in cs us sy > id wa > 1 0 0 5032012 60888 821008 0 0 216 6938 1952 5049 40 > 8 15 37 > 0 1 0 5031392 60892 821632 0 0 0 8152 2126 5725 45 > 6 0 49 > 0 1 0 5030896 60900 822144 0 0 0 8124 2052 5731 46 > 6 0 47 > 0 1 0 5030400 60908 822768 0 0 0 8144 2124 5717 44 > 7 0 50 > 1 0 0 5029904 60924 823272 0 0 0 8304 2062 5763 43 > 7 0 49 > > I've read (2004), that Xeon may have problems with content switching - > is the problem still existing? Can I do something to minimize the > problem? > > > postgresql.conf: > > shared_buffers = 28672 > effective_cache_size = 400000 > random_page_cost = 2 > > > shmall & shmmax are set to 268435456 > > hdparm: > > ClusterNode2 root $ hdparm -tT /dev/cciss/c0d0p1 > > /dev/cciss/c0d0p1: > Timing buffer-cache reads: 3772 MB in 2.00 seconds = 1885.34 MB/sec > Timing buffered disk reads: 150 MB in 2.06 seconds = 72.72 MB/sec > > greetings Ernst > > > > > > ---------------------------(end of broadcast)--------------------------- > TIP 2: Don't 'kill -9' the postmaster >
Do you have any sources for that information? I am running dual SmartArray 6402's in my DL585 and haven't noticed anything poor about their performance. On Sep 1, 2005, at 2:24 PM, Luke Lonergan wrote: > Are you using the built-in HP SmartArray RAID/SCSI controllers? If > so, that > could be your problem, they are known to have terrible and variable > performance with Linux.
Dan, On 9/1/05 4:02 PM, "Dan Harris" <fbsd@drivefaster.net> wrote: > Do you have any sources for that information? I am running dual > SmartArray 6402's in my DL585 and haven't noticed anything poor about > their performance. I've previously posted comprehensive results using the 5i and 6xxx series smart arrays using software RAID, HW RAID on 3 different kernels, alongside LSI and Adaptec SCSI controllers, and an Adaptec 24xx HW RAID adapter. Results with bonnie++ and simple sequential read/write with dd. I'll post them again here for reference in the next message. Yes, the performance of the SmartArray controllers under Linux was abysmal, even when run by the labs at HP. - Luke
Hi !
Sorry, for my late answer. I was unavailable for a few days...
Yes, I'm using the build-in HP Smart Array Controller. Both, the internal disks, and the external storrage are using the controller.
Can you send me your test results? I'm interested in it.
I've done some testing now. I've imported the data again and tuned the DB like I was told in some performance howtos. Now, the database has a good performance - until it has to read from the disks.
Greetings Ernst
Am Donnerstag, den 01.09.2005, 21:54 -0700 schrieb Luke Lonergan:
Sorry, for my late answer. I was unavailable for a few days...
Yes, I'm using the build-in HP Smart Array Controller. Both, the internal disks, and the external storrage are using the controller.
Can you send me your test results? I'm interested in it.
I've done some testing now. I've imported the data again and tuned the DB like I was told in some performance howtos. Now, the database has a good performance - until it has to read from the disks.
Greetings Ernst
Am Donnerstag, den 01.09.2005, 21:54 -0700 schrieb Luke Lonergan:
Dan, On 9/1/05 4:02 PM, "Dan Harris" <fbsd@drivefaster.net> wrote: > Do you have any sources for that information? I am running dual > SmartArray 6402's in my DL585 and haven't noticed anything poor about > their performance. I've previously posted comprehensive results using the 5i and 6xxx series smart arrays using software RAID, HW RAID on 3 different kernels, alongside LSI and Adaptec SCSI controllers, and an Adaptec 24xx HW RAID adapter. Results with bonnie++ and simple sequential read/write with dd. I'll post them again here for reference in the next message. Yes, the performance of the SmartArray controllers under Linux was abysmal, even when run by the labs at HP. - Luke ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly
-- Ernst Einstein <Crusader@gmx.ch> |
Sure – I posted the excel spreadsheet to the list right after my message, but I think it blocks attachments. I’ll send it to you now privately.
I recommend switching to software RAID 10 or 5 using simple SCSI U320 adapter(s) from LSI or Adaptec, which you can buy from HP if you must.
Cheers,
- Luke
On 9/4/05 8:47 AM, "Ernst Einstein" <Crusader@gmx.ch> wrote:
I recommend switching to software RAID 10 or 5 using simple SCSI U320 adapter(s) from LSI or Adaptec, which you can buy from HP if you must.
Cheers,
- Luke
On 9/4/05 8:47 AM, "Ernst Einstein" <Crusader@gmx.ch> wrote:
Hi !
Sorry, for my late answer. I was unavailable for a few days...
Yes, I'm using the build-in HP Smart Array Controller. Both, the internal disks, and the external storrage are using the controller.
Can you send me your test results? I'm interested in it.
I've done some testing now. I've imported the data again and tuned the DB like I was told in some performance howtos. Now, the database has a good performance - until it has to read from the disks.
Greetings Ernst
Am Donnerstag, den 01.09.2005, 21:54 -0700 schrieb Luke Lonergan:
Dan,
On 9/1/05 4:02 PM, "Dan Harris" <fbsd@drivefaster.net> wrote:
> Do you have any sources for that information? I am running dual
> SmartArray 6402's in my DL585 and haven't noticed anything poor about
> their performance.
I've previously posted comprehensive results using the 5i and 6xxx series
smart arrays using software RAID, HW RAID on 3 different kernels, alongside
LSI and Adaptec SCSI controllers, and an Adaptec 24xx HW RAID adapter.
Results with bonnie++ and simple sequential read/write with dd.
I'll post them again here for reference in the next message. Yes, the
performance of the SmartArray controllers under Linux was abysmal, even when
run by the labs at HP.
- Luke
---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly