Our LSI 9650SE-12 RAID Controller dropped the main Postgres disk offline ... it just disappeared as though the disk wasn't there. It was an 8-disk RAID10 unit. The other unit (RAID1 for Linux & pg_xlog) was still functional.
Using tw_cli, it showed the array as "DEGRADED" and claimed to be verifying it. One disk in the array was "DEGRADED". There was no /dev entry for the device; Linux couldn't see it at all.
There were two hot spares, but it didn't use them. Worse, there was nothing I could do to make it do anything. Every command reported "Failed" and no further explanation. Booting into the RAID BIOS gave the same problem: if I selected "rebuild" or "verify", it said "You must select an array..." even though I had selected the array. It was as though the array didn't exist, yet it was shown.
I shut off the computer, unplugged the BBU from the RAID card and plugged it back in, unplugged and reinserted all the SATA cables, and then restarted. Exact same symptoms.
I finally gave up trying to recover the database (we had a backup server). The RAID controller let me delete and recreate the degraded array, and now everything seems fine. I can rebuild the Postgres database on the new unit. But I've lost a HUGE amount of trust in the LSI 9650-SE RAID controller card.