I noticed this morning that the RAID containers on my new Dell PowerEdge x9xx servers don’t have their caching options enabled. Now, I understand that write caching is potentially risky, and I understand why Dell ships the PERC 5/i controllers without write caching. However, no read caching? That doesn’t make much sense.
While I was mucking around I did some benchmarking of the controllers with different cache settings. I used Red Hat Enterprise Linux 4 Update 4, running on a Dell PowerEdge 2950 with six 146 GB 10K SAS disks. The filesystem is a 20 GB ext3 volume created in LVM, mounted with data=writeback. I used bonnie++ to generate load, which for that tool means sequential writes and reads. Each test was run five times and averaged, then the machine was rebooted to change the controller settings. While the machine was in multiuser mode (runlevel 3) I was alone on it, and it was not doing anything but my testing.
Note: if you have a good tool to test random I/O please leave me a comment with the link. The most promising tool I could find was POSTmark, from NetApp, but it looks like they removed that from their site. I could write something, too, but I didn’t have time here.
Anyhow, the results are fairly obvious. The graph is:
The conclusion, at least for sequential reads and writes, is to turn your cache on for maximum performance. No surprises there. ๐
Update: The array was configured as RAID 5, across all six disks.
You should try http://www.iometer.org/
A word of warning, I have a Dell Perc DC with twin external arrays each with 8 disks in a cluster that is supposed to switch over in event of a failure.
The Perc card did not fail, the only thing in the logs was some warnings about one drive that might fail, no orange lights, no errors on the controller.
Yet somehow it lost it’s NVRAM settings of the containers, it then updated the second server in the Cluster to no containers. This resulted in a complete systems failure as no drives could be seen.
We recreated the containers but windows could not see the volumes, we lost everything, the Windows cluster information, the SQL cluster, the data everything.
Dell said it was a firmware issue and suggested we contact a data recovery specialist who want $30k+ with no guarantees.
I do not know why these Perc controllers do not allow you to save the config to floppy or why they can’t properly log the problem.
I guess you have to keep the firmware updated but we have heard horror stories of those that did and lost their data.
Next time we are NOT trusting Dell with our data.
omg, thats an interesting thing to know as i was planning to operate an large array on this controller. i guess this is going to change my mind…
It has been my experience that data and config loss occurs most often in SCSI PERC controllers when an update to firmware or driver is performed when either one of the arrays is degraded or when a disk in the array has a sense key against it for predictive failure.
Backups are the key with RAID. RAID itself is not a backup option. It in essence buys time in the event of a physical disk failure. Additionally with SCSI signaling on a parallel bus, a bad signal from one drive can cause a failure in andother due to SCSI bus cross-talk.
SAS (PERC 5/6) in some way eliminates this problem as it is a point to point switch network architecture.
Turning writeback caching on is always an option that will increase read and write performance.
Loss of NVRAAM can be for several reasons. The metadata on the array itself can go bad due to poor array maintenance, lack of consistency checks, or turning patrol read off.
It can also be lost due to a failing CMOS.
“Yet somehow it lost itโs NVRAM settings of the containers, it then updated the second server in the Cluster to no containers. This resulted in a complete systems failure as no drives could be seen”
What kind of cluster, what was the config? Was it in an active/passive configuration? Was the storage unit set correctly in terms of bus settings?
I am just curious how one controlelr told another to dump configuration data.
bonnie++ is a good test tool as it attempts to simulate real-world disk usage.