I’m tired, as I was up sort of late last night. One of our storage arrays, the kind that is advertised to never fail, failed briefly. It took out our enterprise mail system and a variety of other databases and applications, all of which had been moved to this array because it was supposed to be super reliable. In my queue of things to write about is putting all your eggs in one basket while trusting vendors. “Where Data Lives” is more like “Where Data Dies And Support Shrugs At You Like They Think You Are An Idiot For Believing The Spec Sheet Or The Sales People.”
In the meantime, xkcd has a cartoon today that hits pretty close to home for me.
I think I’ve seen you mention clariions before, lose a storage processor? I’m never confident a full set of trespasses like that will go cleanly because of the confusion around array-side failover settings and powerpath/dm-multipath side settings.
No, this was our DMX-3. EMC claims nothing went wrong, but 15 of our biggest hosts disagree. We think it was a problem with a specific RAID group because other hosts that weren’t on that RAID group didn’t have a problem. Meanwhile, the array didn’t log a damn thing. All was well while our enterprise systems were dying. Like I said, the kicker is that we had moved all those systems to the DMX-3 precisely because it was never supposed to do crap like this.
Personally I’m not a fan of EMC at all. Their stuff just isn’t reliable, and when it doesn’t work as advertised you basically get a blank stare back from support. If you’re lucky you get a shrug.
In an interesting twist, once we got some of the high I/O systems off of our CLARiiONs they became pretty stable. We used to blow a storage processor a month on the eight CX700s, now it’s been a few months.