How to Troubleshoot Unreliable or Malfunctioning Hardware

My post on Intel X710 NICs being awful has triggered a lot of emotion and commentary from my readers. One of the common questions has been: so I have X710 NICs, what do I do? How do I troubleshoot hardware that isn’t working right? 1. Document how to reproduce the problem and its severity. Is it a management annoyance or does it cause outages & downtime? Is there a reasonable expectation that what you’re trying to do should work the way you expect? That might seem like an odd question, but sometimes other people do the procurement for (and without) us and there are gotchas they didn’t think to ask about. In my case with the X710s I felt I …

Read More

Interesting Dell iDRAC Tricks

Deploying a bunch of machines all at once? Know your way around for loops in shell scripts, or Excel enough to do some basic text functions & autofill? You, too, can set up a few hundred servers in one shot. Here’s some interesting things I’ve done in the recent past using the Dell iDRAC out-of-band hardware management controllers. You need to install the racadm utility on your Windows or Linux host. I’ll leave this up to you, but you probably want to look in the Dell Downloads for your server, under “Systems Management.” I recently found it as “Dell OpenManage DRAC Tools, includes Racadm” in 32- and 64-bit flavors. Basic Command The basic racadm command I’ll represent with $racadm from …

Read More

How To Increase Your "% Virtualized" Rates

The #VirtualizeDell tweet chat today got me thinking about what stops most virtualization implementations around 50-75%. These are just some thoughts on ways to kick things loose. @LethaW commented “[that some of them are] sneaky and underhanded, and I love it.” I took that as encouragement. Needless to say, your mileage may vary. Problem: Physical hardware is required or requested by vendors. Solutions: Actually check to make sure that a vendor does require physical hardware. For example, Oracle doesn’t require it for many things, but there’s this misconception out there that they do, and I hear it from DBAs a lot. Consultants will also tell you a wide variety of things, too. Check the facts. Get it in writing. Don’t …

Read More

vCenter Hardware Status Stops Polling After 1 Hour

(Update, 1/19/2012, 1130 CST: The product manager for this feature, commenting below, has indicated this is actually a bug, and I’ve emailed her the details of my case so she can help track down where the information I was told came from, and fix my problem, too) —————— For what seems like an eternity I’ve had a support case open with VMware because the hardware status functionality in vCenter (4.1 and 5) stops updating. I was told today by my support guy that, for a variety of reasons that cannot be known by me, VMware has decided that the hardware status polling should stop after 1 hour. So my bug isn’t a bug, it’s a feature, case closed. I am …

Read More