How to Troubleshoot Unreliable or Malfunctioning Hardware

My post on Intel X710 NICs being awful has triggered a lot of emotion and commentary from my readers. One of the common questions has been: so I have X710 NICs, what do I do? How do I troubleshoot hardware that isn’t working right? 1. Document how to reproduce the problem and its severity. Is it a management annoyance or does it cause outages & downtime? Is there a reasonable expectation that what you’re trying to do should work the way you expect? That might seem like an odd question, but sometimes other people do the procurement for (and without) us and there are gotchas they didn’t think to ask about. In my case with the X710s I felt I …

Read More

Intel X710 NICs Are Crap

(I’m grumpy this week and I’m giving myself permission to return to my blogging roots and complain about stuff. Deal with it.) In the not so distant past we were growing a VMware cluster and ordered 17 new blade servers with X710 NICs. Bad idea. X710 NICs suck, as it turns out. Those NICs do all sorts of offloads, and the onboard processor intercepts things like CDP and LLDP packets so that the OS cannot see or participate. That’s a real problem for ESXi hosts where you want to listen for and broadcast meaningful neighbor advertisements. Under Linux you can echo a bunch of crap into the right spot in /dev and shut that off but no such luck on …

Read More