How to Troubleshoot Unreliable or Malfunctioning Hardware

My post on Intel X710 NICs being awful has triggered a lot of emotion and commentary from my readers. One of the common questions has been: so I have X710 NICs, what do I do? How do I troubleshoot hardware that isn’t working right? 1. Document how to reproduce the problem and its severity. Is it a management annoyance or does it cause outages & downtime? Is there a reasonable expectation that what you’re trying to do should work the way you expect? That might seem like an odd question, but sometimes other people do the procurement for (and without) us and there are gotchas they didn’t think to ask about. In my case with the X710s I felt I …

Read More

Intel X710 NICs Are Crap

(I’m grumpy this week and I’m giving myself permission to return to my blogging roots and complain about stuff. Deal with it.) In the not so distant past we were growing a VMware cluster and ordered 17 new blade servers with X710 NICs. Bad idea. X710 NICs suck, as it turns out. Those NICs do all sorts of offloads, and the onboard processor intercepts things like CDP and LLDP packets so that the OS cannot see or participate. That’s a real problem for ESXi hosts where you want to listen for and broadcast meaningful neighbor advertisements. Under Linux you can echo a bunch of crap into the right spot in /dev and shut that off but no such luck on …

Read More

Fix the Security Audits in vRealize Operations Manager

(I’m grumpy this week and I’m giving myself permission to return to my blogging roots and complain about stuff. Deal with it.) Several bloggers have written about the Runecast Analyzer lately. I was crazy bored in a meeting the other day so instead of stabbing myself with my pen to see if I still feel things I decided to go check out their website. My interest piqued when I saw the screen shot where they show security hardening guideline compliance, as well as compliance with the DISA STIG for ESX 6. I do a lot of that crap nowadays. You know what my first thought was about the Runecast product, though? It was “This is what vRealize Operations Manager (vROPS) could …

Read More

How to Disable Windows IPv6 Temporary Addresses

The default Microsoft Windows IPv6 implementation has privacy extensions enabled, where IPv6 temporary addresses are used for client activities. The idea is that IPv6 has so many addresses available to it that we can create extra ones to help mask our activities. In practice these temporary addresses are largely pointless, and are very unhelpful if firewalls and ACLs are configured to allow access from a specific static address. By themselves, IP addresses aren’t a good way to authenticate people but they often form another layer of defense. This is especially important for IT infrastructure where there often aren’t (or can’t be) sophisticated authentication mechanisms. Paste these commands into an administrator-level PowerShell or Command Prompt and then restart your PC: netsh interface …

Read More

Should We Panic About the KPTI/KAISER Intel CPU Design Flaw?

As a followup to yesterday’s post, I’ve been asked: should we panic about the KPTI/KAISER/F*CKWIT Intel CPU design flaw? My answer was: it depends on a lot of unknowns. There are NDAs around a lot of the fixes so it’s hard to know the scope and effect. We also don’t know how much this will affect particular workloads. The folks over at Sophos have a nice writeup today about the actual problem (link below) but in short, the fix will reduce the effectiveness of the CPU’s speculative execution and on-die caches, forcing it to go out to main memory more. Main memory (what we call RAM) is 20x slower than the CPU’s L2 cache (look below for a good link showing …

Read More

Intel CPU Design Flaw, Performance Degradation, Security Updates

I was just taking a break and reading some tech news and I saw a wonderfully detailed post from El Reg (link below) about an Intel CPU design flaw and impending crisis-level security updates to fix it. As if that wasn’t bad enough, the fix for the problem is estimated to decrease performance by 5% to 30%, with older systems being the hardest hit. Welcome to 2018, folks. In short, an Intel CPU tries to keep itself busy by speculating about what it’s going to need to work on next. On Intel CPUs (but not AMD) this speculative execution doesn’t properly respect the security boundaries between the OS kernel and userspace applications, so you can trick an Intel processor into letting …

Read More

Apple Deserves What It Gets From This Battery Fiasco

Yesterday Apple issued an apology for the intentional slowing of iPhones because of aging in the iPhone battery. As part of that they announced a number of changes, like a $29 battery replacement and actually giving people information and choices about how their device functions. This says a few things to me. First, it says that have gouged consumers for the cost of a battery all these years. Second, it tells me they are scared enough of these class-action lawsuits to admit fault publicly. There are a million reasons why an iPhone might perform poorly, especially after an upgrade. This has little to do with the battery, and likely more to do with background maintenance tasks that happen after an …

Read More

Let’s Just Keep An Eye On The Time

“You’re asking me how a watch works. For now, let’s just keep an eye on the time.” – Alejandro, Sicario I’ve enjoyed the eclectic roles Benicio del Toro has been playing these last few years. His appearance in recent space movies reminded me of this quote of his from the movie Sicario. Often enough in our own technological roles we are asked to explain ourselves, explain why something is the way it is or why we want it to be a particular way. How do you convey to someone in just a minute the years of school, decades of experience, days in noisy data centers, nights bringing systems back online, hours staring at configurations that are wrong and scripts that don’t work, dumb …

Read More

Fixing Veeam “Can’t Delete Replica When It Is Being Processed” Errors

I’ve used Veeam Backup & Replication for a long time now, and when we restructure storage, redeploy VMs, or change our replication jobs we sometimes get situations where we get the error: Error: Can’t delete replica when it is being processed Here’s how I fix it. As always, your mileage may vary, and free advice is often worth what you paid, especially from a stranger on the Internet. Veeam support is probably a safe but much higher latency source of non-free advice. Stop the affected jobs and disable them. Ensure that the replicas are gone, from both the VMware environment (vCenter) and in Backup & Replication (Replicas -> Ready, then right-click and Delete From Disk). Don’t delete it from the …

Read More

7 Ways IT Staff Can Prepare for the Holidays

For us IT types it is important to maintain a good balance between work and our lives. Just as they say that good fences make good neighbors, I’ve found that a good delineation between work and home improves both. The holiday season is taxing, though. People rush around trying to wrap up loose ends, they’re using vacation they’re going to lose, and they’re generally scattered and distracted, which isn’t a good thing. If you’re lucky enough to work somewhere with a true 24×7 operations center then coverage over the holidays is already thought out. However, most IT staff in the world aren’t in places like that. Here are some thoughts I have about how to defend your time off over the …

Read More