I’m not sure how many times I’ve been asked by coworkers, friends, and random people if I know how to fix a problem. The conversation always goes something like:
“Hi Bob. I am getting error XYZ when I try to use scp with public keys to copy a VMDK file from one ESX host to another. Can you tell me what I’m doing wrong?”
“Hi Joe. It could be one of thousands of things. You might try looking at /var/log/messages or /var/log/secure to see what SSH thinks the problem is.”
“Bob, thanks! It was a permission problem for my authorized_hosts file.” Neato.
The nice thing about logs is that they often give you information that helps you solve a problem[0]. Like today, I’m trying to use VMware’s Update Manager to patch an ESX host but it keeps reporting that “VMware Update Manager had a failure.” Digging in a little, it turns out that it’s complaining about patch metadata being missing, which doesn’t make any sense because all of my other hosts work just fine. It’s just this one customer’s ESX hosts that are being difficult.
So I “tail -f /var/log/vmware/vpx/vpxa.log” to watch what happens when I tell it to scan for updates. Sure enough, I see the error, and it becomes painfully obvious what the problem is:
[2008-10-03 11:26:06.714 'App' 106433456 info] [VpxLRO] -- ERROR task-47870 -- -- vim.host.PatchManager.Scan: vim.fault.PatchMetadataNotFound: (vim.fault.PatchMetadataNotFound) { dynamicType = <unset>, patchID = "Unknown", metaData = (string) [ "http://vcserverhostname:80/vci/hostupdates/hostupdate/esx/esx-3.5.0/contents.xml.sig" ], msg = "Metadata for patch missing." }
Sure, the metadata can’t be retrieved because, to this host, ‘vcserverhostname’ isn’t resolvable[1].
About 15 seconds later I’d updated the DNS configuration to include my Update Manager server’s domain (as part of the “Look for hosts in the following domains”). Problem solved.
Thanks /var/log/vmware/vpx/vpxa.log!
————
[0] This seems obvious, but given how many times I’ve had that same conversation, it doesn’t seem like a place people usually remember to look.
[1] Which is also easily checked, at least from ESX’s CLI: “host vcservername”. If it tells you “Host vcservername not found: 3(NXDOMAIN)” you know what the problem is.