RSS Feed for VirtualizationCategory: Virtualization

Size Labels for Virtual Environments – A Proposal »

“How big is your virtual environment?”

I love that question. Find a virtual environment and ask ten people who work on it, and they’ll give you ten different answers. “It’s pretty big,” one person will say. The next person will say “oh, we’re small.” The next two people asked will argue with each other until you shake your head and walk away. It’s all relative, too. If most guys you know have 50 virtual machines, and you have 200, you’re big, relatively-speaking. You’ve got problems they don’t have, and you’d probably like to talk with others that have had those same problems. Talking to a guy who has 2000 VMs isn’t going to help you much, though. He’s operating at a whole different scale, size, and budget level.

I spent some time this morning answering questions for a fellow who wants to build a large virtual environment. He didn’t have a lot of specifics to start with, but was really balking at what I was suggesting he look into for storage, servers, etc. As it turns out, “large” to him was really only 50 virtual machines in the next three years. That’s a big difference from what I perceive as large, which means many thousands of dollars, different storage and software strategies, completely different P2V approaches, etc.

As such, I propose some simple terminology, based on a logarithmic scale, to help sort out sizing:

Virtual-Environment-Sizes

I have 300 virtual machines, so I consider myself to be medium-sized. When I get to 1000 VMs I’ll be large.

Complexity is a whole different problem. From a complexity perspective my environment is pretty simple. The thing is, someone can take a small environment and make it really complex. And some of the biggest environments I’ve seen have been pretty simple, overall. It’s only scale I’m proposing labels for.

At any rate at least I have a graph to point to when I’m talking about this stuff. :-)

Should I Use Fibre Channel or iSCSI? »

Yet another frequently asked question: My company is getting more serious about virtualization. Should we keep using our fibre channel SAN or switch to iSCSI or NFS?

My usual answer is a series of questions: What technology do you know best? How big is your SAN? Why are you thinking about switching? What’s your performance like? What will it cost?

The thing is, I never have an answer. Whereas some common virtualization questions have easy answers this one depends heavily on what you’re trying to do. iSCSI is a great way for small- and medium-sized organizations to get into cluster filesystems. With 10 Gbps Ethernet you can get SAN-like performance, too, but 10 Gbps NICs are as costly as fibre channel HBAs. If you already have a fibre channel SAN you may already have a lot of what you need. There are large iSCSI-based virtualization implementations, there are large NFS-based ones, and obviously large FC-based ones, and a lot of good reasons to choose any of them.

Sometimes there’s a massive savings with one of the technologies, or there’s a particular storage vendor that you like that only does iSCSI, NFS, or fibre channel. If there is not a compelling reason to go one direction or another, and everything you’re evaluating is on the Hardware Compatibility List, I’d make the call based on what I know and limit the number of new things I’m trying to do at once.

Update: Commenters are chiding me for not mentioning NFS, and the chiding is a good thing. I don’t use NFS, but a lot of people do, and they love it. The problem I’ve had with NFS is that VMware doesn’t seem to think it’s as cool as other options, because their certification of NFS-based arrays seems to lag or not exist in a number of cases (like for SRM). As such I tend not to suggest it, because I don’t want to see someone get trapped with it. However, with Storage VMotion, migration to a new storage technology is easy, so it’s not a problem anymore. As such I’ve updated the post. Thanks guys!

——————————–

This is the fourth post in my series of “what VMware questions do I hear most often?” The first three questions were:

  1. How much capacity should I have for VMware?
  2. Should I convert my old servers to ESX?
  3. What kind of servers should I buy for VMware?

If you think of a question you’d like me to answer please put it in the comments. Thanks!

What Kind Of Servers Should I Buy for VMware? »

Another frequently asked question: What kind of servers should I buy to start my VMware cluster out?

My off-the-cuff answer: “the biggest machines you can afford at least three of, from whatever vendor you like the most,” followed by “it depends.”

Part of the great thing about virtualization, especially with VMware, is that you can use VMotion to move everything off of a running machine. This means that you need a place to put that workload, though (think of my “buckets of water” analogy). If you buy two machines you have to keep one empty, and 50% of your cluster capacity sits idle. If you buy three machines you can use two of them, and 33% of your cluster capacity sits idle. Four machines, 25%.

Of course, you don’t want to buy machines smaller than the workload you’re going to virtualize. If you want to virtualize an application that will need 8 vCPUs you want to consider carefully whether you’d want to purchase 8-core servers or go for something like a 16-core machine.

CPU isn’t the only consideration. You need to think about RAM sizing, too. Do you get three machines with 256 GB of RAM each or seven with 96 GB of RAM? What’s the sweet spot for pricing, versus the size of your largest VM? What does it cost for each new server you put in, in incremental costs, like KVM, SAN, and network switch ports? That should play into this, too, because those things aren’t free, and they contribute to the cost of the environment. Would you rather spend $3000 on infrastructure costs or put that $3000 into the machines?

As you see, and as with many things in IT, there’s no “best way” to do things. It’s all about you, your workloads, and your budget.

——————————–

This is the third post in my series of “what VMware questions do I hear most often?” The first two questions were:

  1. How much capacity should I have for VMware?
  2. Should I convert my old servers to ESX?

If you think of a question you’d like me to answer please put it in the comments. Thanks!

Should I Convert My Old Servers to ESX? »

Frequently asked question: My company is virtualizing our data center. Should we buy new servers or turn the ones we have into ESX servers?

My usual answer is a question: “How old are the servers you have?”

Average answer: “Somewhere around three years old.”

My reply: “Get new servers.”

Why new servers? Because, performance-wise, they smoke your old servers, and have all the new technologies like Extended Page Tables, VT-x, VT-d, etc. RAM is often a limiting factor for how many VMs you can get on a physical host. Newer servers can have lots of RAM on them, more inexpensively than old servers can have lots of RAM on them. New servers often come with four NICs built in — two for the service console, two for VMs. Older servers have two. And with new servers you won’t have to worry about them again for five years until the warranty expires. Your old servers have two years left on their warranty and extended warranties are expensive.

In short, take a look at what you’d spend to retrofit and then replace your old machines and compare that with the cost of new equipment over three to five years. It might be worth it, but in most cases the old machines are just sunk costs.

How Much Capacity Should I Have For VMware? »

Frequently asked question: How much capacity should I have in my VMware environment?

My stock answer to this: N+1 in each cluster.

If you have N physical hosts worth of work in a cluster, have N+1 physical hosts. That way you have spare capacity for maintenance operations, and you can take a whole server completely out of the cluster by VMotioning its workload to the spare machine.

Think about your servers as buckets, and your workload as water. If you have 30 gallons of water in 6 buckets, where will you put 5 gallons of water when you need to drain one of the buckets? You need an empty bucket that’s as big as the largest bucket you have. In this case you need to have 35 gallons of capacity, or an extra 5 gallon pail.

In one of my clusters I have 6 Dell PowerEdge R900s (four socket, 96 GB RAM) and 6 Dell PowerEdge 2950s (two socket, 32 GB RAM). I treat one R900 as extra capacity (the ‘+1′ in “N+1″) because it’s able to take all of the work from another R900, or any of the 2950s.

In practice I let Dynamic Resource Scheduling (DRS) move workloads around freely between all of my hosts in a cluster, including the spare. However, I periodically check the load on the cluster by putting my spare machine into maintenance mode and ensuring that the load on the rest of the cluster is within limits, both for RAM and CPU.

(As an aside, I did a presentation on virtualization where I used the water analogy as a demo, with four large, clear plastic cups and some red food coloring. People get what you’re talking about, plus all the people dozing off in the audience wake up.)

Intel Xeon 5500 Release »

I’ve spent the morning looking at the new server models from Dell, based on the Intel Xeon 5500 series of CPUs (Nehalem-EP). These things look sweet, but there are some interesting caveats. A few of my observations so far:

1. Intel has killed the front-side bus and in its place implemented QuickPath Interconnect (QPI), a competitor to AMD’s HyperTransport. It’s speed is measured in GigaTransfers per second (GT/s), and is 4.86 GT/s, 5.86 GT/s, or 6.40 GT/s per direction, which according to the Wikipedia article I linked to is 12 to 16 GB/s per direction per link. Cool, but most people are going to pick a CPU based on price point rather than link speed, given that everything in the 5500 series smokes all previous Intel CPUs anyhow.

In multi-CPU configurations the CPUs themselves have a QPI link between them that is used to share L3 cache and memory information. This is a big win in the war against processor affinity, which promotes cache hits but is a management pain. For things like virtualization, where VMs are executing on different CPUs all the time, having cache data available globally will speed things up. No matter where you execute your cached data is close.

2. There are options for 1333 MHz RAM, which achieves this through interleaving across multiples of six DIMMs. Multiples of six DIMMs combined with what seems to be a lack of 8 GB 1333 MHz DIMMs means a maximum of 24 GB of RAM at that speed. Thankfully 800 MHz and 1066 MHz RAM is available, and you can cram 144 GB and 96 GB, respectively, in a server.

One drawback is that with one CPU you can only access a certain amount of memory. For Dell right now that caps a single CPU off at 12 GB of RAM. This probably isn’t the end of the world, as most people stuffing 16+ GB of RAM in a machine will likely opt for two CPUs anyhow. It does point to where things are going with virtualization, though. If you need less CPU than two quad-cores you’re also probably running a hypervisor of some sort, so it doesn’t pay for Intel to worry about the low-CPU, high-RAM users.

3. HyperThreading is back. Might as well have it — you’ve got all this extra CPU sitting there, sleeping, and it could be doing something useful instead. If they could just implement SETI@Home in the CPUs for those truly idle times…

4. Turbo Boost is a new feature, in combination with their new power technologies, where a CPU can essentially overclock itself in certain cases. These CPUs also have new sleep states, including much deeper sleep for the CPU and RAM, so when the server is idle it consumes a lot less power. It’s also interesting that the Intelligent Power Technology lets you customize the power consumption with profiles per application.

5. All that Intel Virtualization Technology that we’ve been hearing about is in these things. Extended Page Tables, VT-c (hardware-assisted I/O), VT-d (directed I/O, or dedicated virtual I/O devices), and VT-x (FlexMigrate) are all in here and conspiring together to make VMs fly.

6. They’ve implemented PCI Express 2.0, which doubles the bandwidth of the slots. Now a 2.0 x8 slot can perform as fast as an 1.0 x16 slot. Most of us will probably only take advantage of this on our workstations/desktops, with more awesome video cards, but for some people with HPC clusters this gets important.

Dell in particular has used this rollout to also launch it’s new management software, replacing the clunky IT Assistant, and also introduce new options to the server line, like solid state disk. Overall, though, I have to conclude that no matter what vendor you use, if you’re building out your own private cloud these new CPUs are pretty sweet building blocks.

Not Running VMware Capacity Planner as root on Linux »

I’ve recently been working with VMware Capacity Planner project in my organization. It’s a useful tool in proving what I already know: that the physical hosts in my data center don’t do anything. :-)

The Capacity Planner Data Manager is a software component that you install at your site on a Windows host (in my case a virtual machine). It gathers data from your hosts, sanitizes it, and relays it or stores it for relaying to VMware’s data warehouse (where it’s analyzed). One of Data Manager’s features is that it’s agentless, and will just SSH into my Linux hosts and gather what it needs.

Problem with that, though, is that it wants to log in as root. All the documentation says to have it log in as root. But on my hosts nobody logs in as root, unless there’s some big crisis happening.

So I started messing around with it. As it turns out, at least under Red Hat Enterprise Linux, most everything that the Data Manager needs to do can be run as a normal user. The few commands it needs to run that require root are ethtool, mii-tool, and dmidecode. So you have a few options:

1. Grant the user sudo rights and change the scripts that run to use sudo for just those commands. On the Data Manager machine the scripts are all stored in C:\Program Files\VMware\VMware Capacity Planner\scripts by default, and you can just edit them to use sudo. Depending on how you do your user and rights management this could be either very easy or very hard.

2. Make those commands setuid for the duration of the exercise. Doing something like “chmod +s /usr/sbin/dmidecode” will make it run as its owner, which by default is root. When the data gathering is over you can just “chmod -s” those utilities again. This is more risky than allowing sudo rights, especially if you have a lot of users on your machines, because suddenly users can change network settings, etc. or even worse, trick dmidecode into doing something as root that it shouldn’t do. In my case I have very few local users on my hosts, and they’re all fellow admins with sudo rights, so the risk is a lot lower.

Another option is that you could make these setuid, do the inventory, remove the setuid bits, and then deactivate the Data Manager daily inventory job. No problem.

3. You can also copy the Capacity Planner scripts out to the hosts, and tell Data Manager where to find them via the Options. This might open up some options, as you could mess with the permissions on the scripts, either via chmod/chown/chgrp or via ACLs, to make them setuid but not readable/runnable by anybody but your Capacity Planner user.

There are some definite pros and cons to each, and a lot of it depends heavily on your environment. Personally, I was looking for an easy way to make this work and be able to revert any changes once we were done. It’s like “Leave No Trace” camping, only with servers.

Beyond figuring out how I was going to let it run dmidecode, ethtool, and mii-tool, I did the following:

1. On all my hosts I added a user with the same username and password so that I wouldn’t have to mess with entering hundreds of separate username/password pairs. This also means I can run one command via Capistrano to add the user, and in a month I’ll run one command to remove it again.

2. I use the pam_access.so module in /etc/pam.d/sshd so I added a line to my /etc/security/access.conf to restrict where the logins could come from:

-:vmcap:ALL EXCEPT 192.168.100.15 192.168.100.16 192.168.100.17

This denies the ‘vmcap’ user from logging in anywhere but those IP addresses.

3. I instructed VMware Capacity Planner Data Manager to sanitize the data it sends to the Capacity Planner dashboard, so that IP addresses and names and the like are stripped out.

As a result of all of this I am okay with, and more importantly, my security officers are okay with having Capacity Planner running in our data center for a month.

I’m pretty new to Capacity Planner overall, and a lot of my knowledge of it is derived from dissecting the inventory and performance gathering scripts, as well as trial and error. As always, if I missed something here or there’s a better way to do something let me know in the comments. Thanks!

How File Deletions Work »

Q: I deleted a bunch of files from one of my virtual machines yesterday. Deduplication happened overnight, but the total disk space in use didn’t go down. That doesn’t make any sense.

Q: I completely evacuated one of the LUNs on my NetApp array, but the NetApp still says that the LUN is almost completely full, even after deduplication. How can that be?

A: To understand what is happening you need to know a little bit about how a file system works. A simple way to explain it is that a file system stores the data in a file as data blocks, and it stores the name of the file (and other data, like access times, etc.) in a directory block. The entry in a directory block points to where all the file’s data blocks are.

Files are found on disk via their directories, and when you move a file from one directory to another that file gets transferred to the new directory block, and removed from the old one. All the data blocks that belong to the file stay put, because it’s only the directory blocks that need to change.

When you rename a file the file system just changes its name in the directory block, and leaves all the data alone.

When you want to delete a file all the file system has to do is remove the entry in the correct directory block. Once that happens the file is “gone.”  However, the filesystem does nothing to remove the data blocks that were part of that file, though. They’re still out there on disk, just not visible to the filesystem.

This is why deduplication doesn’t instantly shrink your disk space. All that data is still out there, just like it was before, it’s just that your OS can’t see it anymore. In the case of VMware you also have to remember that not only do you have filesystems in your VMs, but VMFS is indeed a filesystem, too, with these same properties. Which is why it’s possible to have a completely empty VMFS volume but have your NetApp array complaining that the LUN/Volume/etc. is full.

If you want to have deduplication reclaim the space you have to actually overwrite that old data with something that’s easy to deduplicate, like a huge file full of zeros. On Linux you’d do something like what Leo Raikhman suggests as his zero-out script, and on Windows you can use sdelete to do the same.