When to Cluster, When to Build Big

When you’re building a medium-large sized service there’s a fundamental choice you have to make: do you build it on one big machine or do you cluster smaller machines? If you choose a cluster do you make it fewer, bigger machines or a bunch of smaller machines?

A cluster is hard. A cluster usually means a shared filesystem. A shared filesystem means a SAN. A shared filesystem usually means an IP-based lock server, which means your network needs to be as reliable as your SAN. It also means that file accesses will be slower. Sometimes a cluster means weirdo stuff like multicast, or things your network guys might not like or understand. You also might need to do heartbeat via the IP network. Your network guys might not like the 100% uptime you need from the network now. Or maybe they won’t like that and they also won’t let you put a $30 Linksys switch between all your machines to get around it, because that looks bad for them now that they bought all that fancy network gear.

Clusters are a pain to troubleshoot. Each node has to be identical. You can’t just change something on one node and not on the others, or now the whole cluster will stumble. You now have to watch the performance of 5 machines simultaneously, rather than just one, and try to correlate behaviour with potentially random-looking events such as users using the system. One of your servers might die and take the cluster with it, despite all of the fancy fencing and shoot-the-other-machine-in-the-head tactics you implemented. Clusters make you learn all the fancy features of drawing in Visio so that you can make nice charts for when you’re talking about how you’re going to troubleshoot the problems you’re having.

At its best a cluster is fault-tolerant. At its worst it is fault-causing. When it’s working it’s tolerant of hardware, OS, and software failures. When it’s working right you can patch machines with a rolling restart, provided that the patches don’t require the entire cluster to come down. A cluster means you need a very similar test cluster, so that you can practice these things without destroying anything. A test cluster means more money spent on hardware and software. A cluster provides instant failover, sweet performance, 24×7 operation, and sometimes it just cannot be beat.

Big machines are easy. They don’t need IP-based lock managers, rolling restarts, or huge test clusters of machines. They need maintenance windows, though. They still need test systems, but there can be one test system, not a whole cluster. You don’t have shared filesystems. You don’t have automatic failover, either. But with a single machine and a test box you can engineer it so that you could just switch the storage to the test machine and fire everything up. A single machine requires some foresight into disaster planning. A cluster generally needs less foresight. A single machine better be darn reliable, though, and be able to limp along if it’s having a problem. IBM’s pSeries boxes are great at this, with neat messages like “hey, I lost a CPU, but it’s no big deal — I just shut the offending part off until you can fix me.”

Single, large machines are easier to program for than clusters of machines. Say you have this cool network traffic graphing program, like MRTG, and want it to graph so much stuff that a single 2 CPU box isn’t enough, and you’d be taxing a 4 CPU box. Do you figure out how to split it between two 4 CPU machines, or do you get an 8 or 16 CPU machine and just call it even?

Single machines also expose weird problems when you get that much load on a NIC, filesystem, fibre channel card, etc. Maybe a patch from your OS vendor hoses I/O, but it wouldn’t be as big of a deal if the load were spread around. Maybe you need a third and fourth fibre channel card to spread the I/O around. Maybe having all those files in one directory causes some weird directory inode update problem. Maybe you’re just putting too much I/O on that single LUN on your storage array. At this level it’s all wacky, and all the troubleshooting takes a while.

Yeah, maybe single machines are just as bad as clusters, only in different ways.

My favorite cluster strategy is the “gang of single machines” approach. Take a Layer 4 load balancer, add web servers, or mail servers, or whatever, and balance them. You can do rolling restarts, patching, all sorts of neat things because it’s like having six standalone machines with infinite maintenance windows. Not that I recommend it, but you can even do a certain level of your testing on the production systems, so that if there’s a problem you have only hosed one of the machines. It’s easy to add capacity, just slap another machine in the mix. There are three drawbacks, though: keeping their configurations in sync, ending up with lots of little machines, and building layers of load-balanced services. Because of the way the layer 4 switches work in conjunction with OSI layer 2, if you want to build a load-balanced cluster that uses another load-balanced cluster (like an LDAP server cluster) you just have to think it out a little. Keeping the configuration in sync across a bunch of machines isn’t usually too hard using rsync and SSH public keys, or a database for configuration data.

The biggest question for a cluster is whether you get three 4 CPU machines or six 2 CPU machines. Smaller machines make expanding more granular, but also don’t allow for big load spikes. Removing a smaller machine from a cluster for patching or whatnot is less of a hit than removing a big machine, though. Maintaining six machines is more annoying than maintaining three, and requires absolute discipline and good tools to ensure all of them are identical. Six machines also means six fibre channel cards, six or twelve SAN switch ports, six or twelve network switch ports, and all of those cost money. They are all points of failure, too, and while the failure might not be catastrophic you will be required to fix them. Adding another big machine to the cluster means you probably don’t have to do that again for a while, and can spend some time playing Wolfenstein: ET instead of racking machines.

So what’s the answer? When do you cluster, and when do you build big? It seems to me the cutoff is either four or eight CPUs. If you need a machine bigger than 8 CPUs, get a cluster. If your application is already broken into logical pieces cluster it, but if it’s monolithic throw big hardware at it (until you get to 8 CPUs, then balk). And always, always, always opt for three 4 CPU machines than six 2 CPU machines. Trust me. It’s cheaper on the whole, easier to deal with, and easier to draw in Visio. And, after all, it’s all about how it looks in Visio. 🙂

1 thought on “When to Cluster, When to Build Big”

Sean

February 2, 2007 at 12:22 PM

Interesting analysis and there’s nothing that I wouldn’t agree with.

I’d just like to say that the number of processors (or size of the box, rather) has to follow the workload. You might want to use 4-way servers for DB nodes, but not for LDAP nodes (of a cluster). And sometimes number of CPUs is not an issue if the workload is I/O bound, etc.

Nice post!

Comments are closed.