Disk Partition Alignment Is Still Important

This is post #9 in my December 2013 series about Linux Virtual Machine Performance Tuning. For more, please see the tag “Linux VM Performance Tuning.”

I have written about this almost yearly (beginning all the way back in 2006), but even now I routinely run across something, like a virtual appliance, that has poor partition alignment.

What’s the big deal? In short, misaligned I/O is killing your disk performance. Blame Logical Block Addressing, or LBA. Back in the day, a BIOS interacted with drives by knowing the exact geometry of the drive, namely how many cylinders, heads, and sectors were on a disk (CHS). Unfortunately that limited the size of the drives that could be used, and ignored some basic facts about circular items. So when newer, larger drives switched to zone bit recording, where the number of sectors per track varied based on the location of the track (tracks on the outside of the platter can hold more sectors than those on the inside), this broke CHS. So a scheme was invented to translate the addresses from CHS to something that worked with these newfangled drives, and LBA was born. LBA holds the number of sectors per track constant at 63, varies the number of heads, and as a result varies the number of cylinders available, which works for the old-school BIOSes and can be used by the drive to compute a meaningful new-style address.

The interaction between the Master Boot Record at sector 0 of your disk, LBA’s insistence on 63 sectors per track, and old-style disk utilities implementing partition alignment with tracks means that the first partition will start at sector 63 (since sector 0 had something in it it looks to the next unoccupied whole track). The number 63 is a persona non grata in the computer world. It isn’t a power of 2, and it certainly doesn’t line up with your storage’s idea of the world (no matter if it’s a local SSD, a RAID controller, or a big enterprise array). The misaligned partition has blocks that straddle the stripes on the array, and instead of reading a single stripe the array has to read from, or write to, two stripes.

vmdk-vmfs-array

This isn’t a big problem on one or two VMs, but when hundreds of VMs have misaligned I/O the effect is crippling. For every I/O operation you do at the OS level, you’re really doing two on the back end. That hurts performance, deduplication, and on SSD disks it reduces lifespan because SSDs have a limited number of writes they can do. Do twice as many writes as you meant to and your SSD lives half as long. Seriously. It also fills your disk cache with twice as much stuff, which means it’ll be half as useful (or less).

To this day I run into misaligned VMs, as well as people who argue with me that they don’t need to be aligned, which isn’t a common case for block storage. I also encounter advice that does not include these steps. Please! Read your manuals and align your partitions! If you are using a recent OS, like Red Hat Enterprise Linux 6 or Microsoft Windows 7 or 8 or Server 2008 or 2012 it’ll auto-align things for you. Otherwise refer to your array manual, use good ol’ fdisk to fix things on new installations, or use the NetApp tools mbrscan & mbralign that are part of the NetApp Host Utilities. And if you encounter this on a virtual appliance please submit it as a bug with the vendor. For example, a surprising number of storage vendors distribute misaligned VMs, including NetApp & EMC Isilon.

If you are using Logical Volume Management (LVM) on your VM you could also use pvmove to help you align a VM. Add a scratch virtual hard disk, align it properly, pvcreate, and then add it to your volume group. Use pvmove to migrate all the data from your misaligned LVM partition. Use vgreduce to get the misaligned volume out of the volume group, then use fdisk to fix the alignment (you might need to reboot here to pick up the new partition table). Then just pvcreate the re-aligned partition, vgextend, and pvmove back off the scratch volume. Finish with a vgreduce of the scratch partition and and shutdown to remove the scratch disk from the VM. I’ve used this a lot, especially with P2Vs, and while it won’t correct the alignment of /boot there isn’t much I/O for /boot, either, making it a non-issue.

Comments on this entry are closed.

  • You say “Logical Block Addressing on your disk drive makes the Master Boot Record 63 bytes long. This means it occupies sectors 0-62 on disk, and the first partition will start at sector 63.”. I don’t understand this, for several reasons:

    1) How does logical block addressing make the MBR any particular size?

    2) Your second sentence says that the MBR “occupies sectors 0-62 on disk”. I’ve always believed that a sector is 512 bytes. If your sentence is true, then this means that the
    MBR occupies 63 * 512 bytes = 32256 bytes. This would be a very strange size for a MBR,
    which I’ve also always thought was 512 bytes.

    I agree that fdisk attempts to put the first partition at sector 63, which can cause
    alignment issues for the reasons you mention. But, I think there’s something wrong
    with what you say at the beginning of the second paragraph.

    • You’re right, it was lacking some detail. I updated it.

  • far too complicated for me. i prefer clear skies to clouds; i’ll br paying a good techie to “clean” my computer