Mystery of storage misalignment

Usually when people hear that term and how important it is to avoid storage misalignment on environments, regardless if we talk about virtual or physical, they thought their “kingdoms” are not affected and their just ignoring it. Suddenly they end up in a performance problems on storage which was caused by storage misalignment. Anyway, in regards of misalignment in virtual environments and shared storage, it is really serious problem because it does not affect only single OS instance but bunch of virtual machines which runs together on shared storage with misalign virtual machine or machines. Of course one misalign VM, among hundreds, usually (depending on IOPS characteristic) it is not a problem. Problem is when there are more virtual machines with storage misalignment. This brings serious storage performance degradation.

What exactly is storage misalignment ?

Starting in 2009, storage vendors stared producing disks with 4096 bytes blocks instead of 512 bytes like it was before. However thanks to firmware, each 4096 block is devised by logical blocks of 512 bytes. In each 4096 physical disk block there are 8 x 512 bytes logical blocks which are presented to the operating system.

storage block size

storage block size

misalign storage

misalign storage

storage properly align

storage properly align

 

Most modern file systems use data structures that are 4096 bytes or larger in size. When VM wants to read or write one of these data structures on a new disk with 4096-byte sectors and the file system data structures happen to align perfectly with the underlying physical partition size, a read or write of a 4096-byte data structure results in a read or write of a single sectorย  but when the file system data structures do not align perfectly with the underlying physical sectors, a read or write operation must access two physical sectors. Heaving misaligned Virtual machines significantly increase number of IOPS issued into underlying storage which in same cases might cause overall Virtual Machines performance drop down.

Misaligned virtual machine filesystem

Misaligned virtual machine filesystem

In figure below, single read\write from a misaligned virtual machine affects 2 physical chunks causing additional IO to underlying storage system. If it is only one low IOPS intensive VM, it is not a big problem, modern storage systems are fast enough to handle workload even from misalign VM. Problem starts when there more VMs with different IOPS characteristics and additional storage saving features such as deduplication are enabled.

R/W IO on misaligned storage

R/W IO on misaligned storage

After Virtual Machine storage alignment single write on virtual machine block filesystem issues single IO to underlying storage.

2013-07-02_17h11_35

After alignment

Side effects of misalignment
  • drop down disk performance
  • backup – longer backup window
  • deduplication – longer window toย  perform data deduplication
How to identify storage misalignment
In Windows OS:
  • Run msinfo32.exe, under Component –> storage –>Disks –>Partition starting offset has to be divisible by 4096. On a figure below is not divisible it means that storage is misaligned.
  • Download Free tool from ctxadmtools.comย  called VM Check Alignment v1.0 and run it against all Windows machines in network.
disk offest on misaligned partition

disk offest on misaligned partition

Linux or Unix workloads:

SSH to system and run fdisk -lu device, look at Start bock, in example below partition starts on 63 sector, which means is not aligned because sector number 63 “belongs” to the last sector from the first 4096 block (see first figures on top).

Tools used to fix misalignment.
  1. Tool #1 is UberAlign, fantastic tool.
  2. Storage vendor tools such as: NetAPP Virtual Storage Console.
  3. GParted – free Linux partition manager
  4. VMware converter 5 and newer align storage during conversion
  5. Platespin, Quest vOptimizerPro, Quest vConverter
How to avoid storage misalignment.
  1. follow storage vendor best practices and technical papers
  2. use tools which are recommended by vendor to manage storage and vSphere environment
  3. prepare golden image (virtual machine template)
  4. if possible, integrate vSphere with storage system using designated plugins to bring more visibility about storage backend to vSphere and vice versa
  5. use “virtualization aware” operating systems on virtual machines, such as Windows 2008 and newer, RHEL 6 and newer and more
  6. If you have to use older OS, prepare virtual machine template with aligned storage
  7. periodic checking of infrastructure

Artur Krzywdzinski

Artur is Consulting Architect at Nutanix. He has been using, designing and deploying VMware based solutions since 2005 and Microsoft since 2012. He specialize in designing and implementing private and hybrid cloud solution based on VMware and Microsoft software stacks, datacenter migrations and transformation, disaster avoidance. Artur has been in IT industry since 1999 and consulting since 2008. Artur holds VMware Certified Design Expert certification (VCDX #077).

  • Marek Lubinski

    Artur, I strongly disagree with this article and whole bullshit about misalignment stuff. What I noticed is that vendors (big ones like NetApp-direct experience and EMC) tend to find those lousy bullshit to bounce complaining customers even without checking what’s real customer issue. I went through this shit once and i know what i’m talking about. We had like hundreds of misaligned VM’s on our NetApp cluster and basically we had other issues related to 10Gbit throughput from particular luns (customer was complaining about his speeds from linux or whatever, don’t remember already). So we created case in NetApp about that. And guess what – nothing was checked. First point: you have misaligned servers – fix it – that’s the problem. In the beginning i was like: wtf? are you serious? But they were very serious claiming that those misaligned servers were VERY F***** BIG problem. I was like, c’mon be serious. We skipped this particular customer issue (he quit anyway) but i spent like 6months aligning all servers, templates etc to have everything intact. And guess what – I managed to have that. And guess also what? This particular issue WAS STILL THERE. Weird right? It should be related (according to this great netapp support) to misaligned vm’s which were KILLING storage (lol). Well it wasn’t the case ๐Ÿ˜‰
    I respect vendors who did measure that bullshit and guess what: Dell in their official manuals/guides (about Compellent storage arrays) strictly say: “Performance benefit coming from 100% aligned disk is so low/minor that administrative hassle to align is not worth”. Wow, respect Dell. Also Nexenta says: we are not affected by misalignment so no worries there. Respect as well. Guess what, We were running many thousands of VM’s (notice: MISALIGNED) on those systems and guess what – NO PROBLEMS ๐Ÿ™‚ Therefore don’t let yourself being bullshitted by vendors who say it’s a big problem (it’s not). Just accept the fact that big guys know it’s very easy to bounce your problem back to you with this statement “you have misaligned disks”.
    I thought it’s worth to comment on this post even though my comment is long ๐Ÿ˜€

    • thanks for valuable comment, as always. This post is based on my current experience with IBM N-Series problems, supported pointed me out to their knowledge base and I have to align all servers otherwise they not gonna help me further :-/ In next week I should have majority of my VMs aligned so I will be able to compare results before and after alignment and update post accordingly ๐Ÿ™‚

      • Marek Lubinski

        yeah exactly, as IBM N-Series is nothing more than re-branded NetApp box ๐Ÿ˜€
        what kind of issues you have?

        • One of the N-series head from cluster has average CPU load around 80%, when data deduplication kicks in basically CPU runs 100% all the time and dedup never finish. I tried to balance load across both N-series heads in cluster by moving the most IO intensive VMs out of it, but it didn’t help much. Any ideas ?

          • Marek Lubinski

            no worries, you can first check sysstat -M 1 (in priv set diag mode) to see whether only 1 core/cpu is loaded or all of them. high cpu might be caused by lack of diskspace – there should be around >35% free disk space in aggregate (lower values will cause high cpu load). Also if you have lots of big snashots done at same time, removal kicks in container block space reclamation process which is heavy cpu ๐Ÿ˜‰
            check those things, maybe it’s one of those. It’s not caused by misalignment for sure ๐Ÿ˜€

          • only one CPU is utilized and there are more than 35% free space

          • Marek Lubinski

            in this case i believe it might be caused by snapshot removals or lots of dedupe operations.

            run “wafl scan status” and see if you have any active: “Containter block reclamation process”
            also run sis status if there are any concurrent dedupe processes at that time ๐Ÿ™‚

          • Problem was solved couple of days ago. It was problem wit deduplication metadata. We stop disabled deduplication on volume and enable it back. Problem with high CPU load disappeared. Thanks for help Marek.

          • Marek Lubinski

            my pleasure m8 I’m glad it’s fixed. And you just save days of aligning that shit to be “compliant” with support statements ๐Ÿ˜‰

  • Miguel Rodriguez

    Nice writeup and refresher. Agree, templates should be the first defense, and the NetApp VSC can quickly analyze the environment for misaligned VMs.

    Also agree with Marek. The first step in troubleshooting by vendors was, is your firmware up to date? Update and then call us back. Sometimes doing the initial search is valuable before calling support, as it helps me to gauge where we are going in the process. Just my 2 Euro:)