Guest OS disk usage script revisited.

I was recently asked to re-write my good ol’ script that retrieves disk usage from virtual machines and formats this information into nice’n’tidy CSV report.

My “customer” wanted to have some more data included in the report, pretty generic stuff like name of vSphere Cluster (and host – I dunno why?) where the VM was running at the moment of report creation, name of vSphere datastore holding the .vmdk file(s) and some of the information, that this “customer” puts in virtual machine annotations (not tags… at least not yet 😉 ).

On top of that I was asked to change the layout of the report, so that one line represents information about a single disk (filesystem) of VM.

At first I was like: “But what is wrong with old layout!? The one introduced by Alan Renouf, with one VM per line and disk information in columns?”.

Then I realized that creating “one disk per line” report makes perfect sense actually.
With a layout like this (using any spreadsheet application of your choice) you can group your information by vSphere cluster, or by datastore, you can retrieve “grand total” of disk space used (wasted?) in your virtual infrastructure, you can even group information “per C:\ drive”, to see if your sizing for Windows “system drives” is correct!

W/o much further ado – let’s have a look at the script itself:

$filter array that I define in Line 94 is just a set of names of annotation fields that I was asked to put into report.

The real kung-fu happens between Line 139 and Line 145 and this is probably the longest and least readable “one liner” I’ve ever committed. (and I don’t really like one liners, alright?).
The thing is – the for-each loop from original script was taking ages when extended with retrieving annotations etc., so I was looking for a way to speed it up and take advantage of PowerShell’s (attempted) parallel processing during pipe “execution”.

And it helped… A little…

Now it takes takes round 30 minutes, to put this report together in an “example infrastructure” of 1000 VMs (it was close to one hour before).
But it is still a lot of time and I really have to try to use Get-View somehow, to bring execution time to some reasonable levels… (Any hints? Please provide them in the comments!)

I owe you some explanation on how I retrieve the datastore name in  Line 142 (just because I’m “cheating” here a little).
The data structure $vm.DatastoreIdList is an array of “Managed Object References” for datastores holding VM files.
To obtain “human readable” name, I reach for the “Name” property of first element in this array. And as you probably noticed I do this only once per VM (not – per disk!).
There is no issue with this method, as long as your VM resides on a single datastore (which is 80% of the cases, I think), but if you (for whatever reason) decided to spread .vmdk files of your “monster VM” across many datastores, only the name of first datastore in this array will be included in the report (and I honestly hope this is the datastore where .vmx file is located).
Matching OS filesystem (disk), to .vmdk and then to datastore is (surprisingly!) not so easy task (you can find script of this kind in an excellent post from Arnim van Lieshout) and this routine would unnecessarily complicate my script and resulted in even longer execution times. So I decided against incorporating it here, especially that (in my opinion at least) .vmdk to datastore relation is not the most important information, we are looking for with this script.

The sample of “rearranged” Guest OS disk utilization report might look like that:

As you can see even John Doe himself does not fill all the annotations required by the report 😉

That’s it for this episode – I hope you will find this post useful, feel free to share it and let me know, if you have any comments!

Sebastian Baryło

Successfully jumping to conclusions since 2001.