The ultimate VM batch deployment script.

In my first blog post ever, almost three months ago I shared my good ol’ script for batch deployment of virtual machines.
That script served me well for quite a few years already, but it was somewhat on the crude side of scripting.
It was using new-vm cmdlet sequentially for each line of .csv input and the method of controlling the deployment progress was just a sequence of while loops that tested the status of VMTools in freshly deployed VM.
Not very elegant, but this wasn’t a big issue for me, neither was the sluggishness of synchronous deployment.
I simply used to run this script overnight and if I had any concerns the batch would not complete before I come back to work in the morning, I would just split it into separate .csv-s and start multiple instances of PowerCLI for that.

Once that simple script “went public” I made a commitment to “some people” that I will re-write it to use async tasks (so deploy multiple VMs simultaneously) and I promised to finish that before my summer holiday this year ;).

This was also great opportunity for me to learn more on how to control background jobs, so here we go.

The new script still takes input (VMs to be deployed) from a CSV file which should be saved in script’s working directory, under “hard coded”  name of vms2deploy.csv.

Just to remind you, the CSV file should look somewhat like this:

Fields are still pretty self-explanatory I believe, but I introduced first improvement here already – as you can see for deploying a Linux VM you can (in fact you should) put “none” as value of “dns1” field.
This is because it is impossible to set DNS servers configuration via OS Customization Specification for Linux VMs (there is a check in the script that handles this value properly).
Not much else to comment on here, except maybe “datastorecluster” field – If you’re still not using Datastore Clusters in your environment, you should modify this script to use get-cluster cmdlet instead of get-datastorecluster where necessary (it is only used twice, so this shouldn’t be a big problem 😉 ).
If you do use Datastore Clusters however, remember to turn Storage DRS (SDRS) on.
This is required for initial placement of new VMs on the datastores just like DRS is required for initial placement of new VMs on hosts. If SDRS is not enabled the script will fail miserably (and silently 😉 ).
Many thanks to @KrazieEyes for finding this out and letting me know via Twitter!

Because of this KB script requires PowerCLI 5.5 R1 and because I manipulate OS Customization Specifications during deployment, it needs to be started from 32-bit PowerCLI window…
Well, not anymore – as pointed-out by Shawn Masterson below, VMware did a great job recently and complied 64-bit versions of “OS Customization related” cmdlets, that’s why the condition in Line 371 is fixed to $true now 😉 .

The general workflow for the script is as follows. In step one input is sanitized (empty fields and duplicate VM names are eliminated from CSV file), then script groups VMs to be deployed per host cluster and starts a separate background job (so separate PowerShell process) for each cluster.
I still tend to think in host clusters categories, if you also group your hosts in logical folders (which is actually VMware’s recommendation), I encourage you to modify this script to use -Location instead of -Cluster where necessary.
After dispatching background jobs the only responsibility of “main” script is to pool these jobs every 20 seconds and display overall progress, as usual quite detailed logging is done both for main script and for each background deployment job.

This is how the code looks:

WoW! That was long! A new record of 486 lines!

But seriously – control part of the script is contained between Line 325 and Line 486, this is where input is sanitized and background jobs are dispatched.

From Line 413 to the very end the script is only pooling these jobs for progress and displaying information about it.

All the “deployment magic” happens in a humongous script-block defined between Line 97 and Line 320, so lets focus on that first.

Because this script-block is started as separate PowerShell (not PowerCLI!) process we first have to (and this time I mean it) load the PowerCLI snapin, then connect to our vCenter Server (because background job doesn’t inherit this connection).
As you can see vCenter address and log-on credentials (gathered from user upon script start-up) are passed to script-block as parameters. The three remaining parameters are array of VMs to be deployed in given cluster and locations of log and progress tracking files. The helper object to track job progress is defined between Line 100 and Line 110

Now, I was trying to “think big” when writing this script, but I’m also a bit of a “control freak”, so I decided against just firing all the VMs we want to deploy at once. You can of course start 100 or more new-vm tasks asynchronously and let vCenter sort the load out, but well… I prefer to do it in smaller chunks, with cluster capacity in mind if possible. That’s why I’ve defined two while loops that make sure each available host in cluster is deploying at most one VM at any given moment this script-block is running. The loops continue until we run out of VMs from input. You can say this slows the whole process down, but in my opinion it is better to be safe than sorry.
When your cluster runs out of capacity for example (because you requested too many VMs to be deployed) and DRS is not able to power on any VMs more (I actually created such condition for testing purposes, you can see that in screenshots below), the deployment will just stop in a controlled way, without overloading vCenter or anything, this approach also makes control of deployment and OS Customization faster and less resource greedy.
Last but not least – it is really easy to change this code to start more than one VM deployment per host ;).

As for deployment control – I use classic method described by Luc Dekens (who else? 😉 ) ages ago. Between Line 150 and Line 202 a “chunk” of new-vm tasks is started asynchronously, information about these tasks is saved into a hash table, using task id as hash and name of deployed VM as value.
Further on (between Line 205 and Line 225) list of recent tasks is retrieved from vCenter and matched against our saved ids. If there is a match and the task was successful we power on the machine (to let OS Customization process begin) and remove this task from our table (removing is also done for all failed tasks).
This part loops until we run out of deployment tasks, then new “chunk” is started.

Only after we deploy (and hopefully power on) all requested VMs, we switch to track the progress of OS Customization inside the guests. If your batch is on the larger side, there is a good chance that some of the OS Customizations will complete before we even check ;). This part of script is “inspired” (OK, I almost copy-pasted it completely 😉 ) by excellent post of VMware’s Vitali Baruh. Although it looks somewhat complicated (defining a script-block inside script-block to control main loop…) the idea is not that difficult to perceive.
Basically for each VM that powered-on successfully (we don’t care about failed ones anymore) we search vCenter events for “CustomizationStarted”, “CustomizationSucceeded” and “CustomizationFailure” events. The loop repeats every 10 seconds (like all loops in this script-block) until we are out of VMs or time-out (fixed to 7200 seconds or two hours) elapses.
I would like to stress-out that this time-out is only for OS Customizations part (we all know how many things can go wrong there, right?), by no means will it disrupt “deploy and power on” part of the script-block.

And that’s basically it for the “worker horse” of this script.

I have to admit I cheated a little in the main loop that displays script progress…
As you can see every major loop inside the script-blog dumps current progress indexes (I define 6 of them between Line 100 and Line 110) to a .csv file inside the script log directory.
The control loop in main script section picks-up these indexes for all dispatched background jobs and estimates overall progress (or at least tries to do so 😉 ).
You’re free to say it is neither most elegant nor the fastest way to track progress of background jobs, but it just does the trick and I’m not too worried that short write sequence every ten seconds will kill your storage subsystem ;).
In the first stage of workflow the progress is calculated as proportion of sum of VMs that powered on successfully (or failed across all background tasks) against the grand total of VM deployment requests (from CSV file), so your PowerCLI window might look like that:

ultimate1

Then, if script detects that OS Customization has started inside any of the background jobs (CUSTSTART index greater than 0), it switches to displaying progress as sum of successful and failed customizations compared to total of successfully powered on VMs.
You can see that “current activity” field displayed by write-progress cmdlet changes from “VM deployment in progress” to “VM OS Customization in progress”.

ultimate2

This might have funny effects, in situations where you, for example, deploy one “monster VM” in (say) “server cluster” and a bunch of small VMs in (say) “vdi cluster”.
Probably many of the small VMs will power on before “the monster”, so your indicated progress will be soaring, then once “monster VM” starts OS Customization, the progress will suddenly drop to zero…
To my defense: this approach always shows “worst case” scenario, so you will never see progress bar stuck at 100% for hours and such unexpected behavior can be somewhat amusing while you wait for jobs to complete ;).

I call this script “ultimate” for two reasons. First of all – as a joke of course :D.
Secondly – I really have no idea how to make it more complex… ah wait… I could introduce additional safety checks (like free diskspace in datastores, or load on host clusters) or I could introduce customization of vCPU and vRAM assignments (so that VMs could be deployed with resources different from the ones fixed in template)…
Sky is (almost) the limit and maybe I should come back to this script once again in the future .
(Read: after I’m back from holiday 😉 ).

That’s it for now. I hope you will find this script useful, feel free to share or provide your feedback!

<Update, August 29th, 2014>
I just noticed that (by mistake) I posted “very early” version of this script…
While I create a “very advanced hash-table” in Line 395  I do not make any use of it (at least not in the version originally posted) 🙁
Instead I just did a rudimentary query (Line 402 of “early script”) for all background jobs that are running (which in some cases – like “orphaned” background jobs – can lead to unexpected results).
This was legacy from the time I struggled a bit with controlling these jobs, so I corrected it and now you see a really “elegant” way of querying only the jobs we started between Line 409 and Line 421.
Enjoy!
</Update>

Sebastian Baryło

Successfully jumping to conclusions since 2001.