When HA kicks (in) and you’re not quite ready…

vSphere HA functionality is definitely one of the top contributors to success of VMware virtualization.
But have you ever been in a situation, when one of your vSphere hosts failed (say, in the night) and vSphere HA did its job perfectly, restarting all these 50 VMs affected, then you come to work (say, “a little late”) just to find the whole gang of application owners, process controllers, incident managers etc. gathered at your desk and demanding you to provide them with the impact analysis and improvement plan before lunch today?

I think we’ve all been there 😉

You need to know at least which VMs were restarted by vSphere HA, just to fend all these people off.
Unfortunately finding these VMs in a DRS enabled cluster a few hours after the fail over action took place is (surprise!) not that easy task.

Especially when your infrastructure is on the smaller side and you don’t have any fancy tools like vCOPs (pardon, vRealize Operations, of course) to help you.

This was exactly my problem, when disaster like that happened to me for the first time.

vSphere Client is not much of assistance, by the time you got to work the default Events view is probably flooded with many things that happened after the fail over, bruteforce checking all VMs in cluster is… tedious and official KB is not quite what you need either.
I mean… “Reviewing FDM logs on master and slave hosts”? Anybody volunteers to do that?

Luckily we can retrieve the list of affected VMs with a simple PowerCLI one-liner, that might look like this:

Okay – that’s six lines not one, but I put them like this just for “readability”, you can delete all the “CR + LF”s after pipe symbols and there you go – a canonical one-liner!

To be honest: I’m not very fond of one-liners 🙁 I know they are faster than for-each loops (because PowerShell tries to execute in parallel as much of the steps in “pipe” as possible), but most of the time I have difficulty reading (understanding) them, so I try to avoid one-liners whenever I don’t need data as soon as possible.

But hey, I’ve got these guys breathing in my neck here and now, right?

In the example above I’ve assumed you’re already connected to your vCenter and the fail-over happened in a cluster called “AcceptanceCluster”, so I just grab all the VMs from there and retrieve warning events registered during last 3 hours (take into account how late you really were at work and adjust this time window accordingly 😉  ). Finally I filter out events with description containing the dreaded “vSphere HA restarted” phrase.
In this case “ObjectName” property is just the name of VM restarted, together with the exact timestamp it is all the information I need, but you can easily extend this pipe further, to retrieve any data useful for you.

I hope you will find this post useful, as always: feel free to share and/or provide your feedback!

Sebastian Baryło

Successfully jumping to conclusions since 2001.