Killing a Stuck or Hung VM

by bunchc on December 4, 2008

Like everything else, there are more than a few ways to do this.  We’ll cover them here from least to most painful. All of these require access to the service console, so ESXi users… Sorry.

vmware-cmd stop hard

  1. Log into the service console (SSH, iLO, KVM, etc)
  2. Type “vmware-cmd –l” This will return a list of all VM’s & their paths
  3. Check the VM’s state “vmware-cmd /the/path/from/step/two.vmx getstate”
  4. Let’s try to stop stop the VM “vmware-cmd /the/path/from/step/two.vmx stop trysoft”
  5. Didn’t work? Time for a larger bat: “vmware-cmd /the/path/from/step/two.vmx stop hard”

This method is the safest, as in step 4 we try to use the tools to gracefully turn the VM off. Failing that, we do the equilivant of pulling the power.

vm-support –X

This one is a bit more cumbersome as it generates a bunch of support debugging data as well, but it should be your next step in killing a stuck VM.

  1. Log into the Service Console
  2. We need to get the VMID of the VM that we’re killing “vm-support –x” (that’s a little x)
  3. Kill the VM! “vm-support –X vmid”
  4. Answer the prompts, paying special attention to the “ABORT” question, you want to answer yes to that.
  5. In about 5 to 10 minutes, the VM will be off, and there will be a tar file in the directory you ran the command from that can be sent along to support.

Like I said, a bit more cumbersome, but much less likely to have any long term effects.

vmkload_app –kill 9

This method is still yet more awkward, but still has quite a bit of grace to it.

  1. Log into the Service Console
  2. We need to get the VMID of the VM that we’re killing “vm-support –x” (that’s a little x)
  3. Now we need to get the ‘World ID’ “less /proc/vmware/vm/(the value from #2)/cpu/status
  4. Look for the ‘Group’ field, it will contain a value like ‘vm.1293’
  5. Now hit it with a bat: “/usr/bin/vmware/vmkload_app –k 9 1293”

You will get a warning that you have sent ‘signal 9’ to world 1293 (or your ID) and it should be quite dead.

kill –9 pid

I hate even putting this here, but sometimes things refuse to die. Please keep in mind that this should ONLY be used as a last resort, as it is like swatting a fly with a Cadillac.

  1. Log into the Service Console
  2. Get the pid “ps auxwww | grep vmx”
  3. Kill it with “kill –9 pid”

Straight forward, but then again so is pulling the pin on a grenade.  The results will likely be the same.

Previous post:

Next post: