This post actually puts together two of my past posts to solve a ‘common’ problem. Common that is, if you often leave snapshots running, and do not have a method for otherwise checking on them. The situation is this:
It is 3:45PM on Friday (what good issue doesn’t happen on a Friday?), you are closing up shop for the weekend and the phone rings. You let the first call go to voice mail, after all it is Friday, it can wait. Moments later, the “Bat Phone” rings, and you know what that means. It means a long Friday night, and likely a long weekend. After a short conversation you’ve found that one of your VM administrators left a snapshot running on their Counter Strike server, and it has filled the datastore. The problem with this, however, is that it also happens to be running on the same datastore as the Payroll web server, the same Payroll web server that makes sure you get paid. Story sound interesting yet? So what do you do when this occurs:
Step 1) Kill the bunk VM. Why? It will likely be useless by the time you get to it anyways, but if it is running, and there is no space left, we need to kill it to ensure that it does not affect our attempts to fix it during step 2.
Step 2) Removing the snapshot with extreme prejudice. This is critical as well, it is what got you into this situation in the first place, isn’t it? Removing the snapshot, and cleanly will free up the space on the datastore that you need to not only get the Counter Strike server back online, but will also allow you the free space to power back on the remaining crashed VMs.
Step 3) Remediate the VM Admin using what ever policy best suits you. I prefer public humiliation, and death by firing squad.
