How To Read Dumps – ESX Crash Dumps That Is

by bunchc on February 11, 2009

2331307556_84c8bb52c7_o[1]

About thirty years ago in the jungle in South Korea I was spending some time living as a monk. One of the things I learned from these monks, was the ancient art of Dump reading. Yes! That’s right, I can tell the future by reading the finer texture and smell of a dump.

Ok, while not true (I’m naught by 26) and I can’t tell the future by reading dumps. I can tell you, however, that reading ESX dumps would be conducive to your future.

What Makes A Dump?

Lots and lots of fiber in your diet. That… and PSOD’s (Purple Screens of Death). They’ll generate an ESX kernel dump and drop a crash dump file into the /root/ directory, named something like: ‘vmkernel-zdump-<reversed date>.#.#.#’

This file is created on the first reboot following your psod and is generated from the contents of your VMKCORE partition, you did make a VMKCORE partition, right? It’s the one labeled ‘fc’. Can’t find it? Sure? Did you look in your sock drawer? Ok… well in that case “vmkdump -d /dev/sda5″ where /dev/sda5 is the output from esxcfg-dumppart -l

I Have My Dump, Now What?

So you can do a few things. First is to generate a support bundle and send it off to VMware for analysis (which you should do anyways). However, if you’re like me, and can’t wait, from the service console you can do the following:

Here is where the dump hides:

# ls -alh
total 14M
-rw-r–r–    1 root     root          13M Feb  6 04:40 vmkernel-zdump-020609.04.40.1

Lets extract it:

# vmkdump -l vmkernel-zdump-020609.04.40.1
created file vmkernel-log.1

# ls -alh
-rw-r–r–    1 root     root         186K Feb 11 14:32 vmkernel-log.1
-rw-r–r–    1 root     root          13M Feb  6 04:40 vmkernel-zdump-020609.04.40.1

So there it is… now lets take a look at the insides:

54:01:08:11.385 cpu15:1166)<6>Debug scsi underrun
54:01:08:11.385 cpu15:1166)<6>Debug scsi underrun
54:01:08:11.385 cpu15:1166)<6>Debug scsi underrun
54:01:08:11.386 cpu15:1166)<6>Debug scsi underrun
54:01:08:11.386 cpu15:1166)<6>Debug scsi underrun
54:06:35:47.637 cpu7:1074)<6>qla24xx_abort_command(0): handle to abort=1457
_[45m_[33;1mVMware ESX Server [Releasebuild-113339]_[0m
Exception type 13 in world 1169:vmm0:197830- @ 0x6ff49b
frame=0x3c47cec ip=0x6ff49b cr2=0x8617c88 cr3=0x3f686000 cr4=0x2660
es=0x3ee64028 ds=0x4028 fs=0x1580000 gs=0x4041
eax=0x2a ebx=0xb3f0f80 ecx=0x9ff47e90 edx=0x50
ebp=0x3c47ed4 esi=0xe edi=0x15806c8 err=0 eflags=0x10286
0:1024/console 1:1196/vmware-vm 2:1200/mks:19783 3:1186/mks:19783
*4:1169/vmm0:1978 5:1161/vmware-vm 6:1170/vmm1:1978 7:1179/mks:19783
8:1176/vmm0:1978 9:1184/vmm1:1978 10:1182/vmware-vm 11:1177/vmm1:1978
12:1162/vmm0:1978 13:1198/vmm1:1978 14:1197/vmm0:1978 15:1039/idle15
@BlueScreen: Exception type 13 in world 1169:vmm0:197830- @ 0x6ff49b
0x3c47ed4:[0x6ff49b]E1000PollTxRing+0×366 stack: 0×7030140, 0xb3f0fb4, 0×0
0x3c47f2c:[0x701474]E1000_PollRings+0x1d7 stack: 0x3ee6a308, 0×704, 0x267d49c0
0x3c47f84:[0x618647]BH_Check+0x2ee stack: 0×1, 0×82000000, 0x85f7d70
0x3c47fd8:[0x62249c]VMKCall+0×147 stack: 0x2d, 0x85f7d70, 0×82000000
0x3c47ffc:[0x67af0b]VMKVMMEnterVMKernel+0x8e stack: 0×0, 0×0, 0×0
VMK uptime: 57:17:09:07.125 TSC: 11937242658207618
Starting coredump to disk Starting coredump to disk Dumping using slot 1 of 1… using slot 1 of 1… log

The first column is your uptime. The last event before the crash was the aborted handle:

54:06:35:47.637 cpu7:1074)<6>qla24xx_abort_command(0): handle to abort=1457

The uptime of the kernel when the crash occurred is the second last line:

VMK uptime: 57:17:09:07.125 TSC: 11937242658207618

We can see that there is 11 hours between the last message and the time of the crash. This means that those debug scsi underrun messages can basically be ignored.

Now let’s move on to the backtrace itself:

@BlueScreen: Exception type 13 in world 1169:vmm0:notthemama- @ 0x6ff49b
0x3c47ed4:[0x6ff49b]E1000PollTxRing+0×366 stack: 0×7030140, 0xb3f0fb4, 0×0
0x3c47f2c:[0x701474]E1000_PollRings+0x1d7 stack: 0x3ee6a308, 0×704, 0x267d49c0
0x3c47f84:[0x618647]BH_Check+0x2ee stack: 0×1, 0×82000000, 0x85f7d70
0x3c47fd8:[0x62249c]VMKCall+0×147 stack: 0x2d, 0x85f7d70, 0×82000000
0x3c47ffc:[0x67af0b]VMKVMMEnterVMKernel+0x8e stack: 0×0, 0×0, 0×0

The last instruction was E1000PollTxRing then E1000_PollRings then BH_Check then VMKCall and finally VMKVMMEnterVMKernel

Based on the name of the last instruction, this host probably crashed due to some type of packet or frame corruption in the Intel E1000 driver in the VM that was running with world ID 1169 in vmm0 named ‘notthemama’.

Thanks for playing along. If you have questions hit me up in the comments or on twitter @cody_bunch

  • http://technodrone.blogspot.com Maish

    Highly educational and very entertaining!!!

  • http://professionalvmware.com professionalvmware

    Thanks!

  • Pingback: Professional VMware » Blog Archive » e1000 vNIC’s Hate Me – How To Find And Change vNic Types With The VI ToolKit

  • http://www.frames4sale.com/Sunglasses/Prada.html Prada Sunglasses

    That’s right, I can tell the future by reading the finer texture and smell of a dump. The last event before the crash was the aborted handle!

    Thanks

  • http://thestyleshopper.com/ed-hardy/ed-hardy-bags-and-purses/ Ed Hardy Bags

    It certainly is very interesting! Good way to learn “How to make Dumps” :P

  • http://www.online-poker-texas-holdem.de/ Online Poker

    This means that those debug scsi underrun messages can basically be ignored.

  • http://www.memorybits.co.uk/camera_accessories Camera Accessories

    Yeah.. I agree! Highly Educational indeed..

  • http://www.hotels-in-dubai.org Hotels In dubai

    Excellent , thank you !

  • http://professionalvmware.com professionalvmware

    No problem!

  • http://www.psphomebrewdownloads.com Psp Homebrew

    I was looking for this for a while now !
    Good tutorial buddy !

  • http://professionalvmware.com professionalvmware

    No problem!

  • coldblood

    That's really interesting! Really enjoyed reading the article.. Thanks for putting in your time and effort fort he article!

  • http://credit-card-processing.123vendors.com/index.asp Online credit card processing

    That’s right, I can tell the future by reading the finer texture and smell of a dump. The last event before the crash was the aborted handle!

  • http://www.homme-rock.com/Designers/fred-bennett/ fred bennett

    Why are you repeating same sentence again and again it irritating.

  • http://professionalvmware.com professionalvmware

    Was just acknowledging and thanking the other commenters. Sorry?

  • http://www.DigBands.com/ dig bands

    That is something to learn for. Very informative thanks for sharing.

  • http://woolarearugsguide.com wool area rugs

    This is damm interesting.

  • http://www.quitsmokingaid.net Quit smoking

    The last event before the crash was the aborted handle! This means that those debug scsi underrun messages can basically be ignored.
    Cheers

  • http://www.lasikeye-guide.com/category/lasik-eye-surgery/ lasik eye surgery

    Thanks for the tips

  • http://www.lasikeye-guide.com/category/lasik-eye-surgery/ lasik eye surgery

    Thanks for the tips

  • Pingback: vSphere 4 and Core Dumps (vmkdump)

  • Rudra

    Hi, Looks very good.Thanks. But I am looking for a tutorial/pdf for reading/analysing logs & errors and troubleshoot.Thanks once again.

  • Pingback: .:Ewig Drohendes Versagen:.» Blogarchiv » ESX CoreDump Logfile erstellen nach PSOD

  • Pingback: How to handle a (VMware ESXi) server crash? Drija

  • Fuzzi

    My concern here is: what info does it collect from the ESX host ? Is it safe enough to send this dump for analysis to VMware? Should i not be bothered about my core info being sent/shared ?

  • Gaurav

    Yes it is Safe dont worry these logs are not a Security Concern

  • Pingback: How to handle a (VMware ESXi) server crash? - Just just easy answers

  • aj203355

    Yeah.. I agree! Highly Educational indeed.. =)

Previous post:

Next post: