Purple Screen of Death or commonly known as PSOD is something which we see most of the times when we run an ESXi host.
Usually when we experience PSOD, we reboot the host (which is a must) and then gather the logs and upload it to VMware support for analysis (where I spend a good amount of time going through it)
Why not take a look at the dumps by yourself?
Step 1:
I am going to simulate a PSOD on my ESXi host. You need to be logged into the host's SSH. The command is
# vsish -e set /reliability/crashMe/Panic 1
And when you open a DCUI to the ESXi host, you can see the PSOD
Step 2:
Sometimes, we might miss out on the screenshot of PSOD. Well that's alright! If we have core-dump configured for the ESXi, we can extract the dump files to gather the crash logs.
Reboot the host, if it is in the PSOD screen. Once the host is back up, login to the SSH/Putty of the host and go to the core directory. The core directory is the location where your PSOD logging go to.
# cd var/core
Then list out the files here:
# ls -lh
Step 3:
How do we extract it?
Well, we have a nice extract script that does all the job, " vmkdump_extract ". This command must be executed against the zdump.1 file, which looks something like this:
# vmkdump_extract vmkernel-zdump.1
It creates four files:
a) vmkernel-log.1
b) vmkernel-core.1
c) visorFS.tar
d) vmkernel-pci
All we require for analysis is the vmkernel-log.1 file
Step 4:
Open the vmkernel-log.1 file using the below command:
# less vmkernel-log.1
Skip to the end of the file by pressing Shift+G. Now let's slowly go to the top by pressing PageUp.
You will come across a line that says @BlueScreen: <event>
In my case, the dumps were:
2015-12-17T20:34:03.603Z cpu3:47209)@BlueScreen: CrashMe
2015-12-17T20:34:03.603Z cpu3:47209)Code start: 0x418021200000 VMK uptime: 0:01:14:16.524>
2015-12-17T20:34:03.603Z cpu3:47209)0x412461a5dc10:[0x41802128d249]PanicvPanicInt@vmkernel#nover+0x575 stack: 0x726f632000000008
2015-12-17T20:34:03.603Z cpu3:47209)0x412461a5dc70:[0x41802128d48d]Panic_NoSave@vmkernel#nover+0x49 stack: 0x412461a5dcd0
2015-12-17T20:34:03.604Z cpu3:47209)0x412461a5dd60:[0x41802157a63b]CrashMeCurrentCore@vmkernel#nover+0x553 stack: 0x100000278
2015-12-17T20:34:03.604Z cpu3:47209)0x412461a5dda0:[0x41802157a8ca]CrashMe_VsiCommandSet@vmkernel#nover+0x13e stack: 0x0
2015-12-17T20:34:03.604Z cpu3:47209)0x412461a5de30:[0x41802160c3c7]VSI_SetInfo@vmkernel#nover+0x2fb stack: 0x41109d630330
2015-12-17T20:34:03.604Z cpu3:47209)0x412461a5dec0:[0x4180217bd7a7]UWVMKSyscallUnpackVSI_Set@<none>#<none>+0xef stack: 0x412461a67000
2015-12-17T20:34:03.604Z cpu3:47209)0x412461a5df00:[0x418021783a47]User_UWVMKSyscallHandler@<none>#<none>+0x243 stack: 0x412461a5df20
2015-12-17T20:34:03.604Z cpu3:47209)0x412461a5df10:[0x4180212aa90d]User_UWVMKSyscallHandler@vmkernel#nover+0x1d stack: 0xffbc0bb8
2015-12-17T20:34:03.604Z cpu3:47209)0x412461a5df20:[0x4180212f2064]gate_entry@vmkernel#nover+0x64 stack: 0x0
Usually when we experience PSOD, we reboot the host (which is a must) and then gather the logs and upload it to VMware support for analysis (where I spend a good amount of time going through it)
Why not take a look at the dumps by yourself?
Step 1:
I am going to simulate a PSOD on my ESXi host. You need to be logged into the host's SSH. The command is
# vsish -e set /reliability/crashMe/Panic 1
And when you open a DCUI to the ESXi host, you can see the PSOD
Step 2:
Sometimes, we might miss out on the screenshot of PSOD. Well that's alright! If we have core-dump configured for the ESXi, we can extract the dump files to gather the crash logs.
Reboot the host, if it is in the PSOD screen. Once the host is back up, login to the SSH/Putty of the host and go to the core directory. The core directory is the location where your PSOD logging go to.
# cd var/core
Then list out the files here:
# ls -lh
Step 3:
How do we extract it?
Well, we have a nice extract script that does all the job, " vmkdump_extract ". This command must be executed against the zdump.1 file, which looks something like this:
# vmkdump_extract vmkernel-zdump.1
It creates four files:
a) vmkernel-log.1
b) vmkernel-core.1
c) visorFS.tar
d) vmkernel-pci
All we require for analysis is the vmkernel-log.1 file
Step 4:
Open the vmkernel-log.1 file using the below command:
# less vmkernel-log.1
Skip to the end of the file by pressing Shift+G. Now let's slowly go to the top by pressing PageUp.
You will come across a line that says @BlueScreen: <event>
In my case, the dumps were:
2015-12-17T20:34:03.603Z cpu3:47209)@BlueScreen: CrashMe
2015-12-17T20:34:03.603Z cpu3:47209)Code start: 0x418021200000 VMK uptime: 0:01:14:16.524>
2015-12-17T20:34:03.603Z cpu3:47209)0x412461a5dc10:[0x41802128d249]PanicvPanicInt@vmkernel#nover+0x575 stack: 0x726f632000000008
2015-12-17T20:34:03.603Z cpu3:47209)0x412461a5dc70:[0x41802128d48d]Panic_NoSave@vmkernel#nover+0x49 stack: 0x412461a5dcd0
2015-12-17T20:34:03.604Z cpu3:47209)0x412461a5dd60:[0x41802157a63b]CrashMeCurrentCore@vmkernel#nover+0x553 stack: 0x100000278
2015-12-17T20:34:03.604Z cpu3:47209)0x412461a5dda0:[0x41802157a8ca]CrashMe_VsiCommandSet@vmkernel#nover+0x13e stack: 0x0
2015-12-17T20:34:03.604Z cpu3:47209)0x412461a5de30:[0x41802160c3c7]VSI_SetInfo@vmkernel#nover+0x2fb stack: 0x41109d630330
2015-12-17T20:34:03.604Z cpu3:47209)0x412461a5dec0:[0x4180217bd7a7]UWVMKSyscallUnpackVSI_Set@<none>#<none>+0xef stack: 0x412461a67000
2015-12-17T20:34:03.604Z cpu3:47209)0x412461a5df00:[0x418021783a47]User_UWVMKSyscallHandler@<none>#<none>+0x243 stack: 0x412461a5df20
2015-12-17T20:34:03.604Z cpu3:47209)0x412461a5df10:[0x4180212aa90d]User_UWVMKSyscallHandler@vmkernel#nover+0x1d stack: 0xffbc0bb8
2015-12-17T20:34:03.604Z cpu3:47209)0x412461a5df20:[0x4180212f2064]gate_entry@vmkernel#nover+0x64 stack: 0x0
- The first line @BlueScreen: Tells the crash exception like Exception 13/14, in my case it is CrashMe which is for a manual crash.
- The VMKuptime tells the Kernel up-time before the crash.
- The logging after that is the information that we need to be looking for, the cause as to why the crash occurred.
Now, here the crash dump varies for every crash. These issues can range from hardware errors / driver issues / issues with ESXi build and a lot more.
Each dump analysis would be different. But the basic is the same.
Each dump analysis would be different. But the basic is the same.
So, you can try analyzing the dumps by yourself. However, if you are entitled to VMware support, I will do the job for you.
Cheers!