Nested ESXi Virtualisation

TecScott

Active Member
Mar 30, 2017
25
0
41
33
I've seen articles/posts regarding nested ESXi virtualisation but I seem to have an issue with purple screens when writing data to a second hard drive.

If I run the ESXi host on its own it runs without any issues, however when copying data to the nested host it'll randomly purple screen referencing a PCPU lock.

PCPU 1 locked up. Failed to ack TLB invalidate (at least 1 locked up, PCPU(s): 1).
PCPU(s) did not respond to NMI. Possible hardware problem; contact hardware vendor.

The local debugger after PSOD shows scsi aborts:

scsiTaskMgmtCommand:VMK Task: ABORT sn=0x8bddb initiator=0x43024ff900
ahciAbortIO: (curr) HWQD: 4 BusyL: 0 PioL: 0
scsiTaskMgmtCommand:VMK Task VIRT_RESET initiator=0x43024ff900
ahciAbortIO: (curr) HWQD: 4 BusyL: 0 PioL: 0
'Shared': HB at offset 3866624 - Waiting for timed out HB:
[HB state abcdef02 offset 3866624 gen 103 stampUS....
nmp_ThrottleLogForDevice:3863: Cmd 0x2a (0x459a4259a0c0, 2097165) to dev "t10.ATA___QEMU_HARDDISK__________QM00015_________" on path "vmhba1:C0:T1:L0" Failed:
nmp_ThrottleLogForDevice:3872: H:0x5 D:0x22 P:0x0 Invalid sense data: 0x0 0x0 0x0. Act:EVAL. cmdId.initiator=0x403024ff900 CmdSN 0x8bddb
WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237: NMP device "t10.ATA___QEMU_HARDDISK_______QM00015_______" state in doubt; requested fast path state update...
.... CmdSN 0x8bddb from world 2097165 to dev "t10.ATA____QEMU_HARDDISK________QM00015______" failed
After which the PCPU's don't perform a heartbeat and results in the VM crashing.

Anyone encountered a similar issue with nested ESXi?

CPU is set to host, Machine set to q35, SCSI Controller I've tried both VMware PVSCSI and LSI 53C895A, SeaBIOS, and OS type set to Linux 5.x - 2.6 Kernel
 
I run the nested ESXi with vHDD attached to SATA, not SCSI. Have you tried that?
Thanks - it is attached via SATA unfortunately - when attached via SCSI the disk doesn't appear at all on ESXi

Current settings:
4GB, 4 core, host CPU, SeaBIOS, i440fx (tried q35), VMware PVSCSI SCSI Controller (tried default), SATA disk (tried scsi), vmxnet3 NIC's, Other OS Type (tried Linux 5.x - 2.6 Kernel).

The server will run fine idle, but when copying data to it, it'll hit the issue and PSOD. I've tried rebuilding and a new VMFS datastore too which made no difference.
 
What about the CPU settings? I use this and it works for some time. A puple screen every month or so, but no disk I/O issues:

1674400895728.png
 
I do have more or less the smae config, but I do get the PSOD regularly every day. It looks like it happens mostly when the systems are idle enough.
Is there anything I can test to figure out what's causing it?
1681981190708.png

Tried it with multiple hardware and systems supported in the compatiblity matrix. But it's always the same.
This is my hardware config:

1681981296392.png
 
Last edited:
Did you try using UEFI bios on the VM?

Also, how much ram/cpu cores does the host have? what else is running on the host?
 
Did you try using UEFI bios on the VM?

Also, how much ram/cpu cores does the host have? what else is running on the host?
No, didn't try UEFI, that's definitely something I can try, but to be honest, I doubt, that this will change anything. Still worth a try.
Concerning RAM/CPU cores. I've tried up to 24 cores and 64GB Mem but nothing changed the behavior.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!