[SOLVED] Host freeze input/output error

jambove · Feb 17, 2021

I have installed proxmox 6.3 INTEL NUC10I7FNH (64GB RAM, 1 TB SSD) a week ago and added a Debian VM and it has been working fine. 3 days ago I added another Debian VM and then a Windows 10 VM. Ever since the guests have been halting and the host has been throwing Input/Output error to every single command until it freezes completely.

I suspected the only SSD (Kingston) attached, so ran badblocks test, smartctl test, dd read test, and even a memtest. The host passed multiple times. I managed to grab a dmesg log that I am attaching now.

Any help is appreciated

EDIT: I monitored the host with htop, and cpu usage was above 100% many times by the process kvm running the Win10 VM

Stefan_R · Feb 22, 2021

The line nvme nvme0: I/O 503 QID 5 timeout, aborting would clearly indicate to me that the NVMe SSD is experiencing some trouble. badblocks and co are not perfect representations of a VM workload, so they might not trigger the fault, and the kernel logs leave very little room for interpretation.

Maybe try a different slot, or update your BIOS, or even the drive's firmware. Also check for any misconfigured PCIe settings in the BIOS setup. Otherwise I'd say faulty hardware.

jambove said:
EDIT: I monitored the host with htop, and cpu usage was above 100% many times by the process kvm running the Win10 VM

The percentage is a total over all assigned cores, so if your VM is configured with 4 cores, then the theoretical maximum usage stat would be 400%. As to why it spikes up, Windows often does updates or Windows Defender scans in the background, which may use quite a bit of CPU.

jambove · Mar 5, 2021

Stefan_R said:
The line nvme nvme0: I/O 503 QID 5 timeout, aborting would clearly indicate to me that the NVMe SSD is experiencing some trouble. badblocks and co are not perfect representations of a VM workload, so they might not trigger the fault, and the kernel logs leave very little room for interpretation.

Maybe try a different slot, or update your BIOS, or even the drive's firmware. Also check for any misconfigured PCIe settings in the BIOS setup. Otherwise I'd say faulty hardware.

The percentage is a total over all assigned cores, so if your VM is configured with 4 cores, then the theoretical maximum usage stat would be 400%. As to why it spikes up, Windows often does updates or Windows Defender scans in the background, which may use quite a bit of CPU.

I appreciate the reply Stefan. It turned out to be a faulty SSD - so hardware problem. Got a new SSD and all is good for now

Search

Search

[SOLVED] Host freeze input/output error

jambove

New Member

Attachments

Stefan_R

Proxmox Retired Staff

jambove

New Member

We value your privacy