VM locked on IO-error status (after disconnect of SCSI mapped USB drive)

fmaione

Member
Mar 24, 2022
21
15
8
Hello, I'm testing VM and filesystem resilience upon USB disconnection and/or USB controller errors that cause disconnection. I know that is not advisable but I'm trying to check if a ZFS snapshotting strategy can protect drive data also when drive gets disconnected.

Testing a disconnection I got the VM completely unresponsive, so how to recover from VM status: IO-error since VM does not responds neither via ssh neither on noVNC console?

The whole story
I have a debian 12 VM in a test environment with 2 USB drives attached to it with:
Code:
qm set 400 -scsi1 /dev/disk/by-id/usb-Go-Infin_ity_1F111111835E-0:0,cache=writeback,backup=0,replicate=0,iothread=1
qm set 400 -scsi2 /dev/disk/by-id/usb-SanDisk_SDSSDA240G_12345678937F-0:0,cache=writeback,backup=0,replicate=0,iothread=1
In the VM I created a pool on each drive and I had set up zrepl to make frequent snapshots of the main pool and replication on the secondary pool.
To test if this setup was resilient and most data protective, I intentionally disconnected main USB drive from proxmox host and I got stuck with a VM that I had to quit via QEMU monitor. How can I avoid this? How can I keep VM responsive also if USB disconnects? Using USB emulation is impossible since it gives errors really soon.
 
After doing some more researches I discovered that:
  • this is a QEMU/KVM normal/intended behavior, not something related to Proxmox
  • don't know if you can resume after this kind of stop (didn't find something about it and too little knowledge / time available to search on QEMU sources to understand); maybe stopping is intended only to prevent further damage and/or some kind of debugging.
But I found that in my scenario (a VM running a bunch of services which I don't want to freeze only because one hdd used for backup purposes stops working), attaching the drives as did above I can use parameters rerror=report and werror=report to avoid VM freeze and report the I/O error to VM OS.

This must be done when attaching drives via qm set or later manually editing VM .conf ( see: https://pve.proxmox.com/wiki/Manual:_qm.conf ), since it's not available via GUI.

A further note to the Proxmox staff: if you agree maybe you can consider updating wiki page related to device passthrough inserting a note related to rerror=report and werror=report options, given that it's the guest OS that handles the device.
 
Last edited:
  • Like
Reactions: leesteken
A further note to the Proxmox staff: if you agree maybe you can consider updating wiki page related to device passthrough inserting a note related to rerror=report and werror=report options, given that it's the guest OS that handles the device.
That page topic is concerning "Passthrough Physical Disk to Virtual Machine (VM)". Your rerror=report and werror=report options, are not passthrough-centric, but apply (I believe) to any VM disk - passthrough(ed) or not.
 
Ok, I see your point but still believe that a reference to error reporting would be quite useful.

Anyway, doing my experiments, I managed to recover (to say so) a VM with such parameters (rerror=report and werror=report) via these steps:
  1. cause the error (e.g. USB disconnection)
  2. see the error in VM (e.g. via journalctl)
  3. hibernate the VM
  4. make sure the PVE host have access to disconnected drive (e.g. reconnect it, reboot the PVE host)
  5. resume the VM
You must hibernate (not suspend). I think it's because suspending will not quit kvm process and does not close the file handle which holds disconnected drive.
Using a ZFS in the guest got me covered and I didn't got any data loss, except for transfers during disconnection; anyway the guest device integrity was preserved.

Follow at your risk. I've done with disposable drives/data
 
Last edited:
  • Like
Reactions: gfngfn256

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!