Pause virtual machine od I/O error

I think that such great and hidden feature should be enabled by default (if it's safe) with possibility to disable it per vm. I don't see any need to be able to disable it per disk.
 
Patch for QemuServer.pm
View attachment QemuServer.pm.diff.txt

Patch for qm executable
View attachment qm.diff.txt

I've tested it and it works, disconnecting my iscsi storage and waiting until I/O errors occur makes my kvm guest paused, resuming it after fixing storage makes it work again without data loss, I only got warnings about tasks hanging too long but thats due iscsi timeouts and I think that it can be safely ignored.
Patching pve-manager was actually pretty simple (despite me being unable to read perl), the real problem is interaction between iscsi, lvm and device-mapper. If I restart open-iscsi I got very often my iscsi disk renamed from sda to sdb, device mapper keeps /dev/dm-<n> mapped to sda major and minor numbers so even that my iscsi disk is accessible using /dev/sdb I can't read any lvm device.

1. I got everything running ok
Code:
scsi 7:0:0:100: Direct-Access     IET      VIRTUAL-DISK     0    PQ: 0 ANSI: 4
sd 7:0:0:100: Attached scsi generic sg0 type 0
sd 7:0:0:100: [sda] Very big device. Trying to use READ CAPACITY(16).
sd 7:0:0:100: [sda] 11213398016 512-byte logical blocks: (5.74 TB/5.22 TiB)

Code:
pve220:~/org# dmsetup deps 
iscsivg-vm--147--disk--1: 1 dependencies        : (8, 0)
iscsivg-vm--132--disk--1: 1 dependencies        : (8, 0)

2. I restart open-iscsi

3. iscsi disc is renamed from sda to sdb:
Code:
scsi 8:0:0:100: Direct-Access     IET      VIRTUAL-DISK     0    PQ: 0 ANSI: 4
sd 8:0:0:100: Attached scsi generic sg0 type 0
sd 8:0:0:100: [sdb] Very big device. Trying to use READ CAPACITY(16).
sd 8:0:0:100: [sdb] 11213398016 512-byte logical blocks: (5.74 TB/5.22 TiB)

Note that this also happened to me when running open-iscsi daemon reconnected session because of some connection problem:
Code:
sd 2:0:0:100: [sdb] Very big device. Trying to use READ CAPACITY(16).
 connection1:0: detected conn error (1020)
iscsi: registered transport (iser)
scsi3 : iSCSI Initiator over TCP/IP
sd 3:0:0:100: [sdc] Very big device. Trying to use READ CAPACITY(16).
sd 3:0:0:100: [sdc] 11213398016 512-byte logical blocks: (5.74 TB/5.22 TiB)

4. device mapper deps are broken because sda i 8,0 and sdb is 8, 16
Code:
pve220:~/org# parted /dev/iscsivg/vm-139-disk-1
GNU Parted 1.8.8
Using /dev/mapper/iscsivg-vm--139--disk--1
Welcome to GNU Parted! Type 'help' to view a list of commands.
(parted) print                                                            
Error: /dev/mapper/iscsivg-vm--139--disk--1: unrecognised disk label      
(parted) quit

I'm not really sure how to fix this.
 
also
Code:
qm list
shows kvm guest as running if it is paused, it does not check it's status using kvm monitor?
 
P.S. this is very simple patch that adds werror and rerror options to drive (just list cache or snapshot), I will post more complete version that makes werror=stop default once I test it all.
 
Regarding iscsi disk getting renamed from sda to sdb on iscsi restart or reconnect, I can fix that by hand by doing:

Code:
# stop iscsi
/etc/init.d/open-iscsi stop
# remove all device mapper devices
dmsetup remove_all
# start iscsi
/etc/init.d/open-iscsi start
# rescan lvm using pvesm
pvesm list --all

But this fixes it once it happend, I didn't find any way to prevent this from happening.

P.S. It only works if no kvm guests are running because running guest will held dm device open and I won't be able to remove it using dmsetup.
 
Using multipath seems to fix iscsi device renumbering, device mapper maps to multipath device and multipath daemon can handle device name (and so minor/major number) changes. But I'll need to test it more tomorrow.
 
P.S. this is very simple patch that adds werror and rerror options to drive (just list cache or snapshot), I will post more complete version that makes werror=stop default once I test it all.

Well, I guess there is a reason why this is not the default in kvm?
 
also
Code:
qm list
shows kvm guest as running if it is paused, it does not check it's status using kvm monitor?

Yes, because I don't want to use the monitor to query the status (seems clumsy and slow). But I found not better way to do it?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!