Hello all,
Having a problem with the data drive on a Windows 2019 system. It will work perfectly fine for 30 minutes or even up to 4 hours, then I will begin to have vioscsi warnings, then the drive will just lock up completely, requiring a reboot to come back online.
The system is a supermicro board with an integrated LSI3008 (used for HDDs) - which has been flashed to IT firmware, it also has an onboard C622 controller which is used in ACHI mode for an SSD array.
The VM in question has 2 disks, one on a ZFS Raid10 (SSD) for boot, and the data drive is also a ZFS Raid10 (HDD).
The disk on the SSD array does not seem to have any issues, whereas all the issues are with the large disk on the HDD array.
The initial entry in event viewer is always:
If I leave the system alone, windows event viewer also presents a couple other entries:
The problem was really bad with latest virtio drivers, so I downgraded to version 204, as per: https://github.com/virtio-win/kvm-guest-drivers-windows/issues/623
After the downgrade, the problem did not immediately return in the 30-45 minute window, but did after 2 hours.
Hopefully this is useful data:
Package Versions:
I may try version 215 of the virtio drivers, as per here: https://github.com/virtio-win/kvm-guest-drivers-windows/issues/756
Interestingly (or not) "df -h" shows a different capacity than on "zfs list" or "zpool list" - whereas the capacities for rpool match.
I admit I'm not well-versed enough to know if this means anything or not, but it did make me wonder.
I've seen a couple posts about downgrading the proxmox kernel, but I'm not sure how that will ultimately help?
But this is quite frustrating. I'm really at a loss, not sure where to go from here.
Thank you for any and all help.
Having a problem with the data drive on a Windows 2019 system. It will work perfectly fine for 30 minutes or even up to 4 hours, then I will begin to have vioscsi warnings, then the drive will just lock up completely, requiring a reboot to come back online.
The system is a supermicro board with an integrated LSI3008 (used for HDDs) - which has been flashed to IT firmware, it also has an onboard C622 controller which is used in ACHI mode for an SSD array.
The VM in question has 2 disks, one on a ZFS Raid10 (SSD) for boot, and the data drive is also a ZFS Raid10 (HDD).
The disk on the SSD array does not seem to have any issues, whereas all the issues are with the large disk on the HDD array.
The initial entry in event viewer is always:
Code:
Reset to device, \Device\RaidPort1, was issued.
If I leave the system alone, windows event viewer also presents a couple other entries:
Code:
The IO operation at logical block address 0x8e4fc7618 for Disk 1 (PDO name: \Device\0000002d) was retried.
The system failed to flush data to the transaction log. Corruption may occur in VolumeId: D:, DeviceName: \Device\HarddiskVolume6.
(The I/O device reported an I/O error.)
The problem was really bad with latest virtio drivers, so I downgraded to version 204, as per: https://github.com/virtio-win/kvm-guest-drivers-windows/issues/623
After the downgrade, the problem did not immediately return in the 30-45 minute window, but did after 2 hours.
Hopefully this is useful data:
Code:
root@pve:~# zfs list
NAME USED AVAIL REFER MOUNTPOINT
RECpool-zfs 29.6T 19.4T 96K /RECpool-zfs
RECpool-zfs/vm-101-disk-0 29.6T 19.4T 29.6T -
Tnvme-zfs 1.32M 231G 96K /Tnvme-zfs
rpool 60.4G 397G 104K /rpool
rpool/ROOT 6.99G 397G 96K /rpool/ROOT
rpool/ROOT/pve-1 6.99G 397G 6.99G /
rpool/data 53.4G 397G 96K /rpool/data
rpool/data/vm-101-disk-0 152K 397G 152K -
rpool/data/vm-101-disk-1 53.4G 397G 53.4G -
root@pve:~# zpool list
NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
RECpool-zfs 49.1T 29.6T 19.5T - - 0% 60% 1.00x ONLINE -
Tnvme-zfs 238G 1.32M 238G - - 0% 0% 1.00x ONLINE -
rpool 472G 60.5G 412G - - 7% 12% 1.00x ONLINE -
root@pve:~# df -h
Filesystem Size Used Avail Use% Mounted on
udev 32G 0 32G 0% /dev
tmpfs 6.3G 1.6M 6.3G 1% /run
rpool/ROOT/pve-1 404G 7.0G 397G 2% /
tmpfs 32G 49M 32G 1% /dev/shm
tmpfs 5.0M 0 5.0M 0% /run/lock
Tnvme-zfs 231G 128K 231G 1% /Tnvme-zfs
rpool 397G 128K 397G 1% /rpool
rpool/ROOT 397G 128K 397G 1% /rpool/ROOT
rpool/data 397G 128K 397G 1% /rpool/data
RECpool-zfs 20T 128K 20T 1% /RECpool-zfs
/dev/fuse 128M 16K 128M 1% /etc/pve
tmpfs 6.3G 0 6.3G 0% /run/user/0
root@pve:~# qm show 101
/usr/bin/kvm -id 101 -name 'ACS,debug-threads=on' -no-shutdown -chardev 'socket,id=qmp,path=/var/run/qemu-server/101.qmp,server=on,wait=off' -mon 'chardev=qmp,mode=control' -chardev 'socket,id=qmp-event,path=/var/run/qmeventd.sock,reconnect=5' -mon 'chardev=qmp-event,mode=control' -pidfile /var/run/qemu-server/101.pid -daemonize -smbios 'type=1,uuid=c94c05e9-315b-427f-8913-b8d29c474d70' -drive 'if=pflash,unit=0,format=raw,readonly=on,file=/usr/share/pve-edk2-firmware//OVMF_CODE_4M.fd' -drive 'if=pflash,unit=1,id=drive-efidisk0,format=raw,file=/dev/zvol/rpool/data/vm-101-disk-0,size=540672' -smp '16,sockets=1,cores=16,maxcpus=16' -nodefaults -boot 'menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg' -vnc 'unix:/var/run/qemu-server/101.vnc,password=on' -no-hpet -cpu 'kvm64,+aes,enforce,hv_ipi,hv_relaxed,hv_reset,hv_runtime,hv_spinlocks=0x1fff,hv_stimer,hv_synic,hv_time,hv_vapic,hv_vpindex,+kvm_pv_eoi,+kvm_pv_unhalt,+lahf_lm,+sep' -m 16384 -object 'iothread,id=iothread-virtioscsi0' -object 'iothread,id=iothread-virtioscsi1' -device 'pci-bridge,id=pci.1,chassis_nr=1,bus=pci.0,addr=0x1e' -device 'pci-bridge,id=pci.2,chassis_nr=2,bus=pci.0,addr=0x1f' -device 'pci-bridge,id=pci.3,chassis_nr=3,bus=pci.0,addr=0x5' -device 'vmgenid,guid=42e96f7f-d469-4ac4-a392-662c8ff902fb' -device 'piix3-usb-uhci,id=uhci,bus=pci.0,addr=0x1.0x2' -device 'usb-tablet,id=tablet,bus=uhci.0,port=1' -device 'vfio-pci,host=0000:67:00.0,id=hostpci0,bus=pci.0,addr=0x10' -device 'VGA,id=vga,bus=pci.0,addr=0x2' -chardev 'socket,path=/var/run/qemu-server/101.qga,server=on,wait=off,id=qga0' -device 'virtio-serial,id=qga0,bus=pci.0,addr=0x8' -device 'virtserialport,chardev=qga0,name=org.qemu.guest_agent.0' -iscsi 'initiator-name=iqn.1993-08.org.debian:01:ef828b78221' -device 'virtio-scsi-pci,id=virtioscsi0,bus=pci.3,addr=0x1,iothread=iothread-virtioscsi0' -drive 'file=/dev/zvol/rpool/data/vm-101-disk-1,if=none,id=drive-scsi0,discard=on,format=raw,cache=none,aio=io_uring,detect-zeroes=unmap' -device 'scsi-hd,bus=virtioscsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0,rotation_rate=1,bootindex=100' -device 'virtio-scsi-pci,id=virtioscsi1,bus=pci.3,addr=0x2,iothread=iothread-virtioscsi1' -drive 'file=/dev/zvol/RECpool-zfs/vm-101-disk-0,if=none,id=drive-scsi1,format=raw,cache=none,aio=io_uring,detect-zeroes=on' -device 'scsi-hd,bus=virtioscsi1.0,channel=0,scsi-id=0,lun=1,drive=drive-scsi1,id=scsi1' -netdev 'type=tap,id=net0,ifname=tap101i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on' -device 'virtio-net-pci,mac=B2:B9:CB:0F:BF:9E,netdev=net0,bus=pci.0,addr=0x12,id=net0,rx_queue_size=1024,tx_queue_size=1024,bootindex=101' -rtc 'driftfix=slew,base=localtime' -machine 'type=pc-i440fx-7.1+pve0' -global 'kvm-pit.lost_tick_policy=discard'
Package Versions:
Code:
proxmox-ve: 7.3-1 (running kernel: 5.15.83-1-pve)
pve-manager: 7.3-3 (running version: 7.3-3/c3928077)
pve-kernel-helper: 7.3-2
pve-kernel-5.15: 7.3-1
pve-kernel-5.15.83-1-pve: 5.15.83-1
pve-kernel-5.15.74-1-pve: 5.15.74-1
ceph-fuse: 15.2.17-pve1
corosync: 3.1.7-pve1
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.24-pve2
libproxmox-acme-perl: 1.4.2
libproxmox-backup-qemu0: 1.3.1-1
libpve-access-control: 7.2-5
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.3-1
libpve-guest-common-perl: 4.2-3
libpve-http-server-perl: 4.1-5
libpve-storage-perl: 7.3-1
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 5.0.0-3
lxcfs: 4.0.12-pve1
novnc-pve: 1.3.0-3
proxmox-backup-client: 2.3.1-1
proxmox-backup-file-restore: 2.3.1-1
proxmox-mini-journalreader: 1.3-1
proxmox-widget-toolkit: 3.5.3
pve-cluster: 7.3-1
pve-container: 4.4-2
pve-docs: 7.3-1
pve-edk2-firmware: 3.20220526-1
pve-firewall: 4.2-7
pve-firmware: 3.6-2
pve-ha-manager: 3.5.1
pve-i18n: 2.8-1
pve-qemu-kvm: 7.1.0-4
pve-xtermjs: 4.16.0-1
qemu-server: 7.3-2
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.8.0~bpo11+2
vncterm: 1.7-1
zfsutils-linux: 2.1.7-pve3
I may try version 215 of the virtio drivers, as per here: https://github.com/virtio-win/kvm-guest-drivers-windows/issues/756
Interestingly (or not) "df -h" shows a different capacity than on "zfs list" or "zpool list" - whereas the capacities for rpool match.
I admit I'm not well-versed enough to know if this means anything or not, but it did make me wonder.
I've seen a couple posts about downgrading the proxmox kernel, but I'm not sure how that will ultimately help?
But this is quite frustrating. I'm really at a loss, not sure where to go from here.
Thank you for any and all help.