Reset to device, \Device\RaidPort2, was issued still with 0.1.271 drivers

bazhuge

New Member
Apr 17, 2025
6
1
3
Hi,
I have a Windows Server 2022 guest, running on Proxmox 8.3.5. When the guest os starts to perform some high storage I/O, data are corrupted and window logs warning as "Reset to device, \Device\RaidPort2, was issued." Hist storage I/O is basically data received from remote hosts, because this is a backup offsite server, so it is "high" but it is not a database server or other application server running heavy local tasks.

I found a lot of threads about this kind of problem with previous drivers, but it may be solved with new 0.1.271.1 version that I've installed yesterday. Still I have same problem, and the only way to run the server is to switch every disk to SATA.

Any idea? This makes simple not available Proxmox as platform for Windows guests so is really strange IMHO :(

physical storage is based on a pool of local disks and a pool of local SSD, with ZFS volume. VM disks (raw format) have same problem from both pools.

thanks
 
Hi, as @_gabriel mentions, some virtio-win versions were affected by an issue that resulted in "Reset to device, [...] was issued" warnings, but this issue appears to be fixed in 0.1.266. See this thread [1] for more information. There were several reproducers that seemed to trigger the issue quite reliably under affected versions (<0.1.266).

I just tried one such reproducer [2] with virtio-win 0.1.271 and didn't encounter any issues or warnings. Can you double-check that you have 0.1.271 (or, alternatively, 0.1.266) installed? Please post the version displayed in the Device Manager -> Storage Controllers -> Red Hat VirtIO SCSI pass-through controller under Details -> Driver version.

[1] https://forum.proxmox.com/threads/r...device-system-unresponsive.139160/post-726762
[2] https://github.com/virtio-win/kvm-guest-drivers-windows/issues/756#issuecomment-2012089151
 
OK, the 27100 indicates that it is indeed using the latest driver.
Some other versions to try might be 0.1.266 or 0.1.208 (that one didn't seem to be affected by the issue I mentioned).

The device resets may also be caused by performance issues of the underlying storage. If I understand correctly, you see this in two different ZFS pools? Can you post the output of
Code:
pveversion -v
zpool status
and also the pressure metrics [1] while the VM is running on VirtIO SCSI and the device resets start appearing?
Code:
grep -r '' /proc/pressure/

[1] https://facebookmicrosites.github.io/psi/docs/overview
 
Bash:
pveversion -v
proxmox-ve: 8.4.0 (running kernel: 6.8.12-5-pve)
pve-manager: 8.4.1 (running version: 8.4.1/2a5fa54a8503f96d)
proxmox-kernel-helper: 8.1.1
proxmox-kernel-6.8: 6.8.12-9
proxmox-kernel-6.8.12-9-pve-signed: 6.8.12-9
proxmox-kernel-6.8.12-8-pve-signed: 6.8.12-8
proxmox-kernel-6.8.12-5-pve-signed: 6.8.12-5
proxmox-kernel-6.8.12-4-pve-signed: 6.8.12-4
ceph-fuse: 17.2.7-pve3
corosync: 3.1.9-pve1
criu: 3.17.1-2+deb12u1
glusterfs-client: 10.3-5
ifupdown2: 3.2.0-1+pmx11
ksm-control-daemon: 1.5-1
libjs-extjs: 7.0.0-5
libknet1: 1.30-pve2
libproxmox-acme-perl: 1.6.0
libproxmox-backup-qemu0: 1.5.1
libproxmox-rs-perl: 0.3.5
libpve-access-control: 8.2.2
libpve-apiclient-perl: 3.3.2
libpve-cluster-api-perl: 8.1.0
libpve-cluster-perl: 8.1.0
libpve-common-perl: 8.3.1
libpve-guest-common-perl: 5.2.2
libpve-http-server-perl: 5.2.2
libpve-network-perl: 0.11.2
libpve-rs-perl: 0.9.4
libpve-storage-perl: 8.3.6
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 6.0.0-1
lxcfs: 6.0.0-pve2
novnc-pve: 1.6.0-2
proxmox-backup-client: 3.4.0-1
proxmox-backup-file-restore: 3.4.0-1
proxmox-firewall: 0.7.1
proxmox-kernel-helper: 8.1.1
proxmox-mail-forward: 0.3.2
proxmox-mini-journalreader: 1.4.0
proxmox-offline-mirror-helper: 0.6.7
proxmox-widget-toolkit: 4.3.10
pve-cluster: 8.1.0
pve-container: 5.2.6
pve-docs: 8.4.0
pve-edk2-firmware: 4.2025.02-3
pve-esxi-import-tools: 0.7.3
pve-firewall: 5.1.1
pve-firmware: 3.15-3
pve-ha-manager: 4.0.7
pve-i18n: 3.4.2
pve-qemu-kvm: 9.2.0-5
pve-xtermjs: 5.5.0-2
qemu-server: 8.3.12
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.2.7-pve2



root@vs03:~# zpool status
  pool: local-hdd
 state: ONLINE
  scan: scrub repaired 0B in 16:25:33 with 0 errors on Sun Apr 13 16:49:34 2025
config:

        NAME                                   STATE     READ WRITE CKSUM
        local-hdd                              ONLINE       0     0     0
          raidz2-0                             ONLINE       0     0     0
            ata-ST22000NM002E-3HL113_ZX2898GD  ONLINE       0     0     0
            ata-ST22000NM002E-3HL113_ZX2898H4  ONLINE       0     0     0
            ata-ST22000NM002E-3HL113_ZX289QE8  ONLINE       0     0     0
            ata-ST22000NM002E-3HL113_ZX2898HW  ONLINE       0     0     0
            ata-ST22000NM002E-3HL113_ZX289QAK  ONLINE       0     0     0
            ata-ST22000NM002E-3HL113_ZX289QCX  ONLINE       0     0     0
            ata-ST22000NM002E-3HL113_ZX289PVR  ONLINE       0     0     0
            ata-ST22000NM002E-3HL113_ZX2896JH  ONLINE       0     0     0

errors: No known data errors

  pool: rpool
 state: ONLINE
  scan: scrub repaired 0B in 00:06:10 with 0 errors on Sun Apr 13 00:30:13 2025
config:

        NAME           STATE     READ WRITE CKSUM
        rpool          ONLINE       0     0     0
          mirror-0     ONLINE       0     0     0
            nvme1n1p3  ONLINE       0     0     0
            nvme0n1p3  ONLINE       0     0     0

Basically I have OS disk on rpool (SSDs) and data disk on local-hdd pool (HDDs).

performing "grep -r '' /proc/pressure/" during the problem is quite hard, because my pressure is during night, when remote hosts are uploading backups data.

@_gabriel can help me to better understand the point? I have a SCSI controller of type "VirtIO SCSI single", then disks are attached to that controller using a "SCSI bus".

thanks
 
my requirement is not to be quick. It is to write data without loosing it, and if storage is under stress, to have low performance but not data corruptions.

  1. rpool is a mirror-0 over two 2TB SSD NVME
  2. local-hdd is a mirror2, with 2 parity disks, over 8 x 22TB SATA HDD

Basically I have OS windows volume over rpool, and data volume over local-hdd. My performance requirement is to write to disk data I receive from remote hosts. Connection is with a single 1 GBs nic, so my flow cannot be too high.
I have also some other linux servers without problems.

host has 128GB RAM, with 70% free

Proxmox graphs report max server load of 35

What else can help?