Random IO Error - Windows Server 2025

dcuadrados · Nov 12, 2025

Hello everyone,

I'm experiencing a random "IO Error" that causes my two Windows Server 2025 Data Center VMs to randomly halt (yellow triangle in Proxmox). A reset/reboot resolves the issue temporarily.

My environment details are below. I suspect a potential conflict with my configuration, possibly related to I/O or the high RAM usage.

Node and Storage

Node: Proxmox VE 9.0.11 on Linux 6.14 kernel.
CPU: Intel Xeon E-2288G (16C).
RAM Usage: High (approx. 89% of 31 GiB).
Storage: ZFS pool built on two 960 GB Samsung NVMe SSDs (S.M.A.R.T. OK, low wearout).
Repo Status: Non production-ready repository enabled.

Windows VM Configuration

Both Windows Server 2025 VMs use the following critical settings:

Setting Value
SCSI Controller VirtIO SCSI single
Disk Image RAW format on ZFS
I/O Settings aio=io_uring, cache=writeback, discard=on, iothread=1, ssd=1
Memory 13 GiB and 6 GiB respectively
Processors Host CPU type
BIOS OVMF (UEFI)

Has anyone encountered this specific IO Error with this configuration (especially VirtIO/ZFS/IO_URING) on recent Proxmox versions?

My apologies, I accidentally posted this twice

fiona · Nov 13, 2025

Hi,
please check the host system logs/journal for any messages around the time of the issue. What does zpool status -v say?

What does the following output when the VM is in IO error state, replacing 123 with the actual ID:

Code:

echo '{"execute": "qmp_capabilities"}{"execute": "query-block"}' | socat - /run/qemu-server/123.qmp

?

dcuadrados · Nov 13, 2025

fiona said:
Hi,
please check the host system logs/journal for any messages around the time of the issue. What does zpool status -v say?

What does the following output when the VM is in IO error state, replacing 123 with the actual ID:

Code:

echo '{"execute": "qmp_capabilities"}{"execute": "query-block"}' | socat - /run/qemu-server/123.qmp

?

Right now there’s no IO Error; if it happens again, I’ll do it again. Anyway, I’ve already downgraded the driver to VirtIO version 0.1.271.

Code:

{"QMP": {"version": {"qemu": {"micro": 2, "minor": 1, "major": 10}, "package": "pve-qemu-kvm_10.1.2-1"}, "capabilities": []}}
{"return": {}}
{"return": [{"device": "", "locked": false, "removable": false, "inserted": {"iops_rd": 0, "detect_zeroes": "off", "active": true, "image": {"virtual-size": 3653632, "filename": "/usr/share/pve-edk2-firmware//OVMF_CODE_4M.secboot.fd", "format": "raw", "actual-size": 3653632, "dirty-flag": false}, "iops_wr": 0, "ro": true, "children": [{"node-name": "#block013", "child": "file"}], "node-name": "pflash0", "backing_file_depth": 0, "drv": "raw", "iops": 0, "bps_wr": 0, "write_threshold": 0, "encrypted": false, "bps": 0, "bps_rd": 0, "cache": {"no-flush": false, "direct": false, "writeback": true}, "file": "/usr/share/pve-edk2-firmware//OVMF_CODE_4M.secboot.fd"}, "qdev": "/machine/system.flash0", "type": "unknown"}, {"device": "", "locked": false, "removable": false, "inserted": {"iops_rd": 0, "detect_zeroes": "on", "active": true, "image": {"backing-image": {"virtual-size": 540672, "filename": "json:{\"driver\": \"raw\", \"size\": 540672, \"file\": {\"driver\": \"file\", \"filename\": \"/var/lib/vz/images/100102/vm-100102-disk-0.raw\"}}", "format": "raw", "actual-size": 664064, "dirty-flag": false}, "virtual-size": 540672, "filename": "json:{\"throttle-group\": \"throttle-drive-efidisk0\", \"driver\": \"throttle\", \"file\": {\"driver\": \"raw\", \"size\": 540672, \"file\": {\"driver\": \"file\", \"filename\": \"/var/lib/vz/images/100102/vm-100102-disk-0.raw\"}}}", "format": "throttle", "actual-size": 664064, "dirty-flag": false}, "iops_wr": 0, "ro": false, "children": [{"node-name": "f41fd0da37cb0538e56f7d2c231d098", "child": "file"}], "node-name": "drive-efidisk0", "backing_file_depth": 1, "drv": "throttle", "iops": 0, "bps_wr": 0, "write_threshold": 0, "encrypted": false, "bps": 0, "bps_rd": 0, "cache": {"no-flush": false, "direct": false, "writeback": true}, "file": "json:{\"throttle-group\": \"throttle-drive-efidisk0\", \"driver\": \"throttle\", \"file\": {\"driver\": \"raw\", \"size\": 540672, \"file\": {\"driver\": \"file\", \"filename\": \"/var/lib/vz/images/100102/vm-100102-disk-0.raw\"}}}"}, "qdev": "/machine/system.flash1", "type": "unknown"}, {"io-status": "ok", "device": "", "locked": false, "removable": true, "qdev": "ide2", "tray_open": true, "type": "unknown"}, {"io-status": "ok", "device": "", "locked": false, "removable": false, "inserted": {"iops_rd": 0, "detect_zeroes": "unmap", "active": true, "image": {"backing-image": {"virtual-size": 161061273600, "filename": "/var/lib/vz/images/100102/vm-100102-disk-1.raw", "format": "raw", "actual-size": 54190236160, "dirty-flag": false}, "virtual-size": 161061273600, "filename": "json:{\"throttle-group\": \"throttle-drive-scsi0\", \"driver\": \"throttle\", \"file\": {\"driver\": \"raw\", \"file\": {\"driver\": \"file\", \"filename\": \"/var/lib/vz/images/100102/vm-100102-disk-1.raw\"}}}", "format": "throttle", "actual-size": 54190236160, "dirty-flag": false}, "iops_wr": 0, "ro": false, "children": [{"node-name": "f289c781e188435fd6f4ab27d19d84b", "child": "file"}], "node-name": "drive-scsi0", "backing_file_depth": 1, "drv": "throttle", "iops": 0, "bps_wr": 0, "write_threshold": 0, "encrypted": false, "bps": 0, "bps_rd": 0, "cache": {"no-flush": false, "direct": false, "writeback": false}, "file": "json:{\"throttle-group\": \"throttle-drive-scsi0\", \"driver\": \"throttle\", \"file\": {\"driver\": \"raw\", \"file\": {\"driver\": \"file\", \"filename\": \"/var/lib/vz/images/100102/vm-100102-disk-1.raw\"}}}"}, "qdev": "scsi0", "type": "unknown"}]}

Taz-Matt · Nov 28, 2025

Hello @dcuadrados ! Since I had upgraded to PVE 9 and at the same time enabled ZFS on my new hosts, I had random io-error on 2 or 3 VMs out of about 20 (all running linux but it happened both on RedHat-based and Debian-based distros). I was able to reproduce the issue at will on one of the VMs with a big file transfer in an application that was cached in RAM and when RAM hit 100%, it failed with io-error each time.

Found out that balooning was "off" on all my VMs. I enabled it and issue was resolved. Not sure if it is related or not to what is happening to your Windows VM but that is my experience of the last week or so and wanted to share in case it helps you or if @fiona can find anything from this information.

Also, maybe it did not happen when I was running PVE 8 because I was not using ZFS on my old nodes and now it is active on the new one so the RAM is fully used at all times for caching.

If it doesn't help, please disregard! Have a nice day!

Whatever · Nov 28, 2025

Check this Post in thread 'Redhat VirtIO developers would like to coordinate with Proxmox devs re: "[vioscsi] Reset to device ... system unresponsive"'
https://forum.proxmox.com/threads/r...device-system-unresponsive.139160/post-814067

Taz-Matt · Nov 30, 2025

Taz-Matt said:
Hello @dcuadrados ! Since I had upgraded to PVE 9 and at the same time enabled ZFS on my new hosts, I had random io-error on 2 or 3 VMs out of about 20 (all running linux but it happened both on RedHat-based and Debian-based distros). I was able to reproduce the issue at will on one of the VMs with a big file transfer in an application that was cached in RAM and when RAM hit 100%, it failed with io-error each time.

Found out that balooning was "off" on all my VMs. I enabled it and issue was resolved. Not sure if it is related or not to what is happening to your Windows VM but that is my experience of the last week or so and wanted to share in case it helps you or if @fiona can find anything from this information.

Also, maybe it did not happen when I was running PVE 8 because I was not using ZFS on my old nodes and now it is active on the new one so the RAM is fully used at all times for caching.

If it doesn't help, please disregard! Have a nice day!

Just want to add that I finally had another incident so it is not fully related to balooning and/or PVE9. This VM had about 75% RAM used also. Just unable to recreate the same scenario so it happens for other reasons. Will continue to monitor and let you know if I figure it out.

fiona · Dec 1, 2025

Hi,

Taz-Matt said:
Since I had upgraded to PVE 9 and at the same time enabled ZFS on my new hosts, I had random io-error on 2 or 3 VMs out of about 20 (all running linux but it happened both on RedHat-based and Debian-based distros). I was able to reproduce the issue at will on one of the VMs with a big file transfer in an application that was cached in RAM and when RAM hit 100%, it failed with io-error each time.

please share the information I asked for in my earlier response in this thread:

fiona said:
Hi,
please check the host system logs/journal for any messages around the time of the issue. What does zpool status -v say?

What does the following output when the VM is in IO error state, replacing 123 with the actual ID:

Code:

echo '{"execute": "qmp_capabilities"}{"execute": "query-block"}' | socat - /run/qemu-server/123.qmp

?

as well as the output of qm config 123 (again replacing 123 with the actual ID).

Taz-Matt · Dec 1, 2025

fiona said:
Hi,

please share the information I asked for in my earlier response in this thread:

as well as the output of qm config 123 (again replacing 123 with the actual ID).

Thanks for the follow-up. It seems to happen less often since I enabled balooning. But for sure I will let you know when it happens again.

Taz-Matt · Dec 2, 2025

@fiona here it is:

Code:

# zpool status -v
  pool: data
 state: ONLINE
config:

    NAME                                               STATE     READ WRITE CKSUM
    data                                               ONLINE       0     0     0
      nvme-eui.3634473052b019680025384500000003-part5  ONLINE       0     0     0
      nvme-eui.3634473052b019770025384500000003-part5  ONLINE       0     0     0

errors: No known data errors

-----

# echo '{"execute": "qmp_capabilities"}{"execute": "query-block"}' | socat - /run/qemu-server/110.qmp
{"QMP": {"version": {"qemu": {"micro": 2, "minor": 1, "major": 10}, "package": "pve-qemu-kvm_10.1.2-4"}, "capabilities": []}}
{"return": {}}
{"return": [{"io-status": "ok", "device": "", "locked": false, "removable": true, "qdev": "ide2", "tray_open": false, "type": "unknown"}, {"io-status": "nospace", "device": "", "locked": false, "removable": false, "inserted": {"iops_rd": 0, "detect_zeroes": "on", "active": true, "image": {"backing-image": {"virtual-size": 34359738368, "filename": "/var/lib/vz/images/110/vm-110-disk-0.qcow2", "cluster-size": 65536, "format": "qcow2", "actual-size": 12256150016, "format-specific": {"type": "qcow2", "data": {"compat": "1.1", "compression-type": "zlib", "lazy-refcounts": false, "refcount-bits": 16, "corrupt": false, "extended-l2": false}}, "dirty-flag": false}, "virtual-size": 34359738368, "filename": "json:{\"throttle-group\": \"throttle-drive-scsi0\", \"driver\": \"throttle\", \"file\": {\"driver\": \"qcow2\", \"file\": {\"driver\": \"file\", \"filename\": \"/var/lib/vz/images/110/vm-110-disk-0.qcow2\"}}}", "cluster-size": 65536, "format": "throttle", "actual-size": 12256150016, "dirty-flag": false}, "iops_wr": 0, "ro": false, "children": [{"node-name": "f537c6f0445eec4b8ee057a90cdb513", "child": "file"}], "node-name": "drive-scsi0", "backing_file_depth": 1, "drv": "throttle", "iops": 0, "bps_wr": 0, "write_threshold": 0, "encrypted": false, "bps": 0, "bps_rd": 0, "cache": {"no-flush": false, "direct": false, "writeback": true}, "file": "json:{\"throttle-group\": \"throttle-drive-scsi0\", \"driver\": \"throttle\", \"file\": {\"driver\": \"qcow2\", \"file\": {\"driver\": \"file\", \"filename\": \"/var/lib/vz/images/110/vm-110-disk-0.qcow2\"}}}"}, "qdev": "scsi0", "type": "unknown"}]}

-----

# qm config 110
agent: 1
boot: order=scsi0;ide2;net0
cores: 2
cpu: host
ide2: none,media=cdrom
memory: 1024
meta: creation-qemu=8.1.2,ctime=1706497241
name: ***somevm***
net0: virtio=02:00:00:xx:xx:xx,bridge=vmbr0,firewall=1
net1: virtio=BC:24:11:xx:xx:xx,bridge=vmbr1,firewall=1
numa: 0
onboot: 1
ostype: l26
scsi0: local:110/vm-110-disk-0.qcow2,format=qcow2,iothread=1,size=32G
scsihw: virtio-scsi-single
smbios1: uuid=e49a324e-57f8-4d96-a11b-b27421a6cbc8
sockets: 1
startup: order=1,up=10
vmgenid: 8b1afcbb-c890-4faf-84d4-b36e8d6103b9

-----

# Logs from /var/log/messages at the moment of the crash, followed by the reboot 14 minutes later (time to detect the downtime, login, take evidences and manually stop and start the VM)

Dec  1 22:30:05 gw1 systemd[1]: Starting dnf makecache...
Dec  1 22:30:06 gw1 dnf[7394]: AlmaLinux 9 - AppStream                          44 kB/s | 4.2 kB     00:00
Dec  1 22:30:06 gw1 dnf[7394]: AlmaLinux 9 - BaseOS                             48 kB/s | 3.8 kB     00:00
Dec  1 22:30:06 gw1 dnf[7394]: AlmaLinux 9 - Extras                             22 kB/s | 3.8 kB     00:00
Dec  1 22:30:06 gw1 dnf[7394]: Elastic repository for 9.x packages              53 kB/s | 1.7 kB     00:00
Dec  1 22:30:06 gw1 dnf[7394]: Extra Packages for Enterprise Linux 9 - x86_64   45 kB/s | 5.9 kB     00:00
Dec  1 22:30:07 gw1 dnf[7394]: Extra Packages for Enterprise Linux 9 - x86_64   48 MB/s |  20 MB     00:00
Dec  1 22:44:03 gw1 kernel: The list of certified hardware and cloud instances for Red Hat Enterprise Linux 9 can be viewed at the Red Hat Ecosystem Catalog, https://catalog.redhat.com.

I barely see any activity on the resources of the VM (CPU, RAM, Network, Disk). Everything is very low and far from any limits. Same VM as last time, this VM was migrated from a PVE 8 node to a PVE 9 node and I never saw those IO errors with any of the migrated VMs (including this one) on PVE 8. Exact same hardware was used on the baremetal server running PVE 8 and PVE 9, if that is of any help.

Thanks a lot!

fiona · Dec 2, 2025

Code:

"io-status": "nospace"

Are you sure your local storage has enough free space?

What do the following say?

Code:

pvesm status
df -h
qemu-img info --output=json /var/lib/vz/images/110/vm-110-disk-0.qcow2

Taz-Matt · Dec 2, 2025

Hello @fiona . Thanks for you answer. Disk space is not even close to be used on that node:

Code:

# pvesm status
Name                       Type     Status     Total (KiB)      Used (KiB) Available (KiB)        %
local                       dir     active      3576569984      1040519040      2536050944   29.09%
[...]

(trimmed the rest of the results as they are remote storages that are not used by this VM like pbs, iscsi... this VM is purely local)

This one looks good too:

Code:

# df -h
Filesystem      Size  Used Avail Use% Mounted on
udev             63G     0   63G   0% /dev
tmpfs            13G  8.1M   13G   1% /run
efivarfs        192K   64K  124K  34% /sys/firmware/efi/efivars
/dev/md3         20G  5.6G   13G  30% /
tmpfs            63G   66M   63G   1% /dev/shm
tmpfs           5.0M     0  5.0M   0% /run/lock
tmpfs           1.0M     0  1.0M   0% /run/credentials/systemd-journald.service
/dev/md2        988M  228M  693M  25% /boot
tmpfs            63G     0   63G   0% /tmp
/dev/md1        511M  176K  511M   1% /boot/efi
data/zd0        3.4T  993G  2.4T  30% /var/lib/vz
/dev/fuse       128M   56K  128M   1% /etc/pve
tmpfs           1.0M     0  1.0M   0% /run/credentials/getty@tty1.service
tmpfs           1.0M     0  1.0M   0% /run/credentials/serial-getty@ttyS1.service
tmpfs            13G  8.0K   13G   1% /run/user/0

And for the details of the vm disk:

Code:

# qemu-img info --output=json /var/lib/vz/images/110/vm-110-disk-0.qcow2
{
    "children": [
        {
            "name": "file",
            "info": {
                "children": [
                ],
                "virtual-size": 34365243392,
                "filename": "/var/lib/vz/images/110/vm-110-disk-0.qcow2",
                "format": "file",
                "actual-size": 12256150016,
                "format-specific": {
                    "type": "file",
                    "data": {
                    }
                },
                "dirty-flag": false
            }
        }
    ],
    "virtual-size": 34359738368,
    "filename": "/var/lib/vz/images/110/vm-110-disk-0.qcow2",
    "cluster-size": 65536,
    "format": "qcow2",
    "actual-size": 12256150016,
    "format-specific": {
        "type": "qcow2",
        "data": {
            "compat": "1.1",
            "compression-type": "zlib",
            "lazy-refcounts": false,
            "refcount-bits": 16,
            "corrupt": false,
            "extended-l2": false
        }
    },
    "dirty-flag": false
}

Thanks again, really appreciate it. Like I said, never happened in the past year and more, with the same VMs, on PVE 8, same hardware (even also different hardware a year ago). It only happened since I moved to PVE 9 and am using local storage over ZFS (instead of local storage without ZFS) about 2 weeks ago.

Ballooning seems to help, but not completely remove the behavior. I am talking about memory because after first searching for disks like you are asking me right now, I asked an AI and after giving some details, it went to memory allocation because memory does have high usage because of ZFS ARC (screenshot). Note that the last issue happened at 21:30 in this graph so nothing shows signs of major changes at that time and there is plenty of RAM left for a VM configured at 1GB RAM and using about 600MB (the VM is basically a linux router).

fiona · Dec 3, 2025

What does qemu-img check /var/lib/vz/images/110/vm-110-disk-0.qcow2 say (while the VM is shut down)? Could you also check your system journal for messages around the time of the issue, i.e. with the journalctl command.

In general, using qcow2 on top of ZFS is not recommended, because it means having duplicate copy-on-write operations. But of course, there's still a bug here, because that should only hurt performance and not lead to issues like this.

Taz-Matt · Dec 4, 2025

Hello @fiona , thanks again for the follow-up. Here is the result of the command with the VM stopped (note, exact same result while running):

Code:

# qemu-img check /var/lib/vz/images/110/vm-110-disk-0.qcow2
No errors were found on the image.
524288/524288 = 100.00% allocated, 0.00% fragmented, 0.00% compressed clusters
Image end offset: 34365243392

Also thanks for letting me know about the qcow2 over ZFS. I did not know about that detail and it was probably the right type on my previous setup without ZFS. What do you recommend? Raw?

Thanks!
Mathieu

Taz-Matt · Dec 4, 2025

Note: I have already converted this specific VM (110) to RAW. I will convert all other QCOW2 running on ZFS to RAW.

fiona · Dec 4, 2025

If you have ZFS, the recommended way is to use ZFS as the storage type, not a Directory type storage on top of ZFS. You then get ZFS volumes as virtual block devices, not just files and have all the advantages ZFS has to offer, like snapshots:
https://pve.proxmox.com/pve-docs/chapter-pvesm.html#_storage_types

Taz-Matt · Dec 4, 2025

Thanks for the info @fiona I will look into that.

dcuadrados · Dec 9, 2025

After downgrading to driver version 0.1.271, my Windows Server 2025 Datacenter VMs have been completely stable. Previously, both machines were crashing at least 1-2 times per week. Since the downgrade, they've been running for over a month without any errors and performance has significantly improved.

Taz-Matt · Dec 9, 2025

This is good to know @dcuadrados . Thanks for the info!

On my end, I still had issues in spite of all the changes I did, they were just less frequent. I decided to start moving VMs around my cluster to reinstall the hosts properly. I decided to use ext4 and keep the raw disk images. Usually, an issue happened at least once every two days. If nothing happens until the weekend, I will also consider it as resolved and never use a mix of ZFS and RAW or QCOW2 images. Either I will go fully into ZFS or stay over EXT4.

Thanks @fiona for all the info and tests, it made me learn a few things that will be helpful in the future.

03984 · Dec 10, 2025

Hello
I can confirm this issue, randomly on 5 servers in OVH, two was intels and three amds, now we have three amds. guest are debian from 11 to 13. one big python app and three mysql and haproxy.

Taz-Matt said:
never happened in the past year and more, with the same VMs, on PVE 8, same hardware (even also different hardware a year ago). It only happened since I moved to PVE 9 and am using local storage over ZFS (instead of local storage without ZFS) about 2 weeks ago.

Same case, but we migrate vms to exact hardware configuration but different host.

fiona · Dec 11, 2025

Hi,

03984 said:
I can confirm this issue, randomly on 5 servers in OVH, two was intels and three amds, now we have three amds. guest are debian from 11 to 13. one big python app and three mysql and haproxy.

please share more details about the storage configuration for the VM disks. Do you also have qcow2 on top of ZFS or something else?

What does the following output when the VM is in IO error state, replacing 123 with the actual ID both times:

Code:

qm config 123
echo '{"execute": "qmp_capabilities"}{"execute": "query-block"}' | socat - /run/qemu-server/123.qmp

Do you also see an "io-status": "nospace" there?

What does pvesm status say?

Random IO Error - Windows Server 2025

New Member

Proxmox Staff Member

New Member

Member

Renowned Member

Member

Proxmox Staff Member

Member

Member

Proxmox Staff Member

Member

Attachments

Proxmox Staff Member

Member

Member

Proxmox Staff Member

Member

New Member

Attachments

Member

New Member

Proxmox Staff Member

We value your privacy