Virtual machine freezes with IO error

uzisuicida · Apr 16, 2026

Hello everyone,

I'm having problems with only one virtual machine. After several days, it displays the error: Status: io-error. I've checked the disk and there's no problem there. It only happens with this machine. After shutting it down and turning it back on, it continues to function normally.

I have no idea what to check.

Thanks!

d.oishi · Apr 16, 2026

Are there any errors in `journalctl` or `dmesg`?
You can gather the VM configuration/status and the storage status with the following commands:

Code:

qm status $VMID --verbose
qm config $VMID
qm showcmd $VMID --pretty
pvesm status

uzisuicida · Apr 16, 2026

Hello d.oshi,

No errors were observed with journalctl or dmesg.
I also don't see any errors with the commands you sent me, unless I don't know how to interpret them; I've attached a file with the output of each command.

Thanks!

cwt · Apr 17, 2026

Your VM is on the local-lvm which is not really suitable for VM storage.

If the error re-occurs look for storage related messages:

Code:

dmesg -T | egrep -i "error|fail|reset|nvme|sd"

The log indicates slow writes:

Code:

wr_operations: 802447
wr_total_time_ns: 1731275382833

That‘s around 2ms per write. And this:

Code:

account_failed: 1
account_invalid: 1

comes directly from the QEMU block-layer and means that there was an IO error.

How does did you setup your storage for local-lvm? NVME? SSD?

Impact · Apr 17, 2026

There does not seem to be a local-lvm. Also why would local-lvm not be suitable? For local I agree.

uzisuicida · Apr 17, 2026

Hello cwt,

Attached is the result of dmesg grep, at this moment it froze again, I turned it off and on again.

Thanks!

cwt · Apr 17, 2026

Jepp, typo. Storage is local, not local-lvm.

@uzisuicida: your VM has

Code:

cache.direct=true
no-flush=false

But dmesg shows:

Code:

sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA

FUA is Force Unit Access = write blocks directly on the storage.

qcow2 (used by your VM) relies heavily on metadata updates and requires reliable flush operations to keep the filesystem consistent.
If the underlying storage does not support FUA, flush requests may be acknowledged before data is actually written to disk.
This creates a mismatch where the VM assumes data is safely stored, while it may still reside in volatile cache.
Under load or failure conditions, this can lead to data corruption or I/O errors, potentially crashing the VM.

uzisuicida · Apr 17, 2026

Hello cwt,
Do I need to disable the disk cache? Or what should I do in this case? Thank you so much for your help.

Upstairs_Cycle384 · Apr 17, 2026

This creates a mismatch where the VM assumes data is safely stored, while it may still reside in volatile cache. Under load or failure conditions, this can lead to data corruption or I/O errors, potentially crashing the VM.

How would it lead to data corruption or I/O errors under load? When data is requested to be read again, if it's in the cache (and hasn't YET been written to disk), then the cached data would be given back to the requestor.

Where you run into issues is if there's a crash and cached data can't be flushed to disk

uzisuicida · Apr 17, 2026

cwt said:
Jepp, typo. Storage is local, not local-lvm.

@uzisuicida: your VM has

Code:

cache.direct=true no-flush=false

But dmesg shows:

Code:

sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA

FUA is Force Unit Access = write blocks directly on the storage.

qcow2 (used by your VM) relies heavily on metadata updates and requires reliable flush operations to keep the filesystem consistent.
If the underlying storage does not support FUA, flush requests may be acknowledged before data is actually written to disk.
This creates a mismatch where the VM assumes data is safely stored, while it may still reside in volatile cache.
Under load or failure conditions, this can lead to data corruption or I/O errors, potentially crashing the VM.

Hello,

I have other servers with Proxmox, and I ran the same command, it shows the same thing, but I don't have this problem of a virtual machine freezing and having to turn it off and on.

Thanks.

fiona · Apr 22, 2026

Yes, there was an IO error, but

cwt said:
Code:

account_failed: 1 account_invalid: 1

comes directly from the QEMU block-layer and means that there was an IO error.

this is not what these two values mean:

account_invalid (<span>boolean</span>) – Whether invalid operations are included in thelast access statistics (Since 2.5)

account_failed (<span>boolean</span>) – Whether failed operations are included in thelatency and last access statistics (Since 2.5)

https://qemu.readthedocs.io/en/mast...f.html#object-QMP-block-core.BlockDeviceStats

cwt · Apr 22, 2026

fiona said:
Yes, there was an IO error, but

this is not what these two values mean:

https://qemu.readthedocs.io/en/mast...f.html#object-QMP-block-core.BlockDeviceStats

Agree.

The IO error is not indicated by account_failed / account_invalid (those are just accounting flags), but by failed_*_operations together with qmpstatus: io-error

nils-the-oldone1980 · May 26, 2026

uzisuicida said:
Hello,

I have other servers with Proxmox, and I ran the same command, it shows the same thing, but I don't have this problem of a virtual machine freezing and having to turn it off and on.

Thanks.

Hi uzisuicida,

I am having the same problems with "io-errors" with two of my VMs - no logs from the kernel or anything else.

DId you ever find a solution for your issue?

fiona · May 26, 2026

Hi @nils-the-oldone1980,
is there enough free space? Can you share the excerpt from the system logs/journal from around the time the issue happened as well as the output of pveversion -v and the VM configuration qm config ID with the numerical ID.

nils-the-oldone1980 · May 26, 2026

fiona said:
Hi @nils-the-oldone1980,
is there enough free space? Can you share the excerpt from the system logs/journal from around the time the issue happened as well as the output of pveversion -v and the VM configuration qm config ID with the numerical ID.

Hi fiona,

please see attached output. I've already tried all available "Async IO" mechanisms, disabling "IO thread" at all and/or using "VirtIO SCSI" instead of "VirtIO SCSI single". Even tried without KSM. No success - still locks up with "io error" eventually - cannot force it; it just happens.

edit: VM 999 locked up at around 05:08 today - cannot say for sure because I am using Nagios for out-of-work-time monitoring. Nagios has a delay of two minutes before reporting not-reachable guest agents.

Question: whenever one of the VMs should lock up again, which commands/outputs can I execute/provide to investigate further?

fiona · May 27, 2026

Please share the full journal, not just warnings. Are you using CommVault or Naviko as a backup solution? In the log it can be seen that something creates NBD block devices and maybe the issue is related to that.

The output of

Code:

echo '{"execute": "qmp_capabilities"}{"execute": "query-block"}' | socat - /run/qemu-server/123.qmp | jq

would also be interesting. Replace 123 with your VM ID. You might need to install socat and jq first.

nils-the-oldone1980 · May 28, 2026

fiona said:
Please share the full journal, not just warnings. Are you using CommVault or Naviko as a backup solution? In the log it can be seen that something creates NBD block devices and maybe the issue is related to that. [...]

Yes, we are using NAKIVO. But your mentioning of looking at the full journal and not just warnings gave me the solution, i think.

Yesterday, the VM 999 locked up again at around 09:49 CEST. Here are the relevant journal entries:

Code:

May 27 09:49:52 proxkon01 zed[2586197]: eid=43733 class=dio_verify_wr pool='datastore' size=131072 offset=2737142026240 priority=1 err=5 flags=0x200080 bookmark=54:6784:0:984520
May 27 09:49:59 proxkon01 pvedaemon[2574504]: VM 999 qga command failed - VM 999 qga command 'guest-ping' failed - got timeout

Looking up the failure "dio_verify_wr" in combination with "err=5" led me to another thread:

T

[SOLVED] Thread 'Proxmox 9 - IO error ZFS'

Jan 22, 2026

Hi everyone,

We’ve been deploying several new Proxmox 9 nodes using ZFS as the primary storage, and we’re encountering issues where virtual machines become I/O locked.

When it happens, the VMs are paused with an I/O error. We’re aware this can occur when a host runs out of disk space, but in our case there is plenty of free storage available.
We’ve seen this behavior across multiple hosts, different clusters, and different hardware platforms.

Furthermore, we’ve been running ZFS on Proxmox 8 without any issues, but since these problems started with our Proxmox 9 installations, we’re...

I have disabled Direct I/O on our ZFS storages now using "zfs set direct=disabled datastore" and am very sure that this was the cause. Will let you know the outcome in a day or two.

nils-the-oldone1980 · Jun 1, 2026

Hello again,

since deactivation of Direct I/O there were no more io errors of the VMs, and all of them are running absolutely fine.

Virtual machine freezes with IO error

New Member

Attachments

Member

New Member

Attachments

Renowned Member

Distinguished Member

New Member

Attachments

Renowned Member

New Member

New Member

New Member

Proxmox Staff Member

Renowned Member

New Member

Proxmox Staff Member

New Member

Attachments

Proxmox Staff Member

New Member

[SOLVED] Thread 'Proxmox 9 - IO error ZFS'

New Member

We value your privacy