Hi all,
I would like to add another data point to the already reported issue regarding guest disk I/O errors with the Proxmox 7.x host kernel series.
This is related to the existing thread:
io_uring on kernel 7.0.6-2-pve / guest disk I/O errors / EIO / filesystem shutdown
In our case, the affected guest was a Debian-based VM running MariaDB 10.11.14. I am intentionally omitting VM names, internal hostnames, storage names, IP addresses, customer-specific details and internal infrastructure layout.
Summary
Guest-side symptoms
MariaDB initially failed during startup with:
The guest kernel showed disk I/O errors on the virtual disk, for example:
A full sequential read of the guest disk later completed without errors:
This completed successfully, which suggested that the problem was not a stable bad block inside the guest disk image.
However, while trying to restore the database into a freshly initialized MariaDB datadir, the server crashed again and the guest kernel logged write/read EIOs:
MariaDB then again failed at InnoDB recovery/startup:
Relevant observations
Sanitized VM disk configuration after mitigation attempt
Inside the guest, MariaDB was also configured conservatively during recovery:
Workaround used
After reboot:
confirmed:
Operational recommendation based on our experience
A useful guest-side stress test could be something similar to:
Please run this only on a disposable test VM or disposable test disk.
Current status
Additional guest OS / guest kernel observations
We were also able to observe the same class of guest-side disk I/O errors on different Linux guests, not only on one specific distribution or guest kernel.
Affected guest environments included:
So far, using a 7.x kernel inside the guest did not make a relevant difference in our environment. The issue still appeared to depend primarily on the Proxmox host running the affected 7.x pve kernel series.
This is why our current working assumption is that the guest kernel version may influence timing or reproducibility, but is unlikely to be the primary root cause. The stronger correlation in our environment is still:
After rolling the Proxmox host back to
I hope this helps with debugging and provides another confirmation that this may not be limited to XFS guests. In our case the most visible failure was MariaDB/InnoDB on a Debian-based guest, with the guest receiving
I am not claiming this is definitively a Proxmox-only bug. However, in our environment the practical mitigation was to move away from the affected 7.x pve host kernel and boot/pin
I would like to add another data point to the already reported issue regarding guest disk I/O errors with the Proxmox 7.x host kernel series.
This is related to the existing thread:
io_uring on kernel 7.0.6-2-pve / guest disk I/O errors / EIO / filesystem shutdown
In our case, the affected guest was a Debian-based VM running MariaDB 10.11.14. I am intentionally omitting VM names, internal hostnames, storage names, IP addresses, customer-specific details and internal infrastructure layout.
Summary
- After booting the Proxmox host into the 7.x pve kernel series (also current
7.0.12-1-pve), many VMs started to show hard guest-side disk I/O errors. - The first visible application failure was MariaDB/InnoDB refusing to start because
ib_logfile0could not be read. - A full sequential read test inside the guest later completed successfully, which made the issue look transient rather than a permanently bad virtual block.
- During a later database restore attempt, guest-side write EIOs appeared again and MariaDB/InnoDB failed again.
- Changing the affected VM disk to
aio=threadson the Proxmox side did not fully resolve the issue in our environment. - After pinning and booting the Proxmox host back to
6.17.13-13-pve, the EIOs have not reappeared so far. - The issue was not limited to a single guest OS or guest kernel. We observed the same class of errors on Debian 13 guests with both 6.x and 7.x guest kernels, as well as on Ubuntu- and Arch-based guests.
- So far, running a 7.x kernel inside the guest did not change the behavior in a meaningful way. The stronger correlation appears to be the Proxmox host kernel 7.x series.
Guest-side symptoms
MariaDB initially failed during startup with:
Code:
InnoDB: pread("ib_logfile0") returned -1, operating system error 5
InnoDB: Failed to read log at <offset>: I/O error
InnoDB: Log scan aborted at LSN <value>
InnoDB: Plugin initialization aborted with error Generic error
Plugin 'InnoDB' registration as a STORAGE ENGINE failed.
Unknown/unsupported storage engine: InnoDB
Aborting
The guest kernel showed disk I/O errors on the virtual disk, for example:
Code:
I/O error, dev sda, sector <sector> op 0x0:(READ)
sd <x:x:x>: [sda] Sense Key : Aborted Command [current]
sd <x:x:x>: [sda] Add. Sense: I/O process terminated
A full sequential read of the guest disk later completed without errors:
Code:
dd if=/dev/sda of=/dev/null bs=1M status=progress
This completed successfully, which suggested that the problem was not a stable bad block inside the guest disk image.
However, while trying to restore the database into a freshly initialized MariaDB datadir, the server crashed again and the guest kernel logged write/read EIOs:
Code:
I/O error, dev sda, sector <sector> op 0x1:(WRITE)
sd <x:x:x>: [sda] Sense Key : Aborted Command [current]
sd <x:x:x>: [sda] Add. Sense: I/O process terminated
I/O error, dev sda, sector <sector> op 0x0:(READ)
MariaDB then again failed at InnoDB recovery/startup:
Code:
InnoDB: Log scan aborted at LSN <value>
InnoDB: Plugin initialization aborted with error Generic error
Plugin 'InnoDB' registration as a STORAGE ENGINE failed.
Unknown/unsupported storage engine: InnoDB
Aborting
Relevant observations
- The failure pattern was application-visible as MariaDB/InnoDB corruption/failure, but the underlying problem appears to be below MariaDB.
- The guest received real block-level EIOs from its virtual disk. InnoDB was only the workload that exposed the issue quickly because database restore/startup creates a lot of synchronous write and redo-log activity.
- A full sequential
ddread inside the guest was not sufficient to prove the system was safe. The problem reappeared under write-heavy database restore workload. - Changing the VM disk to
aio=threadswas not sufficient in our case. The VM was fully stopped and started again after changing the disk configuration. - After booting the Proxmox host back to
6.17.13-13-pveand pinning that kernel, the issue has not reappeared so far.
Sanitized VM disk configuration after mitigation attempt
Code:
scsihw: virtio-scsi-single
scsi0: <storage>:vm-<id>-disk-0,aio=threads,cache=none,iothread=1,size=<size>
Inside the guest, MariaDB was also configured conservatively during recovery:
Code:
[mysqld]
innodb_use_native_aio=0
max_allowed_packet=1G
net_read_timeout=600
net_write_timeout=600
wait_timeout=28800
interactive_timeout=28800
[mariadb]
max_allowed_packet=1G
binary-mode
Workaround used
Code:
proxmox-boot-tool kernel pin 6.17.13-13-pve
proxmox-boot-tool refresh
reboot
After reboot:
Code:
uname -r
confirmed:
Code:
6.17.13-13-pve
Operational recommendation based on our experience
- Do not continue database restores or filesystem repair attempts while the guest is still seeing
I/O error, dev sdXfrom the virtual disk. - Treat any database datadir or filesystem that was written to during those EIO events as potentially inconsistent.
- For stateful workloads, especially databases, consider booting/pinning the last known stable 6.17 pve kernel until this is fully understood or fixed.
aio=threadsmay reduce one possible io_uring-related path, but in our case it did not fully prevent guest EIOs while running the affected 7.x host kernel.- Before trusting a restored database again, perform write-heavy testing, not only sequential read testing.
A useful guest-side stress test could be something similar to:
Code:
fio --name=stress-hdd
--ioengine=libaio
--iodepth=8
--rw=randwrite
--bs=4k
--direct=1
--fdatasync=8
--size=20G
--numjobs=4
--runtime=1800
--time_based
--ramp_time=30
--group_reporting
--filename=/tmp/fio-test
--output-format=normal,json
--output=fio-result.json
Please run this only on a disposable test VM or disposable test disk.
Current status
- Host kernel pinned to
6.17.13-13-pve. - No further guest disk EIOs observed so far after rollback.
- Database recovery/restoration is being redone from a clean datadir after the host kernel rollback.
- The datadir created or written during the 7.x-kernel EIO events is considered unsafe and is not being reused.
Additional guest OS / guest kernel observations
We were also able to observe the same class of guest-side disk I/O errors on different Linux guests, not only on one specific distribution or guest kernel.
Affected guest environments included:
- Debian 13 guests with 6.x guest kernels
- Debian 13 guests with 7.x guest kernels
- Ubuntu-based guests
- Arch-based guests
So far, using a 7.x kernel inside the guest did not make a relevant difference in our environment. The issue still appeared to depend primarily on the Proxmox host running the affected 7.x pve kernel series.
This is why our current working assumption is that the guest kernel version may influence timing or reproducibility, but is unlikely to be the primary root cause. The stronger correlation in our environment is still:
Code:
Proxmox host kernel 7.x
→ guest receives virtual disk EIO / Aborted Command / I/O process terminated
→ stateful workloads such as MariaDB/InnoDB fail or become inconsistent
After rolling the Proxmox host back to
6.17.13-13-pve, the issue has not reappeared so far, regardless of the tested guest distribution/kernel combination.I hope this helps with debugging and provides another confirmation that this may not be limited to XFS guests. In our case the most visible failure was MariaDB/InnoDB on a Debian-based guest, with the guest receiving
Aborted Command / I/O process terminated / I/O error from the virtual disk.I am not claiming this is definitively a Proxmox-only bug. However, in our environment the practical mitigation was to move away from the affected 7.x pve host kernel and boot/pin
6.17.13-13-pve.