Additional confirmation: guest disk EIO / MariaDB InnoDB failures with Proxmox 7.x host kernel, mitigated by booting 6.17.13-13-pve

PMXOlli

New Member
Jun 24, 2026
2
0
1
Hi all,

I would like to add another data point to the already reported issue regarding guest disk I/O errors with the Proxmox 7.x host kernel series.

This is related to the existing thread:

io_uring on kernel 7.0.6-2-pve / guest disk I/O errors / EIO / filesystem shutdown

In our case, the affected guest was a Debian-based VM running MariaDB 10.11.14. I am intentionally omitting VM names, internal hostnames, storage names, IP addresses, customer-specific details and internal infrastructure layout.

Summary

  • After booting the Proxmox host into the 7.x pve kernel series (also current 7.0.12-1-pve), many VMs started to show hard guest-side disk I/O errors.
  • The first visible application failure was MariaDB/InnoDB refusing to start because ib_logfile0 could not be read.
  • A full sequential read test inside the guest later completed successfully, which made the issue look transient rather than a permanently bad virtual block.
  • During a later database restore attempt, guest-side write EIOs appeared again and MariaDB/InnoDB failed again.
  • Changing the affected VM disk to aio=threads on the Proxmox side did not fully resolve the issue in our environment.
  • After pinning and booting the Proxmox host back to 6.17.13-13-pve, the EIOs have not reappeared so far.
  • The issue was not limited to a single guest OS or guest kernel. We observed the same class of errors on Debian 13 guests with both 6.x and 7.x guest kernels, as well as on Ubuntu- and Arch-based guests.
  • So far, running a 7.x kernel inside the guest did not change the behavior in a meaningful way. The stronger correlation appears to be the Proxmox host kernel 7.x series.

Guest-side symptoms

MariaDB initially failed during startup with:

Code:
InnoDB: pread("ib_logfile0") returned -1, operating system error 5
InnoDB: Failed to read log at <offset>: I/O error
InnoDB: Log scan aborted at LSN <value>
InnoDB: Plugin initialization aborted with error Generic error
Plugin 'InnoDB' registration as a STORAGE ENGINE failed.
Unknown/unsupported storage engine: InnoDB
Aborting

The guest kernel showed disk I/O errors on the virtual disk, for example:

Code:
I/O error, dev sda, sector <sector> op 0x0:(READ)
sd <x:x:x>: [sda] Sense Key : Aborted Command [current]
sd <x:x:x>: [sda] Add. Sense: I/O process terminated

A full sequential read of the guest disk later completed without errors:

Code:
dd if=/dev/sda of=/dev/null bs=1M status=progress

This completed successfully, which suggested that the problem was not a stable bad block inside the guest disk image.

However, while trying to restore the database into a freshly initialized MariaDB datadir, the server crashed again and the guest kernel logged write/read EIOs:

Code:
I/O error, dev sda, sector <sector> op 0x1:(WRITE)
sd <x:x:x>: [sda] Sense Key : Aborted Command [current]
sd <x:x:x>: [sda] Add. Sense: I/O process terminated
I/O error, dev sda, sector <sector> op 0x0:(READ)

MariaDB then again failed at InnoDB recovery/startup:

Code:
InnoDB: Log scan aborted at LSN <value>
InnoDB: Plugin initialization aborted with error Generic error
Plugin 'InnoDB' registration as a STORAGE ENGINE failed.
Unknown/unsupported storage engine: InnoDB
Aborting

Relevant observations

  1. The failure pattern was application-visible as MariaDB/InnoDB corruption/failure, but the underlying problem appears to be below MariaDB.
  2. The guest received real block-level EIOs from its virtual disk. InnoDB was only the workload that exposed the issue quickly because database restore/startup creates a lot of synchronous write and redo-log activity.
  3. A full sequential dd read inside the guest was not sufficient to prove the system was safe. The problem reappeared under write-heavy database restore workload.
  4. Changing the VM disk to aio=threads was not sufficient in our case. The VM was fully stopped and started again after changing the disk configuration.
  5. After booting the Proxmox host back to 6.17.13-13-pve and pinning that kernel, the issue has not reappeared so far.

Sanitized VM disk configuration after mitigation attempt

Code:
scsihw: virtio-scsi-single
scsi0: <storage>:vm-<id>-disk-0,aio=threads,cache=none,iothread=1,size=<size>

Inside the guest, MariaDB was also configured conservatively during recovery:

Code:
[mysqld]
innodb_use_native_aio=0
max_allowed_packet=1G
net_read_timeout=600
net_write_timeout=600
wait_timeout=28800
interactive_timeout=28800

[mariadb]
max_allowed_packet=1G
binary-mode

Workaround used

Code:
proxmox-boot-tool kernel pin 6.17.13-13-pve
proxmox-boot-tool refresh
reboot

After reboot:

Code:
uname -r

confirmed:

Code:
6.17.13-13-pve

Operational recommendation based on our experience

  • Do not continue database restores or filesystem repair attempts while the guest is still seeing I/O error, dev sdX from the virtual disk.
  • Treat any database datadir or filesystem that was written to during those EIO events as potentially inconsistent.
  • For stateful workloads, especially databases, consider booting/pinning the last known stable 6.17 pve kernel until this is fully understood or fixed.
  • aio=threads may reduce one possible io_uring-related path, but in our case it did not fully prevent guest EIOs while running the affected 7.x host kernel.
  • Before trusting a restored database again, perform write-heavy testing, not only sequential read testing.

A useful guest-side stress test could be something similar to:

Code:
fio --name=stress-hdd
--ioengine=libaio
--iodepth=8
--rw=randwrite
--bs=4k
--direct=1
--fdatasync=8
--size=20G
--numjobs=4
--runtime=1800
--time_based
--ramp_time=30
--group_reporting
--filename=/tmp/fio-test
--output-format=normal,json
--output=fio-result.json

Please run this only on a disposable test VM or disposable test disk.

Current status

  • Host kernel pinned to 6.17.13-13-pve.
  • No further guest disk EIOs observed so far after rollback.
  • Database recovery/restoration is being redone from a clean datadir after the host kernel rollback.
  • The datadir created or written during the 7.x-kernel EIO events is considered unsafe and is not being reused.

Additional guest OS / guest kernel observations

We were also able to observe the same class of guest-side disk I/O errors on different Linux guests, not only on one specific distribution or guest kernel.

Affected guest environments included:

  • Debian 13 guests with 6.x guest kernels
  • Debian 13 guests with 7.x guest kernels
  • Ubuntu-based guests
  • Arch-based guests

So far, using a 7.x kernel inside the guest did not make a relevant difference in our environment. The issue still appeared to depend primarily on the Proxmox host running the affected 7.x pve kernel series.

This is why our current working assumption is that the guest kernel version may influence timing or reproducibility, but is unlikely to be the primary root cause. The stronger correlation in our environment is still:

Code:
Proxmox host kernel 7.x
→ guest receives virtual disk EIO / Aborted Command / I/O process terminated
→ stateful workloads such as MariaDB/InnoDB fail or become inconsistent

After rolling the Proxmox host back to 6.17.13-13-pve, the issue has not reappeared so far, regardless of the tested guest distribution/kernel combination.

I hope this helps with debugging and provides another confirmation that this may not be limited to XFS guests. In our case the most visible failure was MariaDB/InnoDB on a Debian-based guest, with the guest receiving Aborted Command / I/O process terminated / I/O error from the virtual disk.

I am not claiming this is definitively a Proxmox-only bug. However, in our environment the practical mitigation was to move away from the affected 7.x pve host kernel and boot/pin 6.17.13-13-pve.
 
are you also using XFS inside the VM, or another file system?
 
Hi @PMXOlli,
what storage are you using on the host for the virtual disks? Please share the relevant part of cat /etc/pve/storage.cfg.