I/O errors since upgrading to PVE 7.1

Oct 22, 2019
30
5
13
Hey,

I've recently updated from PVE 7.0 to PVE 7.1, since then I've been getting IO issues in my VMs, docker images getting downloaded and then being corrupt or deb files being corrupted when trying to upgrade packages in VMs.

I thought the issue was my two NVMe disks dying so I've had them replaced with some new ones (they were for boot), then a fresh install of PVE 7.1, restored all my VMs from backup (Thank you PBS!), and the issues look to be happening again, at least currently console is getting buffer I/O errors. Also its happening on VMs that are running off HDDs, so it's not the NVMe's again, and maybe it never was.

I've got 2 ZFS mirrors.
First zfs pool which is called rpool which is created and installed using PVE installer, using the 2 NVME drives.
Second zfs pool is called hdd, using 2 4TB HDDs in a mirror.

All my Disc are Data Center grade.
2x WDC CL SN720 SDAQNTW-512G-2000
2x HGST_HUS726T4TALA6L1

ZFS isn't showing any errors even after multiple scrubs, so I'm wondering are the errors at the qemu layer?
I use the virtio driver for disks and networking.
SMART passes on all drives.

Anyone getting issues?

Here's a screenshot from the console, VMs were started within an hour of each other.
VM 1 and 2, were started 17 or so hours ago, VM 3 recently started and also errors.

VM 1 (NVMe ZFS Mirror) 14ish hours up
1637430081012.png

VM 2 (NVMe ZFS Mirror) 14ish hours up
1637430144990.png

VM 3 (HDD ZFS Mirror), recently start and up 2 hours at this point
1637430275318.png
 
I think my issue maybe related to this 5.13 potential kernel regression.

https://forum.proxmox.com/threads/s...t-raw-on-lvm-on-top-of-drbd.21051/post-431567

Code:
# pveversion -v
proxmox-ve: 7.1-1 (running kernel: 5.13.19-1-pve)
pve-manager: 7.1-5 (running version: 7.1-5/6fe299a0)
pve-kernel-5.13: 7.1-4
pve-kernel-helper: 7.1-4
pve-kernel-5.13.19-1-pve: 5.13.19-2
ceph-fuse: 15.2.15-pve1
corosync: 3.1.5-pve2
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.22-pve2
libproxmox-acme-perl: 1.4.0
libproxmox-backup-qemu0: 1.2.0-1
libpve-access-control: 7.1-2
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.0-14
libpve-guest-common-perl: 4.0-3
libpve-http-server-perl: 4.0-3
libpve-storage-perl: 7.0-15
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 4.0.9-4
lxcfs: 4.0.8-pve2
novnc-pve: 1.2.0-3
proxmox-backup-client: 2.0.14-1
proxmox-backup-file-restore: 2.0.14-1
proxmox-mini-journalreader: 1.2-1
proxmox-widget-toolkit: 3.4-2
pve-cluster: 7.1-2
pve-container: 4.1-2
pve-docs: 7.1-2
pve-edk2-firmware: 3.20210831-2
pve-firewall: 4.2-5
pve-firmware: 3.3-3
pve-ha-manager: 3.3-1
pve-i18n: 2.6-1
pve-qemu-kvm: 6.1.0-2
pve-xtermjs: 4.12.0-1
qemu-server: 7.1-3
smartmontools: 7.2-1
spiceterm: 3.2-2
swtpm: 0.7.0~rc1+2
vncterm: 1.7-1
zfsutils-linux: 2.1.1-pve3
 
Last edited:
Set my 'Async IO' to "threads", downloaded a large package and it still gets 'Input/output error', so maybe its not apart of that regression.
 
Have issues too, "threads" doesn't help.

Edit: even with VirtIO
 
Last edited:
Stayed on PVE 5.11 but I've changed some of the VMs over to using SCSI instead of VirtIO Block. When I download large files off the internet, the VM will error for those using VirtIO and the ones using SCSI do not. So I think its issue with VirtIO Block

@jeannotp can you test this on your side too?
 
Stayed on PVE 5.11 but I've changed some of the VMs over to using SCSI instead of VirtIO Block. When I download large files off the internet, the VM will error for those using VirtIO and the ones using SCSI do not. So I think its issue with VirtIO Block

@jeannotp can you test this on your side too?
Hello @FingerlessGloves. My OS don't seem to like SCSI. I'm not much used to change these settings. I just choose SATA and it's fine.
I have FreeBSD VM which work well with VirtIO by the way.
 
Hello,

Same problem here. Temporary workaround from karnow98 works.

Code:
wget http://download.proxmox.com/debian/dists/bullseye/pve-no-subscription/binary-amd64/pve-qemu-kvm_6.0.0-4_amd64.deb
dpkg -i pve-qemu-kvm_6.0.0-4_amd64.deb
apt-mark hold pve-qemu-kvm

pveversion after downgrade.
Code:
proxmox-ve: 7.1-1 (running kernel: 5.13.19-1-pve)
pve-manager: 7.1-6 (running version: 7.1-6/4e61e21c)
pve-kernel-5.13: 7.1-4
pve-kernel-helper: 7.1-4
pve-kernel-5.11: 7.0-10
pve-kernel-5.13.19-1-pve: 5.13.19-3
pve-kernel-5.11.22-7-pve: 5.11.22-12
pve-kernel-5.11.22-1-pve: 5.11.22-2
ceph-fuse: 15.2.13-pve1
corosync: 3.1.5-pve2
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.22-pve2
libproxmox-acme-perl: 1.4.0
libproxmox-backup-qemu0: 1.2.0-1
libpve-access-control: 7.1-5
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.0-14
libpve-guest-common-perl: 4.0-3
libpve-http-server-perl: 4.0-3
libpve-storage-perl: 7.0-15
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 4.0.9-4
lxcfs: 4.0.8-pve2
novnc-pve: 1.2.0-3
proxmox-backup-client: 2.1.2-1
proxmox-backup-file-restore: 2.1.2-1
proxmox-mini-journalreader: 1.2-1
proxmox-widget-toolkit: 3.4-3
pve-cluster: 7.1-2
pve-container: 4.1-2
pve-docs: 7.1-2
pve-edk2-firmware: 3.20210831-2
pve-firewall: 4.2-5
pve-firmware: 3.3-3
pve-ha-manager: 3.3-1
pve-i18n: 2.6-2
pve-qemu-kvm: 6.0.0-4
pve-xtermjs: 4.12.0-1
qemu-server: 7.1-4
smartmontools: 7.2-1
spiceterm: 3.2-2
swtpm: 0.7.0~rc1+2
vncterm: 1.7-1
zfsutils-linux: 2.1.1-pve3

Br,
 
Last edited:
  • Like
Reactions: John_Doe
I'm not sure if I have the same issue, but it seems like it. Last week I migrated from an ssd to an nvme ssd, looked at it a couple of days and decided to erase the hard drive and use it as an additional device. I also ran apt upgrade before replacing the disk.

Now, in my case the error got so bad, the whole proxmox server basically crashes. I have two VM's (pfsense and home assistant), and 20 lxc container. pfsense starts at boot time and is working fine (although start time is definitely slower for some reason). But as soon as I start the home assistant VM I get I/O errors on the lvm volumes used by the lxc container. Since they're on the same physical disk as the OS, the whole disk is put into read only mode and I'm not able to login anymore trough the physical console.

After a lot of trial and error, I removed a passthrough PCIe device (coral edge tpu to be precise) from the HA VM and since then I haven't had a crash (it's only been two hours though).

Since Internet/network isn't working when I shut down proxmox, I'll leave it be for at least today, but maybe it helps someone else with the same issue.

Again, not sure if this is something else entirely (I don't rule out hardware failure at this point)

edit: forgot to mention that I removed the PCIe device from the VM, not from a container.
 
Last edited:
FYI: There's a new pve-qemu-kvm package in version 6.1.0-3, at time of writing available through the pvetest repository, it should address the issue with VirtIO-Block on certain storage-types/configurations that came in with QEMU 6.1.

https://pve.proxmox.com/wiki/Package_Repositories#sysadmin_test_repo
It works with 6.1.0-3. No more IO issue.
Thank you.

Code:
wget http://download.proxmox.com/debian/pve/dists/bullseye/pvetest/binary-amd64/pve-qemu-kvm_6.1.0-3_amd64.deb
dpkg -i pve-qemu-kvm_6.1.0-3_amd64.deb

@John_Doe Your lxc containers do not use qemu-kvm. So nothing to see. Sorry for you.

Br,
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!