I/O errors since upgrading to PVE 7.1

FingerlessGloves · Nov 20, 2021

Hey,

I've recently updated from PVE 7.0 to PVE 7.1, since then I've been getting IO issues in my VMs, docker images getting downloaded and then being corrupt or deb files being corrupted when trying to upgrade packages in VMs.

I thought the issue was my two NVMe disks dying so I've had them replaced with some new ones (they were for boot), then a fresh install of PVE 7.1, restored all my VMs from backup (Thank you PBS!), and the issues look to be happening again, at least currently console is getting buffer I/O errors. Also its happening on VMs that are running off HDDs, so it's not the NVMe's again, and maybe it never was.

I've got 2 ZFS mirrors.
First zfs pool which is called rpool which is created and installed using PVE installer, using the 2 NVME drives.
Second zfs pool is called hdd, using 2 4TB HDDs in a mirror.

All my Disc are Data Center grade.
2x WDC CL SN720 SDAQNTW-512G-2000
2x HGST_HUS726T4TALA6L1

ZFS isn't showing any errors even after multiple scrubs, so I'm wondering are the errors at the qemu layer?
I use the virtio driver for disks and networking.
SMART passes on all drives.

Anyone getting issues?

Here's a screenshot from the console, VMs were started within an hour of each other.
VM 1 and 2, were started 17 or so hours ago, VM 3 recently started and also errors.

VM 1 (NVMe ZFS Mirror) 14ish hours up

VM 2 (NVMe ZFS Mirror) 14ish hours up

VM 3 (HDD ZFS Mirror), recently start and up 2 hours at this point

FingerlessGloves · Nov 20, 2021

I think my issue maybe related to this 5.13 potential kernel regression.

https://forum.proxmox.com/threads/s...t-raw-on-lvm-on-top-of-drbd.21051/post-431567

Code:

# pveversion -v
proxmox-ve: 7.1-1 (running kernel: 5.13.19-1-pve)
pve-manager: 7.1-5 (running version: 7.1-5/6fe299a0)
pve-kernel-5.13: 7.1-4
pve-kernel-helper: 7.1-4
pve-kernel-5.13.19-1-pve: 5.13.19-2
ceph-fuse: 15.2.15-pve1
corosync: 3.1.5-pve2
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.22-pve2
libproxmox-acme-perl: 1.4.0
libproxmox-backup-qemu0: 1.2.0-1
libpve-access-control: 7.1-2
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.0-14
libpve-guest-common-perl: 4.0-3
libpve-http-server-perl: 4.0-3
libpve-storage-perl: 7.0-15
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 4.0.9-4
lxcfs: 4.0.8-pve2
novnc-pve: 1.2.0-3
proxmox-backup-client: 2.0.14-1
proxmox-backup-file-restore: 2.0.14-1
proxmox-mini-journalreader: 1.2-1
proxmox-widget-toolkit: 3.4-2
pve-cluster: 7.1-2
pve-container: 4.1-2
pve-docs: 7.1-2
pve-edk2-firmware: 3.20210831-2
pve-firewall: 4.2-5
pve-firmware: 3.3-3
pve-ha-manager: 3.3-1
pve-i18n: 2.6-1
pve-qemu-kvm: 6.1.0-2
pve-xtermjs: 4.12.0-1
qemu-server: 7.1-3
smartmontools: 7.2-1
spiceterm: 3.2-2
swtpm: 0.7.0~rc1+2
vncterm: 1.7-1
zfsutils-linux: 2.1.1-pve3

FingerlessGloves · Nov 20, 2021

Set my 'Async IO' to "threads", downloaded a large package and it still gets 'Input/output error', so maybe its not apart of that regression.

jeannotp · Nov 20, 2021

Have issues too, "threads" doesn't help.

Edit: even with VirtIO

FingerlessGloves · Nov 20, 2021

jeannotp said:
Have issues too, "threads" doesn't help.

I've just installed kernel 5.11 again and reboot back in to that kernel. To see if the older kernel helps.

FingerlessGloves · Nov 20, 2021

Update, still seems to error with Linux 5.11.22-7-pve

FingerlessGloves · Nov 20, 2021

Stayed on PVE 5.11 but I've changed some of the VMs over to using SCSI instead of VirtIO Block. When I download large files off the internet, the VM will error for those using VirtIO and the ones using SCSI do not. So I think its issue with VirtIO Block

@jeannotp can you test this on your side too?

FingerlessGloves · Nov 21, 2021

May want to refer to this thread where other people are reporting issues on 7.1 VM disks.
https://forum.proxmox.com/threads/proxmox-ve-7-1-released.99847/page-4

Not sure if the topic for this should be moved other to that one or not.

jeannotp · Nov 21, 2021

FingerlessGloves said:
Stayed on PVE 5.11 but I've changed some of the VMs over to using SCSI instead of VirtIO Block. When I download large files off the internet, the VM will error for those using VirtIO and the ones using SCSI do not. So I think its issue with VirtIO Block

@jeannotp can you test this on your side too?

Hello @FingerlessGloves. My OS don't seem to like SCSI. I'm not much used to change these settings. I just choose SATA and it's fine.
I have FreeBSD VM which work well with VirtIO by the way.

karnow98 · Nov 26, 2021

This is probably not a kernel problem. Let's downgrade pve-qemu-kvm to 6.0.0-4 version. Download from there and install with dpkg.
http://download.proxmox.com/debian/...n/binary-amd64/pve-qemu-kvm_6.0.0-4_amd64.deb
That helped me. I am waiting for a new version of QEMU which will fix the problem.

Sleeck · Nov 29, 2021

Hello,

Same problem here. Temporary workaround from karnow98 works.

Code:

wget http://download.proxmox.com/debian/dists/bullseye/pve-no-subscription/binary-amd64/pve-qemu-kvm_6.0.0-4_amd64.deb
dpkg -i pve-qemu-kvm_6.0.0-4_amd64.deb
apt-mark hold pve-qemu-kvm

pveversion after downgrade.

Code:

proxmox-ve: 7.1-1 (running kernel: 5.13.19-1-pve)
pve-manager: 7.1-6 (running version: 7.1-6/4e61e21c)
pve-kernel-5.13: 7.1-4
pve-kernel-helper: 7.1-4
pve-kernel-5.11: 7.0-10
pve-kernel-5.13.19-1-pve: 5.13.19-3
pve-kernel-5.11.22-7-pve: 5.11.22-12
pve-kernel-5.11.22-1-pve: 5.11.22-2
ceph-fuse: 15.2.13-pve1
corosync: 3.1.5-pve2
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.22-pve2
libproxmox-acme-perl: 1.4.0
libproxmox-backup-qemu0: 1.2.0-1
libpve-access-control: 7.1-5
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.0-14
libpve-guest-common-perl: 4.0-3
libpve-http-server-perl: 4.0-3
libpve-storage-perl: 7.0-15
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 4.0.9-4
lxcfs: 4.0.8-pve2
novnc-pve: 1.2.0-3
proxmox-backup-client: 2.1.2-1
proxmox-backup-file-restore: 2.1.2-1
proxmox-mini-journalreader: 1.2-1
proxmox-widget-toolkit: 3.4-3
pve-cluster: 7.1-2
pve-container: 4.1-2
pve-docs: 7.1-2
pve-edk2-firmware: 3.20210831-2
pve-firewall: 4.2-5
pve-firmware: 3.3-3
pve-ha-manager: 3.3-1
pve-i18n: 2.6-2
pve-qemu-kvm: 6.0.0-4
pve-xtermjs: 4.12.0-1
qemu-server: 7.1-4
smartmontools: 7.2-1
spiceterm: 3.2-2
swtpm: 0.7.0~rc1+2
vncterm: 1.7-1
zfsutils-linux: 2.1.1-pve3

Br,

t.lamprecht · Dec 1, 2021

FYI: There's a new pve-qemu-kvm package in version 6.1.0-3, at time of writing available through the pvetest repository, it should address the issue with VirtIO-Block on certain storage-types/configurations that came in with QEMU 6.1.

https://pve.proxmox.com/wiki/Package_Repositories#sysadmin_test_repo

John_Doe · Dec 1, 2021

I'm not sure if I have the same issue, but it seems like it. Last week I migrated from an ssd to an nvme ssd, looked at it a couple of days and decided to erase the hard drive and use it as an additional device. I also ran apt upgrade before replacing the disk.

Now, in my case the error got so bad, the whole proxmox server basically crashes. I have two VM's (pfsense and home assistant), and 20 lxc container. pfsense starts at boot time and is working fine (although start time is definitely slower for some reason). But as soon as I start the home assistant VM I get I/O errors on the lvm volumes used by the lxc container. Since they're on the same physical disk as the OS, the whole disk is put into read only mode and I'm not able to login anymore trough the physical console.

After a lot of trial and error, I removed a passthrough PCIe device (coral edge tpu to be precise) from the HA VM and since then I haven't had a crash (it's only been two hours though).

Since Internet/network isn't working when I shut down proxmox, I'll leave it be for at least today, but maybe it helps someone else with the same issue.

Again, not sure if this is something else entirely (I don't rule out hardware failure at this point)

edit: forgot to mention that I removed the PCIe device from the VM, not from a container.

Sleeck · Dec 1, 2021

t.lamprecht said:
FYI: There's a new pve-qemu-kvm package in version 6.1.0-3, at time of writing available through the pvetest repository, it should address the issue with VirtIO-Block on certain storage-types/configurations that came in with QEMU 6.1.

https://pve.proxmox.com/wiki/Package_Repositories#sysadmin_test_repo

It works with 6.1.0-3. No more IO issue.
Thank you.

Code:

wget http://download.proxmox.com/debian/pve/dists/bullseye/pvetest/binary-amd64/pve-qemu-kvm_6.1.0-3_amd64.deb
dpkg -i pve-qemu-kvm_6.1.0-3_amd64.deb

@John_Doe Your lxc containers do not use qemu-kvm. So nothing to see. Sorry for you.

Br,

t.lamprecht · Dec 1, 2021

Sleeck said:
It works with 6.1.0-3. No more IO issue.

Thanks for your feedback!

Lokytech · Dec 9, 2021

Same here. I've upgraded to pve-qemu-kvm_6.1.0-3 and no more IO issue on VirtIO disk.

Search

Search

I/O errors since upgrading to PVE 7.1

FingerlessGloves

Well-Known Member

FingerlessGloves

Well-Known Member

FingerlessGloves

Well-Known Member

jeannotp

Member

FingerlessGloves

Well-Known Member

FingerlessGloves

Well-Known Member

FingerlessGloves

Well-Known Member

FingerlessGloves

Well-Known Member

jeannotp

Member

karnow98

New Member

Sleeck

New Member

t.lamprecht

Proxmox Staff Member

John_Doe

Active Member

Sleeck

New Member

t.lamprecht

Proxmox Staff Member

Lokytech

Renowned Member

We value your privacy