VM I/O errors on all disks

onixid · Aug 10, 2021

Hi, I recently upgraded from Proxmox 6 to 7, I have 1 VM and about 7 LXC containers running on it.
My VM runs OpenMediaVault with 4 passthrough 3TB WD Red disks and another WB Black 12Tb connected through USB.
Today I noticed that my network shares were really slow, so I checked the OMV VM and I discovered I/O errors on all the disks, even the 12TB one which is brand new:

Code:

[413289.620177] EXT4-fs warning (device md127): ext4_end_bio:349: I/O error 10 writing to inode 72089625 starting block 1377092879)
[413289.620401] EXT4-fs warning (device md127): ext4_end_bio:349: I/O error 10 writing to inode 72089625 starting block 1377092623)
[413289.620672] EXT4-fs warning (device md127): ext4_end_bio:349: I/O error 10 writing to inode 72089625 starting block 1377092335)
[413289.620895] EXT4-fs warning (device md127): ext4_end_bio:349: I/O error 10 writing to inode 72089625 starting block 1377092079)
[413289.621122] EXT4-fs warning (device md127): ext4_end_bio:349: I/O error 10 writing to inode 72089625 starting block 1377091584)
[413289.621238] EXT4-fs warning (device md127): ext4_end_bio:349: I/O error 10 writing to inode 72089625 starting block 1377091577)
[415979.193504] blk_update_request: I/O error, dev vdc, sector 2561749528 op 0x1:(WRITE) flags 0x0 phys_seg 61 prio class 0
[415979.196641] blk_update_request: I/O error, dev vdc, sector 2561750016 op 0x1:(WRITE) flags 0x0 phys_seg 65 prio class 0
[415979.197571] blk_update_request: I/O error, dev vdc, sector 2561750536 op 0x1:(WRITE) flags 0x0 phys_seg 2 prio class 0
[415979.198623] blk_update_request: I/O error, dev vdc, sector 2561750552 op 0x1:(WRITE) flags 0x0 phys_seg 61 prio class 0
[415979.199583] blk_update_request: I/O error, dev vdc, sector 2561751040 op 0x1:(WRITE) flags 0x0 phys_seg 65 prio class 0
[415979.200678] blk_update_request: I/O error, dev vdc, sector 2561751560 op 0x1:(WRITE) flags 0x0 phys_seg 2 prio class 0
[415979.201701] blk_update_request: I/O error, dev vdc, sector 2561751576 op 0x1:(WRITE) flags 0x0 phys_seg 61 prio class 0
[415979.202722] blk_update_request: I/O error, dev vdc, sector 2561752064 op 0x1:(WRITE) flags 0x0 phys_seg 65 prio class 0
[415979.203610] blk_update_request: I/O error, dev vdc, sector 2561752584 op 0x1:(WRITE) flags 0x0 phys_seg 2 prio class 0
[415979.204614] blk_update_request: I/O error, dev vdc, sector 2561752600 op 0x1:(WRITE) flags 0x0 phys_seg 61 prio class 0
[415987.268192] blk_update_request: I/O error, dev vdb, sector 4099882560 op 0x1:(WRITE) flags 0x0 phys_seg 56 prio class 0
[415987.269386] blk_update_request: I/O error, dev vdb, sector 4099883008 op 0x1:(WRITE) flags 0x0 phys_seg 70 prio class 0
[415987.270351] blk_update_request: I/O error, dev vdb, sector 4099883568 op 0x1:(WRITE) flags 0x0 phys_seg 2 prio class 0
[415987.271278] blk_update_request: I/O error, dev vdb, sector 4099883584 op 0x1:(WRITE) flags 0x0 phys_seg 56 prio class 0
[415987.732732] blk_update_request: I/O error, dev vda, sector 4895432704 op 0x1:(WRITE) flags 0x0 phys_seg 128 prio class 0
[415987.733908] blk_update_request: I/O error, dev vda, sector 4895433728 op 0x1:(WRITE) flags 0x0 phys_seg 128 prio class 0
[415987.734936] blk_update_request: I/O error, dev vda, sector 4895434752 op 0x1:(WRITE) flags 0x0 phys_seg 31 prio class 0
[415987.927613] blk_update_request: I/O error, dev vdd, sector 4895432704 op 0x1:(WRITE) flags 0x0 phys_seg 128 prio class 0
[415987.928717] blk_update_request: I/O error, dev vdd, sector 4895433728 op 0x1:(WRITE) flags 0x0 phys_seg 128 prio class 0
[416032.007461] blk_update_request: I/O error, dev vda, sector 4379576800 op 0x1:(WRITE) flags 0x0 phys_seg 68 prio class 0
[416032.009257] blk_update_request: I/O error, dev vda, sector 4379577344 op 0x1:(WRITE) flags 0x0 phys_seg 58 prio class 0
[416032.010790] blk_update_request: I/O error, dev vda, sector 4379577808 op 0x1:(WRITE) flags 0x0 phys_seg 2 prio class 0
[416032.011669] blk_update_request: I/O error, dev vda, sector 4379577824 op 0x1:(WRITE) flags 0x0 phys_seg 68 prio class 0
[416032.012558] blk_update_request: I/O error, dev vda, sector 4379578368 op 0x1:(WRITE) flags 0x0 phys_seg 72 prio class 0
[416032.013533] blk_update_request: I/O error, dev vda, sector 4379579072 op 0x1:(WRITE) flags 0x0 phys_seg 2 prio class 0
[416032.014630] blk_update_request: I/O error, dev vda, sector 4379579088 op 0x1:(WRITE) flags 0x0 phys_seg 38 prio class 0
[416032.015524] blk_update_request: I/O error, dev vda, sector 4379579392 op 0x1:(WRITE) flags 0x0 phys_seg 88 prio class 0
[416032.016408] blk_update_request: I/O error, dev vda, sector 4379580096 op 0x1:(WRITE) flags 0x0 phys_seg 2 prio class 0
[416032.017554] blk_update_request: I/O error, dev vda, sector 4379580112 op 0x1:(WRITE) flags 0x0 phys_seg 38 prio class 0
[416548.442209] EXT4-fs warning (device md127): ext4_end_bio:349: I/O error 10 writing to inode 72089625 starting block 1094829952)
[416548.442222] Buffer I/O error on device md127, logical block 1094829801
[416548.443247] Buffer I/O error on device md127, logical block 1094829802
[416548.444437] Buffer I/O error on device md127, logical block 1094829803
[416548.446862] Buffer I/O error on device md127, logical block 1094829804
[416548.447871] Buffer I/O error on device md127, logical block 1094829805
[416548.448611] Buffer I/O error on device md127, logical block 1094829806
[416548.449310] Buffer I/O error on device md127, logical block 1094829807
[416548.449977] Buffer I/O error on device md127, logical block 1094829808
[416548.450694] Buffer I/O error on device md127, logical block 1094829809
[416548.451294] Buffer I/O error on device md127, logical block 1094829810
[416548.452166] EXT4-fs warning (device md127): ext4_end_bio:349: I/O error 10 writing to inode 72089625 starting block 1094829696)
[416548.452397] EXT4-fs warning (device md127): ext4_end_bio:349: I/O error 10 writing to inode 72089625 starting block 1094829528)
[416548.452618] EXT4-fs warning (device md127): ext4_end_bio:349: I/O error 10 writing to inode 72089625 starting block 1094829272)
[416548.452863] EXT4-fs warning (device md127): ext4_end_bio:349: I/O error 10 writing to inode 72089625 starting block 1094828986)
[416548.453088] EXT4-fs warning (device md127): ext4_end_bio:349: I/O error 10 writing to inode 72089625 starting block 1223793150)
[416548.453315] EXT4-fs warning (device md127): ext4_end_bio:349: I/O error 10 writing to inode 72089625 starting block 1223792894)
[416548.454021] EXT4-fs warning (device md127): ext4_end_bio:349: I/O error 10 writing to inode 72089625 starting block 640372608)
[416548.454253] EXT4-fs warning (device md127): ext4_end_bio:349: I/O error 10 writing to inode 72089625 starting block 640372545)
[416548.454501] EXT4-fs warning (device md127): ext4_end_bio:349: I/O error 10 writing to inode 72089625 starting block 640372289)
[441221.485209] blk_update_request: I/O error, dev vdd, sector 2759243088 op 0x1:(WRITE) flags 0x0 phys_seg 86 prio class 0
[441221.488351] blk_update_request: I/O error, dev vdd, sector 2759243776 op 0x1:(WRITE) flags 0x0 phys_seg 40 prio class 0
[441221.489100] blk_update_request: I/O error, dev vdd, sector 2759244096 op 0x1:(WRITE) flags 0x0 phys_seg 2 prio class 0
[441221.489862] blk_update_request: I/O error, dev vdd, sector 2759244112 op 0x1:(WRITE) flags 0x0 phys_seg 85 prio class 0
[454841.456051] blk_update_request: I/O error, dev vde, sector 23281261272 op 0x1:(WRITE) flags 0x0 phys_seg 113 prio class 0
[454841.459244] EXT4-fs warning (device vde1): ext4_end_bio:349: I/O error 10 writing to inode 363726718 starting block 2910157661)
[454841.459255] EXT4-fs warning (device vde1): ext4_end_bio:349: I/O error 10 writing to inode 363726718 starting block 2910157772)
[454841.459282] blk_update_request: I/O error, dev vde, sector 23281262176 op 0x1:(WRITE) flags 0x4000 phys_seg 254 prio class 0
[454841.459835] Buffer I/O error on device vde1, logical block 2910157405
[454841.462156] Buffer I/O error on device vde1, logical block 2910157406
[454841.463603] Buffer I/O error on device vde1, logical block 2910157407
[454841.465096] Buffer I/O error on device vde1, logical block 2910157408
[454841.466715] Buffer I/O error on device vde1, logical block 2910157409
[454841.468061] Buffer I/O error on device vde1, logical block 2910157410
[454841.469534] Buffer I/O error on device vde1, logical block 2910157411
[454841.470388] Buffer I/O error on device vde1, logical block 2910157412
[454841.471189] Buffer I/O error on device vde1, logical block 2910157413
[454841.472272] Buffer I/O error on device vde1, logical block 2910157414
[454841.486661] EXT4-fs warning (device vde1): ext4_end_bio:349: I/O error 10 writing to inode 363726719 starting block 2910158028)
[454846.579262] blk_update_request: I/O error, dev vde, sector 23281099800 op 0x1:(WRITE) flags 0x0 phys_seg 27 prio class 0
[454846.580206] EXT4-fs warning (device vde1): ext4_end_bio:349: I/O error 10 writing to inode 363726729 starting block 2910137477)
[454846.580212] EXT4-fs warning (device vde1): ext4_end_bio:349: I/O error 10 writing to inode 363726729 starting block 2910137502)
[454846.580231] blk_update_request: I/O error, dev vde, sector 23281100016 op 0x1:(WRITE) flags 0x4000 phys_seg 254 prio class 0
[454846.582065] Buffer I/O error on device vde1, logical block 2910137221
[454846.582992] Buffer I/O error on device vde1, logical block 2910137222
[454846.583926] Buffer I/O error on device vde1, logical block 2910137223
[454846.584697] Buffer I/O error on device vde1, logical block 2910137224
[454846.585543] Buffer I/O error on device vde1, logical block 2910137225
[454846.586573] Buffer I/O error on device vde1, logical block 2910137226
[454846.587367] Buffer I/O error on device vde1, logical block 2910137227
[454846.588141] Buffer I/O error on device vde1, logical block 2910137228
[454846.588861] Buffer I/O error on device vde1, logical block 2910137229
[454846.589843] Buffer I/O error on device vde1, logical block 2910137230
[454846.679274] EXT4-fs warning (device vde1): ext4_end_bio:349: I/O error 10 writing to inode 363726730 starting block 2910137758)
[454890.033586] blk_update_request: I/O error, dev vde, sector 8844595184 op 0x1:(WRITE) flags 0x0 phys_seg 2 prio class 0
[454890.034461] EXT4-fs warning (device vde1): ext4_end_bio:349: I/O error 10 writing to inode 337383482 starting block 1105574400)
[454890.034475] blk_update_request: I/O error, dev vde, sector 8844595200 op 0x1:(WRITE) flags 0x4000 phys_seg 254 prio class 0
[454890.081856] EXT4-fs warning (device vde1): ext4_end_bio:349: I/O error 10 writing to inode 337383482 starting block 1105574657)
[454890.082167] Buffer I/O error on device vde1, logical block 1105574144
[454890.083020] Buffer I/O error on device vde1, logical block 1105574145
[454890.083927] Buffer I/O error on device vde1, logical block 1105574146
[454890.084724] Buffer I/O error on device vde1, logical block 1105574147
[454890.085560] Buffer I/O error on device vde1, logical block 1105574148
[454890.086413] Buffer I/O error on device vde1, logical block 1105574149
[454890.087190] Buffer I/O error on device vde1, logical block 1105574150
[454890.087959] Buffer I/O error on device vde1, logical block 1105574151
[454890.088672] Buffer I/O error on device vde1, logical block 1105574152
[454890.089426] Buffer I/O error on device vde1, logical block 1105574153
[502429.263152] blk_update_request: I/O error, dev vdc, sector 4371581952 op 0x1:(WRITE) flags 0x0 phys_seg 128 prio class 0
[502429.265592] blk_update_request: I/O error, dev vdc, sector 4371582976 op 0x1:(WRITE) flags 0x0 phys_seg 128 prio class 0
[502429.266449] blk_update_request: I/O error, dev vdc, sector 4371584000 op 0x1:(WRITE) flags 0x0 phys_seg 128 prio class 0
[502429.267305] blk_update_request: I/O error, dev vdc, sector 4371585024 op 0x1:(WRITE) flags 0x0 phys_seg 128 prio class 0
[502429.268211] blk_update_request: I/O error, dev vdc, sector 4371586048 op 0x1:(WRITE) flags 0x0 phys_seg 128 prio class 0
[502429.269015] blk_update_request: I/O error, dev vdc, sector 4371587072 op 0x1:(WRITE) flags 0x0 phys_seg 127 prio class 0
[513402.941444] blk_update_request: I/O error, dev vda, sector 1401241600 op 0x1:(WRITE) flags 0x0 phys_seg 87 prio class 0
[513402.942215] blk_update_request: I/O error, dev vda, sector 1401242296 op 0x1:(WRITE) flags 0x0 phys_seg 2 prio class 0
[513402.943011] blk_update_request: I/O error, dev vda, sector 1401242312 op 0x1:(WRITE) flags 0x0 phys_seg 39 prio class 0
[513402.943751] blk_update_request: I/O error, dev vda, sector 1401242624 op 0x1:(WRITE) flags 0x0 phys_seg 94 prio class 0
[513402.944439] blk_update_request: I/O error, dev vda, sector 1401243424 op 0x1:(WRITE) flags 0x0 phys_seg 2 prio class 0
[513402.945132] blk_update_request: I/O error, dev vda, sector 1401243456 op 0x1:(WRITE) flags 0x0 phys_seg 23 prio class 0
[513840.231868] blk_update_request: I/O error, dev vdc, sector 965668864 op 0x1:(WRITE) flags 0x0 phys_seg 128 prio class 0
[513840.233526] blk_update_request: I/O error, dev vdc, sector 965669888 op 0x1:(WRITE) flags 0x0 phys_seg 116 prio class 0
[513840.234270] blk_update_request: I/O error, dev vdc, sector 965670912 op 0x1:(WRITE) flags 0x0 phys_seg 8 prio class 0
[513840.235015] blk_update_request: I/O error, dev vdc, sector 965670976 op 0x1:(WRITE) flags 0x0 phys_seg 2 prio class 0
[513840.235761] blk_update_request: I/O error, dev vdc, sector 965670992 op 0x1:(WRITE) flags 0x0 phys_seg 118 prio class 0

lsblk output:

Code:

NAME    MAJ:MIN RM  SIZE RO TYPE   MOUNTPOINT
loop0     7:0    0  200G  0 loop   /srv/dev-disk-by-label-Array/CCTV1
loop1     7:1    0  200G  0 loop   /srv/dev-disk-by-label-Array/CCTV2
loop2     7:2    0  200G  0 loop   /srv/dev-disk-by-label-Array/CCTV3
loop3     7:3    0  200G  0 loop   /srv/dev-disk-by-label-Array/CCTV4
loop4     7:4    0  200G  0 loop   /srv/dev-disk-by-label-Array/CCTV5
sda       8:0    0    6G  0 disk
├─sda1    8:1    0    4G  0 part   /
├─sda2    8:2    0    1K  0 part
└─sda5    8:5    0    2G  0 part   [SWAP]
sr0      11:0    1  580M  0 rom
vda     254:0    0  2.7T  0 disk
└─md127   9:127  0  5.5T  0 raid10 /srv/dev-disk-by-label-Array
vdb     254:16   0  2.7T  0 disk
└─md127   9:127  0  5.5T  0 raid10 /srv/dev-disk-by-label-Array
vdc     254:32   0  2.7T  0 disk
└─md127   9:127  0  5.5T  0 raid10 /srv/dev-disk-by-label-Array
vdd     254:48   0  2.7T  0 disk
└─md127   9:127  0  5.5T  0 raid10 /srv/dev-disk-by-label-Array
vde     254:64   0 10.9T  0 disk
└─vde1  254:65   0 10.9T  0 part   /srv/dev-disk-by-label-WD12

I honestly doubt that all the disks are failing at once, even the brand new one, so my fear is that there's something wrong with the VM, or something got messed up with the upgrade to Proxmox 7.
Can anyone give me some hints on what could be happening here?

fabian · Aug 11, 2021

please also show pveversion -v, the VM config and the hypervisor logs from around the same time..

onixid · Aug 11, 2021

fabian said:
please also show pveversion -v, the VM config and the hypervisor logs from around the same time..

pveversion -v:

Code:

root@pve:~# pveversion -v
proxmox-ve: 7.0-2 (running kernel: 5.11.22-3-pve)
pve-manager: 7.0-10 (running version: 7.0-10/d2f465d3)
pve-kernel-5.11: 7.0-6
pve-kernel-helper: 7.0-6
pve-kernel-5.4: 6.4-5
pve-kernel-5.11.22-3-pve: 5.11.22-6
pve-kernel-5.4.128-1-pve: 5.4.128-1
pve-kernel-5.4.124-1-pve: 5.4.124-2
ceph-fuse: 14.2.21-1
corosync: 3.1.2-pve2
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown: residual config
ifupdown2: 3.1.0-1+pmx3
libjs-extjs: 7.0.0-1
libknet1: 1.21-pve1
libproxmox-acme-perl: 1.2.0
libproxmox-backup-qemu0: 1.2.0-1
libpve-access-control: 7.0-4
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.0-5
libpve-guest-common-perl: 4.0-2
libpve-http-server-perl: 4.0-2
libpve-storage-perl: 7.0-9
libqb0: 1.0.5-1
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 4.0.9-4
lxcfs: 4.0.8-pve2
novnc-pve: 1.2.0-3
proxmox-backup-client: 2.0.7-1
proxmox-backup-file-restore: 2.0.7-1
proxmox-mini-journalreader: 1.2-1
proxmox-widget-toolkit: 3.3-6
pve-cluster: 7.0-3
pve-container: 4.0-8
pve-docs: 7.0-5
pve-edk2-firmware: 3.20200531-1
pve-firewall: 4.2-2
pve-firmware: 3.2-4
pve-ha-manager: 3.3-1
pve-i18n: 2.4-1
pve-qemu-kvm: 6.0.0-2
pve-xtermjs: 4.12.0-1
qemu-server: 7.0-11
smartmontools: 7.2-pve2
spiceterm: 3.2-2
vncterm: 1.7-1
zfsutils-linux: 2.0.5-pve1
root@pve:~#

VM Config:

Code:

root@pve:/etc/pve/qemu-server# cat 100.conf
boot: order=scsi0;ide2;net0
cores: 2
ide2: local:iso/openmediavault_5.5.11-amd64.iso,media=cdrom
memory: 4096
name: OMV
net0: virtio=7A:5C:C0:93:8A:F7,bridge=vmbr0,firewall=1
numa: 0
onboot: 1
ostype: l26
scsi0: local:100/vm-100-disk-0.qcow2,size=6G
scsihw: virtio-scsi-pci
smbios1: uuid=be80ed6c-d528-45e0-8bdf-8f3b96d158f5
sockets: 1
virtio1: /dev/disk/by-id/ata-WDC_WD30EFRX-68EUZN0_WD-WCC4N1JP0EY0,size=2930266584K
virtio2: /dev/disk/by-id/ata-WDC_WD30EFRX-68EUZN0_WD-WCC4N4UEJS7U,size=2930266584K
virtio3: /dev/disk/by-id/ata-WDC_WD30EFRX-68EUZN0_WD-WCC4N7PE7034,size=2930266584K
virtio4: /dev/disk/by-id/ata-WDC_WD30EFRX-68EUZN0_WD-WCC4N7PE74ZU,size=2930266584K
virtio5: /dev/disk/by-id/usb-WD_Elements_25A3_35504B3258353242-0:0,size=11444192M
vmgenid: 2ac6e010-0881-4df0-8986-21bdc43fd3a6

[PENDING]
sockets: 2

PVE syslog, but I see no reference to that VM id:

Code:

Aug 11 03:20:00 pve systemd[1]: Starting Proxmox VE replication runner...
Aug 11 03:20:02 pve systemd[1]: pvesr.service: Succeeded.
Aug 11 03:20:02 pve systemd[1]: Finished Proxmox VE replication runner.
Aug 11 03:20:02 pve systemd[1]: pvesr.service: Consumed 1.142s CPU time.
Aug 11 03:20:13 pve kernel: [749491.787596] audit: type=1400 audit(1628644813.449:126729): apparmor="DENIED" operation="mount" info="failed flags match" error=-13 profile="lxc-103_</var/lib/lxc>" name="/run/systemd/unit-root/proc/" pid=1383
80 comm="(d-logind)" fstype="proc" srcname="proc" flags="rw, nosuid, nodev, noexec"
Aug 11 03:20:13 pve kernel: [749491.823951] audit: type=1400 audit(1628644813.485:126730): apparmor="DENIED" operation="mount" info="failed flags match" error=-13 profile="lxc-103_</var/lib/lxc>" name="/run/systemd/unit-root/proc/" pid=1383
85 comm="(d-logind)" fstype="proc" srcname="proc" flags="rw, nosuid, nodev, noexec"
Aug 11 03:20:13 pve kernel: [749491.860138] audit: type=1400 audit(1628644813.521:126731): apparmor="DENIED" operation="mount" info="failed flags match" error=-13 profile="lxc-103_</var/lib/lxc>" name="/run/systemd/unit-root/proc/" pid=1383
89 comm="(d-logind)" fstype="proc" srcname="proc" flags="rw, nosuid, nodev, noexec"
Aug 11 03:20:13 pve kernel: [749491.896529] audit: type=1400 audit(1628644813.557:126732): apparmor="DENIED" operation="mount" info="failed flags match" error=-13 profile="lxc-103_</var/lib/lxc>" name="/run/systemd/unit-root/proc/" pid=1383
93 comm="(d-logind)" fstype="proc" srcname="proc" flags="rw, nosuid, nodev, noexec"
Aug 11 03:20:13 pve kernel: [749491.931841] audit: type=1400 audit(1628644813.593:126733): apparmor="DENIED" operation="mount" info="failed flags match" error=-13 profile="lxc-103_</var/lib/lxc>" name="/run/systemd/unit-root/proc/" pid=1383
97 comm="(d-logind)" fstype="proc" srcname="proc" flags="rw, nosuid, nodev, noexec"
Aug 11 03:20:39 pve kernel: [749517.812425] audit: type=1400 audit(1628644839.473:126734): apparmor="DENIED" operation="mount" info="failed flags match" error=-13 profile="lxc-103_</var/lib/lxc>" name="/run/systemd/unit-root/proc/" pid=1385
30 comm="(d-logind)" fstype="proc" srcname="proc" flags="rw, nosuid, nodev, noexec"
Aug 11 03:20:39 pve kernel: [749517.850357] audit: type=1400 audit(1628644839.513:126735): apparmor="DENIED" operation="mount" info="failed flags match" error=-13 profile="lxc-103_</var/lib/lxc>" name="/run/systemd/unit-root/proc/" pid=1385
34 comm="(d-logind)" fstype="proc" srcname="proc" flags="rw, nosuid, nodev, noexec"
Aug 11 03:20:39 pve kernel: [749517.886472] audit: type=1400 audit(1628644839.549:126736): apparmor="DENIED" operation="mount" info="failed flags match" error=-13 profile="lxc-103_</var/lib/lxc>" name="/run/systemd/unit-root/proc/" pid=1385
38 comm="(d-logind)" fstype="proc" srcname="proc" flags="rw, nosuid, nodev, noexec"
Aug 11 03:20:39 pve kernel: [749517.922345] audit: type=1400 audit(1628644839.585:126737): apparmor="DENIED" operation="mount" info="failed flags match" error=-13 profile="lxc-103_</var/lib/lxc>" name="/run/systemd/unit-root/proc/" pid=1385
42 comm="(d-logind)" fstype="proc" srcname="proc" flags="rw, nosuid, nodev, noexec"
Aug 11 03:20:39 pve kernel: [749517.953789] audit: type=1400 audit(1628644839.617:126738): apparmor="DENIED" operation="mount" info="failed flags match" error=-13 profile="lxc-103_</var/lib/lxc>" name="/run/systemd/unit-root/proc/" pid=1385
46 comm="(d-logind)" fstype="proc" srcname="proc" flags="rw, nosuid, nodev, noexec"
Aug 11 03:21:00 pve systemd[1]: Starting Proxmox VE replication runner...
Aug 11 03:21:02 pve systemd[1]: pvesr.service: Succeeded.
Aug 11 03:21:02 pve systemd[1]: Finished Proxmox VE replication runner.
Aug 11 03:21:02 pve systemd[1]: pvesr.service: Consumed 1.274s CPU time.
Aug 11 03:21:05 pve kernel: [749543.828476] audit: type=1400 audit(1628644865.489:126739): apparmor="DENIED" operation="mount" info="failed flags match" error=-13 profile="lxc-103_</var/lib/lxc>" name="/run/systemd/unit-root/proc/" pid=1387
15 comm="(d-logind)" fstype="proc" srcname="proc" flags="rw, nosuid, nodev, noexec"
Aug 11 03:21:05 pve kernel: [749543.865424] audit: type=1400 audit(1628644865.525:126740): apparmor="DENIED" operation="mount" info="failed flags match" error=-13 profile="lxc-103_</var/lib/lxc>" name="/run/systemd/unit-root/proc/" pid=1387
19 comm="(d-logind)" fstype="proc" srcname="proc" flags="rw, nosuid, nodev, noexec"
Aug 11 03:21:05 pve kernel: [749543.902842] audit: type=1400 audit(1628644865.565:126741): apparmor="DENIED" operation="mount" info="failed flags match" error=-13 profile="lxc-103_</var/lib/lxc>" name="/run/systemd/unit-root/proc/" pid=1387
23 comm="(d-logind)" fstype="proc" srcname="proc" flags="rw, nosuid, nodev, noexec"
Aug 11 03:21:05 pve kernel: [749543.938081] audit: type=1400 audit(1628644865.601:126742): apparmor="DENIED" operation="mount" info="failed flags match" error=-13 profile="lxc-103_</var/lib/lxc>" name="/run/systemd/unit-root/proc/" pid=1387
27 comm="(d-logind)" fstype="proc" srcname="proc" flags="rw, nosuid, nodev, noexec"
Aug 11 03:21:05 pve kernel: [749543.970092] audit: type=1400 audit(1628644865.633:126743): apparmor="DENIED" operation="mount" info="failed flags match" error=-13 profile="lxc-103_</var/lib/lxc>" name="/run/systemd/unit-root/proc/" pid=1387
31 comm="(d-logind)" fstype="proc" srcname="proc" flags="rw, nosuid, nodev, noexec"
Aug 11 03:21:31 pve kernel: [749569.843288] audit: type=1400 audit(1628644891.505:126744): apparmor="DENIED" operation="mount" info="failed flags match" error=-13 profile="lxc-103_</var/lib/lxc>" name="/run/systemd/unit-root/proc/" pid=1388
47 comm="(d-logind)" fstype="proc" srcname="proc" flags="rw, nosuid, nodev, noexec"
Aug 11 03:21:31 pve kernel: [749569.880429] audit: type=1400 audit(1628644891.541:126745): apparmor="DENIED" operation="mount" info="failed flags match" error=-13 profile="lxc-103_</var/lib/lxc>" name="/run/systemd/unit-root/proc/" pid=1388
51 comm="(d-logind)" fstype="proc" srcname="proc" flags="rw, nosuid, nodev, noexec"
Aug 11 03:21:31 pve kernel: [749569.917275] audit: type=1400 audit(1628644891.577:126746): apparmor="DENIED" operation="mount" info="failed flags match" error=-13 profile="lxc-103_</var/lib/lxc>" name="/run/systemd/unit-root/proc/" pid=1388
55 comm="(d-logind)" fstype="proc" srcname="proc" flags="rw, nosuid, nodev, noexec"
Aug 11 03:21:31 pve kernel: [749569.950935] audit: type=1400 audit(1628644891.613:126747): apparmor="DENIED" operation="mount" info="failed flags match" error=-13 profile="lxc-103_</var/lib/lxc>" name="/run/systemd/unit-root/proc/" pid=1388
59 comm="(d-logind)" fstype="proc" srcname="proc" flags="rw, nosuid, nodev, noexec"
Aug 11 03:21:31 pve kernel: [749569.987017] audit: type=1400 audit(1628644891.649:126748): apparmor="DENIED" operation="mount" info="failed flags match" error=-13 profile="lxc-103_</var/lib/lxc>" name="/run/systemd/unit-root/proc/" pid=1388
63 comm="(d-logind)" fstype="proc" srcname="proc" flags="rw, nosuid, nodev, noexec"

fabian · Aug 11, 2021

can you try adding ',aio=native' to the virtio drives in the VM config, and then do a stop and start of the VM?

onixid · Aug 11, 2021

fabian said:
can you try adding ',aio=native' to the virtio drives in the VM config, and then do a stop and start of the VM?

You mean adding it like this only to the following devices?

Code:

virtio1: /dev/disk/by-id/ata-WDC_WD30EFRX-68EUZN0_WD-WCC4N1JP0EY0,size=2930266584K,aio=native
virtio2: /dev/disk/by-id/ata-WDC_WD30EFRX-68EUZN0_WD-WCC4N4UEJS7U,size=2930266584K,aio=native
virtio3: /dev/disk/by-id/ata-WDC_WD30EFRX-68EUZN0_WD-WCC4N7PE7034,size=2930266584K,aio=native
virtio4: /dev/disk/by-id/ata-WDC_WD30EFRX-68EUZN0_WD-WCC4N7PE74ZU,size=2930266584K,aio=native
virtio5: /dev/disk/by-id/usb-WD_Elements_25A3_35504B3258353242-0:0,size=11444192M,aio=native

fabian · Aug 11, 2021

onixid · Aug 11, 2021

fabian said:
yes

After the change the RAID 10 is not getting mounted anymore:

Code:

Aug 11 12:31:54 nas01 monit[904]: Lookup for '/srv/dev-disk-by-label-Array' filesystem failed  -- not found in /proc/self/mounts
Aug 11 12:31:54 nas01 monit[904]: Filesystem '/srv/dev-disk-by-label-Array' not mounted
Aug 11 12:31:54 nas01 monit[904]: 'filesystem_srv_dev-disk-by-label-Array' unable to read filesystem '/srv/dev-disk-by-label-Array' state
Aug 11 12:31:55 nas01 postfix/smtp[1013]: C045327A5: replace: header Subject: Monitoring restart -- Does not exist filesystem_srv: Subject: [nas01.XXXXXX] Monitoring restart -- Does not exist filesystem_srv_dev-disk-by-label-Array
Aug 11 12:32:25 nas01 monit[904]: Filesystem '/srv/dev-disk-by-label-Array' not mounted
Aug 11 12:32:25 nas01 monit[904]: 'filesystem_srv_dev-disk-by-label-Array' unable to read filesystem '/srv/dev-disk-by-label-Array' state
Aug 11 12:32:25 nas01 monit[904]: 'filesystem_srv_dev-disk-by-label-Array' trying to restart
Aug 11 12:32:25 nas01 monit[904]: 'mountpoint_srv_dev-disk-by-label-Array' status failed (1) -- /srv/dev-disk-by-label-Array is not a mountpoint

onixid · Aug 11, 2021

I tried to revert the change, the system still is not mounting the raid, I had 6Tb of data in it...

onixid · Aug 11, 2021

I had to manually restart the array

onixid said:
I tried to revert the change, the system still is not mounting the raid, I had 6Tb of data in it...

Update, I had to manually restart the md raid, it worked fine, so I switched off the VM again and then I reapplied the aio flag to the virtio devices.
Once rebooted the array started up properly.
I will keep monitoring the situation for about 24 hours and see how it goes.

kristian.kirilov · Nov 24, 2021

I have the same issue only for 1 VM, at least

I have tried setting up native with no cache, but this makes the things even worst - the VM completely freezes.

Other ideas how to resolve the issue?
Thanks in advance.

shukko · Nov 24, 2021

After upgrading to several proxmox 6 boxes to proxmox 7.1 I've got exactly the same problem.

but only on centos 6 KVM virtual machines.

Centos 7 and Centos 8 / AlmaLinux 8 KVM Virtual servers are not having this problem.

Centos 7 KVM Virtual Servers Going into Read Only randomly after giving DISK I/O errors and stopping journald inside VM.

This is an urgent problem.

I will add other screenshots and pve server info in another post..

Current Proxmox Node systems:

4x2TB SSD , ZFS Fle system in ZFS RAID 10 , Kvm Virtual Servers Using Ext4 file system.

shukko · Nov 24, 2021

shukko said:
I will add other screenshots and pve server info in another post..

Code:

root@r5:~# pveversion -v
proxmox-ve: 7.1-1 (running kernel: 5.13.19-1-pve)
pve-manager: 7.1-5 (running version: 7.1-5/6fe299a0)
pve-kernel-5.13: 7.1-4
pve-kernel-helper: 7.1-4
pve-kernel-5.4: 6.4-7
pve-kernel-5.13.19-1-pve: 5.13.19-2
pve-kernel-5.4.143-1-pve: 5.4.143-1
pve-kernel-5.4.73-1-pve: 5.4.73-1
ceph-fuse: 14.2.21-1
corosync: 3.1.5-pve2
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown: 0.8.36+pve1
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.22-pve2
libproxmox-acme-perl: 1.4.0
libproxmox-backup-qemu0: 1.2.0-1
libpve-access-control: 7.1-2
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.0-14
libpve-guest-common-perl: 4.0-3
libpve-http-server-perl: 4.0-3
libpve-storage-perl: 7.0-15
libqb0: 1.0.5-1
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 4.0.9-4
lxcfs: 4.0.8-pve2
novnc-pve: 1.2.0-3
proxmox-backup-client: 2.0.14-1
proxmox-backup-file-restore: 2.0.14-1
proxmox-mini-journalreader: 1.2-1
proxmox-widget-toolkit: 3.4-2
pve-cluster: 7.1-2
pve-container: 4.1-2
pve-docs: 7.1-2
pve-edk2-firmware: 3.20210831-2
pve-firewall: 4.2-5
pve-firmware: 3.3-3
pve-ha-manager: 3.3-1
pve-i18n: 2.6-1
pve-qemu-kvm: 6.1.0-2
pve-xtermjs: 4.12.0-1
qemu-server: 7.1-3
smartmontools: 7.2-pve2
spiceterm: 3.2-2
swtpm: 0.7.0~rc1+2
vncterm: 1.7-1
zfsutils-linux: 2.1.1-pve3

Code:

 cat /etc/pve/qemu-server/137.conf
bootdisk: virtio0
cores: 4
cpu: host
ide2: none,media=cdrom
memory: 4096
name: xxx.xxxx.com
net0: virtio=DA:AA:99:95:1A:F4,bridge=vmbr0
numa: 1
onboot: 1
ostype: l26
smbios1: uuid=182c525f-da14-4513-89dd-41753807b6a1
sockets: 1
virtio0: local-zfs:vm-137-disk-0,size=400G

kristian.kirilov · Nov 24, 2021

On my side only Ubuntu VM's are affected, but I think this is shoot in the dark. How this can be related to distribution of the OS..

Here are the details about my hypervisor:

Code:

root@proxmox-node-1.home.lan:~# pveversion -v
proxmox-ve: 7.1-1 (running kernel: 5.13.19-1-pve)
pve-manager: 7.1-6 (running version: 7.1-6/4e61e21c)
pve-kernel-5.13: 7.1-4
pve-kernel-helper: 7.1-4
pve-kernel-5.11: 7.0-10
pve-kernel-5.13.19-1-pve: 5.13.19-3
pve-kernel-5.11.22-7-pve: 5.11.22-12
ceph-fuse: 15.2.15-pve1
corosync: 3.1.5-pve2
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown: residual config
ifupdown2: 3.1.0-1+pmx3
libjs-extjs: 7.0.0-1
libknet1: 1.22-pve2
libproxmox-acme-perl: 1.4.0
libproxmox-backup-qemu0: 1.2.0-1
libpve-access-control: 7.1-4
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.0-14
libpve-guest-common-perl: 4.0-3
libpve-http-server-perl: 4.0-3
libpve-storage-perl: 7.0-15
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 4.0.9-4
lxcfs: 4.0.8-pve2
novnc-pve: 1.2.0-3
openvswitch-switch: 2.15.0+ds1-2
proxmox-backup-client: 2.1.2-1
proxmox-backup-file-restore: 2.1.2-1
proxmox-mini-journalreader: 1.2-1
proxmox-widget-toolkit: 3.4-3
pve-cluster: 7.1-2
pve-container: 4.1-2
pve-docs: 7.1-2
pve-edk2-firmware: 3.20210831-2
pve-firewall: 4.2-5
pve-firmware: 3.3-3
pve-ha-manager: 3.3-1
pve-i18n: 2.6-2
pve-qemu-kvm: 6.1.0-2
pve-xtermjs: 4.12.0-1
pve-zsync: 2.2
qemu-server: 7.1-4
smartmontools: 7.2-pve2
spiceterm: 3.2-2
swtpm: 0.7.0~rc1+2
vncterm: 1.7-1
zfsutils-linux: 2.1.1-pve3
root@proxmox-node-1.home.lan:~#

Code:

root@proxmox-node-1.home.lan:~# cat /etc/pve/qemu-server/111.conf
#SERV%3A GITLAB, POSTGRESQL, LLDP, NRPE, SNMP, PUPPET, POSTFIX
#IP%3A 192.168.0.21
#VLAN%3A 30
#PAT%3A NONE
agent: 1
boot: order=virtio0;ide2;net0
cores: 2
cpu: host
ide2: none,media=cdrom
memory: 3072
name: behemoth.home.lan
net0: virtio=12:34:56:69:ED:70,bridge=vmbr1,tag=30
numa: 0
ostype: l26
scsihw: virtio-scsi-pci
serial0: socket
smbios1: uuid=21bcc491-27a8-4195-8772-1373b7db83e3
sockets: 1
virtio0: zpool-ssd-01:vm-111-disk-0,cache=writeback,size=24G
vmgenid: 7d086a51-865b-4bf9-891c-f444a0d17fc1
root@proxmox-node-1.home.lan:~#

I even tried to move the VM to another physical disks with another ZFS pool, but this doesn't help.

Kabbone · Nov 24, 2021

I noticed the same behaviour the last weeks. I have two nodes in the cluster, but so far I only saw it on one of them, the only difference is, that one has a ZFS Pool with the Volumes in it, so this looks like my assumption. But Op doesn't seem to use ZFS...
EDIT: Another difference is NVMe and SATA SSD between my nodes, NVMe is the error one with ZFS

I already tried some things, like aio=native or even changing the filesystems in the VMs from BTRFS to ext4, no change. It looks like this only occours under heavy IO Load, e.g. backup to PBS or scrubbing BTRFS.

Code:

root@apollon:~# pveversion -v
proxmox-ve: 7.1-1 (running kernel: 5.13.19-1-pve)
pve-manager: 7.1-5 (running version: 7.1-5/6fe299a0)
pve-kernel-5.13: 7.1-4
pve-kernel-helper: 7.1-4
pve-kernel-5.13.19-1-pve: 5.13.19-2
ceph-fuse: 14.2.21-1
corosync: 3.1.5-pve2
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown: residual config
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.22-pve2
libproxmox-acme-perl: 1.4.0
libproxmox-backup-qemu0: 1.2.0-1
libpve-access-control: 7.1-2
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.0-14
libpve-guest-common-perl: 4.0-3
libpve-http-server-perl: 4.0-3
libpve-storage-perl: 7.0-15
libqb0: 1.0.5-1
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 4.0.9-4
lxcfs: 4.0.8-pve2
novnc-pve: 1.2.0-3
proxmox-backup-client: 2.0.14-1
proxmox-backup-file-restore: 2.0.14-1
proxmox-mini-journalreader: 1.2-1
proxmox-widget-toolkit: 3.4-2
pve-cluster: 7.1-2
pve-container: 4.1-2
pve-docs: 7.1-2
pve-edk2-firmware: 3.20210831-2
pve-firewall: 4.2-5
pve-firmware: 3.3-3
pve-ha-manager: 3.3-1
pve-i18n: 2.6-1
pve-qemu-kvm: 6.1.0-2
pve-xtermjs: 4.12.0-1
qemu-server: 7.1-3
smartmontools: 7.2-pve2
spiceterm: 3.2-2
swtpm: 0.7.0~rc1+2
vncterm: 1.7-1
zfsutils-linux: 2.1.1-pve3

Code:

root@apollon:/etc/pve/qemu-server# cat 303.conf
agent: 1,fstrim_cloned_disks=1
balloon: 0
bios: ovmf
boot: order=virtio1
cores: 2
cpu: host
cpuunits: 512
efidisk0: local-lvm:vm-303-disk-0,size=4M
machine: q35
memory: 4096
name: test
net0: virtio=1A:B5:3C:B0:B9:3D,bridge=vmbr103,firewall=1
numa: 0
onboot: 1
ostype: l26
scsi0: /dev/disk/by-id/usb-WD_Elements_25A3_3551474738333746-0:0,aio=native,backup=0,replicate=0,size=11444192M
scsi1: /dev/disk/by-id/usb-WD_Elements_25A3_355147474A554B46-0:0,aio=native,backup=0,replicate=0,size=11444192M
scsihw: virtio-scsi-pci
smbios1: uuid=b234879d-e40d-4407-a043-f33476470440
sockets: 1
virtio1: local-lvm:vm-303-disk-1,aio=native,discard=on,size=20G
virtio2: WD_Black:vm-303-disk-0,aio=native,discard=on,size=100G
vmgenid: 81201fb1-f858-4b93-8f83-04818a2a248e

There are also two other VMs with this error, but the config is quite similiar to this, just no USB passthrough.

EDIT2:
Now I moved one VM completely to my lvm SSD Storage and no more Errors occure

Code:

agent: 1,fstrim_cloned_disks=1
balloon: 0
bios: ovmf
boot: order=virtio1
cores: 2
cpu: host
efidisk0: local-lvm:vm-301-disk-0,size=128K
machine: q35
memory: 2048
name: DMZ-entry
net0: virtio=xx:xx:xx:xx:xx:xx,bridge=vmbr101,firewall=1
numa: 0
onboot: 1
ostype: l26
scsihw: virtio-scsi-pci
sockets: 1
startup: order=2,up=60
virtio0: local-lvm:vm-301-disk-2,discard=on,size=20G
virtio1: local-lvm:vm-301-disk-1,discard=on,size=20G
virtio3: local-lvm:vm-301-disk-3,discard=on,size=10G
virtio4: local-lvm:vm-301-disk-4,discard=on,size=10G

shukko · Nov 24, 2021

here is a new screenshot

kristian.kirilov · Nov 25, 2021

This morning I tried to limit the network and disk speed to 1MB/ps, the problem still persist.
From my point of view it is not about heavy load, but the amount of data being written to the disk.

As you see, the gitlab installation is one huge package, which contain all the things related to this software.

Just to ensure that the error won't come up again after few minutes of idling, I have rebooted the server, logged in there and started the iotop tool to see how the system behaves - sporadically I saw disk reads by different gitlab processes with the maximum speed of (5MB, I increased the speed this time) but after 5 minutes normal work (without this huge write) the error haven't appeared so far:

kristian.kirilov · Nov 25, 2021

So maybe the last test we can do is to simulate many small writes, just to check if it will appear or not.

Because by far as mentioned above, the issue comes up only when trying to store/write big piece of data.

shukko · Nov 26, 2021

shukko said:
here is a new screenshot

View attachment 31711

Hey My problem is now solved!

Solution: Switch all disks to Virtio SCSI from Virtio.

detach disk from vm and reattach using SCSI

Setup new boot order for SCSI

Out of 20 KVM virtual Servers all running Centos 6 only one server suffered some minor data loss. But restored.

No problems with SCSI so far...

kristian.kirilov · Nov 26, 2021

I'm not sure what you are talking about,

You want me to switch to Virtio Single or?

spirit · Nov 26, 2021

use scsiX for your disks instead virtioX, and keep virtio-scsi controller type.

VM I/O errors on all disks

Member

Proxmox Staff Member

Member

Proxmox Staff Member

Member

Proxmox Staff Member

Member

Member

Member

Well-Known Member

Renowned Member

Renowned Member

Well-Known Member

New Member

Renowned Member

Well-Known Member

Well-Known Member

Renowned Member

Well-Known Member

Distinguished Member