Sporadic Buffer I/O error on device vda1 inside guest,RAW on LVM on top of DRBD

Would anyone have any idea where should i look ?
Just a hunch, can you edit the VM disk and switch the disk's Async IO mode to threads (for cache = write back/through) or native (for cache = off, none or direct sync), respectively.

We're currently looking into a potential regression with the new 5.13 Kernel and io_uring and (mainly, but potentially not limited too) VirtIO Block or SATA as bus controller, interestingly we can mainly reproduce such symptoms with using Windows, but it could be just easier to trigger that way or really a different issue. Anyhow, that's why it'd be interesting to see if switching Async Mode would help you.

Possibly also worth a try, but potentially not as straight forward as async mode switch: use SCSI for the disk bus (detach disk and re-attach by edit with SCSI). The dev name in the VM will change from /dev/vdX to /dev/sdX though.
 
Just a hunch, can you edit the VM disk and switch the disk's Async IO mode to threads (for cache = write back/through) or native (for cache = off, none or direct sync), respectively.

We're currently looking into a potential regression with the new 5.13 Kernel and io_uring and (mainly, but potentially not limited too) VirtIO Block or SATA as bus controller, interestingly we can mainly reproduce such symptoms with using Windows, but it could be just easier to trigger that way or really a different issue. Anyhow, that's why it'd be interesting to see if switching Async Mode would help you.

Possibly also worth a try, but potentially not as straight forward as async mode switch: use SCSI for the disk bus (detach disk and re-attach by edit with SCSI). The dev name in the VM will change from /dev/vdX to /dev/sdX though.
On it !
Thank you.

Would you like any other information ?
You are right, VM are configured like that :
Code:
#172.X.Y.Z
agent: 1
balloon: 0
boot: cd
bootdisk: virtio0
cores: 4
cpu: host
ide2: none,media=cdrom
memory: 20480
name: MED-BDD-5
net0: virtio=XX:XX:XX:XX:XX:XX,bridge=vmbr0
numa: 0
ostype: l26
scsihw: virtio-scsi-pci
smbios1: uuid=0ae67392-7061-4eb8-adce-c902a7a56e62
sockets: 2
startup: up=20
vga: std
virtio0: MD3200i-1-RAID5-2:vm-226-disk-0,cache=writethrough,size=50G
virtio1: MD3200i-1-RAID1-1:vm-226-disk-0,backup=0,cache=writethrough,mbps_rd=80,mbps_wr=80,size=480G
virtio2: MD3200i-1-RAID1-2:vm-226-disk-0,backup=0,cache=writethrough,mbps_rd=80,mbps_wr=80,size=550G
virtio3: MD3200i-1-RAID1-3:vm-226-disk-0,backup=0,cache=writethrough,mbps_rd=80,mbps_wr=80,size=550G
virtio4: MD3200i-1-RAID5-1:vm-226-disk-0,backup=0,cache=writethrough,mbps_rd=80,mbps_wr=80,size=500G

Yes, i even went to reduce the bandwidth with the new option in pve7. No change at all :(

EDIT: i'm rebooting a node on the last kernel first.
 
Last edited:
  • Like
Reactions: _FeDTeN_
pveversion -v
Code:
proxmox-ve: 7.1-1 (running kernel: 5.13.19-1-pve)
pve-manager: 7.1-5 (running version: 7.1-5/6fe299a0)
pve-kernel-5.13: 7.1-4
pve-kernel-helper: 7.1-4
pve-kernel-5.11: 7.0-10
pve-kernel-5.4: 6.4-7
pve-kernel-5.13.19-1-pve: 5.13.19-2
pve-kernel-5.11.22-7-pve: 5.11.22-12
pve-kernel-5.4.143-1-pve: 5.4.143-1
pve-kernel-5.4.34-1-pve: 5.4.34-2
ceph: 16.2.6-pve2
ceph-fuse: 16.2.6-pve2
corosync: 3.1.5-pve2
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown: residual config
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.22-pve2
libproxmox-acme-perl: 1.4.0
libproxmox-backup-qemu0: 1.2.0-1
libpve-access-control: 7.1-2
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.0-14
libpve-guest-common-perl: 4.0-3
libpve-http-server-perl: 4.0-3
libpve-storage-perl: 7.0-15
libqb0: 1.0.5-1
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 4.0.9-4
lxcfs: 4.0.8-pve2
novnc-pve: 1.2.0-3
proxmox-backup-client: 2.0.14-1
proxmox-backup-file-restore: 2.0.14-1
proxmox-mini-journalreader: 1.2-1
proxmox-widget-toolkit: 3.4-2
pve-cluster: 7.1-2
pve-container: 4.1-2
pve-docs: 7.1-2
pve-edk2-firmware: 3.20210831-2
pve-firewall: 4.2-5
pve-firmware: 3.3-3
pve-ha-manager: 3.3-1
pve-i18n: 2.6-1
pve-qemu-kvm: 6.1.0-2
pve-xtermjs: 4.12.0-1
qemu-server: 7.1-3
smartmontools: 7.2-pve2
spiceterm: 3.2-2
swtpm: 0.7.0~rc1+2
vncterm: 1.7-1
zfsutils-linux: 2.1.1-pve3

Changed vm config to :
Code:
agent: 1
balloon: 0
boot: cd
bootdisk: virtio0
cores: 4
cpu: host
ide2: none,media=cdrom
memory: 20480
name: MED-BDD-5
net0: virtio=46:58:91:52:2A:F7,bridge=vmbr0
numa: 0
ostype: l26
scsihw: virtio-scsi-pci
smbios1: uuid=0ae67392-7061-4eb8-adce-c902a7a56e62
sockets: 2
startup: up=20
vga: std
virtio0: MD3200i-1-RAID5-2:vm-226-disk-0,aio=native,size=50G
virtio1: MD3200i-1-RAID1-1:vm-226-disk-0,aio=native,backup=0,size=480G
virtio2: MD3200i-1-RAID1-2:vm-226-disk-0,aio=native,backup=0,size=550G
virtio3: MD3200i-1-RAID1-3:vm-226-disk-0,aio=native,backup=0,size=550G
virtio4: MD3200i-1-RAID5-1:vm-226-disk-0,aio=native,backup=0,size=500G

Test is running, 170G already transfered. So far so good. No IO Error yet. 100Mb/s writing. Lets wait for the transfer to finish and check if i have corrupted files or IO error.

Thank you for your "hunch" t.lamprecht

EDIT: Fri 19 Nov 2021 10:36:16 AM CET : 326G -> IO Error started to appear.

Switching to SCSI disk bus.
 
Last edited:
Code:
agent: 1
balloon: 0
boot: order=scsi0;ide2
cores: 4
cpu: host
ide2: none,media=cdrom
memory: 20480
name: MED-BDD-5
net0: virtio=46:58:91:52:2A:F7,bridge=vmbr0
net1: virtio=3E:30:4B:D0:17:29,bridge=vmbr0,link_down=1,tag=21
numa: 0
ostype: l26
scsi0: MD3200i-1-RAID5-2:vm-226-disk-0,aio=native,size=50G
scsi1: MD3200i-1-RAID1-1:vm-226-disk-0,aio=native,backup=0,size=480G
scsi2: MD3200i-1-RAID1-2:vm-226-disk-0,aio=native,backup=0,size=550G
scsi3: MD3200i-1-RAID1-3:vm-226-disk-0,aio=native,backup=0,size=550G
scsi4: MD3200i-1-RAID5-1:vm-226-disk-0,aio=native,backup=0,size=500G
smbios1: uuid=0ae67392-7061-4eb8-adce-c902a7a56e62
sockets: 2
startup: up=20
vga: std

Test is running. Same incoming speed : 107 - 115 MB/s with some low at 60 MB/s.
In fact, speed is now around 62MB/s instead of 100.

UPDATE :
Fri 19 Nov 2021 01:31:51 PM CET -> 539GB transferred, no IO error, steady 60MB/s.
I hope the loss of performance does not alter too much read/write IO on the DB

Fri 19 Nov 2021 02:15:22 PM CET -> 676G transferred, no IO error, steady 60MB/s.

Copy successful. No IO error, no file corrupted. BUT ... IO are slow now and this server (a mariadb slave) is not catching up with the master anymore. :/

Code:
# iostat 1
Linux 4.19.0-18-amd64 (MED-BDD-5)       11/19/2021      _x86_64_        (8 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          16.69    0.00    8.56    9.14    0.02   65.58

Device             tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
sdd             500.58      5657.36     11433.94   25504125   51545680
sde             152.32      2375.04     10227.19   10706984   46105524
sdc             324.25      3149.08      1301.26   14196480    5866240
sdb               0.05         1.86         0.00       8392          0
sda               2.82        64.98        23.63     292923     106508
dm-0              2.25        42.71         3.69     192545      16632
dm-1              0.04         0.98         0.02       4396         92
dm-2              1.10        16.95        20.12      76429      90716
dm-3              0.06         0.50         0.03       2234        122
dm-4              0.03         0.48         0.00       2185         12
dm-5           1146.21     11176.87     22962.45   50386781  103517720

The data folder is an lvm (dm-5) of sdb/c/d/e. They are Dell (Seagate) SAS 15k 600GB

I switched those disk to cache=writeback,aio=threads for this test.
It was a little bit slower with cache=nocache,aio=native.

EDIT: Btw, i'm ready to give you ssh access if it can help.
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!