I'm facing similar, if not identical, problem here, except with newer versions. Figured I'd not necro the old thread, but if it is preferred by the moderation team, please merge as you deem fit.
Here's my pveversion output:
On the storage front, I have MD RAID 6 on my VM host:
For the VM, this is the config:
I've tried a combination of other settings on this drive `aio=native; no cache`, `aio=thread; no cache`, `aio=native; cache writeback` and none of them seem to resolve the problem...
The problem manifests as such:
1. Boot VM guest, things seems to start okay. Once loaded, `mount` shows the volumes being mounted in read only mode.
2. A little bit later, this pops up in `dmesg`, and the volumes gets remounted with `rw`
3. I attempt to put some disk write -- download a large file for example -- and multiple errors start to show in `dmesg`. I've included two iterations, but it is repeated several times
4. Some more Buffer I/O error messages come up, and eventually remounts the disk in read only mode:
I've read a couple links from various sources suggesting the problem is with qemu, and supposedly fixed in pve-qemu-kvm 6.1.0-3, or qemu 7.2; and I'm already on pve-qemu-kvm: 7.2.0-8.
Can someone please recommend further testing on how I might be able to address this issue?
Thanks!
Here's my pveversion output:
Bash:
# pveversion -v
proxmox-ve: 7.4-1 (running kernel: 5.15.107-2-pve)
pve-manager: 7.4-3 (running version: 7.4-3/9002ab8a)
pve-kernel-5.15: 7.4-3
pve-kernel-5.13: 7.1-9
pve-kernel-5.15.107-2-pve: 5.15.107-2
pve-kernel-5.15.74-1-pve: 5.15.74-1
pve-kernel-5.13.19-6-pve: 5.13.19-15
pve-kernel-5.13.19-2-pve: 5.13.19-4
ceph-fuse: 14.2.21-1
corosync: 3.1.7-pve1
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown: 0.8.36+pve2
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.24-pve2
libproxmox-acme-perl: 1.4.4
libproxmox-backup-qemu0: 1.3.1-1
libproxmox-rs-perl: 0.2.1
libpve-access-control: 7.4-3
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.4-1
libpve-guest-common-perl: 4.2-4
libpve-http-server-perl: 4.2-3
libpve-rs-perl: 0.7.6
libpve-storage-perl: 7.4-2
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 5.0.2-2
lxcfs: 5.0.3-pve1
novnc-pve: 1.4.0-1
openvswitch-switch: 2.15.0+ds1-2+deb11u4
proxmox-backup-client: 2.4.2-1
proxmox-backup-file-restore: 2.4.2-1
proxmox-kernel-helper: 7.4-1
proxmox-mail-forward: 0.1.1-1
proxmox-mini-journalreader: 1.3-1
proxmox-offline-mirror-helper: 0.5.1-1
proxmox-widget-toolkit: 3.7.0
pve-cluster: 7.3-3
pve-container: 4.4-3
pve-docs: 7.4-2
pve-edk2-firmware: 3.20230228-2
pve-firewall: 4.3-2
pve-firmware: 3.6-5
pve-ha-manager: 3.6.1
pve-i18n: 2.12-1
pve-qemu-kvm: 7.2.0-8
pve-xtermjs: 4.16.0-1
qemu-server: 7.4-3
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.8.0~bpo11+3
vncterm: 1.7-1
zfsutils-linux: 2.1.11-pve1
On the storage front, I have MD RAID 6 on my VM host:
Bash:
# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 7.3T 0 disk
├─sda1 8:1 0 29.8G 0 part
│ └─md0 9:0 0 29.8G 0 raid1 /boot
├─sda2 8:2 0 238.4G 0 part
│ └─md1 9:1 0 238.3G 0 raid1 [SWAP]
└─sda3 8:3 0 7T 0 part
└─md2 9:2 0 21T 0 raid6 /
sdb 8:16 0 7.3T 0 disk
├─sdb1 8:17 0 29.8G 0 part
│ └─md0 9:0 0 29.8G 0 raid1 /boot
├─sdb2 8:18 0 238.4G 0 part
│ └─md1 9:1 0 238.3G 0 raid1 [SWAP]
└─sdb3 8:19 0 7T 0 part
└─md2 9:2 0 21T 0 raid6 /
sdc 8:32 0 7.3T 0 disk
├─sdc1 8:33 0 29.8G 0 part
│ └─md0 9:0 0 29.8G 0 raid1 /boot
├─sdc2 8:34 0 238.4G 0 part
│ └─md1 9:1 0 238.3G 0 raid1 [SWAP]
└─sdc3 8:35 0 7T 0 part
└─md2 9:2 0 21T 0 raid6 /
sdd 8:48 0 7.3T 0 disk
├─sdd1 8:49 0 29.8G 0 part
│ └─md0 9:0 0 29.8G 0 raid1 /boot
├─sdd2 8:50 0 238.4G 0 part
│ └─md1 9:1 0 238.3G 0 raid1 [SWAP]
└─sdd3 8:51 0 7T 0 part
└─md2 9:2 0 21T 0 raid6 /
sde 8:64 0 7.3T 0 disk
├─sde1 8:65 0 29.8G 0 part
│ └─md0 9:0 0 29.8G 0 raid1 /boot
├─sde2 8:66 0 238.4G 0 part
│ └─md1 9:1 0 238.3G 0 raid1 [SWAP]
└─sde3 8:67 0 7T 0 part
└─md2 9:2 0 21T 0 raid6 /
sr0 11:0 1 1024M 0 rom
sr1 11:1 1 1024M 0 rom
For the VM, this is the config:
I've tried a combination of other settings on this drive `aio=native; no cache`, `aio=thread; no cache`, `aio=native; cache writeback` and none of them seem to resolve the problem...
The problem manifests as such:
1. Boot VM guest, things seems to start okay. Once loaded, `mount` shows the volumes being mounted in read only mode.
2. A little bit later, this pops up in `dmesg`, and the volumes gets remounted with `rw`
Code:
[ 278.541093] EXT4-fs (sda1): mounted filesystem with ordered data mode. Quota mode: journalled.
[ 279.131221] EXT4-fs (sda1): re-mounted. Quota mode: journalled.
[ 299.986377] EXT4-fs (sda1): re-mounted. Quota mode: journalled.
Code:
[ 649.287317] sd 2:0:0:0: [sda] tag#129 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=98s
[ 649.287374] sd 2:0:0:0: [sda] tag#129 Sense Key : Aborted Command [current]
[ 649.287380] sd 2:0:0:0: [sda] tag#129 Add. Sense: I/O process terminated
[ 649.287389] sd 2:0:0:0: [sda] tag#129 CDB: Write(16) 8a 00 00 00 00 08 35 f4 08 00 00 00 0a 00 00 00
[ 649.287397] I/O error, dev sda, sector 35264923648 op 0x1:(WRITE) flags 0x4000 phys_seg 20 prio class 2
[ 649.291416] sd 2:0:0:0: [sda] tag#130 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=98s
[ 649.291454] sd 2:0:0:0: [sda] tag#130 Sense Key : Aborted Command [current]
[ 649.291457] sd 2:0:0:0: [sda] tag#130 Add. Sense: I/O process terminated
[ 649.291459] sd 2:0:0:0: [sda] tag#130 CDB: Write(16) 8a 00 00 00 00 08 35 f4 12 00 00 00 0a 00 00 00
[ 649.291461] I/O error, dev sda, sector 35264926208 op 0x1:(WRITE) flags 0x4000 phys_seg 20 prio class 2
Code:
[ 649.329506] Buffer I/O error on device sda1, logical block 4408115202
[ 649.335991] Buffer I/O error on device sda1, logical block 4408115203
[ 649.338448] Buffer I/O error on device sda1, logical block 4408115204
[ 649.340823] Buffer I/O error on device sda1, logical block 4408115205
[ 649.341301] EXT4-fs warning (device sda1): ext4_end_bio:343: I/O error 10 writing to inode 68681852 starting block 4408119424)
[ 649.343738] Buffer I/O error on device sda1, logical block 4408115206
[ 649.346230] Buffer I/O error on device sda1, logical block 4408115207
[ 649.348517] Buffer I/O error on device sda1, logical block 4408115208
[ 649.350862] Buffer I/O error on device sda1, logical block 4408115209
[ 649.370613] EXT4-fs warning (device sda1): ext4_end_bio:343: I/O error 10 writing to inode 68681852 starting block 4408121472)
[ 649.891043] Aborting journal on device sda1-8.
[ 649.907671] EXT4-fs error (device sda1): ext4_journal_check_start:83: comm rsync: Detected aborted journal
[ 650.002353] EXT4-fs warning (device sda1): ext4_end_bio:343: I/O error 10 writing to inode 68681852 starting block 4408124608)
[ 650.002505] EXT4-fs error (device sda1): ext4_journal_check_start:83: comm kworker/u64:0: Detected aborted journal
[ 650.092468] EXT4-fs warning (device sda1): ext4_end_bio:343: I/O error 10 writing to inode 68681852 starting block 4408125659)
[ 650.092543] EXT4-fs warning (device sda1): ext4_end_bio:343: I/O error 10 writing to inode 68681852 starting block 4408125696)
[ 650.213881] EXT4-fs warning (device sda1): ext4_end_bio:343: I/O error 10 writing to inode 68681852 starting block 4408126683)
[ 650.327463] EXT4-fs warning (device sda1): ext4_end_bio:343: I/O error 10 writing to inode 68681852 starting block 4408127707)
[ 650.327471] EXT4-fs warning (device sda1): ext4_end_bio:343: I/O error 10 writing to inode 68681852 starting block 4408127744)
[ 650.331726] EXT4-fs warning (device sda1): ext4_end_bio:343: I/O error 10 writing to inode 68681852 starting block 4408128731)
[ 650.370287] EXT4-fs error (device sda1): ext4_journal_check_start:83: comm kworker/u64:13: Detected aborted journal
[ 650.796762] Buffer I/O error on dev sda1, logical block 1341685760, lost sync page write
[ 650.801202] JBD2: I/O error when updating journal superblock for sda1-8.
[ 650.884901] Buffer I/O error on dev sda1, logical block 0, lost sync page write
[ 650.888583] EXT4-fs (sda1): I/O error while writing superblock
[ 650.888721] EXT4-fs (sda1): previous I/O error to superblock detected
[ 650.890747] EXT4-fs (sda1): Remounting filesystem read-only
[ 650.972869] Buffer I/O error on dev sda1, logical block 0, lost sync page write
[ 650.976495] EXT4-fs (sda1): previous I/O error to superblock detected
[ 651.060622] Buffer I/O error on dev sda1, logical block 0, lost sync page write
[ 651.064169] EXT4-fs (sda1): I/O error while writing superblock
[ 651.064227] EXT4-fs (sda1): I/O error while writing superblock
[ 651.066978] EXT4-fs (sda1): failed to convert unwritten extents to written extents -- potential data loss! (inode 68681852, error -30)
[ 651.069236] EXT4-fs (sda1): ext4_writepages: jbd2_start: 27 pages, ino 68681852; err -30
[ 651.073927] EXT4-fs (sda1): failed to convert unwritten extents to written extents -- potential data loss! (inode 68681852, error -30)
[ 651.082278] EXT4-fs (sda1): failed to convert unwritten extents to written extents -- potential data loss! (inode 68681852, error -30)
I've read a couple links from various sources suggesting the problem is with qemu, and supposedly fixed in pve-qemu-kvm 6.1.0-3, or qemu 7.2; and I'm already on pve-qemu-kvm: 7.2.0-8.
Can someone please recommend further testing on how I might be able to address this issue?
Thanks!