On a 12-node cluster, due to a backup issue, several CT's RBD images became read-only.
Issue: backups caused RBD images to become read-only
On all nodes, a backup job ran at the time of the issue. The backup jobs on two nodes (`g04s01` and `g04s07`) failed.
On `g04s07`, the following was logged at 05:02:20:
This implies that PBS was busy - which is likely as all nodes in the cluster are writing simultaneously. Unfortunately, no disk statistics are available.
On that same node, the following was logged at 05:03:35:
After this, backups of other CTs on that same node failed with:
These timestamps can be correlated with the first RBD image errors:
On `g04s01`, the following was logged, starting over half an hour later at 05:39:10:
On all other nodes, backup jobs were successful.
This caused several RBD images to become read-only.
Consequence: single node in cluster unable to mount RBD images after RBD images were read-only
On all nodes except for `g04s07` (one of the two nodes on which the backup job failed), the read-only issue was solved by restarting CTs. On the 'broken' node, messages such as following are logged for CTs on start:
When migrating a CT on this node to any other node (including `g04s01`, on which the backup job also failed), it starts successfully, after logging the following:
When migrating the started CT from a working node back to the broken node `g04s07`, the CT no longer starts, with the same errors as before.
Why is node `g04s07` unable to mount RBD images? There's no siginificant differences: same kernel version (`5.15.74-1-pve`), same `librbd1` version (`16.2.9`), same `pveversion -v` output (except for some cleaned up kernels), mostly same hardware.
--
Issue: backups caused RBD images to become read-only
On all nodes, a backup job ran at the time of the issue. The backup jobs on two nodes (`g04s01` and `g04s07`) failed.
On `g04s07`, the following was logged at 05:02:20:
Code:
INFO: Error: error trying to connect: error connecting to https://10.1.3.16:8007/ - tcp connect error: Connection timed out (os error 110)` (CT 4017)
This implies that PBS was busy - which is likely as all nodes in the cluster are writing simultaneously. Unfortunately, no disk statistics are available.
On that same node, the following was logged at 05:03:35:
Code:
Removing snap: 100% complete...done.2023-09-03T05:04:15.286+0200 7fdb1df2a700 -1 librbd::Operations: update notification timed-out` (CT 4018)
After this, backups of other CTs on that same node failed with:
Code:
INFO: including mount point rootfs ('/') in backup
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: create storage snapshot 'vzdump'
syncfs '/' failed - Input/output error
fsfreeze '/proc/409165/root//' failed - Input/output error
Creating snap: 10% complete...
Creating snap: 100% complete...done.
fsthaw '/proc/409165/root//' failed - Invalid argument
In some cases useful info is found in syslog - try "dmesg | tail".
rbd: sysfs write failed
umount: /mnt/vzsnap0/: not mounted.
command 'umount -l -d /mnt/vzsnap0/' failed: exit code 32
ERROR: Backup of VM 4023 failed - can't map rbd volume vm-4023-disk-0@vzdump: rbd: sysfs write failed
INFO: Failed at 2023-09-03 05:04:41
These timestamps can be correlated with the first RBD image errors:
Code:
05:00:46 rbd: rbd1: encountered watch error: -107` (first error)
05:04:07 blk_update_request: I/O error, dev rbd11, sector 5350480 op 0x1:(WRITE) flags 0x800 phys_seg 1 prio class` (first I/O error)
On `g04s01`, the following was logged, starting over half an hour later at 05:39:10:
Code:
INFO: Starting Backup of VM 4184 (lxc)
INFO: Backup started at 2023-09-03 05:39:10
INFO: status = running
INFO: CT Name: c04dns-sec03
INFO: including mount point rootfs ('/') in backup
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: create storage snapshot 'vzdump'
syncfs '/' failed - Input/output error
fsfreeze '/proc/311288/root//' failed - Input/output error
Creating snap: 10% complete...
Creating snap: 100% complete...done.
fsthaw '/proc/311288/root//' failed - Invalid argument
In some cases useful info is found in syslog - try "dmesg | tail".
rbd: sysfs write failed
umount: /mnt/vzsnap0/: not mounted.
command 'umount -l -d /mnt/vzsnap0/' failed: exit code 32
ERROR: Backup of VM 4184 failed - can't map rbd volume vm-4184-disk-0@vzdump: rbd: sysfs write failed
INFO: Failed at 2023-09-03 05:39:12
INFO: Starting Backup of VM 4198 (lxc)
INFO: Backup started at 2023-09-03 05:39:12
INFO: status = running
INFO: CT Name: c04smtp-dane10
INFO: including mount point rootfs ('/') in backup
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: create storage snapshot 'vzdump'
syncfs '/' failed - Input/output error
2023-09-03T05:43:33.405+0200 7f49ccfc1700 -1 librbd::ImageWatcher: 0x7f49b0007960 image watch failed: 139954462218688, (107) Transport endpoint is not connected
2023-09-03T05:43:33.405+0200 7f49ccfc1700 -1 librbd::Watcher: 0x7f49b0007960 handle_error: handle=139954462218688: (107) Transport endpoint is not connected
2023-09-03T05:45:04.133+0200 7f49cd7c2700 -1 librbd::ImageWatcher: 0x7f49b0007960 image watch failed: 139954529243872, (107) Transport endpoint is not connected
2023-09-03T05:45:04.133+0200 7f49cd7c2700 -1 librbd::Watcher: 0x7f49b0007960 handle_error: handle=139954529243872: (107) Transport endpoint is not connected
2023-09-03T05:48:13.801+0200 7f49ccfc1700 -1 librbd::ManagedLock: 0x7f49b4000dc8 send_reacquire_lock: aborting reacquire due to invalid watch handle
2023-09-03T05:52:37.528+0200 7f49ccfc1700 -1 librbd::ManagedLock: 0x7f49b4000dc8 send_acquire_lock: watcher not registered - delaying request
2023-09-03T05:54:37.441+0200 7f49ccfc1700 -1 librbd::ImageWatcher: 0x7f49b0007960 image watch failed: 139954462234000, (107) Transport endpoint is not connected
2023-09-03T05:54:37.441+0200 7f49ccfc1700 -1 librbd::Watcher: 0x7f49b0007960 handle_error: handle=139954462234000: (107) Transport endpoint is not connected
Creating snap: 10% complete...2023-09-03T05:58:48.367+0200 7f49cd7c2700 -1 librbd::ImageWatcher: 0x7f49b0007960 image watch failed: 139954529309488, (107) Transport endpoint is not connected
2023-09-03T05:58:48.367+0200 7f49cd7c2700 -1 librbd::Watcher: 0x7f49b0007960 handle_error: handle=139954529309488: (107) Transport endpoint is not connected
Creating snap: 100% complete...done.
2023-09-03T06:02:25.929+0200 7f49cd7c2700 -1 librbd::ImageWatcher: 0x7f49b0007960 image watch failed: 139954529309488, (107) Transport endpoint is not connected
2023-09-03T06:02:25.929+0200 7f49cd7c2700 -1 librbd::Watcher: 0x7f49b0007960 handle_error: handle=139954529309488: (107) Transport endpoint is not connected
In some cases useful info is found in syslog - try "dmesg | tail".
rbd: sysfs write failed
umount: /mnt/vzsnap0/: not mounted.
command 'umount -l -d /mnt/vzsnap0/' failed: exit code 32
ERROR: Backup of VM 4198 failed - can't map rbd volume vm-4198-disk-0@vzdump: rbd: sysfs write failed
INFO: Failed at 2023-09-03 06:02:26
INFO: Backup job finished with errors
TASK ERROR: job errors
On all other nodes, backup jobs were successful.
This caused several RBD images to become read-only.
Consequence: single node in cluster unable to mount RBD images after RBD images were read-only
On all nodes except for `g04s07` (one of the two nodes on which the backup job failed), the read-only issue was solved by restarting CTs. On the 'broken' node, messages such as following are logged for CTs on start:
Code:
Sep 03 05:00:46 g04s07 kernel: rbd: rbd1: encountered watch error: -107
Sep 03 05:01:07 g04s07 kernel: rbd: rbd3: encountered watch error: -107
Sep 03 05:01:33 g04s07 kernel: rbd: rbd5: encountered watch error: -107
Sep 03 05:02:01 g04s07 kernel: rbd: rbd9: capacity 8589934592 features 0x3d
Sep 03 05:02:20 g04s07 kernel: rbd: rbd1: encountered watch error: -107
Sep 03 05:02:20 g04s07 kernel: EXT4-fs (rbd9): write access unavailable, skipping orphan cleanup
Sep 03 05:02:20 g04s07 kernel: EXT4-fs (rbd9): mounted filesystem without journal. Opts: noload. Quota mode: none.
Sep 03 05:02:54 g04s07 kernel: rbd: rbd3: encountered watch error: -107
Sep 03 05:02:57 g04s07 kernel: rbd: rbd9: capacity 8589934592 features 0x3d
Sep 03 05:02:57 g04s07 kernel: EXT4-fs (rbd9): write access unavailable, skipping orphan cleanup
Sep 03 05:02:57 g04s07 kernel: EXT4-fs (rbd9): mounted filesystem without journal. Opts: noload. Quota mode: none.
Sep 03 05:04:04 g04s07 kernel: rbd: rbd1: encountered watch error: -107
Sep 03 05:04:06 g04s07 kernel: rbd: rbd5: encountered watch error: -107
Sep 03 05:04:07 g04s07 kernel: rbd: rbd11: write at objno 653 565248~4096 result -108
Sep 03 05:04:07 g04s07 kernel: rbd: rbd11: write result -108
Sep 03 05:04:07 g04s07 kernel: blk_update_request: I/O error, dev rbd11, sector 5350480 op 0x1:(WRITE) flags 0x800 phys_seg 1 prio class 0
Sep 03 05:04:07 g04s07 kernel: EXT4-fs warning (device rbd11): ext4_end_bio:344: I/O error 10 writing to inode 395524 starting block 668811)
Sep 03 05:04:07 g04s07 kernel: Buffer I/O error on device rbd11, logical block 668810
Sep 03 05:04:07 g04s07 kernel: rbd: rbd16: write at objno 1058 3334144~28672 result -108
Sep 03 05:04:07 g04s07 kernel: rbd: rbd16: write result -108
Sep 03 05:04:07 g04s07 kernel: blk_update_request: I/O error, dev rbd16, sector 8673648 op 0x1:(WRITE) flags 0x800 phys_seg 7 prio class 0
Sep 03 05:04:07 g04s07 kernel: Aborting journal on device rbd16-8.
Sep 03 05:04:07 g04s07 kernel: rbd: rbd16: write at objno 1056 0~4096 result -108
Sep 03 05:04:07 g04s07 kernel: rbd: rbd16: write result -108
Sep 03 05:04:07 g04s07 kernel: blk_update_request: I/O error, dev rbd16, sector 8650752 op 0x1:(WRITE) flags 0x800 phys_seg 1 prio class 0
Sep 03 05:04:07 g04s07 kernel: Buffer I/O error on dev rbd16, logical block 1081344, lost sync page write
Sep 03 05:04:07 g04s07 kernel: JBD2: Error -5 detected when updating journal superblock for rbd16-8.
Sep 03 05:04:08 g04s07 kernel: rbd: rbd13: write at objno 9 159744~4096 result -108
Code:
root@g04s07:~# pct mount 4027
In some cases useful info is found in syslog - try "dmesg | tail".
rbd: sysfs write failed
mounting container failed
can't map rbd volume vm-4027-disk-0: rbd: sysfs write failed
When migrating a CT on this node to any other node (including `g04s01`, on which the backup job also failed), it starts successfully, after logging the following:
Code:
Sep 03 13:08:32 g04s05 kernel: rbd: rbd63: breaking header lock owned by client207783332
Sep 03 13:08:32 g04s05 kernel: rbd: rbd63: breaking object map lock owned by client207783332
Sep 03 13:08:32 g04s05 kernel: rbd: rbd63: capacity 8589934592 features 0x3d
Sep 03 13:08:32 g04s05 kernel: EXT4-fs warning (device rbd63): ext4_multi_mount_protect:326: MMP interval 42 higher than expected, please wait.
When migrating the started CT from a working node back to the broken node `g04s07`, the CT no longer starts, with the same errors as before.
Why is node `g04s07` unable to mount RBD images? There's no siginificant differences: same kernel version (`5.15.74-1-pve`), same `librbd1` version (`16.2.9`), same `pveversion -v` output (except for some cleaned up kernels), mostly same hardware.
--
Code:
root@g04s07:~# pveversion -v
proxmox-ve: 7.3-1 (running kernel: 5.15.74-1-pve)
pve-manager: 7.3-3 (running version: 7.3-3/c3928077)
pve-kernel-5.15: 7.2-14
pve-kernel-helper: 7.2-14
pve-kernel-5.15.74-1-pve: 5.15.74-1
pve-kernel-5.15.64-1-pve: 5.15.64-1
pve-kernel-5.15.30-2-pve: 5.15.30-3
ceph: 16.2.9-pve1
ceph-fuse: 16.2.9-pve1
corosync: 3.1.7-pve1
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.24-pve2
libproxmox-acme-perl: 1.4.2
libproxmox-backup-qemu0: 1.3.1-1
libpve-access-control: 7.2-5
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.3-1
libpve-guest-common-perl: 4.2-3
libpve-http-server-perl: 4.1-5
libpve-storage-perl: 7.3-1
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 5.0.0-3
lxcfs: 4.0.12-pve1
novnc-pve: 1.3.0-3
proxmox-backup-client: 2.2.7-1
proxmox-backup-file-restore: 2.2.7-1
proxmox-mini-journalreader: 1.3-1
proxmox-offline-mirror-helper: 0.5.0-1
proxmox-widget-toolkit: 3.5.3
pve-cluster: 7.3-1
pve-container: 4.4-2
pve-docs: 7.3-1
pve-edk2-firmware: 3.20220526-1
pve-firewall: 4.2-7
pve-firmware: 3.5-6
pve-ha-manager: 3.5.1
pve-i18n: 2.8-1
pve-qemu-kvm: 7.1.0-4
pve-xtermjs: 4.16.0-1
qemu-server: 7.3-1
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.8.0~bpo11+2
vncterm: 1.7-1
zfsutils-linux: 2.1.6-pve1
Last edited: