Backups caused RBD images to become read-only, and single node in cluster unable to mount RBD images after RBD images were read-only

May 20, 2017
172
18
58
Netherlands
cyberfusion.io
On a 12-node cluster, due to a backup issue, several CT's RBD images became read-only.

Issue: backups caused RBD images to become read-only

On all nodes, a backup job ran at the time of the issue. The backup jobs on two nodes (`g04s01` and `g04s07`) failed.

On `g04s07`, the following was logged at 05:02:20:

Code:
INFO: Error: error trying to connect: error connecting to https://10.1.3.16:8007/ - tcp connect error: Connection timed out (os error 110)` (CT 4017)

This implies that PBS was busy - which is likely as all nodes in the cluster are writing simultaneously. Unfortunately, no disk statistics are available.

On that same node, the following was logged at 05:03:35:

Code:
Removing snap: 100% complete...done.2023-09-03T05:04:15.286+0200 7fdb1df2a700 -1 librbd::Operations: update notification timed-out` (CT 4018)

After this, backups of other CTs on that same node failed with:

Code:
INFO: including mount point rootfs ('/') in backup
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: create storage snapshot 'vzdump'
syncfs '/' failed - Input/output error
fsfreeze '/proc/409165/root//' failed - Input/output error
Creating snap: 10% complete...
Creating snap: 100% complete...done.
fsthaw '/proc/409165/root//' failed - Invalid argument
In some cases useful info is found in syslog - try "dmesg | tail".
rbd: sysfs write failed
umount: /mnt/vzsnap0/: not mounted.
command 'umount -l -d /mnt/vzsnap0/' failed: exit code 32
ERROR: Backup of VM 4023 failed - can't map rbd volume vm-4023-disk-0@vzdump: rbd: sysfs write failed
INFO: Failed at 2023-09-03 05:04:41

These timestamps can be correlated with the first RBD image errors:

Code:
05:00:46 rbd: rbd1: encountered watch error: -107` (first error)
05:04:07 blk_update_request: I/O error, dev rbd11, sector 5350480 op 0x1:(WRITE) flags 0x800 phys_seg 1 prio class` (first I/O error)

On `g04s01`, the following was logged, starting over half an hour later at 05:39:10:

Code:
INFO: Starting Backup of VM 4184 (lxc)
INFO: Backup started at 2023-09-03 05:39:10
INFO: status = running
INFO: CT Name: c04dns-sec03
INFO: including mount point rootfs ('/') in backup
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: create storage snapshot 'vzdump'
syncfs '/' failed - Input/output error
fsfreeze '/proc/311288/root//' failed - Input/output error
Creating snap: 10% complete...
Creating snap: 100% complete...done.
fsthaw '/proc/311288/root//' failed - Invalid argument
In some cases useful info is found in syslog - try "dmesg | tail".
rbd: sysfs write failed
umount: /mnt/vzsnap0/: not mounted.
command 'umount -l -d /mnt/vzsnap0/' failed: exit code 32
ERROR: Backup of VM 4184 failed - can't map rbd volume vm-4184-disk-0@vzdump: rbd: sysfs write failed
INFO: Failed at 2023-09-03 05:39:12
INFO: Starting Backup of VM 4198 (lxc)
INFO: Backup started at 2023-09-03 05:39:12
INFO: status = running
INFO: CT Name: c04smtp-dane10
INFO: including mount point rootfs ('/') in backup
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: create storage snapshot 'vzdump'
syncfs '/' failed - Input/output error
2023-09-03T05:43:33.405+0200 7f49ccfc1700 -1 librbd::ImageWatcher: 0x7f49b0007960 image watch failed: 139954462218688, (107) Transport endpoint is not connected
2023-09-03T05:43:33.405+0200 7f49ccfc1700 -1 librbd::Watcher: 0x7f49b0007960 handle_error: handle=139954462218688: (107) Transport endpoint is not connected
2023-09-03T05:45:04.133+0200 7f49cd7c2700 -1 librbd::ImageWatcher: 0x7f49b0007960 image watch failed: 139954529243872, (107) Transport endpoint is not connected
2023-09-03T05:45:04.133+0200 7f49cd7c2700 -1 librbd::Watcher: 0x7f49b0007960 handle_error: handle=139954529243872: (107) Transport endpoint is not connected
2023-09-03T05:48:13.801+0200 7f49ccfc1700 -1 librbd::ManagedLock: 0x7f49b4000dc8 send_reacquire_lock: aborting reacquire due to invalid watch handle
2023-09-03T05:52:37.528+0200 7f49ccfc1700 -1 librbd::ManagedLock: 0x7f49b4000dc8 send_acquire_lock: watcher not registered - delaying request
2023-09-03T05:54:37.441+0200 7f49ccfc1700 -1 librbd::ImageWatcher: 0x7f49b0007960 image watch failed: 139954462234000, (107) Transport endpoint is not connected
2023-09-03T05:54:37.441+0200 7f49ccfc1700 -1 librbd::Watcher: 0x7f49b0007960 handle_error: handle=139954462234000: (107) Transport endpoint is not connected
Creating snap: 10% complete...2023-09-03T05:58:48.367+0200 7f49cd7c2700 -1 librbd::ImageWatcher: 0x7f49b0007960 image watch failed: 139954529309488, (107) Transport endpoint is not connected
2023-09-03T05:58:48.367+0200 7f49cd7c2700 -1 librbd::Watcher: 0x7f49b0007960 handle_error: handle=139954529309488: (107) Transport endpoint is not connected
Creating snap: 100% complete...done.
2023-09-03T06:02:25.929+0200 7f49cd7c2700 -1 librbd::ImageWatcher: 0x7f49b0007960 image watch failed: 139954529309488, (107) Transport endpoint is not connected
2023-09-03T06:02:25.929+0200 7f49cd7c2700 -1 librbd::Watcher: 0x7f49b0007960 handle_error: handle=139954529309488: (107) Transport endpoint is not connected
In some cases useful info is found in syslog - try "dmesg | tail".
rbd: sysfs write failed
umount: /mnt/vzsnap0/: not mounted.
command 'umount -l -d /mnt/vzsnap0/' failed: exit code 32
ERROR: Backup of VM 4198 failed - can't map rbd volume vm-4198-disk-0@vzdump: rbd: sysfs write failed
INFO: Failed at 2023-09-03 06:02:26
INFO: Backup job finished with errors
TASK ERROR: job errors

On all other nodes, backup jobs were successful.

This caused several RBD images to become read-only.

Consequence: single node in cluster unable to mount RBD images after RBD images were read-only

On all nodes except for `g04s07` (one of the two nodes on which the backup job failed), the read-only issue was solved by restarting CTs. On the 'broken' node, messages such as following are logged for CTs on start:

Code:
Sep 03 05:00:46 g04s07 kernel: rbd: rbd1: encountered watch error: -107
Sep 03 05:01:07 g04s07 kernel: rbd: rbd3: encountered watch error: -107
Sep 03 05:01:33 g04s07 kernel: rbd: rbd5: encountered watch error: -107
Sep 03 05:02:01 g04s07 kernel: rbd: rbd9: capacity 8589934592 features 0x3d
Sep 03 05:02:20 g04s07 kernel: rbd: rbd1: encountered watch error: -107
Sep 03 05:02:20 g04s07 kernel: EXT4-fs (rbd9): write access unavailable, skipping orphan cleanup
Sep 03 05:02:20 g04s07 kernel: EXT4-fs (rbd9): mounted filesystem without journal. Opts: noload. Quota mode: none.
Sep 03 05:02:54 g04s07 kernel: rbd: rbd3: encountered watch error: -107
Sep 03 05:02:57 g04s07 kernel: rbd: rbd9: capacity 8589934592 features 0x3d
Sep 03 05:02:57 g04s07 kernel: EXT4-fs (rbd9): write access unavailable, skipping orphan cleanup
Sep 03 05:02:57 g04s07 kernel: EXT4-fs (rbd9): mounted filesystem without journal. Opts: noload. Quota mode: none.
Sep 03 05:04:04 g04s07 kernel: rbd: rbd1: encountered watch error: -107
Sep 03 05:04:06 g04s07 kernel: rbd: rbd5: encountered watch error: -107
Sep 03 05:04:07 g04s07 kernel: rbd: rbd11: write at objno 653 565248~4096 result -108
Sep 03 05:04:07 g04s07 kernel: rbd: rbd11: write result -108
Sep 03 05:04:07 g04s07 kernel: blk_update_request: I/O error, dev rbd11, sector 5350480 op 0x1:(WRITE) flags 0x800 phys_seg 1 prio class 0
Sep 03 05:04:07 g04s07 kernel: EXT4-fs warning (device rbd11): ext4_end_bio:344: I/O error 10 writing to inode 395524 starting block 668811)
Sep 03 05:04:07 g04s07 kernel: Buffer I/O error on device rbd11, logical block 668810
Sep 03 05:04:07 g04s07 kernel: rbd: rbd16: write at objno 1058 3334144~28672 result -108
Sep 03 05:04:07 g04s07 kernel: rbd: rbd16: write result -108
Sep 03 05:04:07 g04s07 kernel: blk_update_request: I/O error, dev rbd16, sector 8673648 op 0x1:(WRITE) flags 0x800 phys_seg 7 prio class 0
Sep 03 05:04:07 g04s07 kernel: Aborting journal on device rbd16-8.
Sep 03 05:04:07 g04s07 kernel: rbd: rbd16: write at objno 1056 0~4096 result -108
Sep 03 05:04:07 g04s07 kernel: rbd: rbd16: write result -108
Sep 03 05:04:07 g04s07 kernel: blk_update_request: I/O error, dev rbd16, sector 8650752 op 0x1:(WRITE) flags 0x800 phys_seg 1 prio class 0
Sep 03 05:04:07 g04s07 kernel: Buffer I/O error on dev rbd16, logical block 1081344, lost sync page write
Sep 03 05:04:07 g04s07 kernel: JBD2: Error -5 detected when updating journal superblock for rbd16-8.
Sep 03 05:04:08 g04s07 kernel: rbd: rbd13: write at objno 9 159744~4096 result -108

Code:
root@g04s07:~# pct mount 4027
In some cases useful info is found in syslog - try "dmesg | tail".
rbd: sysfs write failed
mounting container failed
can't map rbd volume vm-4027-disk-0: rbd: sysfs write failed

When migrating a CT on this node to any other node (including `g04s01`, on which the backup job also failed), it starts successfully, after logging the following:

Code:
Sep 03 13:08:32 g04s05 kernel: rbd: rbd63: breaking header lock owned by client207783332
Sep 03 13:08:32 g04s05 kernel: rbd: rbd63: breaking object map lock owned by client207783332
Sep 03 13:08:32 g04s05 kernel: rbd: rbd63: capacity 8589934592 features 0x3d
Sep 03 13:08:32 g04s05 kernel: EXT4-fs warning (device rbd63): ext4_multi_mount_protect:326: MMP interval 42 higher than expected, please wait.

When migrating the started CT from a working node back to the broken node `g04s07`, the CT no longer starts, with the same errors as before.

Why is node `g04s07` unable to mount RBD images? There's no siginificant differences: same kernel version (`5.15.74-1-pve`), same `librbd1` version (`16.2.9`), same `pveversion -v` output (except for some cleaned up kernels), mostly same hardware.

--

Code:
root@g04s07:~# pveversion -v
proxmox-ve: 7.3-1 (running kernel: 5.15.74-1-pve)
pve-manager: 7.3-3 (running version: 7.3-3/c3928077)
pve-kernel-5.15: 7.2-14
pve-kernel-helper: 7.2-14
pve-kernel-5.15.74-1-pve: 5.15.74-1
pve-kernel-5.15.64-1-pve: 5.15.64-1
pve-kernel-5.15.30-2-pve: 5.15.30-3
ceph: 16.2.9-pve1
ceph-fuse: 16.2.9-pve1
corosync: 3.1.7-pve1
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.24-pve2
libproxmox-acme-perl: 1.4.2
libproxmox-backup-qemu0: 1.3.1-1
libpve-access-control: 7.2-5
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.3-1
libpve-guest-common-perl: 4.2-3
libpve-http-server-perl: 4.1-5
libpve-storage-perl: 7.3-1
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 5.0.0-3
lxcfs: 4.0.12-pve1
novnc-pve: 1.3.0-3
proxmox-backup-client: 2.2.7-1
proxmox-backup-file-restore: 2.2.7-1
proxmox-mini-journalreader: 1.3-1
proxmox-offline-mirror-helper: 0.5.0-1
proxmox-widget-toolkit: 3.5.3
pve-cluster: 7.3-1
pve-container: 4.4-2
pve-docs: 7.3-1
pve-edk2-firmware: 3.20220526-1
pve-firewall: 4.2-7
pve-firmware: 3.5-6
pve-ha-manager: 3.5.1
pve-i18n: 2.8-1
pve-qemu-kvm: 7.1.0-4
pve-xtermjs: 4.16.0-1
qemu-server: 7.3-1
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.8.0~bpo11+2
vncterm: 1.7-1
zfsutils-linux: 2.1.6-pve1
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!