Hello,
<TLDR>
Seems that PVE or LXC or even Ceph change ext4's
Why, when and how it does?
</TLDR>
Full detailes below:
In a PVE8.1 cluster with Ceph 18.2.1 storage, had a situation yesterday where a privileged LXC (id 200) with a 4'2TB ext4 disk as mp0 somehow got stuck just after backup, while the backup snapshot was being deleted. The snapshot got removed from the storage but in the VM config was still there in "deleted" state.
This are the last backup log lines:
Every other backup seemed to be ok, no Ceph errors/warnings in logs, etc.
The real issue comes next: the node became "greyed out" as pvestatd couldn't refresh state info because lxc-info processes for CT 200 got stuck and the CT itself was completely unresponsive. While reviewing logs, etc, the whole host hung (no ping, no ipmi console, fully freezed) as if kernel somehow got completely stuck due to I/O deadlock (may happen as LXC uses the Ceph KRBD kernel mode driver to access the storage).
Had to be power reset and booted ok. Now, when trying to start the CT, this showed up in journal and console:
Which essentially means that kernel will wait 720 seconds times 4 (nearly 48 minutes!!) before allowing access to that disk of the CT. Meanwhile, the CT was stuck in start state but obviously wasn't working. If left in that state long enough, pvestatd would become unresponsive again and the whole node becomes "grey". Didn't wait long enough to check if the whole host would fully freeze again.
To sort it out killed the CT and manually mapped the rbd and reduced
After that, I ran fsck.ext4 and no issue was found, so I simply started the LXC. It took the expected 30 seconds I set for mmp_update_interval and booted perfectly fine. Curious about it, I orderly shutdown the CT, mapped again the rbd and checked mmp_update_interval: it was set to "5" instead of the value of "30" I had manually set.
Why, when and how
<TLDR>
Seems that PVE or LXC or even Ceph change ext4's
mmp_update_interval
dynamically.Why, when and how it does?
</TLDR>
Full detailes below:
In a PVE8.1 cluster with Ceph 18.2.1 storage, had a situation yesterday where a privileged LXC (id 200) with a 4'2TB ext4 disk as mp0 somehow got stuck just after backup, while the backup snapshot was being deleted. The snapshot got removed from the storage but in the VM config was still there in "deleted" state.
This are the last backup log lines:
Code:
INFO: Duration: 10188.46s
INFO: End Time: Thu Jul 31 01:59:51 2025
INFO: adding notes to backup
INFO: cleanup temporary 'vzdump' snapshot
Removing snap: 100% complete...done.
2025-07-31T02:14:57.974+0200 7f930e8086c0 -1 librbd::ImageWatcher: 0x7f92fc007550 image watch failed: 140269269914816, (107) Transport endpoint is not connected
2025-07-31T02:14:57.974+0200 7f930e8086c0 -1 librbd::Watcher: 0x7f92fc007550 handle_error: handle=140269269914816: (107) Transport endpoint is not connected
Every other backup seemed to be ok, no Ceph errors/warnings in logs, etc.
The real issue comes next: the node became "greyed out" as pvestatd couldn't refresh state info because lxc-info processes for CT 200 got stuck and the CT itself was completely unresponsive. While reviewing logs, etc, the whole host hung (no ping, no ipmi console, fully freezed) as if kernel somehow got completely stuck due to I/O deadlock (may happen as LXC uses the Ceph KRBD kernel mode driver to access the storage).
Had to be power reset and booted ok. Now, when trying to start the CT, this showed up in journal and console:
Code:
kernel: EXT4-fs warning (device rbd1): ext4_multi_mount_protect:328: MMP interval 720 higher than expected, please wait
Which essentially means that kernel will wait 720 seconds times 4 (nearly 48 minutes!!) before allowing access to that disk of the CT. Meanwhile, the CT was stuck in start state but obviously wasn't working. If left in that state long enough, pvestatd would become unresponsive again and the whole node becomes "grey". Didn't wait long enough to check if the whole host would fully freeze again.
To sort it out killed the CT and manually mapped the rbd and reduced
mmp_update_interval
:
Code:
tune2fs -E mmp_update_interval=30 /dev/rbd0
After that, I ran fsck.ext4 and no issue was found, so I simply started the LXC. It took the expected 30 seconds I set for mmp_update_interval and booted perfectly fine. Curious about it, I orderly shutdown the CT, mapped again the rbd and checked mmp_update_interval: it was set to "5" instead of the value of "30" I had manually set.
Why, when and how
mmp_update_interval
gets changed?
Last edited: