Hello all,
I have encountered a problem today that I had not encountered before.
I have ceph installed on 3 Proxmox nodes and created a debian12 virtual machine. I then added a virtual disk from ceph and used it within the vm. I am not sure if the following occurred during a backup where the host interacts with /dev/rbd devices; however, I figured I would post it here in case others run into such problem.
*note on rbd devices in the Proxmox host - I assume correct behaviour is that they are added and then removed during a backup process? So what I am about to post will likely be an edge case (or maybe due to PEBKAC).
Steps to reproduce
Mistakes were made
As I am a hobbyist in a home lab environment, I took it upon myself to force remove the missing devices on the host. Without fully understanding the error, I then discovered the vm was broken and attempted to fix that also.
Needless data loss ensued (don't worry, it was only a big lancache and I still have an older version somewhere).
Fixing the problem
Somehow the problem occurred because LVM on the host detected a device with LVM on it (as I stated above, I am pretty sure this should not happen and had not encountered it before.
Before trying to make the same mistakes that I did, simply getting the pve host to ignore the warnings is enough.
Change the following in
to the following:
Conclusion
I hope this may help anyone that gets caught by this in the future and look forward to any comments correcting what I have written here (I will try to update this post if anything is updated in the comments).
Regards
I have encountered a problem today that I had not encountered before.
I have ceph installed on 3 Proxmox nodes and created a debian12 virtual machine. I then added a virtual disk from ceph and used it within the vm. I am not sure if the following occurred during a backup where the host interacts with /dev/rbd devices; however, I figured I would post it here in case others run into such problem.
*note on rbd devices in the Proxmox host - I assume correct behaviour is that they are added and then removed during a backup process? So what I am about to post will likely be an edge case (or maybe due to PEBKAC).
Steps to reproduce
- With a working ceph cluster, create a vm that uses lvm (Debian12 was my install)
- After vm installation, add another hard disk to the vm from the ceph pool
- Within the vm,
pvcreate /dev/vdb
and then add it to the volume group withvgextend {vgname} /dev/vdb
- On the host machine, check with
lvs
Bash:
root@pve-08:~# lvs
WARNING: Couldn't find device with uuid SgMrg4-IQ1F-3sLc-R020-NZqi-FgTJ-D2jUFP.
WARNING: VG lvm is missing PV SgMrg4-IQ1F-3sLc-R020-NZqi-FgTJ-D2jUFP (last written to /dev/vdb).
Mistakes were made
As I am a hobbyist in a home lab environment, I took it upon myself to force remove the missing devices on the host. Without fully understanding the error, I then discovered the vm was broken and attempted to fix that also.
Needless data loss ensued (don't worry, it was only a big lancache and I still have an older version somewhere).
Fixing the problem
Somehow the problem occurred because LVM on the host detected a device with LVM on it (as I stated above, I am pretty sure this should not happen and had not encountered it before.
/dev/rbd0
was detected on the host as being LVM and gave a warning).Before trying to make the same mistakes that I did, simply getting the pve host to ignore the warnings is enough.
Change the following in
/etc/lvm/lvm.conf
on all Proxmox nodes (at the very bottom):
Code:
devices {
# added by pve-manager to avoid scanning ZFS zvols
global_filter=["r|/dev/zd.*|"]
}
to the following:
Code:
devices {
# added by pve-manager to avoid scanning ZFS zvols (and RBD vols)
global_filter=["r|/dev/zd.*|","r|/dev/rbd.*|"]
}
Conclusion
I hope this may help anyone that gets caught by this in the future and look forward to any comments correcting what I have written here (I will try to update this post if anything is updated in the comments).
Regards