Running a proxmox cluster inside lxd containers is actually rather neat. You can pass through the kvm device from the host:
Bash:
lxc config device add proxmox1 kvm unix-char source=/dev/kvm
Then VMs inside proxmox run at full speed, without the overhead of nested virtualization. Furthermore, live migration of running VMs between cluster nodes "just works" (including local storage migration).
However, you do need to set the containers to privileged mode because of the corosync problem mentioned in this issue. That's a shame, because Ubuntu have fixed it in their corosync package: see LP#1918735. EDIT: the fix was
upstreamed but it's hidden behind a flag
allow_knet_handle_fallback
. You can get corosync to start by adding
Code:
system {
allow_knet_handle_fallback: yes
}
to
/etc/corosync/corosync.conf
. It then gets overwritten, but you can fix that by copying the (edited) file to /etc/pve/corosync.conf. With this setting, a proxmox cluster node works inside an unprivileged lxd container, yay!
Unfortunately you can't run Proxmox CTs (containers) inside Proxmox running in a lxd container, because Proxmox needs to manipulate loopback devices on the host to set up the container's filesystem.
And whilst you can run Ceph mons and managers, you can't run Ceph OSDs (even in a privileged container): Ceph wants to directly manage LVM on the host (*), and that's too risky to allow.
Bash:
# On the host
lvcreate --name ceph1a --size 4G /dev/vg0
[FONT=Open Sans]lxc config device add proxmox1 a unix-block \[/FONT]
source=/dev/mapper/vg0-ceph1a path=/dev/sda
# Inside the container
pveceph osd create /dev/sda
...
Use of uninitialized value in hash element at /usr/share/perl5/PVE/Diskmanage.pm line 455, <DATA> line 960.
/dev/mapper/control: open failed: Operation not permitted
Failure to communicate with kernel device-mapper driver.
Check that device-mapper is available in the kernel.
Incompatible libdevmapper 1.02.185 (2022-05-18) and kernel driver (unknown version).
/dev/mapper/control: open failed: Operation not permitted
Failure to communicate with kernel device-mapper driver.
Check that device-mapper is available in the kernel.
Incompatible libdevmapper 1.02.185 (2022-05-18) and kernel driver (unknown version).
unable to get device info for '/dev/sda'
What you can do instead is use lxd to run full-fat VMs for additional Proxmox nodes. These nodes can be used for Proxmox CTs, Ceph OSDs, and anything else which needs direct access to block devices (e.g. ZFS replication). This works fine.
(*) It appears that bluestore OSDs configure LVM inside of the block device. Here is proxmox running inside a VM where I gave it 4 additional block devices:
Code:
root@proxmox6:~# pvs
PV VG Fmt Attr PSize PFree
/dev/sdb ceph-b558e89a-bd13-496b-ab85-7df08e9e9a9b lvm2 a-- <4.00g 0
/dev/sdc ceph-3388bbde-cd81-485f-b994-51fc82af43ab lvm2 a-- <4.00g 0
/dev/sdd ceph-1307d475-eda5-4797-bcc2-6d013a3e4a6c lvm2 a-- <4.00g 0
/dev/sde ceph-34e41bf4-9f08-4e3a-a55a-a7116e671f5d lvm2 a-- <4.00g 0
root@proxmox6:~# vgs
VG #PV #LV #SN Attr VSize VFree
ceph-1307d475-eda5-4797-bcc2-6d013a3e4a6c 1 1 0 wz--n- <4.00g 0
ceph-3388bbde-cd81-485f-b994-51fc82af43ab 1 1 0 wz--n- <4.00g 0
ceph-34e41bf4-9f08-4e3a-a55a-a7116e671f5d 1 1 0 wz--n- <4.00g 0
ceph-b558e89a-bd13-496b-ab85-7df08e9e9a9b 1 1 0 wz--n- <4.00g 0
root@proxmox6:~# lvs
LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert
osd-block-0842c3a9-413d-4c8a-a5b7-9cf1a3bf0aa1 ceph-1307d475-eda5-4797-bcc2-6d013a3e4a6c -wi-ao---- <4.00g
osd-block-78259d5f-f136-48f3-b1a5-054fbd19668a ceph-3388bbde-cd81-485f-b994-51fc82af43ab -wi-ao---- <4.00g
osd-block-4990e664-e086-4741-b03d-0a18997fa9b9 ceph-34e41bf4-9f08-4e3a-a55a-a7116e671f5d -wi-ao---- <4.00g
osd-block-67a9fe12-3a75-4d01-8e72-769a8af9d14a ceph-b558e89a-bd13-496b-ab85-7df08e9e9a9b -wi-ao---- <4.00g