After Updating my 4 node cluster today, I can no longer start any of my CTs.
Corosync Cluster and Ceph show as healthy.
I created a new Unpriviledged CT after the updates and it works fine
I hope there's a way to fix this and not have to rebuild this cluster...
I get an error when running fsck...
pct fsck 4000
fsck from util-linux 2.33.1
fsck.ext2: Unable to resolve 'rbd:Ceph-CT/vm-4000-disk-0:conf=/etc/pve/ceph.conf:id=admin:keyring=/etc/pve/priv/ceph/Ceph-CT.keyring'
command 'fsck -a -l 'rbd:Ceph-CT/vm-4000-disk-0:conf=/etc/pve/ceph.conf:id=admin:keyring=/etc/pve/priv/ceph/Ceph-CT.keyring'' failed: exit code 8
ceph status
cluster:
id: e43a583a-2e95-46df-af6b-58574ce1187d
health: HEALTH_OK
services:
mon: 4 daemons, quorum pve11,pve12,pve13,pve14 (age 35m)
mgr: pve11(active, since 94m), standbys: pve14, pve12, pve13
osd: 16 osds: 16 up (since 36m), 16 in (since 4w)
data:
pools: 2 pools, 512 pgs
objects: 508.84k objects, 1.8 TiB
usage: 5.2 TiB used, 24 TiB / 29 TiB avail
pgs: 512 active+clean
io:
client: 5.3 KiB/s wr, 0 op/s rd, 0 op/s wr
Last line in /var/log/ceph/ceph.log - all lines the same
2019-11-09 18:17:23.837652 mgr.pve11 (mgr.19724137) 2956 : cluster [DBG] pgmap v2981: 512 pgs: 512 active+clean; 1.8 TiB data, 5.2 TiB used, 24 TiB / 29 TiB avail; 6.3 KiB/s wr, 1 op/s
pvecm status
Quorum information
------------------
Date: Sat Nov 9 18:08:58 2019
Quorum provider: corosync_votequorum
Nodes: 4
Node ID: 0x00000001
Ring ID: 1.1f8
Quorate: Yes
Votequorum information
----------------------
Expected votes: 4
Highest expected: 4
Total votes: 4
Quorum: 3
Flags: Quorate
Membership information
----------------------
Nodeid Votes Name
0x00000001 1 10.10.1.11 (local)
0x00000002 1 10.10.1.12
0x00000003 1 10.10.1.13
0x00000004 1 10.10.1.14
Package Versions
proxmox-ve: 6.0-2 (running kernel: 5.0.21-4-pve)
pve-manager: 6.0-11 (running version: 6.0-11/2140ef37)
pve-kernel-helper: 6.0-11
pve-kernel-5.0: 6.0-10
pve-kernel-5.0.21-4-pve: 5.0.21-8
pve-kernel-5.0.21-3-pve: 5.0.21-7
pve-kernel-5.0.15-1-pve: 5.0.15-1
ceph: 14.2.4-pve1
ceph-fuse: 14.2.4-pve1
corosync: 3.0.2-pve4
criu: 3.11-3
glusterfs-client: 5.5-3
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.13-pve1
libpve-access-control: 6.0-3
libpve-apiclient-perl: 3.0-2
libpve-common-perl: 6.0-6
libpve-guest-common-perl: 3.0-2
libpve-http-server-perl: 3.0-3
libpve-storage-perl: 6.0-9
libqb0: 1.0.5-1
lvm2: 2.03.02-pve3
lxc-pve: 3.2.1-1
lxcfs: 3.0.3-pve60
novnc-pve: 1.1.0-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.0-8
pve-cluster: 6.0-7
pve-container: 3.0-10
pve-docs: 6.0-8
pve-edk2-firmware: 2.20190614-1
pve-firewall: 4.0-7
pve-firmware: 3.0-4
pve-ha-manager: 3.0-2
pve-i18n: 2.0-3
pve-qemu-kvm: 4.0.1-4
pve-xtermjs: 3.13.2-1
qemu-server: 6.0-13
smartmontools: 7.0-pve2
spiceterm: 3.1-1
vncterm: 1.6-1
zfsutils-linux: 0.8.2-pve2
Update
I found out how to get my CTs running again, I had to remove the 4 lines I added previously to convert the CT from privileged to unprivileged.
nano /etc/pve/lxc/[container # from proxmox gui].conf
Before
arch: amd64
cores: 1
hostname: Ion-LMN001
memory: 2560
net0: name=eth0,bridge=vmbr0,firewall=1,gw=192.168.2.3,hwaddr=BE:7F:04:FE:E7:F2,ip=192.168.4.1/16,type=veth
onboot: 1
ostype: ubuntu
rootfs: local-lvm:vm-4001-disk-0,size=20G
swap: 256
unprivileged: 1
lxc.mount.entry: /dev/random dev/random none bind,ro 0 0
lxc.mount.entry: /dev/urandom dev/urandom none bind,ro 0 0
lxc.mount.entry: /dev/random var/spool/postfix/dev/random none bind,ro 0 0
lxc.mount.entry: /dev/urandom var/spool/postfix/dev/urandom none bind,ro 0 0
After
arch: amd64
cores: 1
hostname: Ion-LMN001
memory: 2560
net0: name=eth0,bridge=vmbr0,firewall=1,gw=192.168.2.3,hwaddr=BE:7F:04:FE:E7:F2,ip=192.168.4.1/16,type=veth
onboot: 1
ostype: ubuntu
rootfs: local-lvm:vm-4001-disk-0,size=20G
swap: 256
unprivileged: 1
I'm still getting the error when running fsck... (Maybe this is normal?)
pct fsck 4000
fsck from util-linux 2.33.1
fsck.ext2: Unable to resolve 'rbd:Ceph-CT/vm-4000-disk-0:conf=/etc/pve/ceph.conf:id=admin:keyring=/etc/pve/priv/ceph/Ceph-CT.keyring'
command 'fsck -a -l 'rbd:Ceph-CT/vm-4000-disk-0:conf=/etc/pve/ceph.conf:id=admin:keyring=/etc/pve/priv/ceph/Ceph-CT.keyring'' failed: exit code 8
Corosync Cluster and Ceph show as healthy.
I created a new Unpriviledged CT after the updates and it works fine
I hope there's a way to fix this and not have to rebuild this cluster...
I get an error when running fsck...
pct fsck 4000
fsck from util-linux 2.33.1
fsck.ext2: Unable to resolve 'rbd:Ceph-CT/vm-4000-disk-0:conf=/etc/pve/ceph.conf:id=admin:keyring=/etc/pve/priv/ceph/Ceph-CT.keyring'
command 'fsck -a -l 'rbd:Ceph-CT/vm-4000-disk-0:conf=/etc/pve/ceph.conf:id=admin:keyring=/etc/pve/priv/ceph/Ceph-CT.keyring'' failed: exit code 8
ceph status
cluster:
id: e43a583a-2e95-46df-af6b-58574ce1187d
health: HEALTH_OK
services:
mon: 4 daemons, quorum pve11,pve12,pve13,pve14 (age 35m)
mgr: pve11(active, since 94m), standbys: pve14, pve12, pve13
osd: 16 osds: 16 up (since 36m), 16 in (since 4w)
data:
pools: 2 pools, 512 pgs
objects: 508.84k objects, 1.8 TiB
usage: 5.2 TiB used, 24 TiB / 29 TiB avail
pgs: 512 active+clean
io:
client: 5.3 KiB/s wr, 0 op/s rd, 0 op/s wr
Last line in /var/log/ceph/ceph.log - all lines the same
2019-11-09 18:17:23.837652 mgr.pve11 (mgr.19724137) 2956 : cluster [DBG] pgmap v2981: 512 pgs: 512 active+clean; 1.8 TiB data, 5.2 TiB used, 24 TiB / 29 TiB avail; 6.3 KiB/s wr, 1 op/s
pvecm status
Quorum information
------------------
Date: Sat Nov 9 18:08:58 2019
Quorum provider: corosync_votequorum
Nodes: 4
Node ID: 0x00000001
Ring ID: 1.1f8
Quorate: Yes
Votequorum information
----------------------
Expected votes: 4
Highest expected: 4
Total votes: 4
Quorum: 3
Flags: Quorate
Membership information
----------------------
Nodeid Votes Name
0x00000001 1 10.10.1.11 (local)
0x00000002 1 10.10.1.12
0x00000003 1 10.10.1.13
0x00000004 1 10.10.1.14
Package Versions
proxmox-ve: 6.0-2 (running kernel: 5.0.21-4-pve)
pve-manager: 6.0-11 (running version: 6.0-11/2140ef37)
pve-kernel-helper: 6.0-11
pve-kernel-5.0: 6.0-10
pve-kernel-5.0.21-4-pve: 5.0.21-8
pve-kernel-5.0.21-3-pve: 5.0.21-7
pve-kernel-5.0.15-1-pve: 5.0.15-1
ceph: 14.2.4-pve1
ceph-fuse: 14.2.4-pve1
corosync: 3.0.2-pve4
criu: 3.11-3
glusterfs-client: 5.5-3
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.13-pve1
libpve-access-control: 6.0-3
libpve-apiclient-perl: 3.0-2
libpve-common-perl: 6.0-6
libpve-guest-common-perl: 3.0-2
libpve-http-server-perl: 3.0-3
libpve-storage-perl: 6.0-9
libqb0: 1.0.5-1
lvm2: 2.03.02-pve3
lxc-pve: 3.2.1-1
lxcfs: 3.0.3-pve60
novnc-pve: 1.1.0-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.0-8
pve-cluster: 6.0-7
pve-container: 3.0-10
pve-docs: 6.0-8
pve-edk2-firmware: 2.20190614-1
pve-firewall: 4.0-7
pve-firmware: 3.0-4
pve-ha-manager: 3.0-2
pve-i18n: 2.0-3
pve-qemu-kvm: 4.0.1-4
pve-xtermjs: 3.13.2-1
qemu-server: 6.0-13
smartmontools: 7.0-pve2
spiceterm: 3.1-1
vncterm: 1.6-1
zfsutils-linux: 0.8.2-pve2
Update
I found out how to get my CTs running again, I had to remove the 4 lines I added previously to convert the CT from privileged to unprivileged.
nano /etc/pve/lxc/[container # from proxmox gui].conf
Before
arch: amd64
cores: 1
hostname: Ion-LMN001
memory: 2560
net0: name=eth0,bridge=vmbr0,firewall=1,gw=192.168.2.3,hwaddr=BE:7F:04:FE:E7:F2,ip=192.168.4.1/16,type=veth
onboot: 1
ostype: ubuntu
rootfs: local-lvm:vm-4001-disk-0,size=20G
swap: 256
unprivileged: 1
lxc.mount.entry: /dev/random dev/random none bind,ro 0 0
lxc.mount.entry: /dev/urandom dev/urandom none bind,ro 0 0
lxc.mount.entry: /dev/random var/spool/postfix/dev/random none bind,ro 0 0
lxc.mount.entry: /dev/urandom var/spool/postfix/dev/urandom none bind,ro 0 0
After
arch: amd64
cores: 1
hostname: Ion-LMN001
memory: 2560
net0: name=eth0,bridge=vmbr0,firewall=1,gw=192.168.2.3,hwaddr=BE:7F:04:FE:E7:F2,ip=192.168.4.1/16,type=veth
onboot: 1
ostype: ubuntu
rootfs: local-lvm:vm-4001-disk-0,size=20G
swap: 256
unprivileged: 1
I'm still getting the error when running fsck... (Maybe this is normal?)
pct fsck 4000
fsck from util-linux 2.33.1
fsck.ext2: Unable to resolve 'rbd:Ceph-CT/vm-4000-disk-0:conf=/etc/pve/ceph.conf:id=admin:keyring=/etc/pve/priv/ceph/Ceph-CT.keyring'
command 'fsck -a -l 'rbd:Ceph-CT/vm-4000-disk-0:conf=/etc/pve/ceph.conf:id=admin:keyring=/etc/pve/priv/ceph/Ceph-CT.keyring'' failed: exit code 8
Last edited: