On a small 3-node cluster, running 16 VM's, I did a upgrade from 4.1-22 to 4.2-2. Each node also has one CEPH OSD onboard. Cluster (Proxmox VE and CEPH) was completly healthy before the upgrade started. I did the upgrade node-by-node and before I finished a node and started with the following node I waited until CEPH was "HEALTH_OK" again. Before the first upgrade started all VM's where moved from node01 and node02 to node03. I upgraded node01 and node02, moved the VM's from node03 to node01 and node02 and upgraded node03. Then I moved the VM's for node03 back to node03. I was logging in to ALL VM's to test if everything was ok but on 2 of the VM's I got filesystem errors. When I rebooted them these VM's can't boot at all, fsck won't help: the CEPH VM images of those 2 VM's are corrupted (input/output errors). So I have to restore those VM's from backup. Ceph was already running 0.94.6, so no version change. But kernel is upgraded offcourse.
Ceph config:
Ceph crush:
Ceph config:
Code:
[global]
auth client required = cephx
auth cluster required = cephx
auth service required = cephx
auth supported = cephx
cluster network = 192.168.121.0/24
filestore xattr use omap = true
fsid = 2f5c1777-0fe5-483b-9f6a-c7d4a874c62a
keyring = /etc/pve/priv/$cluster.$name.keyring
osd journal size = 5120
osd pool default min size = 1
public network = 192.168.111.0/24
[osd]
keyring = /var/lib/ceph/osd/ceph-$id/keyring
osd max backfills = 1
osd recovery max active = 1
[mon.1]
host = node02
mon addr = 192.168.111.131:6789
[mon.0]
host = node01
mon addr = 192.168.111.130:6789
[mon.2]
host = node03
mon addr = 192.168.111.132:6789
Ceph crush:
# begin crush map tunable choose_local_tries 0 tunable choose_local_fallback_tries 0 tunable choose_total_tries 50 tunable chooseleaf_descend_once 1 tunable straw_calc_version 1 # devices device 0 osd.0 device 1 osd.1 device 2 osd.2 # types type 0 osd type 1 host type 2 chassis type 3 rack type 4 row type 5 pdu type 6 pod type 7 room type 8 datacenter type 9 region type 10 root # buckets host node01 { id -2 # do not change unnecessarily # weight 0.360 alg straw hash 0 # rjenkins1 item osd.0 weight 0.360 } host node02 { id -3 # do not change unnecessarily # weight 0.360 alg straw hash 0 # rjenkins1 item osd.1 weight 0.360 } host node03 { id -4 # do not change unnecessarily # weight 0.360 alg straw hash 0 # rjenkins1 item osd.2 weight 0.360 } root default { id -1 # do not change unnecessarily # weight 1.080 alg straw hash 0 # rjenkins1 item node01 weight 0.360 item node02 weight 0.360 item node03 weight 0.360 } # rules rule replicated_ruleset { ruleset 0 type replicated min_size 1 max_size 10 step take default step chooseleaf firstn 0 type host step emit } # end crush map