we have a 7 node ceph cluster which is running fine.
the ceph network uses two 10G switches and Mellanox ConnectX-4 cards. we use bond_mode active-backup.
each node has pci slots to add three more dual port connectx cards.
So I am considering mesh network to eliminate the...
we are getting about 10 crashes per month.
the last 2 have this in ceph crash info :
"assert_msg": "/mnt/npool/tlamprecht/pve-ceph/ceph-14.2.5/src/common/ceph_time.h: In function 'ceph::time_detail::timespan ceph::to_timespan(ceph::time_detail::signedspan)' thread 7fa02c5f7700 time...
# lxc-start -n 121 -l DEBUG -o /tmp/lxc-ID.log
lxc-start: 121: lxccontainer.c: wait_on_daemonized_start: 865 No such file or directory - Failed to receive the container state
lxc-start: 121: tools/lxc_start.c: main: 329 The container failed to start
lxc-start: 121: tools/lxc_start.c: main: 332...
this shows at dmesg each time the lxc is attempted to start :
[Mon Dec 23 04:45:33 2019] EXT4-fs (loop0): mounted filesystem with ordered data mode. Opts: (null)
[Mon Dec 23 04:45:33 2019] lxc-start: segfault at 50 ip 00007f14f406ef8b sp 00007fffa3588ba0 error 4 in...