Hi.
Before somebody calls me silly ... I know that this trick shall not be done on production systems ... but I needed some of the features of 5.0 so I jumped the ship early.
Anyway I've got 3 systems with proxmox 5 beta 1 & ceph runing as a small cluster and everything was ok.
After some time I've added fourth system at the time the 5.0 final was released to this cluster and I started getting some odd behaviour.
Essentially I've got ceph monitor running on 2 machines of original 3 (I didn't wanted to hinder performance of third sine it has no disks and it's working purelly as a work horse). Now I get these problem:
- when I try to add a monitor on fourth machine I get error in the loop telling me that it can't get keys
- why I try to migrate VM from second machine to fourth it goes OK, but when I try to migrate from third to fourth it fails with "Error: migration aborted (duration 00:00:00): Can't connect to destination address using public key"
I shall add that:
- fourth machine got 4 OSD created on it without any hickup and those are part of cluster and all is singing and dancing.
- fourth machine has a ceph monitor service running (id: 3), but it's not included in ceph.conf and everything is ok as well ... when I try to include this monitor to ceph.conf - whole ceph just grinds to a halt - even "ceph -s" fails to run. When I try to create a monitor it gives me a lot of errors but the it seems to pass through, monitor with id 4 is created and it gets included into ceph.conf as id 3 (wtf ?) then ceph grinds to a halt ... when replacing 3 with 4 in ceph.conf it does not help either ....
map look this way
1 - just 1 vm in local
2 - 4 vm on ceph rbd + 4 osd + mon
3 - 1 vm on ceph ebd + 4 osd + mon
4 - just 4 osd (monitor service running but not part of ceph.conf)
I'm in way of adding 5th machine here but I would like to iron out all the stuff before adding more problems to the mix.
I shall add that yes I did:
Edit:
I've managed to fix up the migration problem. third machine did not seem to have a link in /etc/ssh/ssh_known_hosts pointing to known_hosts in pve dir ... it was just a standard file. why this got this way I don't know.
Before somebody calls me silly ... I know that this trick shall not be done on production systems ... but I needed some of the features of 5.0 so I jumped the ship early.
Anyway I've got 3 systems with proxmox 5 beta 1 & ceph runing as a small cluster and everything was ok.
After some time I've added fourth system at the time the 5.0 final was released to this cluster and I started getting some odd behaviour.
Essentially I've got ceph monitor running on 2 machines of original 3 (I didn't wanted to hinder performance of third sine it has no disks and it's working purelly as a work horse). Now I get these problem:
- when I try to add a monitor on fourth machine I get error in the loop telling me that it can't get keys
- why I try to migrate VM from second machine to fourth it goes OK, but when I try to migrate from third to fourth it fails with "Error: migration aborted (duration 00:00:00): Can't connect to destination address using public key"
I shall add that:
- fourth machine got 4 OSD created on it without any hickup and those are part of cluster and all is singing and dancing.
- fourth machine has a ceph monitor service running (id: 3), but it's not included in ceph.conf and everything is ok as well ... when I try to include this monitor to ceph.conf - whole ceph just grinds to a halt - even "ceph -s" fails to run. When I try to create a monitor it gives me a lot of errors but the it seems to pass through, monitor with id 4 is created and it gets included into ceph.conf as id 3 (wtf ?) then ceph grinds to a halt ... when replacing 3 with 4 in ceph.conf it does not help either ....
map look this way
1 - just 1 vm in local
2 - 4 vm on ceph rbd + 4 osd + mon
3 - 1 vm on ceph ebd + 4 osd + mon
4 - just 4 osd (monitor service running but not part of ceph.conf)
I'm in way of adding 5th machine here but I would like to iron out all the stuff before adding more problems to the mix.
I shall add that yes I did:
Code:
ssh-keyscan -t rsa proxmox-dl180-8bay-1 proxmox-dl180-14bay-1 proxmox-dl180-14bay-2 proxmox-dl180-14bay-3 >> /etc/pve/priv/known_hosts
# proxmox-dl180-14bay-3:22 SSH-2.0-OpenSSH_7.4p1 Debian-10+deb9u1
# proxmox-dl180-8bay-1:22 SSH-2.0-OpenSSH_7.4p1 Debian-10+deb9u1
# proxmox-dl180-14bay-1:22 SSH-2.0-OpenSSH_7.4p1 Debian-10+deb9u1
# proxmox-dl180-14bay-2:22 SSH-2.0-OpenSSH_7.4p1 Debian-10+deb9u1
Edit:
I've managed to fix up the migration problem. third machine did not seem to have a link in /etc/ssh/ssh_known_hosts pointing to known_hosts in pve dir ... it was just a standard file. why this got this way I don't know.
Last edited: