Hi,
We had a pretty stable Proxmox 5 nodes cluster running "pve-manager/6.3-3/eee5f901 (running kernel: 5.4.78-2-pve)".
The situation just became very bad right after one of the external ceph monitors decided to die, and after reinstalling it version went from 14.2.9 to 14.2.16.
So for a good consistency, all the ceph nodes ( including the other 2 monitors ) where upgraded to 14.2.16.
Now the situation is like this:
- one node for some reason still has luminous client ( not sure why is this happening since all pve hosts are running "6.3-3/eee5f901" and updated at the same time, with the same sources repos ). This pve host is the only host that can list rbd's and also to run the vms.
- the rest of the pve 4 nodes that have nautilus client, are not able anymore to list the rbd's and neither to run vms
The only temporarily workaround to make these pve nodes connect to external ceph and start vms was to copy ceph.conf and the keyring to /etc/ceph directory. ( this was just a desperate solution to get the critical vms back online )
The message displayed in UI went trying to list the disk images from the nodes with nautilus client is:
"rbd error: rbd: listing images failed: (2) No such file or directory (500)"
Any thoughs about this weird behaviour ?
Is the new nautilus client not working with the common way of adding external ceph keyring in "/etc/pve/priv/ceph" ?
Please let me know your thoughs, I am just out of any ideas...
Thank you so much !
Leo
We had a pretty stable Proxmox 5 nodes cluster running "pve-manager/6.3-3/eee5f901 (running kernel: 5.4.78-2-pve)".
The situation just became very bad right after one of the external ceph monitors decided to die, and after reinstalling it version went from 14.2.9 to 14.2.16.
So for a good consistency, all the ceph nodes ( including the other 2 monitors ) where upgraded to 14.2.16.
Now the situation is like this:
- one node for some reason still has luminous client ( not sure why is this happening since all pve hosts are running "6.3-3/eee5f901" and updated at the same time, with the same sources repos ). This pve host is the only host that can list rbd's and also to run the vms.
- the rest of the pve 4 nodes that have nautilus client, are not able anymore to list the rbd's and neither to run vms
The only temporarily workaround to make these pve nodes connect to external ceph and start vms was to copy ceph.conf and the keyring to /etc/ceph directory. ( this was just a desperate solution to get the critical vms back online )
The message displayed in UI went trying to list the disk images from the nodes with nautilus client is:
"rbd error: rbd: listing images failed: (2) No such file or directory (500)"
Any thoughs about this weird behaviour ?
Is the new nautilus client not working with the common way of adding external ceph keyring in "/etc/pve/priv/ceph" ?
Please let me know your thoughs, I am just out of any ideas...
Thank you so much !
Leo