Hey all,
My cluster consists of:
- 5x physical servers
- storage is ceph
- each server has 4x HDDs and 1x SSD (all OSDs)
- each node has PVE installed on a 6th SAS drive
- I have an HDD pool and a separate pool for SSD
During testing, I had no issues with creating, migrating, or anything. In fact, I was quite impressed with how easy going everything was going. But things changed once we put production-level VMs. You may have read my previous posts and I greatly appreciate all your guys' help, but I am now facing another very odd issue.
My issues now are:
- node 1 was completely removed from the crush map
- the OSD view doesn't show any OSD on that node
- i cannot create a new VM on it nor any node now
- however, I had a VM on that node that worked fine
- I was able to migrate the VM onto another node just fine
I was watching SYSLOG during everything and all that's logged when creating a VM is that the task is starting, and there is no other information. The migration logs appeared when doing this and the VM is functional on another node. Even now as I write this, the VM still hasn't created and I've been typing this for at least 20 minutes.
My question is, where else can I look for logging information here?
I am baffled at how the OSDs don't appear and that node1 isn't in the ceph crush map, but even more how that the VM was working just fine?
I am tempted to try a complete reinstall of PVE and zapping all disks and starting afresh due to the issues we've been experiencing, but that's not really a solution and more of a stab in the dark so to speak. I tried looking things up but nothing of late and I figured the pre-2018 posts are too old.
I would really appreciate any help here - please!
My cluster consists of:
- 5x physical servers
- storage is ceph
- each server has 4x HDDs and 1x SSD (all OSDs)
- each node has PVE installed on a 6th SAS drive
- I have an HDD pool and a separate pool for SSD
During testing, I had no issues with creating, migrating, or anything. In fact, I was quite impressed with how easy going everything was going. But things changed once we put production-level VMs. You may have read my previous posts and I greatly appreciate all your guys' help, but I am now facing another very odd issue.
My issues now are:
- node 1 was completely removed from the crush map
- the OSD view doesn't show any OSD on that node
- i cannot create a new VM on it nor any node now
- however, I had a VM on that node that worked fine
- I was able to migrate the VM onto another node just fine
I was watching SYSLOG during everything and all that's logged when creating a VM is that the task is starting, and there is no other information. The migration logs appeared when doing this and the VM is functional on another node. Even now as I write this, the VM still hasn't created and I've been typing this for at least 20 minutes.
My question is, where else can I look for logging information here?
I am baffled at how the OSDs don't appear and that node1 isn't in the ceph crush map, but even more how that the VM was working just fine?
I am tempted to try a complete reinstall of PVE and zapping all disks and starting afresh due to the issues we've been experiencing, but that's not really a solution and more of a stab in the dark so to speak. I tried looking things up but nothing of late and I figured the pre-2018 posts are too old.
I would really appreciate any help here - please!