Hello,
I had a problem yesterday with a cluster node which had its ZFS root disk filled with a bunch of snapshots and the rpool ended up being 100% full. It was monitored but it just happend so fast we couldn't react on time.
Is all this expected?
I've only had a similar issue once but at that time the storage was LVM and it was a single node. That time simply freeing a couple GB and waiting a few minutes got the node and the VMs working again.
Thanks!
I had a problem yesterday with a cluster node which had its ZFS root disk filled with a bunch of snapshots and the rpool ended up being 100% full. It was monitored but it just happend so fast we couldn't react on time.
- The VMs residing in that storage became unresponsive. Other VMs using Ceph storage were working correctly.
- I could SSH into the server, and also connect to the web interface either directly to that node or through another one of the cluster.
- The information in the status pages was updating properly except for the VMs in the ZFS storage. I could not list contents of any storage.
- I could not issue any command to any VM in that server: start/stop, migrate, remove snapshots, etc.
Is all this expected?
I've only had a similar issue once but at that time the storage was LVM and it was a single node. That time simply freeing a couple GB and waiting a few minutes got the node and the VMs working again.
Thanks!