Local Disk Locking Up

manitc

Member
Jan 4, 2022
8
1
8
26
I've got a peculiar issue I just ran into . I have a 3 Node Cluster setup with CephFS. I created a Ubuntu desktop VM, added a new raw drive on local storage and mounted it to a folder. The OS drive is located on the CephFS.

Everything appears to be working just fine. After pulling a bunch of files through git and writing large amounts of data to the raw local drive. After a few minutes the disk locks up. All disk operations fails. Even a simple command like "ls" or "touch" just hangs. The OS is still running and the CEPH drive is still read/writeable. I tried this on multiple nodes and I still get the same failure.

On one of the nodes, a physical disk on the ceph cluster has failed but the overall ceph health status reports ok. I don't think a bad drive on a single ceph cluster could cause a local drive to fail. I'm going to replace the drive next time I'm at the datacenter but wanted to see if anyone ran into the same issue. Where can I see logs for VM disk IO errors?

Version: ProxMox 7.1-4
VM OS: Ubuntu 22.04 Desktop, Ubuntu 18.04 Desktop

Thanks,

Manit
 
Last edited:
I've got a Windows 10 VM with the same setup in the cluster. The OS is on the local drive and a secondary drive is on CEPH. The OS keeps locking up.

On the Ubuntu desktops, I moved the disks to local storage, I'll run some tests today and update my findings.
 
I moved all drives to local and still seeing intermittent freezing/locks on the drive.