Hello all,
I have just set up an external CEPH cluster together with an external specialist.
This is configured as follows:
16 OSDs
1 pool
32PGs
7.1TiB free storage
on 4 nodes with each:
64GB RAM
12 core processors
Only NVMe SSDs & normal SSDs
Connected in Cluster Net with 10G
Connected to Proxmox nodes with 1G
Our Proxmox Cluster is configured as follows:
4 Servers with each
64GB RAM
12 core processors
512G RAID 1 SSDs main disks
Connected in Cluster Net with 1G
Proxmox Version 6.4-8
I have now moved two VMs to the Ceph cluster for testing.
One Windows VM & one Linux VM.
I have run disk tests on both of them.
The Windows VM does not hang, but the disk usage remains constant at 100%.
The Linux VM hangs completely after I try to test the speed with the "dd" command. (/proc/sys/kernel/hung_task_timeout_secs error)
Caching is not enabled on either VM.
Unfortunately the CEPH specialist is not familiar with Proxmox and can't help me there.
Therefore, does anyone possibly know what this could be due to?
What kind of data can I provide for troubleshooting?
Thanks in advance &
with kind regards,
Fabian L.
I have just set up an external CEPH cluster together with an external specialist.
This is configured as follows:
16 OSDs
1 pool
32PGs
7.1TiB free storage
on 4 nodes with each:
64GB RAM
12 core processors
Only NVMe SSDs & normal SSDs
Connected in Cluster Net with 10G
Connected to Proxmox nodes with 1G
Our Proxmox Cluster is configured as follows:
4 Servers with each
64GB RAM
12 core processors
512G RAID 1 SSDs main disks
Connected in Cluster Net with 1G
Proxmox Version 6.4-8
I have now moved two VMs to the Ceph cluster for testing.
One Windows VM & one Linux VM.
I have run disk tests on both of them.
The Windows VM does not hang, but the disk usage remains constant at 100%.
The Linux VM hangs completely after I try to test the speed with the "dd" command. (/proc/sys/kernel/hung_task_timeout_secs error)
Caching is not enabled on either VM.
Unfortunately the CEPH specialist is not familiar with Proxmox and can't help me there.
Therefore, does anyone possibly know what this could be due to?
What kind of data can I provide for troubleshooting?
Thanks in advance &
with kind regards,
Fabian L.