Currently using a full-mesh 3-node Proxmox Ceph cluster using Resource Pools.
Created individual accounts and have them log in using the Proxmox VE authentication realm.
Each VE account has PVEAdmin permissions to the Ceph pools and Networking...
Use consumer SSDs at your peril. They fail and fail spectacularly.
Only real solution is enterprise SSDs with PLP (power-loss protection) and for endurance.
Don't know if matters but latest firmware for Dell HBA330 is 16.17.01.00 A08. May or may not help your issue.
No issues on 16-drive bay R730s in production.
Prior to migrating, may need to regenerate the kernel to include all the drivers. Had to do this when migrating to Hyper-V.
So, run the following as root:
dracut -fv -N --regenerate-all
Yes, the no-sub repos are fine as long as you have Linux SME skills.
Always have a Plan B which is backups.
I highly recommend you test updates on a separate server/cluster before pushing to production.
Ditch the PERC HBA-mode drama and swap it for a Dell HBA330 true IT/HBA-mode storage controller.
Your future self will thank you. Plus, HBA330s are very cheap to get. Update to latest firmware from dell.com/support
Just as Proxmox Backup Server supports namespaces to do hierarchical backups on the same backup pool, does Proxmox VE support namespaces for the creation of VMs/CTs as well on the same node/cluster?
I really, really do NOT want to stand up an...
Hopefully you have backups.
I strongly recommend using a pure IT/HBA-mode storage controller. Use software-defined storage (ZFS, LVM, Ceph) to handle your storage needs.
I use a LSI3008 IT-mode storage controller (Dell HBA330) in production...
Seriously, ditch the PERC HBA-mode drama and get a Dell HBA330 which is a true IT/HBA-mode controller. Uses the much simpler mpt3sas driver. Be sure to update to latest firmware at dell.com/support
Super cheap to get and no more drama! LOL!
While it's true that 3-nodes is the bare minimum for Ceph, losing a node and depending on the other 2 to pick up the slack workload will make me nervous. For best practices, start with 5-nodes. With Ceph, more nodes/OSDs = more IOPS.
As been...
Seems the Dell P570F is a nothing more than a Dell R740xd.
I would get a Dell R740xd to future proof it to make sure it doesn't get vendor locked.
Make sure you get the NVME version of the R740xd otherwise you'll get a R740xd with a PERC which...
I use this, https://fohdeesha.com/docs/perc.html, to flash 12th-gen Dell PERCs to IT-mode with no issues in production.
Don't skip any steps and take your time. Don't forget to flash the BIOS/UEFI ROMs to allow booting off Proxmox.
I use none/noop on Linux guests since like forever on virtualization platforms. That includes VMware and Proxmox in production with no issues. Per that RH article, I don't use iSCSI/SR-IOV/passthrough. I let the hypervisor's I/O scheduler figure...
Lack of power-loss protection (PLP) on those SSDs is the primary reason for horrible IOPS. Read other posts on why PLP is important for SSDs.
I get IOPS in the low thousands on a 7-node Ceph cluster using 10K RPM SAS drives on 16-drive bay...