Hello! I have a 10 nodes hyperconverged cluster. Each node has 4 OSDs (total 40 OSDs). I have 2 different questions:
QUESTION 1 - OSD REPLACEMENT (with identical SSD)
Since I need to replace an SSD (one OSD has crashed 3 times in latest months so I prefer to replace it with a brand new one)...
I answer myself: the procedure worked perfectly. However, you must take into account that, once the disk of the VM is GTP, you can no longer expand it with growpart: you have to do it like this
I can confirm that in latest PVE 6.4, as well as in PVE 5.4, a pve-firewall restart set them all to 1 and fixes the problem (and can be safely run in production, since it doesn't interrupt the networking, I've run it dozens of times on hosts running tens of mission critical VMs).
Unfortunately I...
Hi, I have a growing Linux Debian 9 VM that I need to expand to 3 TB. I've expanded it from Proxmox interface and it correctly reports 3 TB size. But I'm unable to grow the partition over 2 TB because the virtual disk has a MBR. The VM is mission critical so I must carefully plan the operation...
I apologise, the monitoring system was reporting incorrect I/O data because the iostat output used changed between PVE 5 and PVE 6. So false alarm, PVE 6 with Ceph Octopus works perfectly!
I mark the topic as [SOLVED]
Hi Fabian, the first OSD restart did took 5 to 10 minutes, during which the upgraded OSD where down and the CPU on the node very high....but then, once all the OSDs went up on all nodes and Ceph went back to HEALTH_OK I assumed that the upgrade process was completed. Did I assume wrong? Also...
Hi! Last night we upgraded our production 9-nodes cluster from PVE 5.4 to PVE 6.4 and from Ceph 12 to 14 and then to 15 (Octopus), following the official tutorials. Everything went smoothly and all running VMs have been online during the upgrade, so we're very happy about the operation. Now the...
Thank you for all your considerations. I'll go for the cheaper 10 Gb/s switch for this small cluster (the unmanaged 5 ports).
Regarding 960/970 Pro SSDs, I've built much bigger clusters (10 nodes, hundreds of VMs) with Ceph using only Samsung NVMe prosumer drives and I'm really satisfied with...
For this particular cluster, which is very small (3 nodes) I was considering a 8 port managed 10 Gb/s (Netgear XS708T) vs a 5 port unmanaged 10 Gb/s (Netgear XS505M). The last one is cheaper and has 4.8 microseconds latency, while the first one costs more and has 2.8 microseconds latency.
I've...
I just downloaded the datasheets :)
Ok guys, I googled and thought about it a bit (maybe I should have done it sooner). As I told you, the two switches differ by 2 microseconds. Then I figured out that:
A SATA SSD has a read latency of not less than 30 ms (30 milliseconds --> 30000...
Hi! I'm comparing 2 switches with 10 Gb/s ports, tu build a small 3-nodes cluster with with PVE+Ceph. Ceph will run on NVMe drives. One switch has a latency of 2.8 microseconds at 10 Gb/s, while the other (which is cheaper) has 4.8 microseconds at 10 Gb/s.
Does this difference matters at all...
Unfortunately the problem is still there: every couple of months I have to shutdown and restart the machine to free up the tons of memory consumed (instead of the 4 GB allocated)
Many thanks, it worked! Here the steps that I followed, for anyone else interested:
rbd ls -l <poolname> (to get the snapshot name)
rbd showmapped (just to check that the snapshot is not already mapped)
mkdir <mountpoint>
rbd map --read-only <poolname>/<diskname>@snapname
rbd showmapped (just...
Thank you Alwin! Could you please tell me the steps to map in read-only on the Proxmox VE node?
I think you already answered "yes, this is safe". It was just a warning not to point me to an emergency hack if it involved any risk, since I have a plan B ;)
Yep, you're perfectly right, upgrade to...
Hi!
I have a big LXC container (2 TB) running on a Proxmox VE 5.4 with Ceph Luminous storage. The container is running mission critical services and cannot be stopped. Two days ago I took a snapshot from the Proxmox interface (you can see it named "pre_upgrade_tls" in the screenshot below).
Now...
Hi all, I re-open this thread after several months because I recently added new nodes to our production cluster (Proxmox VE 5.4-13) and finally figured out what silently disables the firewall (by setting net.bridge.bridge-nf-call-iptables and net.bridge.bridge-nf-call-ip6tables to zero): it's...
Thanks for the advice regarding the PG increment. Regarding the 10 Gbe, at the moment I only have 200 Mb/s going through each ethernet port, with peaks not superior than 1.5 Gb/s when I add OSDs or rebalance, so 10 Gb/s seems adeguate to me (my workload is mainly made by webhosting VMs). Since I...
I have another question: in addition to the PG increase, I have to add 10 new OSDs to the cluster (each one is a 2 TB NVMe drive, so it's 20 TB of storage to add).
Should I add the new OSDs first, wait for the rebalancing and then add the new PGs, or the opposite?
At the moment the occupation of...
Thank you very much for sharing your procedure: the main steps are the same I had identified but I hadn't thought about disabling the scrub. I will then proceed with a first step of +32 PGs and, based on I/O and network saturation (luckily I have a very well-tuned Zabbix monitoring them in real...
This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
By continuing to use this site, you are consenting to our use of cookies.