Search results

  1. L

    OSD replacement and adding minimizing rebalances

    Hello! I have a 10 nodes hyperconverged cluster. Each node has 4 OSDs (total 40 OSDs). I have 2 different questions: QUESTION 1 - OSD REPLACEMENT (with identical SSD) Since I need to replace an SSD (one OSD has crashed 3 times in latest months so I prefer to replace it with a brand new one)...
  2. L

    [SOLVED] Convert VM disk from MBR to GPT (to expand it beyond 2 TB)

    I answer myself: the procedure worked perfectly. However, you must take into account that, once the disk of the VM is GTP, you can no longer expand it with growpart: you have to do it like this
  3. L

    [SOLVED] PVE Firewall not filtering anything

    I can confirm that in latest PVE 6.4, as well as in PVE 5.4, a pve-firewall restart set them all to 1 and fixes the problem (and can be safely run in production, since it doesn't interrupt the networking, I've run it dozens of times on hosts running tens of mission critical VMs). Unfortunately I...
  4. L

    [SOLVED] Convert VM disk from MBR to GPT (to expand it beyond 2 TB)

    Hi, I have a growing Linux Debian 9 VM that I need to expand to 3 TB. I've expanded it from Proxmox interface and it correctly reports 3 TB size. But I'm unable to grow the partition over 2 TB because the virtual disk has a MBR. The VM is mission critical so I must carefully plan the operation...
  5. L

    [SOLVED] Huge increase in I/O load on NVMe disks (at equal VM load) after upgrade from Ceph 12 to 15

    I apologise, the monitoring system was reporting incorrect I/O data because the iostat output used changed between PVE 5 and PVE 6. So false alarm, PVE 6 with Ceph Octopus works perfectly! I mark the topic as [SOLVED]
  6. L

    [SOLVED] Huge increase in I/O load on NVMe disks (at equal VM load) after upgrade from Ceph 12 to 15

    Hi Fabian, the first OSD restart did took 5 to 10 minutes, during which the upgraded OSD where down and the CPU on the node very high....but then, once all the OSDs went up on all nodes and Ceph went back to HEALTH_OK I assumed that the upgrade process was completed. Did I assume wrong? Also...
  7. L

    [SOLVED] Huge increase in I/O load on NVMe disks (at equal VM load) after upgrade from Ceph 12 to 15

    Hi! Last night we upgraded our production 9-nodes cluster from PVE 5.4 to PVE 6.4 and from Ceph 12 to 14 and then to 15 (Octopus), following the official tutorials. Everything went smoothly and all running VMs have been online during the upgrade, so we're very happy about the operation. Now the...
  8. L

    10 Gb/s switch latency (microseconds difference): does it matter?

    As I wrote above, Samsung 960 Pro and Samsung 970 Pro
  9. L

    10 Gb/s switch latency (microseconds difference): does it matter?

    Thank you for all your considerations. I'll go for the cheaper 10 Gb/s switch for this small cluster (the unmanaged 5 ports). Regarding 960/970 Pro SSDs, I've built much bigger clusters (10 nodes, hundreds of VMs) with Ceph using only Samsung NVMe prosumer drives and I'm really satisfied with...
  10. L

    10 Gb/s switch latency (microseconds difference): does it matter?

    For this particular cluster, which is very small (3 nodes) I was considering a 8 port managed 10 Gb/s (Netgear XS708T) vs a 5 port unmanaged 10 Gb/s (Netgear XS505M). The last one is cheaper and has 4.8 microseconds latency, while the first one costs more and has 2.8 microseconds latency. I've...
  11. L

    10 Gb/s switch latency (microseconds difference): does it matter?

    I just downloaded the datasheets :) Ok guys, I googled and thought about it a bit (maybe I should have done it sooner). As I told you, the two switches differ by 2 microseconds. Then I figured out that: A SATA SSD has a read latency of not less than 30 ms (30 milliseconds --> 30000...
  12. L

    10 Gb/s switch latency (microseconds difference): does it matter?

    Hi! I'm comparing 2 switches with 10 Gb/s ports, tu build a small 3-nodes cluster with with PVE+Ceph. Ceph will run on NVMe drives. One switch has a latency of 2.8 microseconds at 10 Gb/s, while the other (which is cheaper) has 4.8 microseconds at 10 Gb/s. Does this difference matters at all...
  13. L

    KVM process allocating 15x the RAM limit configured for the VM

    Unfortunately the problem is still there: every couple of months I have to shutdown and restart the machine to free up the tons of memory consumed (instead of the 4 GB allocated)
  14. L

    [SOLVED] Mount a Ceph snapshot in a running LXC container

    Many thanks, it worked! Here the steps that I followed, for anyone else interested: rbd ls -l <poolname> (to get the snapshot name) rbd showmapped (just to check that the snapshot is not already mapped) mkdir <mountpoint> rbd map --read-only <poolname>/<diskname>@snapname rbd showmapped (just...
  15. L

    [SOLVED] Mount a Ceph snapshot in a running LXC container

    Thank you Alwin! Could you please tell me the steps to map in read-only on the Proxmox VE node? I think you already answered "yes, this is safe". It was just a warning not to point me to an emergency hack if it involved any risk, since I have a plan B ;) Yep, you're perfectly right, upgrade to...
  16. L

    [SOLVED] Mount a Ceph snapshot in a running LXC container

    Hi! I have a big LXC container (2 TB) running on a Proxmox VE 5.4 with Ceph Luminous storage. The container is running mission critical services and cannot be stopped. Two days ago I took a snapshot from the Proxmox interface (you can see it named "pre_upgrade_tls" in the screenshot below). Now...
  17. L

    [SOLVED] PVE Firewall not filtering anything

    Hi all, I re-open this thread after several months because I recently added new nodes to our production cluster (Proxmox VE 5.4-13) and finally figured out what silently disables the firewall (by setting net.bridge.bridge-nf-call-iptables and net.bridge.bridge-nf-call-ip6tables to zero): it's...
  18. L

    [PVE+Ceph] Increasing PG count on a production cluster

    Thanks for the advice regarding the PG increment. Regarding the 10 Gbe, at the moment I only have 200 Mb/s going through each ethernet port, with peaks not superior than 1.5 Gb/s when I add OSDs or rebalance, so 10 Gb/s seems adeguate to me (my workload is mainly made by webhosting VMs). Since I...
  19. L

    [PVE+Ceph] Increasing PG count on a production cluster

    I have another question: in addition to the PG increase, I have to add 10 new OSDs to the cluster (each one is a 2 TB NVMe drive, so it's 20 TB of storage to add). Should I add the new OSDs first, wait for the rebalancing and then add the new PGs, or the opposite? At the moment the occupation of...
  20. L

    [PVE+Ceph] Increasing PG count on a production cluster

    Thank you very much for sharing your procedure: the main steps are the same I had identified but I hadn't thought about disabling the scrub. I will then proceed with a first step of +32 PGs and, based on I/O and network saturation (luckily I have a very well-tuned Zabbix monitoring them in real...

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!