Recent content by Greatsamps

  1. G

    Windows VM Migration Issue

    Just to add to this. So we have a dedicated 1gb/s network for corosync. We then have 2 LCAP bonded 10gb/s network cards in each host which have been split into VLAN's. We have a couple for ceph and 1 for public, which at the time in question would have had zero traffic on it. We have defined...
  2. G

    Windows VM Migration Issue

    Hi Gang, So this weekend's problem. We did some maintenance on one of our nodes this weekend. We have a cluster with Ceph running. It appeared that everything went ok, but we have had reports today from cusomers that their applications crashed around the time of the migration. One VM that we...
  3. G

    Maximum number of VMs per Proxmox server?

    Well, 3000 is quite a lot! Our use case is as follows. We run a SaaS application that interfaces into a 3rd Party application which is very badly behaved! As such we need to keep it very tightly constrained resource-wise. We actually have the application packaged up into a docker container for...
  4. G

    Maximum number of VMs per Proxmox server?

    Just wondering if there has been any change in this in recent releases?
  5. G

    Dell R430

    Anyone using these? Any issues?
  6. G

    Total Cluster Failure

    Thanks for your further replies. i have checked the switches and they do seem to be working with 802.3ad: [LN-LD4-SW-S6720]dis lacp stat eth-trunk 16 Eth-Trunk16's PDU statistic is: ------------------------------------------------------------------------------ Port...
  7. G

    Total Cluster Failure

    So did a full set of pings. On the short term, i can see max above 6ms, but on the long term one these appear to vanish. Looking at it, it would seem using this link would cause issues for corosync. I also did a "jumbo-frame" ping between all the servers and could confirm that that is working...
  8. G

    Total Cluster Failure

    So running iperf things look ok: root@ld4-pve-n2:~# iperf -c 172.16.2.21 ------------------------------------------------------------ Client connecting to 172.16.2.21, TCP port 5001 TCP window size: 325 KByte (default) ------------------------------------------------------------ [ 3] local...
  9. G

    Total Cluster Failure

    Ok, so an update for you. I have changed the primary corosyn interface over to a standalone 1gb one. For now it is still on the same switch as that will require a DC visit to change, but figured it would be better off like this. For a start, the retransmit messages have stopped, and there is...
  10. G

    Total Cluster Failure

    I have been taking a look at this in more depth, and i am starting to wonder if there is an issue with the overall network setup there are a lot of of corosync entries such as this: Jan 1 10:12:59 ld4-pve-n6 corosync[2200]: [KNET ] link: host: 3 link: 2 is down Jan 1 10:12:59 ld4-pve-n6...
  11. G

    Total Cluster Failure

    Thanks for the replies. I am starting to realize how it has been setup is madness. We don't have the resources to dedicate 6 switches to this (2 x public, 2 x cluster, 2 x storage), but we could put the cluster on its own single switch, maybe with a vlan as a backup? i would not want to be in...
  12. G

    Total Cluster Failure

    New node pveversion -v root@ln-pve-n8:~# pveversion -v proxmox-ve: 6.3-1 (running kernel: 5.4.78-2-pve) pve-manager: 6.3-3 (running version: 6.3-3/eee5f901) pve-kernel-5.4: 6.3-3 pve-kernel-helper: 6.3-3 pve-kernel-5.4.78-2-pve: 5.4.78-2 pve-kernel-5.4.73-1-pve: 5.4.73-1 ceph-fuse...
  13. G

    Total Cluster Failure

    Hi, So the cluster was originally setup by some "consultants" but i have reason to question their abilities. Every node is identical. We have a Mellanox 2 port 10GB network card in place, that is configured as an LACP Bond. That then has a single vmbr bridge on it which is VLAN aware. We...
  14. G

    Total Cluster Failure

    Hi, Thanks for your detailed reply, I am pretty much of the same mindset as you in that something to improve uptime should never behave like this. We have been running VPS hosting for 11 years, the past 7 of those with Hyper-V, and never had such an issue as this. Windows gets its fair share...
  15. G

    Total Cluster Failure

    So tried again, making sure IP addresses/vlans were correct and had exactly the same problem. I have quite the list of issues with Proxmox now, every single one of which is disrupting my clients by having their VM's randomly going offline. I am now at a crossroads. Do i shell out several...

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!