Recent content by Ch@rlus

  1. C

    "Too Many open files" on Proxmox API + GUI (with KVM)

    Thanks ! I added the following at the end of /etc/security/limits.conf root soft nofile 9000 root hard nofile 65000 * soft nofile 9000 * hard nofile 65000 Now I do get 9000 when running "ulimit -n". I'll keep you posted if the problem re-appears, and I'll try to use lsof to find...
  2. C

    "Too Many open files" on Proxmox API + GUI (with KVM)

    Hey guys, Since we updated our PVE cluster to PVE6, we're having quite a lot of API + GUI errors with this message : "RPCEnvironment init request failed: Unable to load access control list: Too many open files" Theses nodes only host KVM VMs, and most of the threads I've found on the subject...
  3. C

    Loss + Consider r2q change

    Yes, of course, it was a "hard" reboot (stop/start of the VM inside proxmox).
  4. C

    Loss + Consider r2q change

    Thanks for your reply :) We ended up changing the network cards to fix theses issues => Our first "internal" ping tests were actually wrong, and went via the host's network card. We switched to Intel X520s, which are much more stable. EDIT : (We did had tried to reboot the VM after the QoS...
  5. C

    Loss + Consider r2q change

    Hey guys ! We recently expended one of our clusters, with larges servers. Theses servers are currently handling about 100-150 KVM VM, and are running great, except for one detail : We have some random loss, during 1-2min, on some VMs, temporarily, and then everything comes back online...
  6. C

    [SOLVED] cfs lock 'domain-ha' timeout & master old timestamp

    Okay, I'll switch this topic as "resolved". After doing a restart of theses services, everything seems to be running smoothly again. systemctl restart pve-cluster systemctl restart pvedaemon systemctl restart pveproxy systemctl restart pvestatd
  7. C

    [SOLVED] cfs lock 'domain-ha' timeout & master old timestamp

    Hum, after more investigations, I found something, and I'm not sure if it's a "normal" behaviour. It seems that i'm not able to "touch" the directory /etc/pve/priv/lock/domain-ha/ from the "master" node compute07. But i'm able to touch this directory from one of the other node (and this node...
  8. C

    [SOLVED] cfs lock 'domain-ha' timeout & master old timestamp

    In addition, here's the listing of the /etc/pve/priv/lock directory, except compute004 which is poweredoff, all the locks seems to be active with good date & hour values.. drwx------ 2 root www-data 0 Oct 28 19:48 domain-ha drwx------ 2 root www-data 0 Oct 28 19:48 ha_agent_compute001_lock...
  9. C

    [SOLVED] cfs lock 'domain-ha' timeout & master old timestamp

    Hey guys ! We're having 2 issues on our cluster, who are likely to be related. Since a few hours, most of our HA task fails, with the error 'update resource failed: error with cfs lock 'domain-ha': got lock request timeout', and our ha-manager reports that the master is old timestamp - dead...
  10. C

    8 nodes cluster disaster

    Yep, I tried it, and it works fine (well, it seems to be). I also have both ring "working" if I do "corosync-cfgtool -s" Printing ring status. Local node ID 5 RING ID 0 id = 10.3.16.12 status = ring 0 active with no faults RING ID 1 id = 10.3.17.12 status = ring 1...
  11. C

    8 nodes cluster disaster

    Hey guys, We're facing an issue with one of our Proxmox cluster. We have a 8 nodes setup, that completely fenced for "no reason" (I assume there's a good reason for this, but not an obvious one). This cluster has a 2 Ring corosync network, and both are independant network fully monitored =>...
  12. C

    Bayesian filter across all cluster

    Thanks for your reply :) Is there any downside to sync theses databases ? It looks like my first MX server is getting much more email than my 2 others (although they all have the same priority), therefore it seems to have a better spam detection ratio. Or maybe, is there anything I could do to...
  13. C

    Bayesian filter across all cluster

    Hey guys ! I'm facing an issue on a fairly recent Proxmox Mail Gateway installation. We have a really simple installation with 3 servers, configured in cluster directly in Proxmox. It looks like the bayesian filter are not sync accross all the nodes in the cluster. I'm receiving tons of...
  14. C

    Iscsi storage lost and HA issue

    Thanks for your reply. That's what I wanted to be sure. I'll take the necessary measures to make sure that the iscsi links are redundant, but things may always happen and broke it. I'll probably see if I can implement an auto-fence on my node if it detects iscsi links down for X seconds...
  15. C

    Iscsi storage lost and HA issue

    Hi guys, I recently built a Proxmox 5.1 cluster, with 5 identical HP servers. Alls theses servers are connected throught ISCSI to a storage bay, with a multipath configuration. Everything is running great, but I wanted to test several scenarios, and had an issue on the following one : What...