Search results

  1. N

    Proxmox with 48 nodes

    2690v4 is actually 22 cores, and you're neglecting the power of the FANS, GPUs, Hard Drives, etc. But you're right, you need less than half as many, but the power consumption is not that much different in the two workloads. Also, when you factor in the price of the used E5-2690v4 and the new...
  2. N

    Proxmox with 48 nodes

    Yes, latency is the same for 40/10, but with dual ports and VLANs for traffic separation, I thought I would be ok. As for why, the hardware was reclaimed and that is what we have... You're right about Broadwell, but again, it's what we have, and I'm not so sure about the cost savings. We...
  3. N

    Proxmox with 48 nodes

    I know in the past that the recommended max number of nodes in a cluster was 32, but is this still the case? My boxes are all dual E5-2690v4 with dual 40 Gig Ethernet. I would like to have one cluster with 48 nodes, but is that a bad idea? Should I go two with 24 nodes?
  4. N

    bash sleep 10 is very inconsistent on VM

    I think you may be right, the .py script just tails the log and greps a line and shows the difference since last. Any idea why sending a line to syslog every 10 sec would be an issue, but writing to a file the same log line every 10 seconds would not? #!/usr/bin/env python3 import argparse, os...
  5. N

    bash sleep 10 is very inconsistent on VM

    I shared the script, its very simple, its just sleep 10 and a writing to syslog. The host nodes are all AMD EPYC 7763 64-Core Processor, 512G RAM, Linux virt1 6.14.11-2-pve #1 SMP PREEMPT_DYNAMIC PMX 6.14.11-2 VM is range from 4 cores 1 socket to 8 cores 1 socket, 6.8.0-79-generic #79-Ubuntu...
  6. N

    bash sleep 10 is very inconsistent on VM

    Running Proxmox 9.0.10 with Ubuntu 24.04 guest. I have been working with voipmonitor support staff who say my issues are because of my clock. I tested a simple script that SHOULD run every 10 seconds, but does not. This is on the weekend when my load is almost zero. Any ideas? #!/bin/bash while...
  7. N

    How do I remove duplicate unknown monitor, manager, and MDS?

    I tried to delete and recreate the service, but the ? unknown service came back when I created a new one for virt2. Any ideas on how to clean this up?
  8. N

    IOMMU 4 NVIDIA GPUs with NCCL

    I have a VM with four exported 3090 GPUs. The GPUs work and I can run things like gpuburn, but when I try to train my models with NCCL I run into errors. I don't have a ACS option in bios (I believe its off now so no option) Supermicro H12SSL, but I do have IOMMU on so I can export the cards to...
  9. N

    Dual 3080 GPUs work in a single VM, but not if I split them to have one each in two VMs.

    So a bit more info, the 4 GPUs are one two x16 slots that are bifurcated into two x8 slots for each GPU. When I boot without the pcie_acs_override=downstream the first two cards are in Group 13 and the last two Group 49, so that wont work with 4 VMs each using 1 card. With the...
  10. N

    Dual 3080 GPUs work in a single VM, but not if I split them to have one each in two VMs.

    Sorry, your right I was not clear. It is still in the same state as the original post, I can start one or the other, but not both if they are not in the same VM. I am using pcie_acs_override, and see them each in a different group. /sys/kernel/iommu_groups/48/devices/0000:81:00.0...
  11. N

    Dual 3080 GPUs work in a single VM, but not if I split them to have one each in two VMs.

    Thanks, they are now in different groups (with pcie_acs_override), but without this they are not. Yes, plenty of RAM, its something with passthrough.
  12. N

    Dual 3080 GPUs work in a single VM, but not if I split them to have one each in two VMs.

    System Setup Proxmox 8.0.4 Supermicro H12SSL 1 Nvidia 4090 3 Nvidia 3080 Machine q35 virt101 - 3080 PCI Device 0000:02:00 virt103 - 4090 PCI Device 0000:01:00 I had virt 105 with two 3080s, PCI Device 0000:81:00 and 0000:82:00 Everything works great with this setup; I shut down 105, cloned...
  13. N

    Palo Alto Networks VM

    Figured it out, you need to add a serial port. :)
  14. N

    Palo Alto Networks VM

    Ever get past this? I am seeing the same issue.
  15. N

    CEPH monitor cannot be deleted when the node fails and goes offline !

    root@virt01:/var/lib/ceph# pveceph createmon --monid virt01 --mon-address 10.0.0.101 monitor 'virt01' already exists
  16. N

    CEPH monitor cannot be deleted when the node fails and goes offline !

    I am having a similar problem, I have proxmox sees a monitor, but it has been removed by ceph: root@virt01:/var/lib/ceph# pveceph mon destroy virt01 no such monitor id 'virt01' root@virt01:/var/lib/ceph# ceph mon remove virt01 mon.virt01 does not exist or has already been removed...
  17. N

    Dual Nvidia 3080 GPUs work on same VM, but not if I have 2 VMs with 1 3080 GPU on each.

    Yep, they are both on 49, going to try a few BIOS settings if that does not work I found: https://gitlab.com/Queuecumber/linux-acs-override But rather see if I can do this without a custom kernel.
  18. N

    Dual Nvidia 3080 GPUs work on same VM, but not if I have 2 VMs with 1 3080 GPU on each.

    I have 2 Nvidia 3080s, on PCI 0000:01:00.0 and 0000:02:00.0, if I put them both on a VM, with x-vga=on and multifunction=on, it works, I nvidia-smi shows two GPUs. However, if I start vm1 with one of the GPUs say 0000:01:00.0, it will start fine, if I then try to start vm2 with GPU 0000:02:00.0...
  19. N

    Easy way to enable ceph authentication on a 21 server cluster without auth?

    I currently have a 21 server ceph cluster with 105 OSDs and I need to enable ceph authentication because Kubernetes can't mount a ceph volume without auth! I have looked at: https://docs.ceph.com/en/latest/rados/configuration/auth-config-ref/ Is that the only way with proxmox, or is there an...
  20. N

    Problem to configure network for guest with tagged and untagged vlan

    I also believe this is correct, I have not been able to create tagged and untagged in the same bridge.