Hello Proxmox Team,
We operate a Proxmox (8.2.2) VM cluster with 20 nodes and Enterprise Subscription.
We operate our (external) Proxmox CEPH Storage on 5 additional nodes, as we do not want to mix the “hyperconverged storage” with the VMs nodes.
The oldest CPU type is:
-> 24x Intel(R) Xeon(R) CPU E5-2620 v3
-> AMD EPYC 7282 16-Core Processor
The newest CPU type is:
-> 128 x Intel(R) Xeon(R) Platinum 8358 CPU
-> AMD EPYC 7543 32-Core Processor
For universal compatibility between all nodes, with maximum CPU flag support, we use: X86-64-v3
We also use NVIDIA VGPU.
For compatibility reasons, the nodes with NVIDIA VGPU run on kernel 6.5 (pinned),
as the current official NVIDIA Host_Driver (NVIDIA-GRID-Linux-KVM-550.54.16-550.54.15-551.78) does not compile cleanly under kernel 6.8.
Unofficial patches like from:
https://gitlab.com/polloloco/vgpu-proxmox/-/commit/e5ca18869437439390daf18d0ae2f355e728fc29
we do not want to use!
On Friday we carried out maintenance work on the infrastructure and migrated various VMs
from the node with kernel 6.8 to the node with kernel 6.5 & VGPU via live migration.
After about 30 minutes, the Debian VMs froze and the Windows VMs crashed.
This frustrated us a lot as we were dealing with specific infrastructure related issues and relied on the basic Proxmox features.
The Proxmox team develops in very short cycles, but it would be very helpful to have a simple overview in the Proxmox Wiki of exactly which features are recommended.
We operate a Proxmox (8.2.2) VM cluster with 20 nodes and Enterprise Subscription.
We operate our (external) Proxmox CEPH Storage on 5 additional nodes, as we do not want to mix the “hyperconverged storage” with the VMs nodes.
The oldest CPU type is:
-> 24x Intel(R) Xeon(R) CPU E5-2620 v3
-> AMD EPYC 7282 16-Core Processor
The newest CPU type is:
-> 128 x Intel(R) Xeon(R) Platinum 8358 CPU
-> AMD EPYC 7543 32-Core Processor
For universal compatibility between all nodes, with maximum CPU flag support, we use: X86-64-v3
We also use NVIDIA VGPU.
For compatibility reasons, the nodes with NVIDIA VGPU run on kernel 6.5 (pinned),
as the current official NVIDIA Host_Driver (NVIDIA-GRID-Linux-KVM-550.54.16-550.54.15-551.78) does not compile cleanly under kernel 6.8.
Unofficial patches like from:
https://gitlab.com/polloloco/vgpu-proxmox/-/commit/e5ca18869437439390daf18d0ae2f355e728fc29
we do not want to use!
On Friday we carried out maintenance work on the infrastructure and migrated various VMs
from the node with kernel 6.8 to the node with kernel 6.5 & VGPU via live migration.
After about 30 minutes, the Debian VMs froze and the Windows VMs crashed.
This frustrated us a lot as we were dealing with specific infrastructure related issues and relied on the basic Proxmox features.
The Proxmox team develops in very short cycles, but it would be very helpful to have a simple overview in the Proxmox Wiki of exactly which features are recommended.
Proxmox Release | Kernel | Kernel | Live Migration (from 6.8 -> 6.5) | Live Migration (from 6.5 -> 6.8) | VGPU Host Support (Kernel 6.5) |
8.2.2 | 6.5.x | 6.8.x | unstable? | stable? | stable (535 & 550) |
Guest OS | VCPU Type | Virtual Storage | IO thread | Virtual Network | Guest Drivers |
Windows 2022 | X86-64-v3 / HOST | VirtIO SCSI Single / VirtIO Block | Enabled | VirtIO | virtio-win-0.1.240 |
Linux Debian 12 | X86-64-v3 / HOST | VirtIO SCSI Single / VirtIO Block | Enabled | VirtIO | Linux Kernel 6.1.x |