Proxmox Wiki about Status? Live Migration? VGPU Support? Overview

May 18, 2021
25
3
8
40
Hello Proxmox Team,

We operate a Proxmox (8.2.2) VM cluster with 20 nodes and Enterprise Subscription.
We operate our (external) Proxmox CEPH Storage on 5 additional nodes, as we do not want to mix the “hyperconverged storage” with the VMs nodes.

The oldest CPU type is:
-> 24x Intel(R) Xeon(R) CPU E5-2620 v3
-> AMD EPYC 7282 16-Core Processor

The newest CPU type is:
-> 128 x Intel(R) Xeon(R) Platinum 8358 CPU
-> AMD EPYC 7543 32-Core Processor

For universal compatibility between all nodes, with maximum CPU flag support, we use: X86-64-v3

We also use NVIDIA VGPU.

For compatibility reasons, the nodes with NVIDIA VGPU run on kernel 6.5 (pinned),
as the current official NVIDIA Host_Driver (NVIDIA-GRID-Linux-KVM-550.54.16-550.54.15-551.78) does not compile cleanly under kernel 6.8.

Unofficial patches like from:
https://gitlab.com/polloloco/vgpu-proxmox/-/commit/e5ca18869437439390daf18d0ae2f355e728fc29
we do not want to use!

On Friday we carried out maintenance work on the infrastructure and migrated various VMs
from the node with kernel 6.8 to the node with kernel 6.5 & VGPU via live migration.
After about 30 minutes, the Debian VMs froze and the Windows VMs crashed.

This frustrated us a lot as we were dealing with specific infrastructure related issues and relied on the basic Proxmox features.

The Proxmox team develops in very short cycles, but it would be very helpful to have a simple overview in the Proxmox Wiki of exactly which features are recommended.

Proxmox ReleaseKernelKernelLive Migration (from 6.8 -> 6.5)Live Migration (from 6.5 -> 6.8)VGPU Host Support (Kernel 6.5)
8.2.26.5.x6.8.xunstable?stable?stable (535 & 550)

Guest OSVCPU TypeVirtual StorageIO threadVirtual NetworkGuest Drivers
Windows 2022X86-64-v3 / HOSTVirtIO SCSI Single / VirtIO BlockEnabledVirtIOvirtio-win-0.1.240
Linux Debian 12X86-64-v3 / HOSTVirtIO SCSI Single / VirtIO BlockEnabledVirtIOLinux Kernel 6.1.x
 
MY question Would be, what happens if u migrate from kernel 6.5 to 6.5.
 
Since we're all guessing here: Proxmox advise is to use the same CPU (family) for a production cluster. Migration to a newer PVE version should work but vice versa might not. And kernel 6.8 has been unlucky for a lot of people and might not be representative for a general rule.
 
@floh8

No further migration, at that time the migration of the VMs from nodes with kernel 6.8 to nodes with kernel 6.5 was necessary.

@Kingneutron

This is not about opening a support ticket, but about increasing transparency in the wiki and generally having an overview of which features are tested in which release statuses and how well.
After all, the Proxmox user community is large enough to determine a good reference value.

@leesteken

If you regularly update your system from the Enterprise Repository, you will automatically receive the packages for the latest Proxmox 8.2.2 release with kernel 6.8.
It can't become a Russian roulette when you install or update Proxmox.
 
  • Like
Reactions: jorge25x
Sorry, but it is Russian Roulette, even if proxmox can't do anything about it
 
@floh8

Why should it?

The kernel releases come from Cannonical, the userland from the Debian team and the Proxmox individual patches from the Proxmox team.

For quality control, a handful of VMs are operated on 2 nodes with shared storage and various scenarios can be tested in advance via API access using CI/CD tools.
These evaluations can then be automatically uploaded to the wiki page.

Ansible / Puppet / Chef / Saltstack whatever: -> Update Nodes 1, Update Node2, Migrate VM 1 -> Live -> from Node 1 to to Node 2 -> check via monitoring tool every 10 minutes system availability / log events of the VM -> Repeat this test scenario every 2-3 hours -> If no sporadic error has occurred after 24 hours, push a change to the wiki page with the entry “Live Migration (Kernel X -> Kernel X) = stable”
-> Automatically release the package changes in the Proxmox Enterprise Repository.
 
If proxmox would have More resources, they would do more testing but they do not go that way. Xcp-ng go More this way, but they work with kernel 4.x. Simply proxmox is not maintainer of these packages and therefor they trust the knowhow of the others and the power of the community. I would like that proxmox do More testing. The higher price i would pay.
 
Last edited:
For quality control, a handful of VMs are operated on 2 nodes with shared storage and various scenarios can be tested in advance via API access using CI/CD tools.
These evaluations can then be automatically uploaded to the wiki page.
In general, I concur, yet the cross product of all possible systems that lead to problems in the past is just too darn big. You need various generations of hardware and different kernel versions ...
 
Well ...

In order to make the whole topic a little more concrete, I have set up an API-capable wiki under the new domain: https://proxmox-stack.com.

We could make 2 nodes from our lab environment permanently available for this project.

Now only the “Boot Environment” part is missing to be able to switch between different Proxmox release states, on ZFS level, of the physical hosts and the CI/CD automation part.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!