Windows Server 2022 VM Dies when other VMs are started

HawHoo

New Member
Apr 1, 2024
16
2
3
Hi,
Newbie here,
I am testing PVE for a production environment and getting to understand PVE better

I have a test environment running and run into a strange situation
PVE-02 Core I5 (6 cores 16 GB) has
VM201 Linux (4 sockets 1 Cores 4GB mem) local ZFS
VM202 Linux (4 sockets 1 Cores 4GB mem) Local ZFS

running

if I live migrate
VM300 win 2022 (4 sockets 1 Cores 6GB mem) QCOW2 on NFS share
from PVE-03 to PVE02
VM300 (and all other VM's) will run fine.

If I shut down and then try to restart VM300 (on PVE-02 it will try to boot and then turn off (right after getting to the windows boot screen)
if I shut down VM201 (or VM202), I can start VM300 and it seems to run fine
As soon as I start VM201 again VM300 will be turned off without warning or error message. (Very scarry for production)

I suspected resource issues but since I can live migrate that does not seem to be the case.

If I migrate VM300 back to PVE-03 start it, it will run fine (after a repair) and I can then live migrate VM300 to PVE-02 again.
This scenario is reliably repeatable

I'd like to understand why this is happening
 
Not trying to answer your question (but as pointed out by dietmar this is most likely a RAM constraint of the node, add to this the dysfunctionality of a Windows 2022 VM on one core) - why do all your VMs have a 4-socket configuration? AFAIK no 4-socketed intel i5 MB exists? Correct me if I'm wrong.
Maybe your PVE-03 node is on some other architecture, (undetermined what effect this would have on live-migration), but that still doesn't explain the first 2 VMs.
 
@dietmar Thanks for getting back to me.
I guess you run out of memory on PVE-02.
Indeed, I did. Reducing vm300 to 4 GB memory fixed the issue. What I do not understand is why I was able to migrate.
There was a message on the console screen that the HV ran out of memory and killed a task.
A message on the task screen would be incredibly helpful in this scenario. May be an enhancement?
or should I be lookin somewher else for these error messages?

Questions:
1) Why I was able to live migrate VM300 to the HV, but not start VM300 on this HV. Does a startup require SIGNIFICANTLY more memory?
(I had 1 GB memory free with all VMs running)
 
or should I be lookin somewher else for these error messages?
While running a cluster in a production environment, you'll definitely want to setup some form of a monitoring service, which should be somewhat independent of the main system, so as to have as much redundancy as possible. PVE provides most of the necessary metrics you're going to want to "feed" to this service. There are many options out there available. This should be part of your trial/testing setup, so anticipate what scenario fits your needs.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!