[SOLVED] Windows VM shutdown/stop since migration to new Server

mircsicz

Well-Known Member
Sep 1, 2015
62
4
48
near Frankfurt
Hi all,

8 weeks ago I migrated a client from a HPE Gen8 DL360 to a new Gen10 machine... Since then the server randomly shuts down or stop, luckily only at night!

There is no cluster, and on that machine there are 2 Linux VM's and 4 Win2012r2 VM's and it's always the same two windows server giving me that issue...

There's absolutely no mention of that issue, or maybe I don't what to look for.

So please gi'me a hand with that issue...From my perspective machine is all up to date:

Code:
root@pve:~# pveversion -v
proxmox-ve: 7.0-2 (running kernel: 5.11.22-3-pve)
pve-manager: 7.0-13 (running version: 7.0-13/7aa7e488)
pve-kernel-helper: 7.1-2
pve-kernel-5.11: 7.0-8
pve-kernel-5.11.22-5-pve: 5.11.22-10
ceph-fuse: 15.2.13-pve1
corosync: 3.1.5-pve1
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.22-pve1
libproxmox-acme-perl: 1.4.0
libproxmox-backup-qemu0: 1.2.0-1
libpve-access-control: 7.0-5
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.0-10
libpve-guest-common-perl: 4.0-2
libpve-http-server-perl: 4.0-3
libpve-storage-perl: 7.0-12
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 4.0.9-4
lxcfs: 4.0.8-pve2
novnc-pve: 1.2.0-3
proxmox-backup-client: 2.0.11-1
proxmox-backup-file-restore: 2.0.11-1
proxmox-mini-journalreader: 1.2-1
proxmox-widget-toolkit: 3.3-6
pve-cluster: 7.0-3
pve-container: 4.1-1
pve-docs: 7.0-5
pve-edk2-firmware: 3.20210831-1
pve-firewall: 4.2-4
pve-firmware: 3.3-2
pve-ha-manager: 3.3-1
pve-i18n: 2.5-1
pve-qemu-kvm: 6.0.0-4
pve-xtermjs: 4.12.0-1
qemu-server: 7.0-16
smartmontools: 7.2-1
spiceterm: 3.2-2
vncterm: 1.7-1
zfsutils-linux: 2.0.5-pve1
 
Last edited:
Hi,
Check flags on the cpu.
If using the host cpu configuration model, you need to enable manually theses tags...
If not declared, activate all cpu flags witch seems for this machine( do not enable amd flags if using intel cpu ^^)
 
Hi,
Check flags on the cpu.
If using the host cpu configuration model, you need to enable manually theses tags...
If not declared, activate all cpu flags witch seems for this machine( do not enable amd flags if using intel cpu ^^)
THX for the hint, but they are the same on all of the machine's. As there's no cluster what would be the suggested model?

Reply to myself:

Here's what could be interpreted as best practice:

In short, if you care about live migration and moving VMs between nodes, leave the kvm64 default. If you don’t care about live migration or have a homogeneous cluster where all nodes have the same CPU, set the CPU type to host, as in theory this will give your guests maximum performance.
 

Attachments

  • Bildschirmfoto 2021-10-25 um 07.27.38.png
    Bildschirmfoto 2021-10-25 um 07.27.38.png
    66.8 KB · Views: 18
  • Bildschirmfoto 2021-10-25 um 07.25.54.png
    Bildschirmfoto 2021-10-25 um 07.25.54.png
    65.1 KB · Views: 15
Last edited:
Hey,

You're right about live migration ;) Insteed, kvm & kvm64 are best recommanded cpu model with a eterogeneous cluster.

BUT, if you use a defined CPU model, you don't need to enable flags, they're activated by the model selectionned. But, if your model is "host", flags need to be declared manually :)

Did you solve your problem?
 
Hi hi, had one server running now for a week (but on host mode without flags) and last night after 7 days uptime I was so f...ing happy and this morning it crashed again...

EDIT: see attached screenshot what I changed the flags to ;-)
 

Attachments

  • Bildschirmfoto 2021-11-04 um 00.06.55.png
    Bildschirmfoto 2021-11-04 um 00.06.55.png
    27.6 KB · Views: 21
Last edited:
Even with those flags those two servers keep turning off randomly.

I'm ready to buy some support to get it fixed once and for all!
 

Attachments

  • Bildschirmfoto 2021-11-15 um 09.42.51.png
    Bildschirmfoto 2021-11-15 um 09.42.51.png
    8.9 KB · Views: 6
it's not quite clear to me what is shutting down - a VM? the VMs? the hypervisor host? in any case we'd need logs and more information ;)
 
@fabian THX for the reply.

I thought I was clear, seems I'm not... It's 2 VM's that randomly shut down during the nights. The host is totally fine!

Problem is I can't find any log entry mentioning the host being shutdown.


Code:
root@pve:~# find /var/log/pve/tasks/ -name *qmstop*
/var/log/pve/tasks/C/UPID:pve:00053D49:000B45C5:6116A77C:qmstop:101:root@pam:
/var/log/pve/tasks/2/UPID:pve:000C50A9:0226E07B:611C2EA2:qmstop:101:root@pam:
/var/log/pve/tasks/5/UPID:pve:0035CC66:0012A142:6116DC15:qmstop:101:root@pam:
/var/log/pve/tasks/5/UPID:pve:0001F66F:000070E7:6116AD85:qmstop:101:root@pam:
/var/log/pve/tasks/E/UPID:pve:000B1FBC:0226A677:611C2E0E:qmstop:100:root@pam:
/var/log/pve/tasks/0/UPID:pve:000B615B:0226BA1D:611C2E40:qmstop:100:root@pam:
/var/log/pve/tasks/3/UPID:pve:003B8E4C:284447A5:617DC113:qmstop:104:root@pam:
/var/log/pve/tasks/3/UPID:pve:003F93DE:0009226B:6116A203:qmstop:102:root@pam:
/var/log/pve/tasks/A/UPID:pve:000256E6:0000985C:6116ADEA:qmstop:100:root@pam:

I'm pretty sure it's windows just turning off! This is all the qmstop run's I could find...

102 & 104 are the machine'S shutting down randomly:

Code:
root@pve:~# find /var/log/pve/tasks/ -name *qmstart*|egrep "102|104"
/var/log/pve/tasks/C/UPID:pve:003B8823:284444EE:617DC10C:qmstart:104:root@pam:
/var/log/pve/tasks/1/UPID:pve:002F0262:05C9FA85:61257E41:qmstart:104:root@pam:
/var/log/pve/tasks/1/UPID:pve:000D33F7:000959B9:6116A291:qmstart:102:root@pam:
/var/log/pve/tasks/6/UPID:pve:000CFEBB:022713DC:611C2F26:qmstart:104:root@pam:
/var/log/pve/tasks/6/UPID:pve:00114BEE:0BF22E6D:61354156:qmstart:102:root@pam:
/var/log/pve/tasks/D/UPID:pve:002D93A9:000A3E7D:6116C69D:qmstart:104:root@pam:
/var/log/pve/tasks/D/UPID:pve:0034B0B6:0E305CD5:613AFF3D:qmstart:104:root@pam:
/var/log/pve/tasks/8/UPID:pve:001153E6:24529210:6173A838:qmstart:104:root@pam:
/var/log/pve/tasks/8/UPID:pve:003B53D5:25DB3F16:61779578:qmstart:104:root@pam:
/var/log/pve/tasks/8/UPID:pve:000A7EB0:24AB3188:61748B18:qmstart:102:root@pam:
/var/log/pve/tasks/8/UPID:pve:0039F50C:2A86F315:61838A78:qmstart:102:root@pam:
/var/log/pve/tasks/8/UPID:pve:0004CC34:2A591345:61831508:qmstart:104:root@pam:
/var/log/pve/tasks/2/UPID:pve:0000CF74:00001225:6116AC92:qmstart:102:root@pam:
/var/log/pve/tasks/2/UPID:pve:003D97C3:1E7DC336:6164BAA2:qmstart:102:root@pam:
/var/log/pve/tasks/2/UPID:pve:00174920:15383941:614CFEE2:qmstart:104:root@pam:
/var/log/pve/tasks/2/UPID:pve:000D1B66:0C70977C:613684F2:qmstart:104:root@pam:
/var/log/pve/tasks/5/UPID:pve:002BFD26:2A02155C:61823655:qmstart:104:root@pam:
/var/log/pve/tasks/5/UPID:pve:000726BC:08923D9A:612C9DA5:qmstart:104:root@pam:
/var/log/pve/tasks/5/UPID:pve:001E50DA:07D3CF4C:612AB625:qmstart:102:root@pam:
/var/log/pve/tasks/5/UPID:pve:0031A71F:090C6EBA:612DD675:qmstart:104:root@pam:
/var/log/pve/tasks/E/UPID:pve:002F8869:26E9E28A:617A4A4E:qmstart:104:root@pam:
/var/log/pve/tasks/E/UPID:pve:000BD8CF:1686E2C4:6150579E:qmstart:102:root@pam:
/var/log/pve/tasks/E/UPID:pve:0022B925:302F6545:6192067E:qmstart:104:root@pam:
/var/log/pve/tasks/0/UPID:pve:003BAE24:28445F56:617DC150:qmstart:104:root@pam:
/var/log/pve/tasks/0/UPID:pve:003D4653:2C911495:6188C320:qmstart:104:root@pam:
/var/log/pve/tasks/0/UPID:pve:00287215:01D4A57E:611B5C20:qmstart:102:root@pam:
/var/log/pve/tasks/0/UPID:pve:003B6C3E:142ED0C9:614A5770:qmstart:104:root@pam:
/var/log/pve/tasks/F/UPID:pve:0027B47D:2DF41591:618C4FEF:qmstart:104:root@pam:
/var/log/pve/tasks/F/UPID:pve:00016106:2D9F6449:618B771F:qmstart:104:root@pam:
/var/log/pve/tasks/F/UPID:pve:001BAE02:29E1D288:6181E3BF:qmstart:102:root@pam:
/var/log/pve/tasks/4/UPID:pve:00082766:0224ED0B:611C29A4:qmstart:102:root@pam:
/var/log/pve/tasks/4/UPID:pve:000C7BC0:22C62D88:616FB174:qmstart:104:root@pam:
/var/log/pve/tasks/3/UPID:pve:0021407D:16BF0574:6150E753:qmstart:104:root@pam:
/var/log/pve/tasks/3/UPID:pve:002A957E:0A6ABF21:61315743:qmstart:104:root@pam:
/var/log/pve/tasks/3/UPID:pve:0032B16D:0970A012:612ED6F3:qmstart:104:root@pam:
/var/log/pve/tasks/3/UPID:pve:001375F1:2DEACFE8:618C3833:qmstart:102:root@pam:
/var/log/pve/tasks/9/UPID:pve:002EF066:1FACA79C:6167C209:qmstart:104:root@pam:
/var/log/pve/tasks/9/UPID:pve:003F9E13:1777B8E4:6152C029:qmstart:104:root@pam:
/var/log/pve/tasks/9/UPID:pve:003249FA:1A7DDF83:615A7D79:qmstart:104:root@pam:
/var/log/pve/tasks/9/UPID:pve:00130111:109A33F4:61412CE9:qmstart:104:root@pam:
/var/log/pve/tasks/9/UPID:pve:002DEB62:2348D331:6170FFE9:qmstart:102:root@pam:
/var/log/pve/tasks/9/UPID:pve:00210A9D:2557C0A4:617644D9:qmstart:102:root@pam:
/var/log/pve/tasks/A/UPID:pve:0020D84C:2F88AA4E:61905BAA:qmstart:104:root@pam:
/var/log/pve/tasks/A/UPID:pve:002D157D:0002B97E:6116919A:qmstart:102:root@pam:
 
well, I'd check the logs inside the VMs and the hypervisor journal for anything out of the ordinary (journalctl -b --since ... for example) if the VM is shuttding down, it should be visible in qmeventd logs. if it crashes, you should see a message about that as well in the journal.
 
Thx for pointing out "journalctl"

found it:


Code:
Oct 28 05:21:27 pve kernel: oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=qemu.slice,mems_allowed=0,global_oom,task_memcg=/qemu.slice/104.scope,task=kvm,pid=3888245,uid=0
Oct 28 05:21:27 pve kernel: Out of memory: Killed process 3888245 (kvm) total-vm:18476872kB, anon-rss:16814900kB, file-rss:3968kB, shmem-rss:4kB, UID:0 pgtables:34116kB oom_score_adj:0
Oct 28 05:21:27 pve systemd[1]: 104.scope: A process of this unit has been killed by the OOM killer.
Oct 28 05:21:27 pve kernel:  zd96: p1 p2
Oct 28 05:21:29 pve kernel: oom_reaper: reaped process 3888245 (kvm), now anon-rss:0kB, file-rss:36kB, shmem-rss:4kB
Oct 28 05:21:30 pve kernel:  zd112: p1 p2
Oct 28 05:21:30 pve kernel: vmbr0: port 9(tap104i0) entered disabled state
Oct 28 05:21:30 pve kernel: vmbr0: port 9(tap104i0) entered disabled state
Oct 28 05:21:30 pve systemd[1]: 104.scope: Succeeded.
Oct 28 05:21:30 pve systemd[1]: 104.scope: Consumed 13h 36min 45.712s CPU time.
Oct 28 05:21:31 pve qmeventd[2345577]: Starting cleanup for 104
Oct 28 05:21:31 pve qmeventd[2345577]: Finished cleanup for 104

So the sys is running OOM and kills the VM

What I don't get is why, the VM's have a total 60GB, from 96GB available on the host, assigned...
 
Is there a way to find out why only two of the seven VM's were killed?

Code:
root@pve:~# qm list
      VMID NAME                 STATUS     MEM(MB)    BOOTDISK(GB) PID
100 pbx running 1024 16.00 754029
101 mgr running 1024 7.00 823598
102 dc running    16384             55.00 1275637
103 fs running 12288 55.00 676650
104 ex running    16384             64.00 2275647
105 ms running 8192 50.00 2096741
198 xmr8 running 4096 8.00 3304511
 
the OOM killer tries to find appropriate candidate processes (by looking at memory usage, OOM scores, etc). it should print a summary of the state when it is triggered, that might give you a clue. but most likely they were the two VMs which used the most RAM at that point. basically the idea is "if I have to kill a process to get free memory, I kill something that gives me a lot of memory so I don't have to kill many processes"
 
the OOM killer tries to find appropriate candidate processes (by looking at memory usage, OOM scores, etc). it should print a summary of the state when it is triggered, that might give you a clue. but most likely they were the two VMs which used the most RAM at that point. basically the idea is "if I have to kill a process to get free memory, I kill something that gives me a lot of memory so I don't have to kill many processes"
THX again, yes those two are the only VM's with 16GB of RAM assigned all the others have assigned less...
 
Hey,

If really necessary to have 16GB RAM in theses 2VM, maybe try balloning technology ?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!