VM was shutdown for unknown reasons

IsThisThingOn

Well-Known Member
Nov 26, 2021
291
117
48
Normally I would not think much about a VM shutting down, but that VM has a GPO policy so there is no shutdown button.
This made me curious to find out what caused the shutdown.

Proxmox webGUI:
Here I see nothing of interest in the task list. The VM was shutdown for a backup, but that was a few days ago. And no, it can't be still in shutdown state from there, because people used it in the meantime (VM has a SMB share that was used in the meantime).
Only tasks there are in between the backup and now, are multiple "update package database" task, all with "TASK OK".

Windows itself showed me this in cmd:

Bash:
C:\Windows\system32>wevtutil qe System /q:"*[System[(EventID=1074)]]" /f:text /c:1
Event[0]:
  Log Name: System
  Source: User32
  Date: 2025-03-16T01:00:03.0410000Z
  Event ID: 1074
  Task: N/A
  Level: Informationen
  Opcode: N/A
  Keyword: Klassisch
  User: S-1-5-18
  User Name: NT-AUTORITÄT\SYSTEM
  Computer: winbau
  Description:
Vom Prozess "qemu-ga.exe" wurde auf Anforderung des Benutzers "NT-AUTORITÄT\SYSTEM" das Ereignis "Ausschalten" für den Computer "printshare" aus folgendem Grund initiiert: "Anderer Grund (geplant)"
 Ursachencode: "0x80000000"
 Herunterfahrtyp: "Ausschalten"
 Kommentar: ""

which makes me belive it was some kind of qemu bug. Any ideas how to troubleshoot this further?
 
could it be possible that somebody pressed on shutdown from the pve ui ? (or via the pve api/cli) ?

if the guest-agent is configured and installed, it will be used when shutting down a vm from the PVE side

EDIT: ah sorry, saw just now that you already looked in the task list


maybe look into/post the journal from the time the shutdown happened?
 
Do I understand you correctly, that you also don't think it came form the GUI,API because it isn't in the PVE task list?

Regarding Windows event viewer, it gets a little bit strange and I am a little bit confused what happened here.

Looking output from above, one could assume that qemu-ga.exe somehow triggered a scheduled shutdown. That was at 01:00.

But I see event logs until 03:27.
Then there is nothing for hours until 08:14 when I started the system manually.
And even stranger, there is a error log at 08:14 that "the system was shutdown unexpectedly at 03:00"

I have a hard time figuring out what really happened, since these times make no sense at all to me.

How can the system be running at 03:27, when it supposedly crashed at 03:00?
Why did it run at least until 03:27, when the qemu-ga.exe issued a shutdown command at 01:00?
And why did the qemu-ga.exe even trigger a shutdown command in the first place?
 
from the exceprt of the log, i guess that you're german speaking ("Herunterfahrtyp") so i guess you're currently in GMT+2 timezone? then that would somehow fit because the timestamp:


Date: 2025-03-16T01:00:03.0410000Z
is in UTC time, which would correspond to around 03:00 in GMT+2

Do I understand you correctly, that you also don't think it came form the GUI,API because it isn't in the PVE task list?
yes, i'd assume that the shutdown would be logged as a task, though i guess there are ways to talk with the guest agent without our tooling/api so i asked if there was something in the host logs around that time


as for
But I see event logs until 03:27.

maybe there was some process stalling the shutdown for ~30 minutes? (some disk operations, etc.?)

i'm really just poking in the dark here without more logs ;)

EDIT:

also i'm just realizing, the log you posted is from march ? is the time way off in the guest or is that maybe the wrong log?
 
also i'm just realizing, the log you posted is from march ? is the time way off in the guest or is that maybe the wrong log?
Haha, oh my gosh sorry, you are right. I just looked at the 16th and just assumed September, this has nothing to do with this shutdown :D

i'm really just poking in the dark here without more logs ;)
Logs from where?
Maybe I am not educated enough, but I did not see anything worthwhile in the event viewer.
 
Last edited:
Logs from where?
as i wrote, the journal from the host would be interesting. you can get that with 'journalctl' e.g. all the logs since the last boot would be 'journalctl -b' or if you want to specify a date you can do something like 'journalctl --since 2025-09-15'
 
  • Like
Reactions: _gabriel
Cheers mate!

Now this looks more interesting:

Bash:
Sep 16 02:31:18 pve smartd[839]: Device: /dev/sda [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 76 to 77
Sep 16 02:31:18 pve smartd[839]: Device: /dev/sdc [SAT], SMART Usage Attribute: 190 Airflow_Temperature_Cel changed from 63 to 62
Sep 16 02:31:18 pve smartd[839]: Device: /dev/sdc [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 37 to 38
Sep 16 03:01:18 pve smartd[839]: Device: /dev/sda [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 77 to 76
Sep 16 03:01:18 pve smartd[839]: Device: /dev/sdc [SAT], SMART Usage Attribute: 190 Airflow_Temperature_Cel changed from 62 to 63
Sep 16 03:01:18 pve smartd[839]: Device: /dev/sdc [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 38 to 37
Sep 16 03:10:01 pve CRON[1507086]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Sep 16 03:10:01 pve CRON[1507087]: (root) CMD (test -e /run/systemd/system || SERVICE_MODE=1 /sbin/e2scrub_all -A -r)
Sep 16 03:10:01 pve CRON[1507086]: pam_unix(cron:session): session closed for user root
Sep 16 03:17:01 pve CRON[1508387]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Sep 16 03:17:01 pve CRON[1508388]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
Sep 16 03:17:01 pve CRON[1508387]: pam_unix(cron:session): session closed for user root
Sep 16 03:31:03 pve kernel: CPU 0/KVM invoked oom-killer: gfp_mask=0x140cca(GFP_HIGHUSER_MOVABLE|__GFP_COMP), order=0, oom_score_adj=0
Sep 16 03:31:03 pve kernel: CPU: 0 PID: 949080 Comm: CPU 0/KVM Tainted: P           O       6.8.12-13-pve #1
Sep 16 03:31:03 pve kernel: Hardware name: HPE ProLiant MicroServer Gen10 Plus/ProLiant MicroServer Gen10 Plus, BIOS U48 05/16/2025
Sep 16 03:31:03 pve kernel: Call Trace:
Sep 16 03:31:03 pve kernel:  <TASK>
Sep 16 03:31:03 pve kernel:  dump_stack_lvl+0x76/0xa0
Sep 16 03:31:03 pve kernel:  dump_stack+0x10/0x20
Sep 16 03:31:03 pve kernel:  dump_header+0x49/0x210
Sep 16 03:31:03 pve kernel:  oom_kill_process+0x110/0x240
Sep 16 03:31:03 pve kernel:  out_of_memory+0x26e/0x560
Sep 16 03:31:03 pve kernel:  __alloc_pages+0x10ce/0x1320
Sep 16 03:31:03 pve kernel:  alloc_pages_mpol+0x91/0x1f0
Sep 16 03:31:03 pve kernel:  vma_alloc_folio+0x64/0xd0
Sep 16 03:31:03 pve kernel:  do_wp_page+0x6eb/0xc10
Sep 16 03:31:03 pve kernel:  __handle_mm_fault+0xba9/0xf70
Sep 16 03:31:03 pve kernel:  handle_mm_fault+0x18d/0x380
Sep 16 03:31:03 pve kernel:  __get_user_pages+0x14f/0x730
Sep 16 03:31:03 pve kernel:  get_user_pages_unlocked+0xe8/0x370
Sep 16 03:31:03 pve kernel:  hva_to_pfn+0xb6/0x540 [kvm]
Sep 16 03:31:03 pve kernel:  __gfn_to_pfn_memslot+0xb5/0x150 [kvm]
Sep 16 03:31:03 pve kernel:  kvm_faultin_pfn+0x123/0x670 [kvm]
Sep 16 03:31:03 pve kernel:  kvm_tdp_page_fault+0x11c/0x170 [kvm]
Sep 16 03:31:03 pve kernel:  kvm_mmu_do_page_fault+0x1b4/0x1f0 [kvm]
Sep 16 03:31:03 pve kernel:  kvm_mmu_page_fault+0x90/0x700 [kvm]
Sep 16 03:31:03 pve kernel:  ? skip_emulated_instruction+0xc9/0x230 [kvm_intel]
Sep 16 03:31:03 pve kernel:  ? __check_object_size+0x6a/0x300
Sep 16 03:31:03 pve kernel:  ? vmx_vmexit+0x79/0xe0 [kvm_intel]
Sep 16 03:31:03 pve kernel:  ? vmx_vmexit+0x73/0xe0 [kvm_intel]
Sep 16 03:31:03 pve kernel:  handle_ept_violation+0xeb/0x440 [kvm_intel]
Sep 16 03:31:03 pve kernel:  ? vmx_vcpu_enter_exit+0x88/0x450 [kvm_intel]
Sep 16 03:31:03 pve kernel:  vmx_handle_exit+0x1f5/0x960 [kvm_intel]
Sep 16 03:31:03 pve kernel:  kvm_arch_vcpu_ioctl_run+0x8f6/0x1680 [kvm]
Sep 16 03:31:03 pve kernel:  ? kvm_on_user_return+0x78/0xd0 [kvm]
Sep 16 03:31:03 pve kernel:  kvm_vcpu_ioctl+0x2a9/0x820 [kvm]
Sep 16 03:31:03 pve kernel:  ? do_syscall_64+0x8d/0x170
Sep 16 03:31:03 pve kernel:  ? fire_user_return_notifiers+0x37/0x80
Sep 16 03:31:03 pve kernel:  ? syscall_exit_to_user_mode+0x86/0x260
Sep 16 03:31:03 pve kernel:  ? do_syscall_64+0x8d/0x170
Sep 16 03:31:03 pve kernel:  __x64_sys_ioctl+0xa0/0xf0
Sep 16 03:31:03 pve kernel:  x64_sys_call+0xa71/0x2480
Sep 16 03:31:03 pve kernel:  do_syscall_64+0x81/0x170
Sep 16 03:31:03 pve kernel:  ? do_syscall_64+0x8d/0x170
Sep 16 03:31:03 pve kernel:  ? kvm_on_user_return+0x78/0xd0 [kvm]
Sep 16 03:31:03 pve kernel:  ? fire_user_return_notifiers+0x37/0x80
Sep 16 03:31:03 pve kernel:  ? syscall_exit_to_user_mode+0x86/0x260
Sep 16 03:31:03 pve kernel:  ? flush_tlb_func+0x216/0x260
Sep 16 03:31:03 pve kernel:  ? __pfx_flush_tlb_func+0x10/0x10
Sep 16 03:31:03 pve kernel:  ? __flush_smp_call_function_queue+0x9f/0x450
Sep 16 03:31:03 pve kernel:  ? irqentry_exit_to_user_mode+0x7b/0x260
Sep 16 03:31:03 pve kernel:  ? irqentry_exit+0x43/0x50
Sep 16 03:31:03 pve kernel:  entry_SYSCALL_64_after_hwframe+0x78/0x80
Sep 16 03:31:03 pve kernel: RIP: 0033:0x7758987b7d5b
Sep 16 03:31:03 pve kernel: Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 >
Sep 16 03:31:03 pve kernel: RSP: 002b:0000775894e77ee0 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
Sep 16 03:31:03 pve kernel: RAX: ffffffffffffffda RBX: 00005b3d87527e50 RCX: 00007758987b7d5b
Sep 16 03:31:03 pve kernel: RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 0000000000000016
Sep 16 03:31:03 pve kernel: RBP: 000000000000ae80 R08: 0000000000000000 R09: 0000000000000000
Sep 16 03:31:03 pve kernel: R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
Sep 16 03:31:03 pve kernel: R13: 0000000000000001 R14: 0000000000000177 R15: 0000000000000000
Sep 16 03:31:03 pve kernel:  </TASK>
Sep 16 03:31:03 pve kernel: Mem-Info:
Sep 16 03:31:03 pve kernel: active_anon:1433818 inactive_anon:4348386 isolated_anon:0
                             active_file:3363 inactive_file:8900 isolated_file:0
                             unevictable:768 dirty:13 writeback:3
                             slab_reclaimable:86977 slab_unreclaimable:188828
                             mapped:14938 shmem:11164 pagetables:15704
                             sec_pagetables:12422 bounce:0
                             kernel_misc_reclaimable:0
                             free:76000 free_pcp:90 free_cma:0
Sep 16 03:31:03 pve kernel: Node 0 active_anon:5735272kB inactive_anon:17393544kB active_file:13452kB inactive_file:35600kB unevictable:3072kB is>
Sep 16 03:31:03 pve kernel: Node 0 DMA free:11264kB boost:0kB min:28kB low:40kB high:52kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0>
Sep 16 03:31:03 pve kernel: lowmem_reserve[]: 0 1771 31865 31865 31865
Sep 16 03:31:03 pve kernel: Node 0 DMA32 free:126780kB boost:2048kB min:5800kB low:7612kB high:9424kB reserved_highatomic:0KB active_anon:53432kB>
Sep 16 03:31:03 pve kernel: lowmem_reserve[]: 0 0 30093 30093 30093
Sep 16 03:31:03 pve kernel: Node 0 Normal free:166464kB boost:188128kB min:251924kB low:282736kB high:313548kB reserved_highatomic:0KB active_ano>
Sep 16 03:31:03 pve kernel: lowmem_reserve[]: 0 0 0 0 0
Sep 16 03:31:03 pve kernel: Node 0 DMA: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 1*1024kB (U) 1*2048kB (M) 2*4096kB (M) = 11264kB
Sep 16 03:31:03 pve kernel: Node 0 DMA32: 360*4kB (UME) 399*8kB (UME) 483*16kB (UE) 451*32kB (UME) 374*64kB (UME) 213*128kB (UE) 116*256kB (UE) 2>
Sep 16 03:31:03 pve kernel: Node 0 Normal: 5031*4kB (UME) 1135*8kB (UME) 5766*16kB (UME) 1370*32kB (UME) 6*64kB (M) 1*128kB (M) 0*256kB 0*512kB 0>
Sep 16 03:31:03 pve kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
Sep 16 03:31:03 pve kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Sep 16 03:31:03 pve kernel: 23543 total pagecache pages
Sep 16 03:31:03 pve kernel: 0 pages in swap cache
Sep 16 03:31:03 pve kernel: Free swap  = 0kB
Sep 16 03:31:03 pve kernel: Total swap = 0kB
Sep 16 03:31:03 pve kernel: 8349266 pages RAM
Sep 16 03:31:03 pve kernel: 0 pages HighMem/MovableOnly
Sep 16 03:31:03 pve kernel: 171152 pages reserved
Sep 16 03:31:03 pve kernel: 0 pages hwpoisoned
Sep 16 03:31:03 pve kernel: Tasks state (memory values in pages):
Sep 16 03:31:03 pve kernel: [  pid  ]   uid  tgid total_vm      rss rss_anon rss_file rss_shmem pgtables_bytes swapents oom_score_adj name
Sep 16 03:31:03 pve kernel: [    490]     0   490    10340     1857      256     1600         1   114688        0          -250 systemd-journal
Sep 16 03:31:03 pve kernel: [    505]     0   505     7205      928      608      320         0    77824        0         -1000 systemd-udevd

shortened for forum

pickup
Sep 16 03:31:03 pve kernel: [1510966]     0 1510966     1368      256        0      256         0    53248        0             0 sleep
Sep 16 03:31:03 pve kernel: oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=qemu.slice,mems_allowed=0,global_oom,task_memcg=/qemu.slic>
Sep 16 03:31:03 pve kernel: Out of memory: Killed process 946755 (kvm) total-vm:9241212kB, anon-rss:8422388kB, file-rss:2560kB, shmem-rss:0kB, UI>
Sep 16 03:31:03 pve kernel:  zd48: p1 p2
Sep 16 03:31:03 pve systemd[1]: 100.scope: A process of this unit has been killed by the OOM killer.
Sep 16 03:31:03 pve systemd[1]: 100.scope: Failed with result 'oom-kill'.
Sep 16 03:31:03 pve systemd[1]: 100.scope: Consumed 1h 22min 33.790s CPU time.
Sep 16 03:31:03 pve kernel: fwbr100i0: port 2(tap100i0) entered disabled state
Sep 16 03:31:03 pve kernel: tap100i0 (unregistering): left allmulticast mode
Sep 16 03:31:03 pve kernel: fwbr100i0: port 2(tap100i0) entered disabled state
Sep 16 03:31:03 pve kernel:  zd32: p1
Sep 16 03:31:04 pve qmeventd[1511060]: Starting cleanup for 100
Sep 16 03:31:04 pve kernel: fwbr100i0: port 1(fwln100i0) entered disabled state
Sep 16 03:31:04 pve kernel: vmbr0: port 3(fwpr100p0) entered disabled state
Sep 16 03:31:04 pve kernel: fwln100i0 (unregistering): left allmulticast mode
Sep 16 03:31:04 pve kernel: fwln100i0 (unregistering): left promiscuous mode
Sep 16 03:31:04 pve kernel: fwbr100i0: port 1(fwln100i0) entered disabled state
Sep 16 03:31:04 pve kernel: fwpr100p0 (unregistering): left allmulticast mode
Sep 16 03:31:04 pve kernel: fwpr100p0 (unregistering): left promiscuous mode
Sep 16 03:31:04 pve kernel: vmbr0: port 3(fwpr100p0) entered disabled state
Sep 16 03:31:04 pve qmeventd[1511060]: Finished cleanup for 100
Sep 16 03:31:18 pve smartd[839]: Device: /dev/sdb [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 76 to 75
Sep 16 03:31:19 pve smartd[839]: Device: /dev/sdc [SAT], SMART Usage Attribute: 190 Airflow_Temperature_Cel changed from 63 to 62

The SMART temp for sdb and sda seems unreasonably high. Sure these are SSDs, but 75 degrees?
Bash:
smartctl -A /dev/sda | grep Temperature
194 Temperature_Celsius     0x0022   077   065   000    Old_age   Always       -       23

The second problem seems to have been an out of band memory issue.
I probably was to generous with ARC ;)
 
The SMART temp for sdb and sda seems unreasonably high. Sure these are SSDs, but 75 degrees?
sadly SMART data is not really standardized and some vendors use offsets or different units or formats for the temperature. Better monitoring for the temperature can be done by loading the 'drivetemp' module with `modprobe drivetemp` and using the `sensors` binary from the `lm-sensors` package

The second problem seems to have been an out of band memory issue.
I probably was to generous with ARC ;)
yep seems like the vm was "simply" killed because of an out-of-memory situation
 
  • Like
Reactions: IsThisThingOn
Thanks for your great support!
Awesome how good proxmox support is even for the community subscription.

I thought I knew how ARC works and that your new default is way, way too small. Turns out once again that you pro guys know what you are doing and I suffered from the dunning kruger effect.
 
Thanks for your great support!
Awesome how good proxmox support is even for the community subscription.

thanks for the feedback! always nice to hear that users/customers are happy :)
just fyi: if you have a subscription, you can enter it in your forum account under 'account details' and then you get a 'subscriber' badge here, so we (and others) know that you have a subscription

I thought I knew how ARC works and that your new default is way, way too small. Turns out once again that you pro guys know what you are doing and I suffered from the dunning kruger effect.
no worries, we try to have sane and safe defaults, but of course they don't fit for every use case so experimenting can make sense (ideally at the start when using PVE in a test environment)
 
  • Like
Reactions: IsThisThingOn