VM down

Jun 24, 2021
12
0
6
56
Hi,
I have a PVE server with two VMs.
Tonight at about 23.40, the VM100s shut down and I had to manually reboot.
In the logs I found the following:


Apr 07 23:35:18 pve smartd[2326]: Device: /dev/sdb [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 121 to 122
Apr 07 23:35:18 pve smartd[2326]: Device: /dev/sdc [SAT], SMART Usage Attribute: 190 Airflow_Temperature_Cel changed from 74 to 70
Apr 07 23:35:18 pve smartd[2326]: Device: /dev/sdd [SAT], SMART Usage Attribute: 190 Airflow_Temperature_Cel changed from 74 to 70
Apr 07 23:35:20 pve pmxcfs[2728]: [dcdb] notice: data verification successful
Apr 07 23:42:20 pve kernel: zfs invoked oom-killer: gfp_mask=0x42dc0(GFP_KERNEL|__GFP_NOWARN|__GFP_COMP|__GFP_ZERO), order=2, oom_score_adj=0
Apr 07 23:42:21 pve kernel: CPU: 7 PID: 586461 Comm: zfs Tainted: P IO 5.13.19-2-pve #1
Apr 07 23:42:21 pve kernel: Hardware name: Dell Inc. PowerEdge R440/04JN2K, BIOS 2.9.3 09/23/2020
Apr 07 23:42:21 pve kernel: Call Trace:
Apr 07 23:42:21 pve kernel: dump_stack+0x7d/0x9c
Apr 07 23:42:21 pve kernel: dump_header+0x4f/0x1f6
Apr 07 23:42:21 pve kernel: oom_kill_process.cold+0xb/0x10
Apr 07 23:42:21 pve kernel: out_of_memory+0x1cf/0x530
Apr 07 23:42:21 pve kernel: __alloc_pages_slowpath.constprop.0+0xc96/0xd80
Apr 07 23:42:21 pve kernel: __alloc_pages+0x30e/0x330
Apr 07 23:42:21 pve kernel: kmalloc_large_node+0x45/0xb0
Apr 07 23:42:21 pve kernel: __kmalloc_node+0x276/0x300
Apr 07 23:42:21 pve kernel: spl_kmem_alloc_impl+0xb5/0x100 [spl]
Apr 07 23:42:21 pve kernel: spl_kmem_zalloc+0x19/0x20 [spl]
Apr 07 23:42:21 pve kernel: zfsdev_ioctl+0x2d/0xe0 [zfs]
Apr 07 23:42:21 pve kernel: __x64_sys_ioctl+0x91/0xc0
Apr 07 23:42:21 pve kernel: do_syscall_64+0x61/0xb0
Apr 07 23:42:21 pve kernel: ? asm_exc_page_fault+0x8/0x30
Apr 07 23:42:21 pve kernel: entry_SYSCALL_64_after_hwframe+0x44/0xae
Apr 07 23:42:21 pve kernel: RIP: 0033:0x7f6d2d606cc7
Apr 07 23:42:21 pve kernel: Code: 00 00 00 48 8b 05 c9 91 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 99 91 0c 00 f7 d8 64 89 01 48
Apr 07 23:42:21 pve kernel: RSP: 002b:00007ffe9f31ee58 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
Apr 07 23:42:21 pve kernel: RAX: ffffffffffffffda RBX: 00007ffe9f31ee80 RCX: 00007f6d2d606cc7
Apr 07 23:42:21 pve kernel: RDX: 00007ffe9f31ee80 RSI: 0000000000005a12 RDI: 0000000000000003
Apr 07 23:42:21 pve kernel: RBP: 00007ffe9f31ee70 R08: 00007f6d2cfde010 R09: 0000000000000000
Apr 07 23:42:21 pve kernel: R10: 0000000000000022 R11: 0000000000000246 R12: 000055f388b28320
Apr 07 23:42:21 pve kernel: R13: 00007ffe9f31ee80 R14: 000055f388b28320 R15: 000055f388b2a100
Apr 07 23:42:21 pve kernel: Mem-Info:
Apr 07 23:42:21 pve kernel: active_anon:2200324 inactive_anon:830795 isolated_anon:0 active_file:23764 inactive_file:125399 isolated_file:0 unevictable:39226 dirty:0 writeback:132 slab_reclaimable:97751 slab_unreclaimable:644177 mapped:23610 shmem:17222 pagetables:14714 bounce:0 free:306451 free_pcp:32 free_cma:0
Apr 07 23:42:21 pve kernel: Node 0 active_anon:8801296kB inactive_anon:3323180kB active_file:95056kB inactive_file:501596kB unevictable:156904kB isolated(anon):0kB isolated(file):0kB mapped:94440kB dirty:0kB writeback:528kB shmem:68888kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 43008kB writeback_tmp:0kB kernel_stack:9268kB pagetables:58856kB all_unreclaimable? no
Apr 07 23:42:21 pve kernel: Node 0 DMA free:11264kB min:32kB low:44kB high:56kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15980kB managed:15360kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
Apr 07 23:42:21 pve kernel: lowmem_reserve[]: 0 1409 31528 31528 31528
Apr 07 23:42:21 pve kernel: Node 0 DMA32 free:121960kB min:3020kB low:4460kB high:5900kB reserved_highatomic:0KB active_anon:640012kB inactive_anon:218852kB active_file:428kB inactive_file:72kB unevictable:0kB writepending:0kB present:1566664kB managed:1501128kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
Apr 07 23:42:21 pve kernel: lowmem_reserve[]: 0 0 30119 30119 30119
Apr 07 23:42:21 pve kernel: Node 0 Normal free:1092580kB min:64528kB low:95368kB high:126208kB reserved_highatomic:2048KB active_anon:8161284kB inactive_anon:3104328kB active_file:95356kB inactive_file:501284kB unevictable:156904kB writepending:592kB present:31457280kB managed:30849432kB mlocked:153832kB bounce:0kB free_pcp:128kB local_pcp:0kB free_cma:0kB
Apr 07 23:42:21 pve kernel: lowmem_reserve[]: 0 0 0 0 0
Apr 07 23:42:21 pve kernel: Node 0 DMA: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 1*1024kB (U) 1*2048kB (M) 2*4096kB (M) = 11264kB
Apr 07 23:42:21 pve kernel: Node 0 DMA32: 1904*4kB (UME) 539*8kB (UME) 509*16kB (ME) 548*32kB (UME) 664*64kB (UME) 247*128kB (UM) 38*256kB (UM) 1*512kB (M) 0*1024kB 0*2048kB 0*4096kB = 121960kB
Apr 07 23:42:21 pve kernel: Node 0 Normal: 261649*4kB (UME) 5503*8kB (UEH) 8*16kB (H) 5*32kB (H) 1*64kB (H) 1*128kB (H) 1*256kB (H) 0*512kB 1*1024kB (H) 0*2048kB 0*4096kB = 1092380kB
Apr 07 23:42:21 pve kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
Apr 07 23:42:21 pve kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Apr 07 23:42:21 pve kernel: 170582 total pagecache pages
Apr 07 23:42:21 pve kernel: 0 pages in swap cache
Apr 07 23:42:21 pve kernel: Swap cache stats: add 0, delete 0, find 0/0
Apr 07 23:42:21 pve kernel: Free swap = 0kB
Apr 07 23:42:21 pve kernel: Total swap = 0kB
Apr 07 23:42:21 pve kernel: 8259981 pages RAM
Apr 07 23:42:21 pve kernel: 0 pages HighMem/MovableOnly
Apr 07 23:42:21 pve kernel: 168501 pages reserved
Apr 07 23:42:21 pve kernel: 0 pages hwpoisoned
Apr 07 23:42:21 pve kernel: Tasks state (memory values in pages):
Apr 07 23:42:21 pve kernel: [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name
Apr 07 23:42:21 pve kernel: [ 1131] 0 1131 10073 1415 102400 0 -250 systemd-journal
Apr 07 23:42:21 pve kernel: [ 1165] 0 1165 5766 935 61440 0 -1000 systemd-udevd
Apr 07 23:42:21 pve kernel: [ 2219] 0 2219 2873 132 61440 0 0 iscsid
Apr 07 23:42:21 pve kernel: [ 2220] 0 2220 2999 2958 65536 0 -17 iscsid
Apr 07 23:42:21 pve kernel: [ 2309] 103 2309 1960 777 61440 0 0 rpcbind
Apr 07 23:42:21 pve kernel: [ 2312] 102 2312 2047 900 49152 0 -900 dbus-daemon
Apr 07 23:42:21 pve kernel: [ 2316] 0 2316 37728 427 61440 0 0 lxcfs
Apr 07 23:42:21 pve kernel: [ 2318] 0 2318 272369 707 184320 0 0 pve-lxc-syscall
Apr 07 23:42:21 pve kernel: [ 2324] 0 2324 55185 1142 77824 0 0 rsyslogd
Apr 07 23:42:21 pve kernel: [ 2326] 0 2326 2959 1168 69632 0 0 smartd
Apr 07 23:42:21 pve kernel: [ 2338] 0 2338 1742 524 53248 0 0 ksmtuned
Apr 07 23:42:21 pve kernel: [ 2368] 0 2368 1051 320 40960 0 0 qmeventd
Apr 07 23:42:21 pve kernel: [ 2383] 0 2383 3446 1207 77824 0 0 systemd-logind
Apr 07 23:42:21 pve kernel: [ 2385] 0 2385 543 266 40960 0 -1000 watchdog-mux
Apr 07 23:42:21 pve kernel: [ 2387] 0 2387 59428 1065 77824 0 0 zed
Apr 07 23:42:21 pve kernel: [ 2390] 0 2390 1446 353 49152 0 0 agetty
Apr 07 23:42:21 pve kernel: [ 2418] 0 2418 1137 355 49152 0 0 lxc-monitord
Apr 07 23:42:21 pve kernel: [ 2481] 0 2481 3323 1382 65536 0 -1000 sshd
Apr 07 23:42:21 pve kernel: [ 2514] 101 2514 4743 654 61440 0 0 chronyd
Apr 07 23:42:21 pve kernel: [ 2517] 101 2517 2695 473 61440 0 0 chronyd
Apr 07 23:42:21 pve kernel: [ 2572] 0 2572 200148 730 188416 0 0 rrdcached
Apr 07 23:42:21 pve kernel: [ 2728] 0 2728 211206 16327 479232 0 0 pmxcfs
Apr 07 23:42:21 pve kernel: [ 2791] 0 2791 9997 626 73728 0 0 master
Apr 07 23:42:21 pve kernel: [ 2793] 106 2793 10149 753 69632 0 0 qmgr
Apr 07 23:42:21 pve kernel: [ 2806] 0 2806 140033 41696 409600 0 0 corosync
Apr 07 23:42:21 pve kernel: [ 2807] 0 2807 1671 553 57344 0 0 cron
Apr 07 23:42:21 pve kernel: [ 2844] 0 2844 69893 21859 303104 0 0 pve-firewall
Apr 07 23:42:21 pve kernel: [ 2847] 0 2847 70689 23594 323584 0 0 pvestatd
Apr 07 23:42:21 pve kernel: [ 2849] 0 2849 576 144 40960 0 0 bpfilter_umh
Apr 07 23:42:21 pve kernel: [ 2989] 0 2989 83176 24296 360448 0 0 pvescheduler
Apr 07 23:42:21 pve kernel: [ 3133] 0 3133 88288 30348 409600 0 0 pvedaemon
Apr 07 23:42:21 pve kernel: [ 3145] 0 3145 84579 24407 376832 0 0 pve-ha-crm
Apr 07 23:42:21 pve kernel: [ 3147] 33 3147 88653 32860 430080 0 0 pveproxy
Apr 07 23:42:21 pve kernel: [ 3153] 33 3153 20874 14422 212992 0 0 spiceproxy
Apr 07 23:42:21 pve kernel: [ 3155] 0 3155 84499 24423 368640 0 0 pve-ha-lrm
Apr 07 23:42:21 pve kernel: [ 11908] 0 11908 3688606 1652521 27303936 0 0 kvm
Apr 07 23:42:21 pve kernel: [2815376] 0 2815376 95776 37173 483328 0 0 pvedaemon worke
Apr 07 23:42:21 pve kernel: [2815760] 0 2815760 90898 32631 438272 0 0 pvedaemon worke
Apr 07 23:42:21 pve kernel: [3367198] 0 3367198 96109 37777 487424 0 0 pvedaemon worke
Apr 07 23:42:21 pve kernel: [ 869273] 0 869273 3468891 1760194 24383488 0 0 kvm
Apr 07 23:42:21 pve kernel: [1652431] 33 1652431 20937 13207 208896 0 0 spiceproxy work
Apr 07 23:42:21 pve kernel: [1658440] 33 1658440 91747 33617 434176 0 0 pveproxy worker
Apr 07 23:42:21 pve kernel: [1658441] 33 1658441 88721 31883 405504 0 0 pveproxy worker
Apr 07 23:42:21 pve kernel: [1658442] 33 1658442 91747 33555 434176 0 0 pveproxy worker
Apr 07 23:42:21 pve kernel: [3421696] 0 3421696 20035 448 57344 0 0 pvefw-logger
Apr 07 23:42:21 pve kernel: [3696519] 106 3696519 10064 1347 69632 0 0 pickup
Apr 07 23:42:21 pve kernel: [ 570701] 0 570701 1326 128 49152 0 0 sleep
Apr 07 23:42:21 pve kernel: [ 586461] 0 586461 2156 543 57344 0 0 zfs
Apr 07 23:42:21 pve kernel: oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=pvestatd.service,mems_allowed=0,global_oom,task_memcg=/qemu.slice/100.scope,task=kvm,pid=869273,uid=0
Apr 07 23:42:21 pve kernel: Out of memory: Killed process 869273 (kvm) total-vm:13875564kB, anon-rss:7034732kB, file-rss:6040kB, shmem-rss:4kB, UID:0 pgtables:23812kB oom_score_adj:0
Apr 07 23:42:21 pve systemd[1]: 100.scope: A process of this unit has been killed by the OOM killer.
Apr 07 23:42:21 pve kernel: oom_reaper: reaped process 869273 (kvm), now anon-rss:0kB, file-rss:36kB, shmem-rss:4kB
Apr 07 23:42:23 pve kernel: zd16: p1 p2
Apr 07 23:42:23 pve kernel: vmbr0: port 2(tap100i0) entered disabled state
Apr 07 23:42:23 pve kernel: vmbr0: port 2(tap100i0) entered disabled state
Apr 07 23:42:23 pve kernel: vmbr1: port 3(tap100i1) entered disabled state
Apr 07 23:42:23 pve kernel: vmbr1: port 3(tap100i1) entered disabled state
Apr 07 23:42:24 pve systemd[1]: Stopping LVM event activation on device 230:18...
Apr 07 23:42:24 pve kernel: vmbr4: port 2(tap100i2) entered disabled state
Apr 07 23:42:24 pve kernel: vmbr4: port 2(tap100i2) entered disabled state
Apr 07 23:42:24 pve kernel: vmbr3: port 2(tap100i3) entered disabled state
Apr 07 23:42:24 pve kernel: vmbr3: port 2(tap100i3) entered disabled state
Apr 07 23:42:24 pve kernel: vmbr5: port 2(tap100i4) entered disabled state
Apr 07 23:42:24 pve kernel: vmbr5: port 2(tap100i4) entered disabled state
Apr 07 23:42:24 pve systemd[1]: 100.scope: Succeeded.
Apr 07 23:42:24 pve systemd[1]: 100.scope: Consumed 2w 1d 6h 28min 5.244s CPU time.
Apr 07 23:42:24 pve lvm[589251]: pvscan[589251] /dev/zd16p2 excluded by filters: device is rejected by filter config.
Apr 07 23:42:24 pve systemd[1]: lvm2-pvscan@230:18.service: Succeeded.
Apr 07 23:42:24 pve systemd[1]: Stopped LVM event activation on device 230:18.
Apr 07 23:42:25 pve qmeventd[589228]: Starting cleanup for 100
Apr 07 23:42:25 pve qmeventd[589228]: Finished cleanup for 100
Apr 07 23:44:09 pve postfix/qmgr[2793]: D9F871B497: from=<root@pve.locandarossa.com>, size=9098, nrcpt=1 (queue active)
Apr 07 23:44:15 pve postfix/smtp[593262]: D9F871B497: to=<allarmi@fanscomputer.it>, relay=none, delay=164418, delays=164412/0.09/6.1/0, dsn=4.4.3, status=deferred (Host or domain name not found. Name service error for name=fanscomputer.it type=MX: Host not found, try again)


Can you help me to interpret the log?
 
zfs invoked oom-killer and kernel: Out of memory: Killed process 869273 (kvm) gives me the impression that the ZFS filesystem cache and the memory for the VMs was too large to fit and VM 100 (which was using 13.2 GiB) was chosen to be killed to free memory.
Also note that Proxmox is trying to email fanscomputer.it about something but cannot find the domain.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!