VM gets killed every week

UHL

New Member
Mar 15, 2022
5
0
1
39
Hi,

I have 3 HA servers each with 64GB RAM, one of the nodes, runs a KVM that keeps crashing every week, it is quite random how it occurs.

Each server have 1GB Swap only that is full most of all the times. Now upon reading some articles and here in forum I disabled temporary the swap. Made vm.swappiness from 60 to 0.

Now in my fstab I got this:
How can I increase existing swap, I do not see a location where current swap resides from fstab

Code:
UUID="dcb41b35-ad9c-4285-81ff-f5db6f1c1477" / ext4 defaults 0 0                              UUID="a9d2b55e-4eb5-45cb-8cfb-33d30a6dc1fe" swap swap defaults 0 0                           UUID="100cb204-d80f-41fa-ba6a-1f84c77291a8" /var/lib/vz ext4 defaults 0 0                    LABEL=EFI_SYSPART /boot/efi vfat defaults 0 0

How can I safely avoid my VMs beying killed as in:

Bash:
Mar 14 07:51:24 n03-sxb-pve01 pvestatd[1414]: got timeout
Mar 14 07:51:30 n03-sxb-pve01 pvestatd[1414]: got timeout
Mar 14 07:51:35 n03-sxb-pve01 pvestatd[1414]: got timeout
Mar 14 07:51:55 n03-sxb-pve01 ceph-mon[2564337]: 2022-03-14T07:51:55.399+0100 7f282b163700 -1 mon.n03-sxb-pve01@1(peon) e6 get_health_metrics reporting 1 slow ops, oldest is mgrbeacon mgr.n01-sxb-pve01(4a5d5cdc-fc64-4ac2-8e14-6cae6ade627a,149366128, , 0)
Mar 14 07:52:01 n03-sxb-pve01 systemd[1]: Starting Proxmox VE replication runner...
Mar 14 07:52:05 n03-sxb-pve01 pvestatd[1414]: status update time (46.405 seconds)
Mar 14 07:52:05 n03-sxb-pve01 kernel: libceph: osd0 (1)172.17.1.3:6819 socket closed (con state OPEN)
Mar 14 07:52:05 n03-sxb-pve01 kernel: libceph: osd0 (1)172.17.1.3:6819 socket closed (con state OPEN)
Mar 14 07:52:06 n03-sxb-pve01 ceph-mon[2564337]: 2022-03-14T07:52:06.774+0100 7f282b163700 -1 mon.n03-sxb-pve01@1(peon) e6 get_health_metrics reporting 1 slow ops, oldest is mgrbeacon mgr.n03-sxb-pve01(4a5d5cdc-fc64-4ac2-8e14-6cae6ade627a,147684202, , 0)
Mar 14 07:52:07 n03-sxb-pve01 pvestatd[1414]: got timeout
Mar 14 07:52:08 n03-sxb-pve01 kernel: ms_dispatch invoked oom-killer: gfp_mask=0x100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0
Mar 14 07:52:08 n03-sxb-pve01 kernel: CPU: 0 PID: 2564379 Comm: ms_dispatch Tainted: P           O      5.4.114-1-pve #1
Mar 14 07:52:08 n03-sxb-pve01 kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./E3C246D4U2-2T, BIOS L2.02K 12/18/2019
Mar 14 07:52:08 n03-sxb-pve01 kernel: Call Trace:
Mar 14 07:52:08 n03-sxb-pve01 kernel:  dump_stack+0x6d/0x8b
Mar 14 07:52:08 n03-sxb-pve01 kernel:  dump_header+0x4f/0x1e1
Mar 14 07:52:08 n03-sxb-pve01 kernel:  oom_kill_process.cold.33+0xb/0x10
Mar 14 07:52:08 n03-sxb-pve01 kernel:  out_of_memory+0x1ad/0x490
Mar 14 07:52:08 n03-sxb-pve01 kernel:  __alloc_pages_slowpath+0xd40/0xe30
Mar 14 07:52:08 n03-sxb-pve01 kernel:  ? __switch_to_asm+0x34/0x70
Mar 14 07:52:08 n03-sxb-pve01 kernel:  __alloc_pages_nodemask+0x2df/0x330
Mar 14 07:52:08 n03-sxb-pve01 kernel:  alloc_pages_current+0x81/0xe0
Mar 14 07:52:08 n03-sxb-pve01 kernel:  __page_cache_alloc+0x6a/0xa0
Mar 14 07:52:08 n03-sxb-pve01 kernel:  pagecache_get_page+0xbe/0x2e0
Mar 14 07:52:08 n03-sxb-pve01 kernel:  filemap_fault+0x783/0xa70
Mar 14 07:52:08 n03-sxb-pve01 kernel:  ? unlock_page_memcg+0x12/0x20
Mar 14 07:52:08 n03-sxb-pve01 kernel:  ? page_add_file_rmap+0x131/0x190
Mar 14 07:52:08 n03-sxb-pve01 kernel:  ? filemap_map_pages+0x28d/0x3b0
Mar 14 07:52:08 n03-sxb-pve01 kernel:  ext4_filemap_fault+0x31/0x50
Mar 14 07:52:08 n03-sxb-pve01 kernel:  __do_fault+0x3c/0x130
Mar 14 07:52:08 n03-sxb-pve01 kernel:  __handle_mm_fault+0xe73/0x1290
Mar 14 07:52:08 n03-sxb-pve01 kernel:  handle_mm_fault+0xc9/0x1f0
Mar 14 07:52:08 n03-sxb-pve01 kernel:  __do_page_fault+0x233/0x4c0
Mar 14 07:52:08 n03-sxb-pve01 kernel:  ? kvm_on_user_return+0x6f/0xa0 [kvm]
Mar 14 07:52:08 n03-sxb-pve01 kernel:  do_page_fault+0x2c/0xe0
Mar 14 07:52:08 n03-sxb-pve01 kernel:  page_fault+0x34/0x40
Mar 14 07:52:08 n03-sxb-pve01 kernel: RIP: 0033:0x7f283153fb00
Mar 14 07:52:08 n03-sxb-pve01 kernel: Code: Bad RIP value.
Mar 14 07:52:08 n03-sxb-pve01 kernel: RSP: 002b:00007f2828959e38 EFLAGS: 00010246
Mar 14 07:52:08 n03-sxb-pve01 kernel: RAX: 00007f282895a09f RBX: 000055ae657e89c8 RCX: 000055ae677543e8
Mar 14 07:52:08 n03-sxb-pve01 kernel: RDX: 0000000000000005 RSI: 000000000000ffff RDI: 00007f282895a09f
Mar 14 07:52:08 n03-sxb-pve01 kernel: RBP: 00007f282895a160 R08: 0000000000000003 R09: 0000000000000000
Mar 14 07:52:08 n03-sxb-pve01 kernel: R10: 00000000622ee618 R11: 00007ffc603ad080 R12: 00007f282895a310
Mar 14 07:52:08 n03-sxb-pve01 kernel: R13: 00007f282895a2c0 R14: 00007f282895a420 R15: 000000000bb85990
Mar 14 07:52:08 n03-sxb-pve01 kernel: Mem-Info:
Mar 14 07:52:08 n03-sxb-pve01 kernel: active_anon:13728851 inactive_anon:1934415 isolated_anon:0
 active_file:305 inactive_file:400 isolated_file:32
 unevictable:37141 dirty:0 writeback:0 unstable:0
 slab_reclaimable:33928 slab_unreclaimable:315332
 mapped:46431 shmem:174016 pagetables:41293 bounce:0
 free:197235 free_pcp:2764 free_cma:0
Mar 14 07:52:08 n03-sxb-pve01 kernel: Node 0 active_anon:54915404kB inactive_anon:7737660kB active_file:976kB inactive_file:1160kB unevictable:148564kB isolated(anon):0kB isolated(file):136kB mapped:185592kB dirty:0kB writeback:0kB shmem:696064kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 5681152kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
Mar 14 07:52:08 n03-sxb-pve01 kernel: Node 0 DMA free:15872kB min:124kB low:152kB high:180kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15992kB managed:15888kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
Mar 14 07:52:08 n03-sxb-pve01 kernel: lowmem_reserve[]: 0 1663 63991 63991 63991
Mar 14 07:52:08 n03-sxb-pve01 kernel: Node 0 DMA32 free:262852kB min:13992kB low:17488kB high:20984kB active_anon:1288864kB inactive_anon:177560kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:1818040kB managed:1751088kB mlocked:0kB kernel_stack:16kB pagetables:2040kB bounce:0kB free_pcp:3336kB local_pcp:108kB free_cma:0kB
Mar 14 07:52:08 n03-sxb-pve01 kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
Mar 14 07:52:08 n03-sxb-pve01 kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Mar 14 07:52:08 n03-sxb-pve01 kernel: 202886 total pagecache pages
Mar 14 07:52:08 n03-sxb-pve01 kernel: 25258 pages in swap cache
Mar 14 07:52:08 n03-sxb-pve01 kernel: Swap cache stats: add 1664493, delete 1639212, find 618428448/618802672
Mar 14 07:52:08 n03-sxb-pve01 kernel: Free swap  = 0kB
Mar 14 07:52:08 n03-sxb-pve01 kernel: Total swap = 1047548kB
Mar 14 07:52:08 n03-sxb-pve01 kernel: 16709388 pages RAM
Mar 14 07:52:08 n03-sxb-pve01 kernel: 0 pages HighMem/MovableOnly
Mar 14 07:52:08 n03-sxb-pve01 kernel: 309792 pages reserved
Mar 14 07:52:08 n03-sxb-pve01 kernel: 0 pages cma reserved
Mar 14 07:52:08 n03-sxb-pve01 kernel: 0 pages hwpoisoned
Mar 14 07:52:08 n03-sxb-pve01 kernel: Tasks state (memory values in pages):
Mar 14 07:52:08 n03-sxb-pve01 kernel: [  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
Mar 14 07:52:08 n03-sxb-pve01 kernel: [    693]   110   693     1752      461    57344       52             0 rpcbind
Mar 14 07:52:08 n03-sxb-pve01 kernel: [    836]   105   836      628      428    40960        1             0 nscd
Mar 14 07:52:08 n03-sxb-pve01 kernel: [    837]     0   837    37717      328    65536       39             0 lxcfs
Mar 14 07:52:08 n03-sxb-pve01 kernel: [    838]     0   838     1722       39    53248       22             0 iscsid
Mar 14 07:52:08 n03-sxb-pve01 kernel: [    839]     0   839     1848     1241    53248        0           -17 iscsid
Mar 14 07:52:08 n03-sxb-pve01 kernel: [    841]   104   841     2217      496    61440       27          -900 dbus-daemon
Mar 14 07:52:08 n03-sxb-pve01 kernel: [    845]     0   845     4858      670    73728       85             0 systemd-logind
Mar 14 07:52:08 n03-sxb-pve01 kernel: [    850]     0   850    56454      451    90112       29             0 rsyslogd
Mar 14 07:52:08 n03-sxb-pve01 kernel: [    855]     0   855     4689      895    81920      864             0 ceph-crash
Mar 14 07:52:08 n03-sxb-pve01 kernel: [    863]     0   863      568      170    45056       19             0 none
Mar 14 07:52:08 n03-sxb-pve01 kernel: [    868]     0   868      535      306    36864        7         -1000 watchdog-mux
Mar 14 07:52:08 n03-sxb-pve01 kernel: [    879]     0   879    68958       13    86016       38             0 pve-lxc-syscall
Mar 14 07:52:08 n03-sxb-pve01 kernel: [    890]     0   890     3128      838    57344       93             0 smartd
Mar 14 07:52:08 n03-sxb-pve01 kernel: [    891]     0   891     1022      344    45056        1             0 qmeventd
Mar 14 07:52:08 n03-sxb-pve01 kernel: [    898]     0   898     1677      183    57344       13             0 ksmtuned
Mar 14 07:52:08 n03-sxb-pve01 kernel: [    903]     0   903     3941      679    69632       83         -1000 sshd
Mar 14 07:52:08 n03-sxb-pve01 kernel: [    907]     0   907     1823      119    53248       61             0 lxc-monitord
Mar 14 07:52:08 n03-sxb-pve01 kernel: [    933]     0   933      640      109    40960       19             0 agetty
Mar 14 07:52:08 n03-sxb-pve01 kernel: [    952]     0   952   146357      406   172032       48             0 rrdcached
Mar 14 07:52:08 n03-sxb-pve01 kernel: [    975]     0   975   191990    16614   458752        0             0 pmxcfs
Mar 14 07:52:08 n03-sxb-pve01 kernel: [   1185]     0  1185    10867      386    81920      179             0 master
Mar 14 07:52:08 n03-sxb-pve01 kernel: [   1187]   109  1187    10884      525    77824       94             0 qmgr
Mar 14 07:52:08 n03-sxb-pve01 kernel: [   1200]     0  1200     2113      443    49152       21             0 cron
Mar 14 07:52:08 n03-sxb-pve01 kernel: [   1401]     0  1401    69064    12270   307200     9297             0 pve-firewall
Mar 14 07:52:08 n03-sxb-pve01 kernel: [   1414]     0  1414    76534    28652   385024     2415             0 pvestatd
Mar 14 07:52:08 n03-sxb-pve01 kernel: [   1560]     0  1560    89252    24507   442368     6369             0 pvedaemon
Mar 14 07:52:08 n03-sxb-pve01 kernel: [   1568]     0  1568    84895     3056   372736    21359             0 pve-ha-crm
Mar 14 07:52:08 n03-sxb-pve01 kernel: [   1570]    33  1570    89001    30887   442368        0             0 pveproxy
Mar 14 07:52:08 n03-sxb-pve01 kernel: [   1576]    33  1576    17632    12495   184320        0             0 spiceproxy
Mar 14 07:52:08 n03-sxb-pve01 kernel: [   1578]     0  1578    84796    12368   385024    11825             0 pve-ha-lrm
Mar 14 07:52:08 n03-sxb-pve01 kernel: [   1845]     0  1845   971105   355138  4104192    73858             0 kvm
Mar 14 07:52:08 n03-sxb-pve01 kernel: [ 129258]   109 129258    10925      459    77824      154             0 tlsmgr
Mar 14 07:52:08 n03-sxb-pve01 kernel: [ 176722]     0 176722  3098814  2235837 19984384    89952             0 kvm
Mar 14 07:52:08 n03-sxb-pve01 kernel: [2718225]     0 2718225     2362      475    49152       45             0 cron
Mar 14 07:52:08 n03-sxb-pve01 kernel: [2718226]     0 2718226      596      142    45056        5             0 sh
Mar 14 07:52:08 n03-sxb-pve01 kernel: [2718227]     0 2718227     6986     2928    90112      228             0 python3
Mar 14 07:52:08 n03-sxb-pve01 kernel: [1558940]   101 1558940    23270      352    90112        8             0 systemd-timesyn
Mar 14 07:52:08 n03-sxb-pve01 kernel: [1558948]     0 1558948    76940    52504   638976      449             0 systemd-journal
Mar 14 07:52:08 n03-sxb-pve01 kernel: [1559278]     0 1559278     5653      617    61440       27         -1000 systemd-udevd
Mar 14 07:52:08 n03-sxb-pve01 kernel: [1304259]     0 1304259  2621177  2195869 18841600    27943             0 kvm
Mar 14 07:52:08 n03-sxb-pve01 kernel: [ 113012]     0 113012  2118960  1053347 10506240    32986             0 kvm
Mar 14 07:52:08 n03-sxb-pve01 kernel: [2564075] 64045 2564075    99017     5575   258048      657             0 ceph-mds
Mar 14 07:52:08 n03-sxb-pve01 kernel: [2564337] 64045 2564337   417262   265104  2797568      593             0 ceph-mon
Mar 14 07:52:08 n03-sxb-pve01 kernel: [2564476] 64045 2564476   126618    34096   548864     4535             0 ceph-mgr
Mar 14 07:52:08 n03-sxb-pve01 kernel: [2564542] 64045 2564542  1390627  1041218 10100736      743             0 ceph-osd
Mar 14 07:52:08 n03-sxb-pve01 kernel: [2564857] 64045 2564857  1208145   871591  8744960     1238             0 ceph-osd
Mar 14 07:52:08 n03-sxb-pve01 kernel: [2565076] 64045 2565076  1167600   852324  8433664      388             0 ceph-osd
Mar 14 07:52:08 n03-sxb-pve01 kernel: [2648460]     0 2648460   140110    41262   401408        0             0 corosync
Mar 14 07:52:08 n03-sxb-pve01 kernel: [2648634]     0 2648634    25163      511    77824        0             0 zed
Mar 14 07:52:08 n03-sxb-pve01 kernel: [ 556592]     0 556592    91949    27034   442368     4952             0 pvedaemon worke
Mar 14 07:52:08 n03-sxb-pve01 kernel: [ 562234]     0 562234    91939    26771   442368     5043             0 pvedaemon worke
Mar 14 07:52:08 n03-sxb-pve01 kernel: [4178836]     0 4178836 10501700  9535421 77312000     1089             0 kvm
Mar 14 07:52:08 n03-sxb-pve01 kernel: [3344157]     0 3344157    91370    25231   434176     5564             0 pvedaemon worke
Mar 14 07:52:08 n03-sxb-pve01 kernel: [4122985]    33 4122985    19639    12949   192512        0             0 spiceproxy work
Mar 14 07:52:08 n03-sxb-pve01 kernel: [4122987]     0 4122987    21543      146    65536        0             0 pvefw-logger
Mar 14 07:52:08 n03-sxb-pve01 kernel: [4122996]    33 4122996    92045    32330   446464        0             0 pveproxy worker
Mar 14 07:52:08 n03-sxb-pve01 kernel: [4122997]    33 4122997    92046    32020   446464        0             0 pveproxy worker
Mar 14 07:52:08 n03-sxb-pve01 kernel: [4122998]    33 4122998    92045    31924   446464        0             0 pveproxy worker
Mar 14 07:52:08 n03-sxb-pve01 kernel: [ 158515]   109 158515    10856      376    86016        0             0 pickup
Mar 14 07:52:08 n03-sxb-pve01 kernel: [ 172233]     0 172233     1305       86    49152        0             0 sleep
Mar 14 07:52:08 n03-sxb-pve01 kernel: [ 172355]     0 172355    53789    10760   196608        0             0 pvesr
Mar 14 07:52:08 n03-sxb-pve01 kernel: oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/qemu.slice/117.scope,task=kvm,pid=4178836,uid=0
Mar 14 07:52:08 n03-sxb-pve01 kernel: Out of memory: Killed process 4178836 (kvm) total-vm:42006800kB, anon-rss:38141680kB, file-rss:0kB, shmem-rss:4kB, UID:0 pgtables:75500kB oom_score_adj:0

Thank you
 
hi,

Code:
kernel: ms_dispatch invoked oom-killer: gfp_mask=0x100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0

looks like the OOM killer is being called because you're running out of memory on your nodes?
Now in my fstab I got this:
How can I increase existing swap, I do not see a location where current swap resides from fstab
you can check the output from swapon to see the enabled swap devices.

you can also check the UUID of your swap device in the fstab (UUID="a9d2b55e-4eb5-45cb-8cfb-33d30a6dc1fe" swap swap defaults 0 0):
Code:
ls -al /dev/disk/by-uuid/ | grep a9d2b55e

to increase the available swap you can create a new swap partition and use the swapon command
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!