Hi,
I have 3 HA servers each with 64GB RAM, one of the nodes, runs a KVM that keeps crashing every week, it is quite random how it occurs.
Each server have 1GB Swap only that is full most of all the times. Now upon reading some articles and here in forum I disabled temporary the swap. Made vm.swappiness from 60 to 0.
Now in my fstab I got this:
How can I increase existing swap, I do not see a location where current swap resides from fstab
How can I safely avoid my VMs beying killed as in:
Thank you
I have 3 HA servers each with 64GB RAM, one of the nodes, runs a KVM that keeps crashing every week, it is quite random how it occurs.
Each server have 1GB Swap only that is full most of all the times. Now upon reading some articles and here in forum I disabled temporary the swap. Made vm.swappiness from 60 to 0.
Now in my fstab I got this:
How can I increase existing swap, I do not see a location where current swap resides from fstab
Code:
UUID="dcb41b35-ad9c-4285-81ff-f5db6f1c1477" / ext4 defaults 0 0 UUID="a9d2b55e-4eb5-45cb-8cfb-33d30a6dc1fe" swap swap defaults 0 0 UUID="100cb204-d80f-41fa-ba6a-1f84c77291a8" /var/lib/vz ext4 defaults 0 0 LABEL=EFI_SYSPART /boot/efi vfat defaults 0 0
How can I safely avoid my VMs beying killed as in:
Bash:
Mar 14 07:51:24 n03-sxb-pve01 pvestatd[1414]: got timeout
Mar 14 07:51:30 n03-sxb-pve01 pvestatd[1414]: got timeout
Mar 14 07:51:35 n03-sxb-pve01 pvestatd[1414]: got timeout
Mar 14 07:51:55 n03-sxb-pve01 ceph-mon[2564337]: 2022-03-14T07:51:55.399+0100 7f282b163700 -1 mon.n03-sxb-pve01@1(peon) e6 get_health_metrics reporting 1 slow ops, oldest is mgrbeacon mgr.n01-sxb-pve01(4a5d5cdc-fc64-4ac2-8e14-6cae6ade627a,149366128, , 0)
Mar 14 07:52:01 n03-sxb-pve01 systemd[1]: Starting Proxmox VE replication runner...
Mar 14 07:52:05 n03-sxb-pve01 pvestatd[1414]: status update time (46.405 seconds)
Mar 14 07:52:05 n03-sxb-pve01 kernel: libceph: osd0 (1)172.17.1.3:6819 socket closed (con state OPEN)
Mar 14 07:52:05 n03-sxb-pve01 kernel: libceph: osd0 (1)172.17.1.3:6819 socket closed (con state OPEN)
Mar 14 07:52:06 n03-sxb-pve01 ceph-mon[2564337]: 2022-03-14T07:52:06.774+0100 7f282b163700 -1 mon.n03-sxb-pve01@1(peon) e6 get_health_metrics reporting 1 slow ops, oldest is mgrbeacon mgr.n03-sxb-pve01(4a5d5cdc-fc64-4ac2-8e14-6cae6ade627a,147684202, , 0)
Mar 14 07:52:07 n03-sxb-pve01 pvestatd[1414]: got timeout
Mar 14 07:52:08 n03-sxb-pve01 kernel: ms_dispatch invoked oom-killer: gfp_mask=0x100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0
Mar 14 07:52:08 n03-sxb-pve01 kernel: CPU: 0 PID: 2564379 Comm: ms_dispatch Tainted: P O 5.4.114-1-pve #1
Mar 14 07:52:08 n03-sxb-pve01 kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./E3C246D4U2-2T, BIOS L2.02K 12/18/2019
Mar 14 07:52:08 n03-sxb-pve01 kernel: Call Trace:
Mar 14 07:52:08 n03-sxb-pve01 kernel: dump_stack+0x6d/0x8b
Mar 14 07:52:08 n03-sxb-pve01 kernel: dump_header+0x4f/0x1e1
Mar 14 07:52:08 n03-sxb-pve01 kernel: oom_kill_process.cold.33+0xb/0x10
Mar 14 07:52:08 n03-sxb-pve01 kernel: out_of_memory+0x1ad/0x490
Mar 14 07:52:08 n03-sxb-pve01 kernel: __alloc_pages_slowpath+0xd40/0xe30
Mar 14 07:52:08 n03-sxb-pve01 kernel: ? __switch_to_asm+0x34/0x70
Mar 14 07:52:08 n03-sxb-pve01 kernel: __alloc_pages_nodemask+0x2df/0x330
Mar 14 07:52:08 n03-sxb-pve01 kernel: alloc_pages_current+0x81/0xe0
Mar 14 07:52:08 n03-sxb-pve01 kernel: __page_cache_alloc+0x6a/0xa0
Mar 14 07:52:08 n03-sxb-pve01 kernel: pagecache_get_page+0xbe/0x2e0
Mar 14 07:52:08 n03-sxb-pve01 kernel: filemap_fault+0x783/0xa70
Mar 14 07:52:08 n03-sxb-pve01 kernel: ? unlock_page_memcg+0x12/0x20
Mar 14 07:52:08 n03-sxb-pve01 kernel: ? page_add_file_rmap+0x131/0x190
Mar 14 07:52:08 n03-sxb-pve01 kernel: ? filemap_map_pages+0x28d/0x3b0
Mar 14 07:52:08 n03-sxb-pve01 kernel: ext4_filemap_fault+0x31/0x50
Mar 14 07:52:08 n03-sxb-pve01 kernel: __do_fault+0x3c/0x130
Mar 14 07:52:08 n03-sxb-pve01 kernel: __handle_mm_fault+0xe73/0x1290
Mar 14 07:52:08 n03-sxb-pve01 kernel: handle_mm_fault+0xc9/0x1f0
Mar 14 07:52:08 n03-sxb-pve01 kernel: __do_page_fault+0x233/0x4c0
Mar 14 07:52:08 n03-sxb-pve01 kernel: ? kvm_on_user_return+0x6f/0xa0 [kvm]
Mar 14 07:52:08 n03-sxb-pve01 kernel: do_page_fault+0x2c/0xe0
Mar 14 07:52:08 n03-sxb-pve01 kernel: page_fault+0x34/0x40
Mar 14 07:52:08 n03-sxb-pve01 kernel: RIP: 0033:0x7f283153fb00
Mar 14 07:52:08 n03-sxb-pve01 kernel: Code: Bad RIP value.
Mar 14 07:52:08 n03-sxb-pve01 kernel: RSP: 002b:00007f2828959e38 EFLAGS: 00010246
Mar 14 07:52:08 n03-sxb-pve01 kernel: RAX: 00007f282895a09f RBX: 000055ae657e89c8 RCX: 000055ae677543e8
Mar 14 07:52:08 n03-sxb-pve01 kernel: RDX: 0000000000000005 RSI: 000000000000ffff RDI: 00007f282895a09f
Mar 14 07:52:08 n03-sxb-pve01 kernel: RBP: 00007f282895a160 R08: 0000000000000003 R09: 0000000000000000
Mar 14 07:52:08 n03-sxb-pve01 kernel: R10: 00000000622ee618 R11: 00007ffc603ad080 R12: 00007f282895a310
Mar 14 07:52:08 n03-sxb-pve01 kernel: R13: 00007f282895a2c0 R14: 00007f282895a420 R15: 000000000bb85990
Mar 14 07:52:08 n03-sxb-pve01 kernel: Mem-Info:
Mar 14 07:52:08 n03-sxb-pve01 kernel: active_anon:13728851 inactive_anon:1934415 isolated_anon:0
active_file:305 inactive_file:400 isolated_file:32
unevictable:37141 dirty:0 writeback:0 unstable:0
slab_reclaimable:33928 slab_unreclaimable:315332
mapped:46431 shmem:174016 pagetables:41293 bounce:0
free:197235 free_pcp:2764 free_cma:0
Mar 14 07:52:08 n03-sxb-pve01 kernel: Node 0 active_anon:54915404kB inactive_anon:7737660kB active_file:976kB inactive_file:1160kB unevictable:148564kB isolated(anon):0kB isolated(file):136kB mapped:185592kB dirty:0kB writeback:0kB shmem:696064kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 5681152kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
Mar 14 07:52:08 n03-sxb-pve01 kernel: Node 0 DMA free:15872kB min:124kB low:152kB high:180kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15992kB managed:15888kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
Mar 14 07:52:08 n03-sxb-pve01 kernel: lowmem_reserve[]: 0 1663 63991 63991 63991
Mar 14 07:52:08 n03-sxb-pve01 kernel: Node 0 DMA32 free:262852kB min:13992kB low:17488kB high:20984kB active_anon:1288864kB inactive_anon:177560kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:1818040kB managed:1751088kB mlocked:0kB kernel_stack:16kB pagetables:2040kB bounce:0kB free_pcp:3336kB local_pcp:108kB free_cma:0kB
Mar 14 07:52:08 n03-sxb-pve01 kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
Mar 14 07:52:08 n03-sxb-pve01 kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Mar 14 07:52:08 n03-sxb-pve01 kernel: 202886 total pagecache pages
Mar 14 07:52:08 n03-sxb-pve01 kernel: 25258 pages in swap cache
Mar 14 07:52:08 n03-sxb-pve01 kernel: Swap cache stats: add 1664493, delete 1639212, find 618428448/618802672
Mar 14 07:52:08 n03-sxb-pve01 kernel: Free swap = 0kB
Mar 14 07:52:08 n03-sxb-pve01 kernel: Total swap = 1047548kB
Mar 14 07:52:08 n03-sxb-pve01 kernel: 16709388 pages RAM
Mar 14 07:52:08 n03-sxb-pve01 kernel: 0 pages HighMem/MovableOnly
Mar 14 07:52:08 n03-sxb-pve01 kernel: 309792 pages reserved
Mar 14 07:52:08 n03-sxb-pve01 kernel: 0 pages cma reserved
Mar 14 07:52:08 n03-sxb-pve01 kernel: 0 pages hwpoisoned
Mar 14 07:52:08 n03-sxb-pve01 kernel: Tasks state (memory values in pages):
Mar 14 07:52:08 n03-sxb-pve01 kernel: [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name
Mar 14 07:52:08 n03-sxb-pve01 kernel: [ 693] 110 693 1752 461 57344 52 0 rpcbind
Mar 14 07:52:08 n03-sxb-pve01 kernel: [ 836] 105 836 628 428 40960 1 0 nscd
Mar 14 07:52:08 n03-sxb-pve01 kernel: [ 837] 0 837 37717 328 65536 39 0 lxcfs
Mar 14 07:52:08 n03-sxb-pve01 kernel: [ 838] 0 838 1722 39 53248 22 0 iscsid
Mar 14 07:52:08 n03-sxb-pve01 kernel: [ 839] 0 839 1848 1241 53248 0 -17 iscsid
Mar 14 07:52:08 n03-sxb-pve01 kernel: [ 841] 104 841 2217 496 61440 27 -900 dbus-daemon
Mar 14 07:52:08 n03-sxb-pve01 kernel: [ 845] 0 845 4858 670 73728 85 0 systemd-logind
Mar 14 07:52:08 n03-sxb-pve01 kernel: [ 850] 0 850 56454 451 90112 29 0 rsyslogd
Mar 14 07:52:08 n03-sxb-pve01 kernel: [ 855] 0 855 4689 895 81920 864 0 ceph-crash
Mar 14 07:52:08 n03-sxb-pve01 kernel: [ 863] 0 863 568 170 45056 19 0 none
Mar 14 07:52:08 n03-sxb-pve01 kernel: [ 868] 0 868 535 306 36864 7 -1000 watchdog-mux
Mar 14 07:52:08 n03-sxb-pve01 kernel: [ 879] 0 879 68958 13 86016 38 0 pve-lxc-syscall
Mar 14 07:52:08 n03-sxb-pve01 kernel: [ 890] 0 890 3128 838 57344 93 0 smartd
Mar 14 07:52:08 n03-sxb-pve01 kernel: [ 891] 0 891 1022 344 45056 1 0 qmeventd
Mar 14 07:52:08 n03-sxb-pve01 kernel: [ 898] 0 898 1677 183 57344 13 0 ksmtuned
Mar 14 07:52:08 n03-sxb-pve01 kernel: [ 903] 0 903 3941 679 69632 83 -1000 sshd
Mar 14 07:52:08 n03-sxb-pve01 kernel: [ 907] 0 907 1823 119 53248 61 0 lxc-monitord
Mar 14 07:52:08 n03-sxb-pve01 kernel: [ 933] 0 933 640 109 40960 19 0 agetty
Mar 14 07:52:08 n03-sxb-pve01 kernel: [ 952] 0 952 146357 406 172032 48 0 rrdcached
Mar 14 07:52:08 n03-sxb-pve01 kernel: [ 975] 0 975 191990 16614 458752 0 0 pmxcfs
Mar 14 07:52:08 n03-sxb-pve01 kernel: [ 1185] 0 1185 10867 386 81920 179 0 master
Mar 14 07:52:08 n03-sxb-pve01 kernel: [ 1187] 109 1187 10884 525 77824 94 0 qmgr
Mar 14 07:52:08 n03-sxb-pve01 kernel: [ 1200] 0 1200 2113 443 49152 21 0 cron
Mar 14 07:52:08 n03-sxb-pve01 kernel: [ 1401] 0 1401 69064 12270 307200 9297 0 pve-firewall
Mar 14 07:52:08 n03-sxb-pve01 kernel: [ 1414] 0 1414 76534 28652 385024 2415 0 pvestatd
Mar 14 07:52:08 n03-sxb-pve01 kernel: [ 1560] 0 1560 89252 24507 442368 6369 0 pvedaemon
Mar 14 07:52:08 n03-sxb-pve01 kernel: [ 1568] 0 1568 84895 3056 372736 21359 0 pve-ha-crm
Mar 14 07:52:08 n03-sxb-pve01 kernel: [ 1570] 33 1570 89001 30887 442368 0 0 pveproxy
Mar 14 07:52:08 n03-sxb-pve01 kernel: [ 1576] 33 1576 17632 12495 184320 0 0 spiceproxy
Mar 14 07:52:08 n03-sxb-pve01 kernel: [ 1578] 0 1578 84796 12368 385024 11825 0 pve-ha-lrm
Mar 14 07:52:08 n03-sxb-pve01 kernel: [ 1845] 0 1845 971105 355138 4104192 73858 0 kvm
Mar 14 07:52:08 n03-sxb-pve01 kernel: [ 129258] 109 129258 10925 459 77824 154 0 tlsmgr
Mar 14 07:52:08 n03-sxb-pve01 kernel: [ 176722] 0 176722 3098814 2235837 19984384 89952 0 kvm
Mar 14 07:52:08 n03-sxb-pve01 kernel: [2718225] 0 2718225 2362 475 49152 45 0 cron
Mar 14 07:52:08 n03-sxb-pve01 kernel: [2718226] 0 2718226 596 142 45056 5 0 sh
Mar 14 07:52:08 n03-sxb-pve01 kernel: [2718227] 0 2718227 6986 2928 90112 228 0 python3
Mar 14 07:52:08 n03-sxb-pve01 kernel: [1558940] 101 1558940 23270 352 90112 8 0 systemd-timesyn
Mar 14 07:52:08 n03-sxb-pve01 kernel: [1558948] 0 1558948 76940 52504 638976 449 0 systemd-journal
Mar 14 07:52:08 n03-sxb-pve01 kernel: [1559278] 0 1559278 5653 617 61440 27 -1000 systemd-udevd
Mar 14 07:52:08 n03-sxb-pve01 kernel: [1304259] 0 1304259 2621177 2195869 18841600 27943 0 kvm
Mar 14 07:52:08 n03-sxb-pve01 kernel: [ 113012] 0 113012 2118960 1053347 10506240 32986 0 kvm
Mar 14 07:52:08 n03-sxb-pve01 kernel: [2564075] 64045 2564075 99017 5575 258048 657 0 ceph-mds
Mar 14 07:52:08 n03-sxb-pve01 kernel: [2564337] 64045 2564337 417262 265104 2797568 593 0 ceph-mon
Mar 14 07:52:08 n03-sxb-pve01 kernel: [2564476] 64045 2564476 126618 34096 548864 4535 0 ceph-mgr
Mar 14 07:52:08 n03-sxb-pve01 kernel: [2564542] 64045 2564542 1390627 1041218 10100736 743 0 ceph-osd
Mar 14 07:52:08 n03-sxb-pve01 kernel: [2564857] 64045 2564857 1208145 871591 8744960 1238 0 ceph-osd
Mar 14 07:52:08 n03-sxb-pve01 kernel: [2565076] 64045 2565076 1167600 852324 8433664 388 0 ceph-osd
Mar 14 07:52:08 n03-sxb-pve01 kernel: [2648460] 0 2648460 140110 41262 401408 0 0 corosync
Mar 14 07:52:08 n03-sxb-pve01 kernel: [2648634] 0 2648634 25163 511 77824 0 0 zed
Mar 14 07:52:08 n03-sxb-pve01 kernel: [ 556592] 0 556592 91949 27034 442368 4952 0 pvedaemon worke
Mar 14 07:52:08 n03-sxb-pve01 kernel: [ 562234] 0 562234 91939 26771 442368 5043 0 pvedaemon worke
Mar 14 07:52:08 n03-sxb-pve01 kernel: [4178836] 0 4178836 10501700 9535421 77312000 1089 0 kvm
Mar 14 07:52:08 n03-sxb-pve01 kernel: [3344157] 0 3344157 91370 25231 434176 5564 0 pvedaemon worke
Mar 14 07:52:08 n03-sxb-pve01 kernel: [4122985] 33 4122985 19639 12949 192512 0 0 spiceproxy work
Mar 14 07:52:08 n03-sxb-pve01 kernel: [4122987] 0 4122987 21543 146 65536 0 0 pvefw-logger
Mar 14 07:52:08 n03-sxb-pve01 kernel: [4122996] 33 4122996 92045 32330 446464 0 0 pveproxy worker
Mar 14 07:52:08 n03-sxb-pve01 kernel: [4122997] 33 4122997 92046 32020 446464 0 0 pveproxy worker
Mar 14 07:52:08 n03-sxb-pve01 kernel: [4122998] 33 4122998 92045 31924 446464 0 0 pveproxy worker
Mar 14 07:52:08 n03-sxb-pve01 kernel: [ 158515] 109 158515 10856 376 86016 0 0 pickup
Mar 14 07:52:08 n03-sxb-pve01 kernel: [ 172233] 0 172233 1305 86 49152 0 0 sleep
Mar 14 07:52:08 n03-sxb-pve01 kernel: [ 172355] 0 172355 53789 10760 196608 0 0 pvesr
Mar 14 07:52:08 n03-sxb-pve01 kernel: oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/qemu.slice/117.scope,task=kvm,pid=4178836,uid=0
Mar 14 07:52:08 n03-sxb-pve01 kernel: Out of memory: Killed process 4178836 (kvm) total-vm:42006800kB, anon-rss:38141680kB, file-rss:0kB, shmem-rss:4kB, UID:0 pgtables:75500kB oom_score_adj:0
Thank you