Error VM Backup

xFede46

Member
Dec 8, 2022
13
0
6
Hi,
i have a problem with the backup of a specific VM (id 102)
Here you can see the log:

Code:
102: 2023-09-07 00:05:58 INFO: Starting Backup of VM 102 (qemu)
102: 2023-09-07 00:05:58 INFO: status = running
102: 2023-09-07 00:05:58 INFO: backup mode: stop
102: 2023-09-07 00:05:58 INFO: ionice priority: 7
102: 2023-09-07 00:05:58 INFO: VM Name: xxxxxxxxxx
102: 2023-09-07 00:05:58 INFO: include disk 'scsi0' 'VM_Pool1:vm-102-disk-1' 100G
102: 2023-09-07 00:05:58 INFO: include disk 'scsi1' 'VM_Pool1:vm-102-disk-3' 500G
102: 2023-09-07 00:05:58 INFO: exclude disk 'scsi2' 'xxxxxxxxx:0.0.1.scsi-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx' (backup=no)
102: 2023-09-07 00:05:58 INFO: include disk 'efidisk0' 'VM_Pool1:vm-102-disk-0' 1M
102: 2023-09-07 00:05:58 INFO: include disk 'tpmstate0' 'VM_Pool1:vm-102-disk-2' 4M
102: 2023-09-07 00:05:58 INFO: stopping virtual guest
102: 2023-09-07 00:06:07 INFO: creating Proxmox Backup Server archive 'vm/102/2023-09-06T22:05:58Z'
102: 2023-09-07 00:06:07 INFO: starting kvm to execute backup task
102: 2023-09-07 00:06:09 INFO: attaching TPM drive to QEMU for backup
102: 2023-09-07 00:06:11 INFO: started backup task 'ed1de8bb-fc5c-4f45-bcf7-0eeb47298ca9'
102: 2023-09-07 00:06:11 INFO: resuming VM again after 13 seconds
102: 2023-09-07 00:06:11 INFO: efidisk0: dirty-bitmap status: created new
102: 2023-09-07 00:06:11 INFO: scsi0: dirty-bitmap status: created new
102: 2023-09-07 00:06:11 INFO: scsi1: dirty-bitmap status: created new
102: 2023-09-07 00:06:11 INFO: tpmstate0-backup: dirty-bitmap status: created new
102: 2023-09-07 00:06:16 INFO:   0% (332.5 MiB of 600.0 GiB) in 5s, read: 66.5 MiB/s, write: 53.7 MiB/s
102: 2023-09-07 00:08:25 INFO:   1% (6.3 GiB of 600.0 GiB) in 2m 14s, read: 47.6 MiB/s, write: 29.4 MiB/s
102: 2023-09-07 00:08:31 INFO:   2% (12.9 GiB of 600.0 GiB) in 2m 20s, read: 1.1 GiB/s, write: 31.3 MiB/s
102: 2023-09-07 00:08:41 INFO:   3% (19.8 GiB of 600.0 GiB) in 2m 30s, read: 713.2 MiB/s, write: 44.0 MiB/s
102: 2023-09-07 00:08:44 INFO:   4% (24.6 GiB of 600.0 GiB) in 2m 33s, read: 1.6 GiB/s, write: 30.7 MiB/s
102: 2023-09-07 00:08:47 INFO:   5% (30.8 GiB of 600.0 GiB) in 2m 36s, read: 2.1 GiB/s, write: 21.3 MiB/s
102: 2023-09-07 00:09:05 INFO:   6% (36.5 GiB of 600.0 GiB) in 2m 54s, read: 326.4 MiB/s, write: 45.6 MiB/s
102: 2023-09-07 00:09:12 INFO:   7% (43.0 GiB of 600.0 GiB) in 3m 1s, read: 940.0 MiB/s, write: 8.6 MiB/s
102: 2023-09-07 00:09:18 INFO:   8% (48.3 GiB of 600.0 GiB) in 3m 7s, read: 913.3 MiB/s, write: 4.7 MiB/s
102: 2023-09-07 00:09:31 INFO:   9% (54.2 GiB of 600.0 GiB) in 3m 20s, read: 463.1 MiB/s, write: 28.9 MiB/s
102: 2023-09-07 00:09:45 INFO:  10% (60.4 GiB of 600.0 GiB) in 3m 34s, read: 450.6 MiB/s, write: 23.7 MiB/s
102: 2023-09-07 00:09:54 INFO:  11% (69.4 GiB of 600.0 GiB) in 3m 43s, read: 1.0 GiB/s, write: 28.9 MiB/s
102: 2023-09-07 00:09:57 INFO:  12% (74.4 GiB of 600.0 GiB) in 3m 46s, read: 1.7 GiB/s, write: 38.7 MiB/s
102: 2023-09-07 00:10:07 INFO:  13% (78.4 GiB of 600.0 GiB) in 3m 56s, read: 410.8 MiB/s, write: 46.0 MiB/s
102: 2023-09-07 00:10:35 INFO:  14% (84.2 GiB of 600.0 GiB) in 4m 24s, read: 211.1 MiB/s, write: 19.1 MiB/s
102: 2023-09-07 00:10:55 INFO:  15% (90.5 GiB of 600.0 GiB) in 4m 44s, read: 325.8 MiB/s, write: 35.8 MiB/s
102: 2023-09-07 00:10:58 INFO:  16% (101.6 GiB of 600.0 GiB) in 4m 47s, read: 3.7 GiB/s, write: 0 B/s
102: 2023-09-07 00:11:01 INFO:  18% (112.1 GiB of 600.0 GiB) in 4m 50s, read: 3.5 GiB/s, write: 0 B/s
102: 2023-09-07 00:11:04 INFO:  20% (123.1 GiB of 600.0 GiB) in 4m 53s, read: 3.7 GiB/s, write: 0 B/s
102: 2023-09-07 00:11:22 INFO:  21% (126.0 GiB of 600.0 GiB) in 5m 11s, read: 170.2 MiB/s, write: 58.0 MiB/s
102: 2023-09-07 00:11:51 INFO:  22% (133.1 GiB of 600.0 GiB) in 5m 40s, read: 248.8 MiB/s, write: 44.8 MiB/s
102: 2023-09-07 00:12:05 INFO:  23% (138.2 GiB of 600.0 GiB) in 5m 54s, read: 372.9 MiB/s, write: 38.6 MiB/s
102: 2023-09-07 00:12:23 INFO:  24% (144.4 GiB of 600.0 GiB) in 6m 12s, read: 355.1 MiB/s, write: 74.2 MiB/s
102: 2023-09-07 00:12:43 INFO:  25% (150.2 GiB of 600.0 GiB) in 6m 32s, read: 295.0 MiB/s, write: 78.0 MiB/s
102: 2023-09-07 00:13:04 INFO:  26% (156.3 GiB of 600.0 GiB) in 6m 53s, read: 297.3 MiB/s, write: 77.7 MiB/s
102: 2023-09-07 00:13:23 INFO:  27% (162.0 GiB of 600.0 GiB) in 7m 12s, read: 309.9 MiB/s, write: 76.8 MiB/s
102: 2023-09-07 00:13:44 INFO:  28% (168.2 GiB of 600.0 GiB) in 7m 33s, read: 301.9 MiB/s, write: 82.7 MiB/s
102: 2023-09-07 00:14:08 INFO:  29% (174.5 GiB of 600.0 GiB) in 7m 57s, read: 268.5 MiB/s, write: 76.7 MiB/s
102: 2023-09-07 00:14:21 INFO:  30% (180.5 GiB of 600.0 GiB) in 8m 10s, read: 468.6 MiB/s, write: 125.5 MiB/s
102: 2023-09-07 00:14:40 INFO:  31% (186.3 GiB of 600.0 GiB) in 8m 29s, read: 314.9 MiB/s, write: 77.1 MiB/s
102: 2023-09-07 00:15:00 INFO:  32% (192.3 GiB of 600.0 GiB) in 8m 49s, read: 305.4 MiB/s, write: 78.2 MiB/s
102: 2023-09-07 00:15:13 INFO:  33% (198.1 GiB of 600.0 GiB) in 9m 2s, read: 459.1 MiB/s, write: 129.8 MiB/s
102: 2023-09-07 00:15:31 INFO:  34% (204.3 GiB of 600.0 GiB) in 9m 20s, read: 353.3 MiB/s, write: 113.8 MiB/s
102: 2023-09-07 00:15:45 INFO:  35% (210.7 GiB of 600.0 GiB) in 9m 34s, read: 464.6 MiB/s, write: 78.9 MiB/s
102: 2023-09-07 00:16:00 INFO:  36% (216.0 GiB of 600.0 GiB) in 9m 49s, read: 364.0 MiB/s, write: 66.9 MiB/s
102: 2023-09-07 00:16:25 INFO:  37% (222.4 GiB of 600.0 GiB) in 10m 14s, read: 260.6 MiB/s, write: 33.9 MiB/s
102: 2023-09-07 00:16:45 INFO:  38% (228.2 GiB of 600.0 GiB) in 10m 34s, read: 298.8 MiB/s, write: 42.4 MiB/s
102: 2023-09-07 00:17:18 INFO:  39% (234.0 GiB of 600.0 GiB) in 11m 7s, read: 180.1 MiB/s, write: 45.9 MiB/s
102: 2023-09-07 00:17:47 INFO:  40% (240.5 GiB of 600.0 GiB) in 11m 36s, read: 229.1 MiB/s, write: 43.3 MiB/s
102: 2023-09-07 00:19:18 INFO:  41% (246.0 GiB of 600.0 GiB) in 13m 7s, read: 62.0 MiB/s, write: 27.3 MiB/s
102: 2023-09-07 00:20:00 INFO:  42% (252.1 GiB of 600.0 GiB) in 13m 49s, read: 149.7 MiB/s, write: 69.7 MiB/s
102: 2023-09-07 00:20:48 INFO:  43% (258.0 GiB of 600.0 GiB) in 14m 37s, read: 124.9 MiB/s, write: 66.9 MiB/s
102: 2023-09-07 00:21:26 INFO:  44% (264.9 GiB of 600.0 GiB) in 15m 15s, read: 186.9 MiB/s, write: 54.5 MiB/s
102: 2023-09-07 00:24:04 INFO:  45% (270.0 GiB of 600.0 GiB) in 17m 53s, read: 32.9 MiB/s, write: 26.4 MiB/s
102: 2023-09-07 00:26:58 INFO:  46% (276.1 GiB of 600.0 GiB) in 20m 47s, read: 35.8 MiB/s, write: 19.7 MiB/s
102: 2023-09-07 00:27:35 INFO:  47% (282.1 GiB of 600.0 GiB) in 21m 24s, read: 165.8 MiB/s, write: 46.7 MiB/s
102: 2023-09-07 00:28:10 INFO:  48% (288.1 GiB of 600.0 GiB) in 21m 59s, read: 176.8 MiB/s, write: 69.5 MiB/s
102: 2023-09-07 00:28:21 INFO:  49% (296.6 GiB of 600.0 GiB) in 22m 10s, read: 789.1 MiB/s, write: 65.8 MiB/s
102: 2023-09-07 00:28:24 INFO:  51% (307.6 GiB of 600.0 GiB) in 22m 13s, read: 3.7 GiB/s, write: 0 B/s
102: 2023-09-07 00:28:27 INFO:  53% (318.8 GiB of 600.0 GiB) in 22m 16s, read: 3.7 GiB/s, write: 0 B/s
102: 2023-09-07 00:28:30 INFO:  54% (329.8 GiB of 600.0 GiB) in 22m 19s, read: 3.6 GiB/s, write: 0 B/s
102: 2023-09-07 00:28:33 INFO:  56% (340.6 GiB of 600.0 GiB) in 22m 22s, read: 3.6 GiB/s, write: 0 B/s
102: 2023-09-07 00:28:36 INFO:  57% (347.2 GiB of 600.0 GiB) in 22m 25s, read: 2.2 GiB/s, write: 0 B/s
102: 2023-09-07 00:28:39 INFO:  58% (348.6 GiB of 600.0 GiB) in 22m 28s, read: 477.3 MiB/s, write: 61.3 MiB/s
102: 2023-09-07 00:28:57 INFO:  59% (354.2 GiB of 600.0 GiB) in 22m 46s, read: 317.8 MiB/s, write: 79.1 MiB/s
102: 2023-09-07 00:29:14 INFO:  60% (360.1 GiB of 600.0 GiB) in 23m 3s, read: 352.2 MiB/s, write: 81.6 MiB/s
102: 2023-09-07 00:29:31 INFO:  61% (366.2 GiB of 600.0 GiB) in 23m 20s, read: 367.8 MiB/s, write: 76.7 MiB/s
102: 2023-09-07 00:29:45 INFO:  62% (372.1 GiB of 600.0 GiB) in 23m 34s, read: 436.0 MiB/s, write: 87.1 MiB/s
102: 2023-09-07 00:30:01 INFO:  63% (378.1 GiB of 600.0 GiB) in 23m 50s, read: 384.0 MiB/s, write: 88.8 MiB/s
102: 2023-09-07 00:30:23 INFO:  64% (384.1 GiB of 600.0 GiB) in 24m 12s, read: 279.3 MiB/s, write: 72.5 MiB/s
102: 2023-09-07 00:30:38 INFO:  65% (390.2 GiB of 600.0 GiB) in 24m 27s, read: 414.1 MiB/s, write: 102.7 MiB/s
102: 2023-09-07 00:30:51 INFO:  66% (396.2 GiB of 600.0 GiB) in 24m 40s, read: 474.2 MiB/s, write: 97.5 MiB/s
102: 2023-09-07 00:31:02 INFO:  67% (402.2 GiB of 600.0 GiB) in 24m 51s, read: 555.6 MiB/s, write: 130.2 MiB/s
102: 2023-09-07 00:31:14 INFO:  68% (408.1 GiB of 600.0 GiB) in 25m 3s, read: 507.3 MiB/s, write: 142.3 MiB/s
102: 2023-09-07 00:31:25 INFO:  69% (414.1 GiB of 600.0 GiB) in 25m 14s, read: 554.2 MiB/s, write: 131.6 MiB/s
102: 2023-09-07 00:31:35 INFO:  70% (420.4 GiB of 600.0 GiB) in 25m 24s, read: 646.4 MiB/s, write: 130.4 MiB/s
102: 2023-09-07 00:31:44 INFO:  71% (426.1 GiB of 600.0 GiB) in 25m 33s, read: 652.4 MiB/s, write: 105.8 MiB/s
102: 2023-09-07 00:31:48 INFO:  72% (433.2 GiB of 600.0 GiB) in 25m 37s, read: 1.8 GiB/s, write: 17.0 MiB/s
102: 2023-09-07 00:31:51 INFO:  74% (444.2 GiB of 600.0 GiB) in 25m 40s, read: 3.7 GiB/s, write: 0 B/s
102: 2023-09-07 00:31:54 INFO:  75% (455.1 GiB of 600.0 GiB) in 25m 43s, read: 3.7 GiB/s, write: 1.3 MiB/s
102: 2023-09-07 00:31:57 INFO:  76% (457.1 GiB of 600.0 GiB) in 25m 46s, read: 658.7 MiB/s, write: 217.3 MiB/s
102: 2023-09-07 00:32:04 INFO:  77% (463.3 GiB of 600.0 GiB) in 25m 53s, read: 908.6 MiB/s, write: 69.7 MiB/s
102: 2023-09-07 00:32:07 INFO:  79% (474.0 GiB of 600.0 GiB) in 25m 56s, read: 3.6 GiB/s, write: 0 B/s
102: 2023-09-07 00:32:10 INFO:  80% (484.9 GiB of 600.0 GiB) in 25m 59s, read: 3.6 GiB/s, write: 0 B/s
102: 2023-09-07 00:32:13 INFO:  82% (495.8 GiB of 600.0 GiB) in 26m 2s, read: 3.6 GiB/s, write: 0 B/s
102: 2023-09-07 00:32:16 INFO:  83% (501.7 GiB of 600.0 GiB) in 26m 5s, read: 1.9 GiB/s, write: 62.7 MiB/s
102: 2023-09-07 00:32:35 INFO:  84% (504.1 GiB of 600.0 GiB) in 26m 24s, read: 131.6 MiB/s, write: 68.0 MiB/s
102: 2023-09-07 00:33:24 ERROR: VM 102 not running
102: 2023-09-07 00:33:24 INFO: aborting backup job
102: 2023-09-07 00:33:24 ERROR: VM 102 not running
102: 2023-09-07 00:33:24 INFO: resuming VM again
102: 2023-09-07 00:33:24 INFO: restarting vm
102: 2023-09-07 00:33:35 ERROR: Backup of VM 102 failed - VM 102 not running

Can you help me to understand the cause of the error?

This is the syslog of the host of this VM:

Code:
Sep 07 00:32:06 whitestar kernel: x86/split lock detection: #AC: CPU 0/KVM/691841 took a split_lock trap at address: 0x778da844
Sep 07 00:32:08 whitestar kernel: sd 10:0:0:2: [sdb] Unit Not Ready
Sep 07 00:32:08 whitestar kernel: sd 10:0:0:2: [sdb] Sense Key : Illegal Request [current]
Sep 07 00:32:08 whitestar kernel: sd 10:0:0:2: [sdb] Add. Sense: Logical unit not supported
Sep 07 00:32:08 whitestar kernel: sd 10:0:0:2: [sdb] Read Capacity(16) failed: Result: hostbyte=DID_OK driverbyte=DRIVER_OK
Sep 07 00:32:08 whitestar kernel: sd 10:0:0:2: [sdb] Sense Key : Illegal Request [current]
Sep 07 00:32:08 whitestar kernel: sd 10:0:0:2: [sdb] Add. Sense: Logical unit not supported
Sep 07 00:32:08 whitestar kernel: sd 10:0:0:2: [sdb] Read Capacity(10) failed: Result: hostbyte=DID_OK driverbyte=DRIVER_OK
Sep 07 00:32:08 whitestar kernel: sd 10:0:0:2: [sdb] Sense Key : Illegal Request [current]
Sep 07 00:32:08 whitestar kernel: sd 10:0:0:2: [sdb] Add. Sense: Logical unit not supported
Sep 07 00:32:16 whitestar kernel: split_lock_warn: 45 callbacks suppressed
Sep 07 00:32:16 whitestar kernel: x86/split lock detection: #AC: CPU 1/KVM/691842 took a split_lock trap at address: 0x778da886

Inside the syslog of this host i find a lot of split lock detection but every VM works correctly, should i be worried about these warnings?

This is the pveversion:
Code:
proxmox-ve: 7.4-1 (running kernel: 5.15.107-2-pve)
pve-manager: 7.4-3 (running version: 7.4-3/9002ab8a)
pve-kernel-5.15: 7.4-3
pve-kernel-5.15.107-2-pve: 5.15.107-2
pve-kernel-5.15.102-1-pve: 5.15.102-1
ceph-fuse: 15.2.17-pve1
corosync: 3.1.7-pve1
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx4
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.24-pve2
libproxmox-acme-perl: 1.4.4
libproxmox-backup-qemu0: 1.3.1-1
libproxmox-rs-perl: 0.2.1
libpve-access-control: 7.4-3
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.4-1
libpve-guest-common-perl: 4.2-4
libpve-http-server-perl: 4.2-3
libpve-rs-perl: 0.7.6
libpve-storage-perl: 7.4-2
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 5.0.2-2
lxcfs: 5.0.3-pve1
novnc-pve: 1.4.0-1
proxmox-backup-client: 2.4.2-1
proxmox-backup-file-restore: 2.4.2-1
proxmox-kernel-helper: 7.4-1
proxmox-mail-forward: 0.1.1-1
proxmox-mini-journalreader: 1.3-1
proxmox-widget-toolkit: 3.7.0
pve-cluster: 7.3-3
pve-container: 4.4-3
pve-docs: 7.4-2
pve-edk2-firmware: 3.20230228-2
pve-firewall: 4.3-2
pve-firmware: 3.6-5
pve-ha-manager: 3.6.1
pve-i18n: 2.12-1
pve-qemu-kvm: 7.2.0-8
pve-xtermjs: 4.16.0-1
qemu-server: 7.4-3
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.8.0~bpo11+3
vncterm: 1.7-1
zfsutils-linux: 2.1.11-pve1

This is the qm.conf:
Code:
agent: 1
bios: ovmf
boot: order=scsi0
cores: 6
cpu: host
efidisk0: VM_Pool1:vm-102-disk-0,efitype=4m,pre-enrolled-keys=1,size=1M
machine: pc-q35-7.2
memory: 65536
meta: creation-qemu=7.2.0,ctime=1687274110
name: xxxxxxxxxxxxxxxxxxx
net0: virtio=1E:29:19:94:17:06,bridge=vmbr0,firewall=1
numa: 0
onboot: 1
ostype: win11
scsi0: VM_Pool1:vm-102-disk-1,cache=writeback,discard=on,iothread=1,size=100G,ssd=1
scsi1: VM_Pool1:vm-102-disk-3,cache=writeback,discard=on,iothread=1,size=500G,ssd=1
scsi2: xxxxxxxxxxxxxxxxxxx:0.0.1.scsi-xxxxxxxxxxxxxxxxxxx,backup=0,cache=writeback,discard=on,iothread=1,size=1T
scsihw: virtio-scsi-single
smbios1: uuid=ac98e8ce-fcfe-492e-a601-6af7a6c67e2d
sockets: 1
tpmstate0: VM_Pool1:vm-102-disk-2,size=4M,version=v2.0
vga: virtio
vmgenid: eb852b6c-55a7-4e2e-96d6-0af5107ff783

Thanks in advice
 
Last edited:
Hi,
the split lock messages should only be warnings (such instructions are slow, see: https://lwn.net/Articles/810317/) but not cause the crash of the VM (timing-wise they happen a bit earlier). Is there anything in the journal/syslog a bit later, i.e. between what you posted and the time of 2023-09-07 00:33:24 ERROR: VM 102 not running? Does the crash happen every time for this VM?
 
Hi,

thanks for the support, this is the log you requested.



Code:
00:33:17: split_lock_warn: 31 callbacks suppressed

00:33:17: x86/split lock detection: #AC: CPU 3/KVM/691844 took a split_lock trap at address: 0x778da844

00:33:18: sd 10:0:0:2: [sdb] Unit Not Ready

00:33:18: sd 10:0:0:2: [sdb] Sense Key : Illegal Request [current]

00:33:18: sd 10:0:0:2: [sdb] Add. Sense: Logical unit not supported

00:33:18: sd 10:0:0:2: [sdb] Read Capacity(16) failed: Result: hostbyte=DID_OK driverbyte=DRIVER_OK

00:33:18: sd 10:0:0:2: [sdb] Sense Key : Illegal Request [current]

00:33:18: sd 10:0:0:2: [sdb] Add. Sense: Logical unit not supported

00:33:18: sd 10:0:0:2: [sdb] Read Capacity(10) failed: Result: hostbyte=DID_OK driverbyte=DRIVER_OK

00:33:18: sd 10:0:0:2: [sdb] Sense Key : Illegal Request [current]

00:33:18: sd 10:0:0:2: [sdb] Add. Sense: Logical unit not supported

00:33:20: x86/split lock detection: #AC: CPU 3/KVM/691844 took a split_lock trap at address: 0x778da844

00:33:23: split_lock_warn: 9 callbacks suppressed

00:33:23: x86/split lock detection: #AC: CPU 0/KVM/691841 took a split_lock trap at address: 0x778da886

00:33:23: kthreadd invoked oom-killer: gfp_mask=0x40cc0(GFP_KERNEL|__GFP_COMP), order=1, oom_score_adj=0

00:33:23: CPU: 27 PID: 2 Comm: kthreadd Tainted: P           O      5.15.107-2-pve #1

00:33:23: Hardware name: Supermicro Super Server/X12SPL-F, BIOS 1.2 05/03/2022

00:33:23: Call Trace:

00:33:23:  <TASK>

00:33:23:  dump_stack_lvl+0x4a/0x63

00:33:23:  dump_stack+0x10/0x16

00:33:23:  dump_header+0x53/0x225

00:33:23:  oom_kill_process.cold+0xb/0x10

00:33:23:  out_of_memory+0x1dc/0x530

00:33:23:  __alloc_pages_slowpath.constprop.0+0xd68/0xe80

00:33:23:  __alloc_pages+0x30c/0x320

00:33:23:  alloc_pages+0x9e/0x1e0

00:33:23:  allocate_slab+0x34d/0x470

00:33:23:  ___slab_alloc+0x635/0x850

00:33:23:  ? copy_process+0x1fd/0x1cc0

00:33:23:  ? update_load_avg+0x82/0x640

00:33:23:  kmem_cache_alloc_node+0x2b0/0x310

00:33:23:  ? copy_process+0x1fd/0x1cc0

00:33:23:  copy_process+0x1fd/0x1cc0

00:33:23:  kernel_clone+0x9d/0x3f0

00:33:23:  ? __schedule+0x356/0x1740

00:33:23:  ? sysvec_apic_timer_interrupt+0x4e/0x90

00:33:23:  kernel_thread+0x55/0x70

00:33:23:  ? set_kthread_struct+0x50/0x50

00:33:23:  kthreadd+0x284/0x2e0

00:33:24:  ? kthread_is_per_cpu+0x30/0x30

00:33:24:  ret_from_fork+0x1f/0x30

00:33:24:  </TASK>

00:33:24: Mem-Info:

00:33:24: active_anon:11347803 inactive_anon:10932760 isolated_anon:0

 active_file:1653 inactive_file:47778 isolated_file:0

 unevictable:38071 dirty:95 writeback:96

 slab_reclaimable:413428 slab_unreclaimable:1568332

 mapped:22229 shmem:16602 pagetables:59079 bounce:0

 kernel_misc_reclaimable:0

 free:1404075 free_pcp:1274 free_cma:0

00:33:24: Node 0 active_anon:45391212kB inactive_anon:43731040kB active_file:6612kB inactive_file:191112kB unevictable:152284kB isolated(anon):0kB isolated(file):0kB mapped:88916kB dirty:380kB writeback:384kB shmem:66408kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 14401536kB writeback_tmp:0kB kernel_stack:14064kB pagetables:236316kB all_unreclaimable? no

00:33:24: Node 0 DMA free:11264kB min:4kB low:16kB high:28kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15992kB managed:15360kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB

00:33:24: lowmem_reserve[]: 0 1518 128339 128339 128339

00:33:24: Node 0 DMA32 free:509728kB min:796kB low:2348kB high:3900kB reserved_highatomic:2048KB active_anon:67036kB inactive_anon:1020240kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:1683456kB managed:1615084kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB

00:33:24: lowmem_reserve[]: 0 0 126820 126820 126820

00:33:24: Node 0 Normal free:5127528kB min:195800kB low:325664kB high:455528kB reserved_highatomic:2048KB active_anon:45324176kB inactive_anon:42710800kB active_file:6612kB inactive_file:191112kB unevictable:152284kB writepending:764kB present:132120576kB managed:129871408kB mlocked:152284kB bounce:0kB free_pcp:6440kB local_pcp:0kB free_cma:0kB

00:33:24: lowmem_reserve[]: 0 0 0 0 0

00:33:24: Node 0 DMA: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 1*1024kB (U) 1*2048kB (M) 2*4096kB (M) = 11264kB

00:33:24: Node 0 DMA32: 838*4kB (UM) 189*8kB (UM) 326*16kB (UM) 250*32kB (UM) 296*64kB (UM) 259*128kB (UM) 233*256kB (UM) 216*512kB (UM) 151*1024kB (UM) 20*2048kB (UME) 18*4096kB (MH) = 509728kB

00:33:24: Node 0 Normal: 1272794*4kB (UH) 4843*8kB (UH) 8*16kB (UH) 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 5130048kB

00:33:24: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB

00:33:24: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB

00:33:24: 70804 total pagecache pages

00:33:24: 0 pages in swap cache

00:33:24: Swap cache stats: add 0, delete 0, find 0/0

00:33:24: Free swap  = 0kB

00:33:24: Total swap = 0kB

00:33:24: 33455006 pages RAM

00:33:24: 0 pages HighMem/MovableOnly

00:33:24: 579543 pages reserved

00:33:24: 0 pages hwpoisoned

00:33:24: Tasks state (memory values in pages):

00:33:24: [  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name

00:33:24: [   1191]     0  1191    35519     1115   307200        0          -250 systemd-journal

00:33:24: [   1397]     0  1397     5661      806    73728        0         -1000 systemd-udevd

00:33:24: [   3816]   103  3816     1975      467    53248        0             0 rpcbind

00:33:24: [   3819]   102  3819     2070      542    53248        0          -900 dbus-daemon

00:33:24: [   3826]     0  3826    37752      299    57344        0         -1000 lxcfs

00:33:24: [   3829]     0  3829    69589      407    86016        0             0 pve-lxc-syscall

00:33:24: [   3832]     0  3832     1757      338    53248        0             0 ksmtuned

00:33:24: [   3838]     0  3838    55199      771    73728        0             0 rsyslogd

00:33:24: [   3840]     0  3840     1067      379    49152        0             0 qmeventd

00:33:24: [   3846]     0  3846     2873      715    61440        0             0 smartd

00:33:24: [   3849]     0  3849     3462      775    69632        0             0 systemd-logind

00:33:24: [   3852]     0  3852    59448      760    81920        0             0 zed

00:33:24: [   4125]     0  4125     2081      372    53248        0             0 lxc-monitord

00:33:24: [   4148]     0  4148     2888      132    57344        0             0 iscsid

00:33:24: [   4150]     0  4150     3290     3282    69632        0           -17 iscsid

00:33:24: [   4161]     0  4161     3339     1164    65536        0         -1000 sshd

00:33:24: [   4174]   101  4174     4758      632    61440        0             0 chronyd

00:33:24: [   4176]   101  4176     2710      488    61440        0             0 chronyd

00:33:24: [   4197]     0  4197     1461      424    49152        0             0 agetty

00:33:24: [   4286]     0  4286   200163      765   184320        0             0 rrdcached

00:33:24: [   4379]     0  4379   154589    16637   434176        0             0 pmxcfs

00:33:24: [   4476]     0  4476    10013      619    77824        0             0 master

00:33:24: [   4478]   106  4478    10090      651    77824        0             0 qmgr

00:33:24: [   4483]     0  4483   139382    41045   397312        0             0 corosync

00:33:24: [   4484]     0  4484     1686      577    53248        0             0 cron

00:33:24: [   4519]     0  4519    69591    23071   274432        0             0 pve-firewall

00:33:24: [   4521]     0  4521    69528    23981   319488        0             0 pvestatd

00:33:24: [   4524]     0  4524      591      140    49152        0             0 bpfilter_umh

00:33:24: [   4547]     0  4547    87845    31354   393216        0             0 pvedaemon

00:33:24: [   4732]    33  4732    88181    32841   430080        0             0 pveproxy

00:33:24: [   4739]    33  4739    19233    13566   196608        0             0 spiceproxy

00:33:24: [  51233]     0 51233    83547    26039   344064        0             0 pvescheduler

00:33:24: [ 960186]     0 960186  2688486  2118009 18231296        0             0 kvm

00:33:24: [ 691669]     0 691669  2530063  2117464 18604032        0             0 kvm

00:33:24: [1817032]     0 1817032  3659715  3159033 26677248        0             0 kvm

00:33:24: [2897458]     0 2897458  2693854  2110177 18710528        0             0 kvm

00:33:24: [2070067]   106 2070067    11032     1100    73728        0             0 tlsmgr

00:33:24: [1825557]   106 1825557    10080      724    69632        0             0 pickup

00:33:24: [1864842]     0 1864842    87946    27972   376832        0             0 task UPID:white

00:33:24: [1869447]    33 1869447    91430    34631   425984        0             0 pveproxy worker

00:33:24: [1869448]    33 1869448    90503    33851   421888        0             0 pveproxy worker

00:33:24: [1869450]    33 1869450    19299    12586   176128        0             0 spiceproxy work

00:33:24: [1869453]     0 1869453    20050      356    57344        0             0 pvefw-logger

00:33:24: [1869487]     0 1869487  1891089  1588767 13860864        0             0 kvm

00:33:24: [3308693]     0 3308693     2868      905    53248        0             0 swtpm

00:33:24: [3308698]     0 3308698 17248895 16850501 136638464        0             0 kvm

00:33:24: [3143145]     0 3143145    90011    32935   409600        0             0 pvedaemon worke

00:33:24: [3158146]     0 3158146    90047    32866   413696        0             0 pvedaemon worke

00:33:24: [ 305275]    33 305275    91406    34663   425984        0             0 pveproxy worker

00:33:24: [1525995]     0 1525995    89983    31843   401408        0             0 task UPID:white

00:33:24: [1526534]     0 1526534    82738    27086   380928        0             0 qm

00:33:24: [1531098]     0 1531098    89948    32324   409600        0             0 pvedaemon worke

00:33:24: [1552808]     0 1552808     1341      106    53248        0             0 sleep

00:33:24: oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/qemu.slice/102.scope,task=kvm,pid=3308698,uid=0

00:33:24: Out of memory: Killed process 3308698 (kvm) total-vm:68995580kB, anon-rss:67394524kB, file-rss:7476kB, shmem-rss:4kB, UID:0 pgtables:133436kB oom_score_adj:0

00:33:24 whitestar systemd[1]: 102.scope: A process of this unit has been killed by the OOM killer.

00:33:24 whitestar pvedaemon[3158146]: <root@pam> end task UPID:whitestar:001748EB:00E8B4E1:64F8FDFE:vncproxy:102:root@pam: OK

00:33:24 whitestar pvescheduler[1864842]: VM 102 qmp command failed - VM 102 not running

00:33:24 whitestar pvescheduler[1864842]: VM 102 qmp command failed - VM 102 not running

00:33:24 whitestar pvescheduler[1864842]: VM 102 qmp command failed - VM 102 not running

00:33:24 whitestar qm[1624279]: <root@pam> starting task UPID:whitestar:0018C8F0:00E8CA1F:64F8FE34:qmstart:102:root@pam:

00:33:24 whitestar qm[1624304]: start VM 102: UPID:whitestar:0018C8F0:00E8CA1F:64F8FE34:qmstart:102:root@pam:

00:33:24 whitestar systemd[1]: 102.scope: Succeeded.

00:33:24 whitestar systemd[1]: Stopped 102.scope.

00:33:24 whitestar systemd[1]: 102.scope: Consumed 28min 38.429s CPU time.

00:33:26: oom_reaper: reaped process 3308698 (kvm), now anon-rss:0kB, file-rss:52kB, shmem-rss:4kB

00:33:26: x86/split lock detection: #AC: CPU 1/KVM/691842 took a split_lock trap at address: 0x778da844

00:33:26:  sda: sda1 sda2

00:33:27: x86/split lock detection: #AC: CPU 1/KVM/691842 took a split_lock trap at address: 0x778da886

00:33:34:  zd32: p1 p2 p3 p4 p5 p6

00:33:34 whitestar qmeventd[3840]: read: Connection reset by peer

00:33:34: fwbr102i0: port 2(tap102i0) entered disabled state

00:33:34:  zd144: p1 p2 p3 p4

00:33:34: fwbr102i0: port 2(tap102i0) entered disabled state

00:33:34 whitestar systemd[1]: Started 102.scope.

00:33:34 whitestar systemd-udevd[1625653]: Using default interface naming scheme 'v247'.

00:33:34 whitestar systemd-udevd[1625653]: ethtool: autonegotiation is unset or enabled, the speed and duplex are not writable.

00:33:35 whitestar qmeventd[1625640]: Starting cleanup for 102

00:33:35 whitestar qmeventd[1625640]: trying to acquire lock...

00:33:35: device tap102i0 entered promiscuous mode

00:33:35: fwbr102i0: port 1(fwln102i0) entered disabled state

00:33:35: vmbr0: port 8(fwpr102p0) entered disabled state

00:33:35: device fwln102i0 left promiscuous mode

00:33:35: fwbr102i0: port 1(fwln102i0) entered disabled state

00:33:35: device fwpr102p0 left promiscuous mode

00:33:35: vmbr0: port 8(fwpr102p0) entered disabled state

00:33:35 whitestar systemd-udevd[1625653]: ethtool: autonegotiation is unset or enabled, the speed and duplex are not writable.

00:33:35 whitestar systemd-udevd[1625651]: ethtool: autonegotiation is unset or enabled, the speed and duplex are not writable.

00:33:35 whitestar systemd-udevd[1625651]: Using default interface naming scheme 'v247'.

00:33:35: vmbr0: port 8(fwpr102p0) entered blocking state

00:33:35: vmbr0: port 8(fwpr102p0) entered disabled state

00:33:35 whitestar systemd-udevd[1625653]: ethtool: autonegotiation is unset or enabled, the speed and duplex are not writable.

00:33:35: device fwpr102p0 entered promiscuous mode

00:33:35: vmbr0: port 8(fwpr102p0) entered blocking state

00:33:35: vmbr0: port 8(fwpr102p0) entered forwarding state

00:33:35: fwbr102i0: port 1(fwln102i0) entered blocking state

00:33:35: fwbr102i0: port 1(fwln102i0) entered disabled state

00:33:35: device fwln102i0 entered promiscuous mode

00:33:35: fwbr102i0: port 1(fwln102i0) entered blocking state

00:33:35: fwbr102i0: port 1(fwln102i0) entered forwarding state

00:33:35: fwbr102i0: port 2(tap102i0) entered blocking state

00:33:35: fwbr102i0: port 2(tap102i0) entered disabled state

00:33:35: fwbr102i0: port 2(tap102i0) entered blocking state

00:33:35: fwbr102i0: port 2(tap102i0) entered forwarding state

00:33:35 whitestar qmeventd[1625640]:  OK

00:33:35 whitestar qmeventd[1625640]: vm still running

00:33:35 whitestar qm[1624279]: <root@pam> end task UPID:whitestar:0018C8F0:00E8CA1F:64F8FE34:qmstart:102:root@pam: OK

00:33:35 whitestar pvescheduler[1864842]: ERROR: Backup of VM 102 failed - VM 102 not running
 
Code:
00:33:24 whitestar systemd[1]: 102.scope: A process of this unit has been killed by the OOM killer.
OOM means out-of-memory, i.e. the host didn't have enough RAM for all processes. That's why the VM was killed.
 
Hi,
thanks again for the very quick reply, the host has 128gb of ram and 50% is normally free during daily usage.
Here there is a screenshot of the memory usage graph, how can i reduce the ram usage during backup?
1694084817726.png
EDIT: the backup job does the backup to a pbs remote server of 6 VMs, another member of the cluster does the same job and it doesn't run out of memory (same hardware, different VMs).
 
Hi,
thanks again for the very quick reply, the host has 128gb of ram and 50% is normally free during daily usage.
Here there is a screenshot of the memory usage graph, how can i reduce the ram usage during backup?
View attachment 55148
EDIT: the backup job does the backup to a pbs remote server of 6 VMs, another member of the cluster does the same job and it doesn't run out of memory (same hardware, different VMs).
At the time the backup started for this VM, the memory usage was already pretty high. Mechanisms to reduce memory usage are KSM and ballooning: https://pve.proxmox.com/wiki/Dynamic_Memory_Management

If you upgrade to Proxmox VE 8, (or specifically pve-qemu-kvm >= 8.0.2-5) you can also benefit from an improvement to reduce memory consumption after backup (this helps if you backup multiple VMs after each other): https://git.proxmox.com/?p=pve-qemu.git;a=commit;h=5f9cb29c3a7d88c7b8f0e1865b6c1651380cc1c8
 
Hi,
both KSM and balooning are already activated on the host and on the VMs, i'm not going to upgrade to PVE 8 because the environment is a pre-production environment so its services are not mission-critical but are very important for us.
Do you think that splitting the backup job can help?
 
Hi,
both KSM and balooning are already activated on the host and on the VMs, i'm not going to upgrade to PVE 8 because the environment is a pre-production environment so its services are not mission-critical but are very important for us.
Proxmox VE will only receive important bug fixes, meaning the patch I mentioned will not be backported. Eventually, you should upgrade, because Proxmox VE 7's support will end July 2024. When the time comes and you decide to do so, make sure you have tested backups and follow the upgrade guide: https://pve.proxmox.com/wiki/Upgrade_from_7_to_8

Do you think that splitting the backup job can help?
I don't see how that would help, because the backups on the same node are already done sequentially.
 
Ok, i will do some testing with PVE 8 and then i will update.
I would like to thank you very much for the support that you gave me.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!