backup jobs with proxmox containers (pct) fail on only one node

Kreios · Jan 25, 2022

Hey guys,

currently all backup jobs including containers seem to fail on only one of our three nodes:

Code:

ERROR: Backup of VM 100 failed - can't map rbd volume vm-100-disk-0@vzdump: rbd: map failed: (12) Cannot allocate memory

The normal qemu VMs are not affected, as can be seen in the screenshot below:

The pool on which all VMs and CTs are running is a ceph pool over all three proxmox cluster nodes. The backup target has been a NFS storage on another server and now is a ceph-fs on another storage cluster. The problem existed on both target types.

One individual log of the backup process:

Code:

INFO: starting new backup job: vzdump --mode snapshot --exclude 134,109,201,141,127,115,116,112,111,110,103 --compress zstd --storage backup-ceph --all 1 --mailto root@domain.de --mailnotification failure --quiet 1
INFO: skip external VMs: 102, 105, 108, 114, 117, 121, 122, 123, 129, 130, 136, 139, 219
INFO: Starting Backup of VM 100 (lxc)
INFO: Backup started at 2022-01-25 04:30:02
INFO: status = running
INFO: CT Name: calc.domain.de
INFO: including mount point rootfs ('/') in backup
INFO: backup mode: snapshot
INFO: bandwidth limit: 500000 KB/s
INFO: ionice priority: 7
INFO: create storage snapshot 'vzdump'
Creating snap: 10% complete...
Creating snap: 100% complete...done.
In some cases useful info is found in syslog - try "dmesg | tail".
umount: /mnt/vzsnap0/: not mounted.
command 'umount -l -d /mnt/vzsnap0/' failed: exit code 32
ERROR: Backup of VM 100 failed - can't map rbd volume vm-100-disk-0@vzdump: rbd: map failed: (12) Cannot allocate memory
INFO: Failed at 2022-01-25 04:30:06

Memory usage seems normal and is the same on all three nodes. Each node has 62Gigs available. There are roughly 30 CTs and 6 VMs in the whole cluster.

Code:

root@adelie:~# free -h
               total        used        free      shared  buff/cache   available
Mem:            62Gi        18Gi       417Mi       632Mi        43Gi        43Gi
Swap:           13Gi       1.1Gi        12Gi

Has anybody got an idea? I am still doing further analysis, but I am stumped for now...

Other info:

Code:

root@adelie:~# pveversion -v
proxmox-ve: 7.1-1 (running kernel: 5.13.19-3-pve)
pve-manager: 7.1-10 (running version: 7.1-10/6ddebafe)

Javier Alonso · Jan 2, 2023

Hello Kreios,

Same issue here with proxmox 7.2-1. In this case the target is PBS.

free -h

Code:

proxmox-ve: 7.2-1 (running kernel: 5.13.19-6-pve)
pve-manager: 7.2-4 (running version: 7.2-4/ca9d43cc)
pve-kernel-5.15: 7.2-3
pve-kernel-helper: 7.2-3
pve-kernel-5.4: 6.4-16
pve-kernel-5.15.35-1-pve: 5.15.35-3
pve-kernel-5.13.19-6-pve: 5.13.19-15
pve-kernel-5.4.178-1-pve: 5.4.178-1
pve-kernel-4.15: 5.4-19
pve-kernel-4.15.18-30-pve: 4.15.18-58
ceph: 15.2.16-pve1
ceph-fuse: 15.2.16-pve1
corosync: 3.1.5-pve2
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown: residual config
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.22-pve2
libproxmox-acme-perl: 1.4.2
libproxmox-backup-qemu0: 1.3.1-1
libpve-access-control: 7.1-8
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.2-2
libpve-guest-common-perl: 4.1-2
libpve-http-server-perl: 4.1-2
libpve-storage-perl: 7.2-4
libqb0: 1.0.5-1
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 4.0.12-1
lxcfs: 4.0.12-pve1
novnc-pve: 1.3.0-3
openvswitch-switch: 2.15.0+ds1-2+deb11u1
proxmox-backup-client: 2.2.1-1
proxmox-backup-file-restore: 2.2.1-1
proxmox-mini-journalreader: 1.3-1
proxmox-widget-toolkit: 3.5.1
pve-cluster: 7.2-1
pve-container: 4.2-1
pve-docs: 7.2-2
pve-edk2-firmware: 3.20210831-2
pve-firewall: 4.2-5
pve-firmware: 3.4-2
pve-ha-manager: 3.3-4
pve-i18n: 2.7-2
pve-qemu-kvm: 6.2.0-8
pve-xtermjs: 4.16.0-1
qemu-server: 7.2-3
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.7.1~bpo11+1
vncterm: 1.7-1
zfsutils-linux: 2.1.4-pve1

free -h

Code:

total        used        free      shared  buff/cache   available
Mem:           251Gi        63Gi        98Gi       2.0Gi        90Gi       184Gi
Swap:             0B          0B          0B

Search

Search

backup jobs with proxmox containers (pct) fail on only one node

Kreios

Active Member

Javier Alonso

Member