Hey guys,
currently all backup jobs including containers seem to fail on only one of our three nodes:
The normal qemu VMs are not affected, as can be seen in the screenshot below:
The pool on which all VMs and CTs are running is a ceph pool over all three proxmox cluster nodes. The backup target has been a NFS storage on another server and now is a ceph-fs on another storage cluster. The problem existed on both target types.
One individual log of the backup process:
Memory usage seems normal and is the same on all three nodes. Each node has 62Gigs available. There are roughly 30 CTs and 6 VMs in the whole cluster.
Has anybody got an idea? I am still doing further analysis, but I am stumped for now...
Other info:
currently all backup jobs including containers seem to fail on only one of our three nodes:
Code:
ERROR: Backup of VM 100 failed - can't map rbd volume vm-100-disk-0@vzdump: rbd: map failed: (12) Cannot allocate memory
The normal qemu VMs are not affected, as can be seen in the screenshot below:
The pool on which all VMs and CTs are running is a ceph pool over all three proxmox cluster nodes. The backup target has been a NFS storage on another server and now is a ceph-fs on another storage cluster. The problem existed on both target types.
One individual log of the backup process:
Code:
INFO: starting new backup job: vzdump --mode snapshot --exclude 134,109,201,141,127,115,116,112,111,110,103 --compress zstd --storage backup-ceph --all 1 --mailto root@domain.de --mailnotification failure --quiet 1
INFO: skip external VMs: 102, 105, 108, 114, 117, 121, 122, 123, 129, 130, 136, 139, 219
INFO: Starting Backup of VM 100 (lxc)
INFO: Backup started at 2022-01-25 04:30:02
INFO: status = running
INFO: CT Name: calc.domain.de
INFO: including mount point rootfs ('/') in backup
INFO: backup mode: snapshot
INFO: bandwidth limit: 500000 KB/s
INFO: ionice priority: 7
INFO: create storage snapshot 'vzdump'
Creating snap: 10% complete...
Creating snap: 100% complete...done.
In some cases useful info is found in syslog - try "dmesg | tail".
umount: /mnt/vzsnap0/: not mounted.
command 'umount -l -d /mnt/vzsnap0/' failed: exit code 32
ERROR: Backup of VM 100 failed - can't map rbd volume vm-100-disk-0@vzdump: rbd: map failed: (12) Cannot allocate memory
INFO: Failed at 2022-01-25 04:30:06
Memory usage seems normal and is the same on all three nodes. Each node has 62Gigs available. There are roughly 30 CTs and 6 VMs in the whole cluster.
Code:
root@adelie:~# free -h
total used free shared buff/cache available
Mem: 62Gi 18Gi 417Mi 632Mi 43Gi 43Gi
Swap: 13Gi 1.1Gi 12Gi
Has anybody got an idea? I am still doing further analysis, but I am stumped for now...
Other info:
Code:
root@adelie:~# pveversion -v
proxmox-ve: 7.1-1 (running kernel: 5.13.19-3-pve)
pve-manager: 7.1-10 (running version: 7.1-10/6ddebafe)