Proxmox Backup Server eats up all the RAM and becomes unresponsible

Andrei_W4TL

New Member
Jul 8, 2024
10
1
3
Hey,

1728028193649.png

when performing a sync job to a local NAS after a few hours the RAM will get FULL (without using SWAP) and the PBS needs to be rebooted.

My setup is like this:

- Backup daily to BK-NAS-01
- weekly SYNC to BK-NAS-02
- weekly connect offsite storage and perform sync job from BK-NAS-02 to BK-NAS-ROTATE

Does this make sense? PBS is running inside PVE as VM.

Greetings
Andrei
 
Hey,

View attachment 75742

when performing a sync job to a local NAS after a few hours the RAM will get FULL (without using SWAP) and the PBS needs to be rebooted.

My setup is like this:

- Backup daily to BK-NAS-01
- weekly SYNC to BK-NAS-02
- weekly connect offsite storage and perform sync job from BK-NAS-02 to BK-NAS-ROTATE

Does this make sense? PBS is running inside PVE as VM.

Greetings
Andrei
Hi,
there where some memory leaks in combination with CIFS which got fixed with proxmox kernel 6.8.8-3. Please try upgrading the kernel to a newer version an see if the issue persists.
 
  • Like
Reactions: Andrei_W4TL
Hi,
there where some memory leaks in combination with CIFS which got fixed with proxmox kernel 6.8.8-3. Please try upgrading the kernel to a newer version an see if the issue persists.
That did not help unfortunately or just partially. Now the RAM is not FULL yet, but the machine is lagging and has very high load:

root@vm-backup-01:~# w
22:30:03 up 13:42, 2 users, load average: 29.55, 24.67, 21.49
USER TTY FROM LOGIN@ IDLE JCPU PCPU WHAT


top output:
1728073870634.png

root@vm-backup-01:~# cat /proc/pressure/memory
some avg10=75.82 avg60=86.98 avg300=88.00 total=7939588640
full avg10=75.69 avg60=86.22 avg300=87.38 total=7934958862

root@vm-backup-01:~# slabtop -o -s c | head -n15
Active / Total Objects (% used) : 2465790 / 2564307 (96.2%)
Active / Total Slabs (% used) : 71416 / 71416 (100.0%)
Active / Total Caches (% used) : 328 / 425 (77.2%)
Active / Total Size (% used) : 1149024.04K / 1173907.69K (97.9%)
Minimum / Average / Maximum Object : 0.01K / 0.46K / 16.25K

OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME
1334844 1333830 99% 0.57K 47673 28 762768K radix_tree_node
254504 241035 94% 1.09K 8776 29 280832K nfs_inode_cache
60320 57385 95% 0.25K 1885 32 15080K kmalloc-rnd-14-256
303960 246612 81% 0.04K 2980 102 11920K lsm_inode_cache
46368 46368 100% 0.14K 1656 28 6624K kernfs_node_cache
439 436 99% 9.88K 153 3 4896K task_struct
7072 2936 41% 0.50K 221 32 3536K kmalloc-rnd-14-512
101 101 100% 16.25K 101 1 3232K cifs_request


on the host:
root@:~# qm config 101
agent: 1
balloon: 0
boot: order=scsi0;ide2;net0
cores: 9
cpu: x86-64-v2-AES
ide2: none,media=cdrom
memory: 36000
meta: creation-qemu=8.1.5,ctime=1719643553
name: VM-BACKUP-01
net0: virtio=BC:24:11:2D:F1:A3,bridge=vmbr0,firewall=1
numa: 0
onboot: 1
ostype: l26
scsi0: local-lvm:vm-101-disk-0,iothread=1,size=50G
scsihw: virtio-scsi-single
smbios1: uuid=63aa39c1-eac7-4610-b5cd-4c38df344488
sockets: 2
vmgenid: 7a717444-e697-411b-8d47-cad66d37ca04
 
Last edited:
"PBS is running inside PVE as VM."

Perhaps there's a basic storage issue here?
  • Are these setup as an NFS mount in the PBS filesystem, which is then set as a PBS datastore? I've seen testing that this is the very worst way to mount PBS storage, by an exponential degree.
  • Or are you talking about NFS mounts to the PVE cluster using the Storage feature? And then you store the virtual disk on the NFS. Of course this doesn't work optimally, but testing shows it works ok.
If its the first option, an NFS mount inside the filesystem to trick PBS into using it, well that might be a commonly done thing. but its not a good idea. I could point you to some very long and contentious threads about NFS mounts in PBS.
 
  • Like
Reactions: Andrei_W4TL
"PBS is running inside PVE as VM."

Perhaps there's a basic storage issue here?
  • Are these setup as an NFS mount in the PBS filesystem, which is then set as a PBS datastore? I've seen testing that this is the very worst way to mount PBS storage, by an exponential degree.
  • Or are you talking about NFS mounts to the PVE cluster using the Storage feature? And then you store the virtual disk on the NFS. Of course this doesn't work optimally, but testing shows it works ok.
If its the first option, an NFS mount inside the filesystem to trick PBS into using it, well that might be a commonly done thing. but its not a good idea. I could point you to some very long and contentious threads about NFS mounts in PBS.

You're right about that, the storage inside the PBS is mounted as nfs/cifs:

root@vm-backup-01:~# cat /etc/fstab
# <file system> <mount point> <type> <options> <dump> <pass>
/dev/pbs/root / ext4 errors=remount-ro 0 1
/dev/pbs/swap none swap sw 0 0
proc /proc proc defaults 0 0
192.168.200.185:/backup /mnt/bk-nas-01 nfs defaults 0 0
192.168.200.186:/mnt/BKP-NAS-02/ /mnt/bk-nas-02 nfs defaults 0 0
//192.168.207.199/BACKUP/ /mnt/bk-nas-rotate cifs username=xxx,password=xxxxxx,vers=1.0,uid=34,gid=34 0 0

The .199 one is the one which will freeze the PBS when doing a sync, so I guess the problem is when using CIFS.
Since this particular NAS which I use for cold storage does not have NFS I had to use CIFS
 
Look, I don't mean to tell you that your build is wrong. Lots of other people are making such systems work. Poorly.
So those folks will tell you that NFS mounts are ok.

@Der Harry , on the other hand, is here to tell you, and prove, that PBS .chunks folders should not be mounted that way.
https://forum.proxmox.com/threads/datastore-performance-tester-for-pbs.148694/
(Their team wrote a testing script that you might want to try against your setup. Its not for diagnosis, but could be used that way if you are careful.)

My own experience is entirely NFS, and you indicate its your CIFS mount that's the problem. I think Der Harry's research covered that too.
From their Conclusion: "avoid nfs and samba like the plague - whatever tutorial you read just ignore it - samba is the worst you can use"

I note that as recently as 2 days ago, Proxmox staff handed out advice on how to build this exact 'NFS mounted in the OS and then used as PBS Datastore' thing.
 
Last edited:
  • Like
Reactions: Andrei_W4TL
Look, I don't mean to tell you that your build is wrong. Lots of other people are making such systems work. Poorly.
So those folks will tell you that NFS mounts are ok.

@Der Harry , on the other hand, is here to tell you, and prove, that PBS .chunks folders should not be mounted that way.
https://forum.proxmox.com/threads/datastore-performance-tester-for-pbs.148694/
(Their team wrote a testing script that you might want to try against your setup. Its not for diagnosis, but could be used that way if you are careful.)

My own experience is entirely NFS, and you indicate its your CIFS mount that's the problem. I think Der Harry's research covered that too.
From their Conclusion: "avoid nfs and samba like the plague - whatever tutorial you read just ignore it - samba is the worst you can use"

I note that as recently as 2 days ago, Proxmox staff handed out advice on how to build this exact 'NFS mounted in the OS and then used as PBS Datastore' thing.

Even if you say the build is wrong, I'd accept that since I know I may not have chosen the right path to solve this. I'm looking to improve my build so I'm thankful for any feedback.

I just want to set the record straight, I think I may have used the wrong terminology.

The (offsite) DATASTORE is CIFS. The other 2 NAS, which are also the DATASTORE are NFS.

This is my "Storage" of the PBS:

1728196555821.png

and the datastores are the ones mounted above (from fstab).

Now my question is: What is the right Datastore solution actually? Should it be a local Disk-Array on the PBS instance? So no "remote" storage?

/e thanks for anyone contributing to this thread, I'm thankful for every post, it helps me learn something new and may help others too if they encounter the same problem.

/e2: I think my biggest issue is that I misunderstood the whole process and made a weird setup. I'm backing up to the NAS 1 and then SYNCing this to the other NAS. This is surely a hell of a bottleneck too.

I'll redo my PBS server and install it on bare + attach the disks to that bare metal. So it'll backup directly to the disks and not over the network to the NAS. then I'll SYNC that to the NAS. Just looked at this video and it showed my mistakes: https://www.youtube.com/watch?v=33ubleU4OFc


/e3: Rebuilt the whole setup, now I see how this was meant to be used :D
 
Last edited:
  • Like
Reactions: tcabernoch