Linux Completely hung on Proxmox VM Backup to NAS

bignay2000

New Member
Aug 17, 2023
10
0
1
Feb 29 22:00 Backup of a VM to NFS share (hosted by FreeNAS Scale VM on a different Proxmox host) caused the whole computer (BeeLink GTR7Pro AMD Ryzen 9 7940 HS) running the latest promox to hang up.

I was on vacation when this occurred. Came back home on Mar 03, the computer was powered on, but no HDMI output. Uptime Kuma monitoring shows all VM's went offline right at 22:00. Had to hold the power button down to power cycle the machine. Machine booted normally.

Seems that there should at least be a ERROR in the journal log.

Linux gtr7pro 6.5.13-1-pve #1 SMP PREEMPT_DYNAMIC PMX 6.5.13-1 (2024-02-05T13:50Z) x86_64 GNU/Linux

Journalctl logs:

Code:
Feb 29 21:17:01 gtr7pro CRON[48401]: pam_unix(cron:session): session closed for user root
Feb 29 21:21:17 gtr7pro pmxcfs[1470]: [dcdb] notice: data verification successful
Feb 29 22:00:01 gtr7pro pvescheduler[55063]: <root@pam> starting task UPID:gtr7pro:0000D718:001F0612:65E144B1:vzdump::root@pam:
Feb 29 22:00:01 gtr7pro pvescheduler[55064]: INFO: starting new backup job: vzdump --node gtr7pro --mode snapshot --mailnotification failure --all 1 --notes-template '{{guestname}}' --prune-backups 'keep-last=2' --quiet 1 --storage nas.abc123.net --mailto abc123@abc123.com --compress zstd
Feb 29 22:00:01 gtr7pro pvescheduler[55064]: INFO: Starting Backup of VM 106 (qemu)
Feb 29 22:00:01 gtr7pro nfsrahead[55071]: setting /mnt/pve/nas.abc123.net/dump readahead to 128
Feb 29 22:00:03 gtr7pro kernel: hrtimer: interrupt took 5190 ns
-- Boot e152907bcc1d4de29fd9855eff243e57 --
Mar 03 18:12:38 gtr7pro kernel: Linux version 6.5.13-1-pve (build@proxmox) (gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40) #1 SMP PREEMPT_DYNAMIC PMX 6.5.13-1 (2024-02-05T13:50Z) ()
Mar 03 18:12:38 gtr7pro kernel: Command line: initrd=\EFI\proxmox\6.5.13-1-pve\initrd.img-6.5.13-1-pve root=ZFS=rpool/ROOT/pve-1 boot=zfs
Mar 03 18:12:38 gtr7pro kernel: KERNEL supported cpus:
Mar 03 18:12:38 gtr7pro kernel:   Intel GenuineIntel
Mar 03 18:12:38 gtr7pro kernel:   AMD AuthenticAMD



Confirmed I am on the lastest release

Code:
root@gtr7pro:~# apt update
Hit:1 http://security.debian.org bookworm-security InRelease
Hit:2 http://ftp.us.debian.org/debian bookworm InRelease
Hit:3 http://ftp.us.debian.org/debian bookworm-updates InRelease
Hit:4 http://download.proxmox.com/debian/pve bookworm InRelease
Hit:5 http://download.proxmox.com/debian/ceph-quincy bookworm InRelease
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
All packages are up to date.
root@gtr7pro:~# apt dist-upgrade
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
Calculating upgrade... Done
The following packages were automatically installed and are no longer required:
  proxmox-kernel-6.5.11-6-pve-signed proxmox-kernel-6.5.11-7-pve-signed
Use 'apt autoremove' to remove them.
0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.


Code:
root@gtr7pro:~# df -h
Filesystem                                    Size  Used Avail Use% Mounted on
udev                                           46G     0   46G   0% /dev
tmpfs                                         9.1G  1.8M  9.1G   1% /run
rpool/ROOT/pve-1                              1.2T  503G  666G  44% /
tmpfs                                          46G   63M   45G   1% /dev/shm
tmpfs                                         5.0M     0  5.0M   0% /run/lock
efivarfs                                      128K   28K   96K  23% /sys/firmware/efi/efivars
rpool                                         666G  128K  666G   1% /rpool
rpool/ROOT                                    666G  128K  666G   1% /rpool/ROOT
rpool/data                                    666G  128K  666G   1% /rpool/data
/dev/fuse                                     128M   32K  128M   1% /etc/pve
nas.abc123.net:/mnt/nvme/vmbackups  912G  239G  674G  27% /mnt/pve/nas.abc123.net
tmpfs                                         9.1G     0  9.1G   0% /run/user/1000

Code:
root@gtr7pro:~# pveversion -v
proxmox-ve: 8.1.0 (running kernel: 6.5.13-1-pve)
pve-manager: 8.1.4 (running version: 8.1.4/ec5affc9e41f1d79)
proxmox-kernel-helper: 8.1.0
pve-kernel-6.2: 8.0.5
proxmox-kernel-6.5.13-1-pve-signed: 6.5.13-1
proxmox-kernel-6.5: 6.5.13-1
proxmox-kernel-6.5.11-8-pve-signed: 6.5.11-8
proxmox-kernel-6.5.11-7-pve-signed: 6.5.11-7
proxmox-kernel-6.5.11-6-pve-signed: 6.5.11-6
proxmox-kernel-6.2.16-20-pve: 6.2.16-20
proxmox-kernel-6.2: 6.2.16-20
pve-kernel-6.2.16-3-pve: 6.2.16-3
ceph-fuse: 17.2.7-pve2
corosync: 3.1.7-pve3
criu: 3.17.1-2
glusterfs-client: 10.3-5
ifupdown2: 3.2.0-1+pmx8
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-4
libknet1: 1.28-pve1
libproxmox-acme-perl: 1.5.0
libproxmox-backup-qemu0: 1.4.1
libproxmox-rs-perl: 0.3.3
libpve-access-control: 8.1.1
libpve-apiclient-perl: 3.3.1
libpve-common-perl: 8.1.0
libpve-guest-common-perl: 5.0.6
libpve-http-server-perl: 5.0.5
libpve-network-perl: 0.9.5
libpve-rs-perl: 0.8.8
libpve-storage-perl: 8.0.5
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 5.0.2-4
lxcfs: 5.0.3-pve4
novnc-pve: 1.4.0-3
proxmox-backup-client: 3.1.4-1
proxmox-backup-file-restore: 3.1.4-1
proxmox-kernel-helper: 8.1.0
proxmox-mail-forward: 0.2.3
proxmox-mini-journalreader: 1.4.0
proxmox-widget-toolkit: 4.1.3
pve-cluster: 8.0.5
pve-container: 5.0.8
pve-docs: 8.1.3
pve-edk2-firmware: 4.2023.08-4
pve-firewall: 5.0.3
pve-firmware: 3.9-2
pve-ha-manager: 4.0.3
pve-i18n: 3.2.0
pve-qemu-kvm: 8.1.5-3
pve-xtermjs: 5.3.0-3
qemu-server: 8.0.10
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.2.2-pve2
 
Last edited:
Crashed again @ 22:00. I had the monitor hooked up and got this picture of the crash
 

Attachments

  • GTR7Pro_Proxmox_backup_crash.jpg
    GTR7Pro_Proxmox_backup_crash.jpg
    523.9 KB · Views: 6
Appears to not be handling errors around "unable to activate storage" gracefully and/or retrying to quickly.
Maybe adding a 5 second wait to retry?

Code:
Mar 03 22:01:23 gtr7pro pvestatd[1647]: got timeout
Mar 03 22:01:23 gtr7pro pvestatd[1647]: unable to activate storage 'nas.hivetechnologies.net' - directory '/mnt/pve/nas.hivetechnologies.net' does not exist or is unreachable
Mar 03 22:01:33 gtr7pro pvestatd[1647]: got timeout
Mar 03 22:01:33 gtr7pro pvestatd[1647]: unable to activate storage 'nas.hivetechnologies.net' - directory '/mnt/pve/nas.hivetechnologies.net' does not exist or is unreachable
 

Attachments

  • 20240303_gtr7pro_crash_backup_VM.log
    192.7 KB · Views: 2
Have not been able to reproduce this issue on Proxmox Virtual Environment 8.2.2 / Linux gtr7pro 6.8.4-3-pve #1 SMP PREEMPT_DYNAMIC PMX 6.8.4-3 (2024-05-02T11:55Z) x86_64 GNU/Linux
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!