[SOLVED] Backup gets stuck/fails without any error message

KirikParty

New Member
Apr 8, 2024
12
1
3
I am very new to proxmox and Linux and hoping someone can help be debug an issue that I am having from the last 2 weeks.

I have a proxmox server with 3VM's and 1 CT. I have a SATA SSD connected to the server that only holds daily backups and ISO's or CT templates etc.
I have configured the backup to run everyday at 1AM that backup all 3 VM's (100, 101, 105) and the CT (104).

When the backup task reaches the CT, it gets stuck at "create storage snapshot 'vzdump'". Its stuck there for more than 5-6 hours and nothing happens. There is no error message or otherwise. I have a few logs below. Reading the logs, I cannot see anything that stands out to be an issue.
Below is the log of /var/log/vzdump/lxc-104.log

Code:
2024-04-08 01:02:08 INFO: Starting Backup of VM 104 (lxc)
2024-04-08 01:02:08 INFO: status = running
2024-04-08 01:02:08 INFO: CT Name: Jellyfin
2024-04-08 01:02:08 INFO: including mount point rootfs ('/') in backup
2024-04-08 01:02:08 INFO: backup mode: snapshot
2024-04-08 01:02:08 INFO: ionice priority: 7
2024-04-08 01:02:08 INFO: create storage snapshot 'vzdump'


The CT in question is a privileged Jellyfin container running Debian 12 CT template. The backup and the container have been working correctly for the past 1 year, but having issues in the last 2 weeks.
Jellyfin itself has some issues (playback issues) that might have started at the same time as the backup issues, but I know how they can be connected. There were no config or permission changes to anything and it was all working correctly for the last 1 year.

The only thing I can see changing is I do update both the jellyfin CT and pve everyweek using apt.

journalctl output from around 1AM below.

Code:
Apr 08 01:00:02 pve pvescheduler[93908]: <root@pam> starting task UPID:pve:00016ED5:0029E22E:6612B4F2:vzdump::root@pam:
Apr 08 01:00:02 pve pvescheduler[93909]: INFO: starting new backup job: vzdump 100 101 105 104 --notes-template '{{guestname}}' --compress zstd --qu>
Apr 08 01:00:04 pve pvescheduler[93909]: INFO: Starting Backup of VM 100 (qemu)
Apr 08 01:00:56 pve pvescheduler[93909]: INFO: Finished Backup of VM 100 (00:00:54)
Apr 08 01:00:56 pve pvescheduler[93909]: INFO: Starting Backup of VM 101 (qemu)
Apr 08 01:02:08 pve pvescheduler[93909]: INFO: Finished Backup of VM 101 (00:01:12)
Apr 08 01:02:08 pve pvescheduler[93909]: INFO: Starting Backup of VM 104 (lxc)
Apr 08 01:17:01 pve CRON[96051]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Apr 08 01:17:01 pve CRON[96052]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
Apr 08 01:17:01 pve CRON[96051]: pam_unix(cron:session): session closed for user root
Apr 08 01:18:03 pve systemd[1]: Starting fstrim.service - Discard unused blocks on filesystems from /etc/fstab...
Apr 08 01:18:15 pve fstrim[96168]: /mnt/local-disk: 416.1 GiB (446806654976 bytes) trimmed on /dev/nvme2n1p1
Apr 08 01:18:15 pve fstrim[96168]: /boot/efi: 1021.6 MiB (1071276032 bytes) trimmed on /dev/nvme1n1p2
Apr 08 01:18:15 pve fstrim[96168]: /: 84.7 GiB (90944671744 bytes) trimmed on /dev/pve/root
Apr 08 01:18:15 pve systemd[1]: fstrim.service: Deactivated successfully.
Apr 08 01:18:15 pve systemd[1]: Finished fstrim.service - Discard unused blocks on filesystems from /etc/fstab.


I woke up at 6am and see that its stuck and tried to revive by killing the backup process etc. but nothing helps. The only thing i have do it connect via SSH and reboot the whole server.

Code:
Apr 08 06:11:53 pve pvedaemon[1783]: <root@pam> successful auth for user 'root@pam'
Apr 08 06:13:22 pve pveproxy[82517]: proxy detected vanished client connection
Apr 08 06:13:43 pve pvedaemon[1784]: <root@pam> starting task UPID:pve:0001FDCB:00469A1F:6612FE77:aptupdate::root@pam:
Apr 08 06:13:45 pve pvedaemon[130507]: update new package list: /var/lib/pve-manager/pkgupdates
Apr 08 06:13:46 pve pvedaemon[1784]: <root@pam> end task UPID:pve:0001FDCB:00469A1F:6612FE77:aptupdate::root@pam: OK
Apr 08 06:17:01 pve CRON[131451]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Apr 08 06:17:01 pve CRON[131452]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
Apr 08 06:17:01 pve CRON[131451]: pam_unix(cron:session): session closed for user root
Apr 08 06:25:01 pve CRON[132368]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Apr 08 06:25:01 pve CRON[132369]: (root) CMD (test -x /usr/sbin/anacron || { cd / && run-parts --report /etc/cron.daily; })
Apr 08 06:25:01 pve CRON[132368]: pam_unix(cron:session): session closed for user root
Apr 08 06:44:05 pve pvedaemon[1783]: <root@pam> successful auth for user 'root@pam'
Apr 08 06:44:33 pve sshd[134648]: Accepted password for root from 10.0.0.23 port 37016 ssh2
Apr 08 06:44:33 pve sshd[134648]: pam_unix(sshd:session): session opened for user root(uid=0) by (uid=0)
Apr 08 06:44:33 pve systemd[1]: Created slice user-0.slice - User Slice of UID 0.
Apr 08 06:44:33 pve systemd[1]: Starting user-runtime-dir@0.service - User Runtime Directory /run/user/0...
Apr 08 06:44:33 pve systemd-logind[1408]: New session 33 of user root.
Apr 08 06:44:33 pve systemd[1]: Finished user-runtime-dir@0.service - User Runtime Directory /run/user/0.
Apr 08 06:44:33 pve systemd[1]: Starting user@0.service - User Manager for UID 0...
Apr 08 06:44:33 pve (systemd)[134651]: pam_unix(systemd-user:session): session opened for user root(uid=0) by (uid=0)
Apr 08 06:44:33 pve systemd[134651]: Queued start job for default target default.target.
Apr 08 06:44:33 pve systemd[134651]: Created slice app.slice - User Application Slice.
Apr 08 06:44:33 pve systemd[134651]: Reached target paths.target - Paths.
Apr 08 06:44:33 pve systemd[134651]: Reached target timers.target - Timers.
Apr 08 06:44:33 pve systemd[134651]: Listening on dirmngr.socket - GnuPG network certificate management daemon.
Apr 08 06:44:33 pve systemd[134651]: Listening on gpg-agent-browser.socket - GnuPG cryptographic agent and passphrase cache (access for web browsers>
Apr 08 06:44:33 pve systemd[134651]: Listening on gpg-agent-extra.socket - GnuPG cryptographic agent and passphrase cache (restricted).
Apr 08 06:44:33 pve systemd[134651]: Listening on gpg-agent-ssh.socket - GnuPG cryptographic agent (ssh-agent emulation).
Apr 08 06:44:33 pve systemd[134651]: Listening on gpg-agent.socket - GnuPG cryptographic agent and passphrase cache.
Apr 08 06:44:33 pve systemd[134651]: Reached target sockets.target - Sockets.
Apr 08 06:44:33 pve systemd[134651]: Reached target basic.target - Basic System.
Apr 08 06:44:33 pve systemd[134651]: Reached target default.target - Main User Target.
Apr 08 06:44:33 pve systemd[134651]: Startup finished in 142ms.
Apr 08 06:44:33 pve systemd[1]: Started user@0.service - User Manager for UID 0.
Apr 08 06:44:33 pve systemd[1]: Started session-33.scope - Session 33 of User root.
Apr 08 06:44:33 pve sshd[134648]: pam_env(sshd:session): deprecated reading of user environment enabled
Apr 08 06:45:23 pve pvedaemon[1783]: <root@pam> successful auth for user 'root@pam'
Apr 08 06:46:04 pve pveproxy[82518]: proxy detected vanished client connection
Apr 08 06:46:29 pve pveproxy[82519]: proxy detected vanished client connection
Apr 08 06:46:34 pve pveproxy[82518]: proxy detected vanished client connection
Apr 08 06:46:34 pve pveproxy[82518]: proxy detected vanished client connection
Apr 08 06:47:04 pve pveproxy[82518]: proxy detected vanished client connection
Apr 08 06:49:00 pve pveproxy[82518]: problem with client ::ffff:10.0.0.23; Connection reset by peer
Apr 08 06:49:03 pve pveproxy[82518]: proxy detected vanished client connection
Apr 08 06:50:01 pve pveproxy[82518]: problem with client ::ffff:10.0.0.23; Connection reset by peer
Apr 08 06:50:29 pve pveproxy[82517]: proxy detected vanished client connection
Apr 08 06:50:29 pve pveproxy[82518]: proxy detected vanished client connection
Apr 08 06:51:25 pve pct[135458]: requesting reboot of CT 104: UPID:pve:00021122:004A0D9C:6613074D:vzreboot:104:root@pam:
Apr 08 06:51:25 pve pct[135457]: <root@pam> starting task UPID:pve:00021122:004A0D9C:6613074D:vzreboot:104:root@pam:
Apr 08 06:52:39 pve pveproxy[82517]: proxy detected vanished client connection
Apr 08 06:53:59 pve pveproxy[82517]: problem with client ::ffff:10.0.0.23; Connection reset by peer
Apr 08 06:54:29 pve pveproxy[82517]: proxy detected vanished client connection
Apr 08 06:54:29 pve pveproxy[82519]: problem with client ::ffff:10.0.0.23; Connection reset by peer
Apr 08 06:54:58 pve pveproxy[82519]: proxy detected vanished client connection
Apr 08 07:03:30 pve pveproxy[82519]: proxy detected vanished client connection
Apr 08 07:04:24 pve sshd[136935]: Accepted password for root from 10.0.0.23 port 39112 ssh2
Apr 08 07:04:24 pve sshd[136935]: pam_unix(sshd:session): session opened for user root(uid=0) by (uid=0)
Apr 08 07:04:24 pve systemd-logind[1408]: New session 35 of user root.
Apr 08 07:04:24 pve systemd[1]: Started session-35.scope - Session 35 of User root.
Apr 08 07:04:24 pve sshd[136935]: pam_env(sshd:session): deprecated reading of user environment enabled
Apr 08 07:04:26 pve sshd[134648]: pam_unix(sshd:session): session closed for user root
Apr 08 07:04:26 pve systemd[1]: session-33.scope: Deactivated successfully.
Apr 08 07:04:26 pve systemd[1]: session-33.scope: Consumed 2.591s CPU time.
Apr 08 07:04:26 pve systemd-logind[1408]: Session 33 logged out. Waiting for processes to exit.
Apr 08 07:04:26 pve systemd-logind[1408]: Removed session 33.
Apr 08 07:05:49 pve pveproxy[82517]: proxy detected vanished client connection
Apr 08 07:06:13 pve pct[137163]: <root@pam> starting task UPID:pve:000217CC:004B68B7:66130AC5:vzstop:104:root@pam:
Apr 08 07:06:13 pve pct[137164]: stopping CT 104: UPID:pve:000217CC:004B68B7:66130AC5:vzstop:104:root@pam:
Apr 08 07:15:39 pve pveproxy[82517]: proxy detected vanished client connection
Apr 08 07:15:43 pve pveproxy[82518]: proxy detected vanished client connection
Apr 08 07:15:43 pve pveproxy[82518]: proxy detected vanished client connection
Apr 08 07:15:49 pve pveproxy[82518]: proxy detected vanished client connection
Apr 08 07:15:49 pve pveproxy[82519]: proxy detected vanished client connection
Apr 08 07:15:51 pve pveproxy[82517]: proxy detected vanished client connection
Apr 08 07:16:09 pve pveproxy[82517]: proxy detected vanished client connection
Apr 08 07:16:13 pve pveproxy[82518]: proxy detected vanished client connection
Apr 08 07:16:13 pve pveproxy[82519]: proxy detected vanished client connection
Apr 08 07:16:15 pve pveproxy[82519]: proxy detected vanished client connection
Apr 08 07:16:17 pve pveproxy[82517]: proxy detected vanished client connection
Apr 08 07:16:17 pve pveproxy[82518]: proxy detected vanished client connection
Apr 08 07:17:01 pve CRON[138420]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Apr 08 07:17:01 pve CRON[138421]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
Apr 08 07:17:01 pve CRON[138420]: pam_unix(cron:session): session closed for user root
Apr 08 07:17:12 pve pveproxy[82518]: proxy detected vanished client connection
Apr 08 07:17:14 pve pveproxy[82518]: proxy detected vanished client connection




Please let me know if there are any other logs I should be looking at? any help is greatly appreciated.
 
  • Like
Reactions: proximoxi2
Hi,
please share the container configuration pct config <VMID> of the container in question. Do you have FUSE mountpoint inside the container? Please not that in that case the use of stop mode backups is recommended.
 
Hi,
please share the container configuration pct config <VMID> of the container in question. Do you have FUSE mountpoint inside the container? Please not that in that case the use of stop mode backups is recommended.

please see below. I don't think I am using FUSE mountpoint.

Code:
root@pve:~# cat /etc/pve/lxc/104.conf
arch: amd64
cores: 2
features: nesting=1
hostname: Jellyfin
memory: 4096
net0: name=eth0,bridge=vmbr0,gw=10.0.0.1,hwaddr=c6:53:77:5a:b7:ce,ip=10.0.0.108/24,type=veth
onboot: 1
ostype: debian
protection: 1
rootfs: local-lvm:vm-104-disk-0,size=16G
startup: order=3
swap: 512
lxc.cgroup2.devices.allow: c 226:0 rwm
lxc.cgroup2.devices.allow: c 226:128 rwm
lxc.mount.entry: /dev/dri/card0 dev/dri/card0 none bind,optional,create=file
lxc.mount.entry: /dev/dri/renderD128 dev/dri/renderD128 none bind,optional,create=file


[vzdump]
#vzdump backup snapshot
arch: amd64
cores: 2
features: nesting=1
hostname: Jellyfin
memory: 4096
net0: name=eth0,bridge=vmbr0,gw=10.0.0.1,hwaddr=c6:53:77:5a:b7:ce,ip=10.0.0.108/24,type=veth
onboot: 1
ostype: debian
protection: 1
rootfs: local-lvm:vm-104-disk-0,size=16G
snapstate: prepare
snaptime: 1712502128
startup: order=3
swap: 512
lxc.cgroup2.devices.allow: c 226:0 rwm
lxc.cgroup2.devices.allow: c 226:128 rwm
lxc.mount.entry: /dev/dri/card0 dev/dri/card0 none bind,optional,create=file
lxc.mount.entry: /dev/dri/renderD128 dev/dri/renderD128 none bind,optional,create=file
 
Then AFAIK you shouldn't be showing any : proxy detected vanished client connection

Had cluster in the past?

What does this show:
Code:
cat /etc/pve/corosync.conf
Strange. i never had a cluster.
please se output below:
Code:
root@pve:~# cat /etc/pve/corosync.conf
cat: /etc/pve/corosync.conf: No such file or directory
root@pve:~#
 
Are you experiencing problems with WebUI (GUI) response/refresh? Possibly this would explain those messages?
 
please see below. I don't think I am using FUSE mountpoint.

Code:
root@pve:~# cat /etc/pve/lxc/104.conf
arch: amd64
cores: 2
features: nesting=1
hostname: Jellyfin
memory: 4096
net0: name=eth0,bridge=vmbr0,gw=10.0.0.1,hwaddr=c6:53:77:5a:b7:ce,ip=10.0.0.108/24,type=veth
onboot: 1
ostype: debian
protection: 1
rootfs: local-lvm:vm-104-disk-0,size=16G
startup: order=3
swap: 512
lxc.cgroup2.devices.allow: c 226:0 rwm
lxc.cgroup2.devices.allow: c 226:128 rwm
lxc.mount.entry: /dev/dri/card0 dev/dri/card0 none bind,optional,create=file
lxc.mount.entry: /dev/dri/renderD128 dev/dri/renderD128 none bind,optional,create=file


[vzdump]
#vzdump backup snapshot
arch: amd64
cores: 2
features: nesting=1
hostname: Jellyfin
memory: 4096
net0: name=eth0,bridge=vmbr0,gw=10.0.0.1,hwaddr=c6:53:77:5a:b7:ce,ip=10.0.0.108/24,type=veth
onboot: 1
ostype: debian
protection: 1
rootfs: local-lvm:vm-104-disk-0,size=16G
snapstate: prepare
snaptime: 1712502128
startup: order=3
swap: 512
lxc.cgroup2.devices.allow: c 226:0 rwm
lxc.cgroup2.devices.allow: c 226:128 rwm
lxc.mount.entry: /dev/dri/card0 dev/dri/card0 none bind,optional,create=file
lxc.mount.entry: /dev/dri/renderD128 dev/dri/renderD128 none bind,optional,create=file
Please post the output of ps auxwf while the vzdump backup hangs, this might tell us more about what is going on.

Edit: Also have a look at this thread https://forum.proxmox.com/threads/c...t-vzdump-when-doing-a-snapshot-backup.117949/
 
Last edited:
Are you experiencing problems with WebUI (GUI) response/refresh? Possibly this would explain those messages?
Yes I am. This happens after the CT backup hangs. UI is very slow and unresponsive sometimes. I have to login via SSH and reboot there server for it to work properly.
 
If your LXC is stopped & then you start the backup, does the backup complete correctly?
 
If your LXC is stopped & then you start the backup, does the backup complete correctly?
I will have to try that. Even when the LXC is running, clicking stop or shutdown from the UI doesn't work and I usually get a timeout error or unable to acquire lock error etc. Only way to stop the LXC is to use --skiplock or to kill the process via SSH.

For last night, I moved the LXC backup to a different time and to a seperate task and it completed successfully without any issues.
Will see if it completes correctly again tonight or not.

Will report back.
 
Please post the output of ps auxwf while the vzdump backup hangs, this might tell us more about what is going on.

Edit: Also have a look at this thread https://forum.proxmox.com/threads/c...t-vzdump-when-doing-a-snapshot-backup.117949/
I will and report back. As in the above reply. I moved the LXC backup to a different time and to a seperate backup task and it completed correctly yesterday night.

I am not sure if jellyfin was doing something around the sametime as the backup which would cause the backup to fail? Does that ever happen?
 
Even when the LXC is running, clicking stop or shutdown from the UI doesn't work and I usually get a timeout error or unable to acquire lock error etc. Only way to stop the LXC is to use --skiplock or to kill the process via SSH.
Well this is enough reason why your backup freezes/fails.
Your going to have to first work out how to get this LXC "in order" & stable.
Can you shutdown this LXC gracefully - from within the LXC - simple cli shutdown command?
 
Well this is enough reason why your backup freezes/fails.
Your going to have to first work out how to get this LXC "in order" & stable.
Can you shutdown this LXC gracefully - from within the LXC - simple cli shutdown command?
Thank you.
There was an update to proxmox kernel and lxc yesterday. After the update and reboot everything is working correctly now.
I can gracefully shut down LXC and backups complete successfully.
I cannot replicate the issue anymore. I will monitor for a few more days.

Appreciate your help.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!