VM freezes on when backing up other VM on the node to pbs

Feb 10, 2021
39
5
13
We currently face a critical issue which occured yesterday exactly when the daily backup (all vms in the cluster) was triggered.
As soon as the backup started, one of our production VMs froze (service was not available anymore).
We found out that all I/O was blocked. As soon as a service requested I/O, it hung idefinitely.
There was no slow I/O, it just hung. Only a hard stop and start of the VM did help to recover it.

This resulted in various kernel messages on the affected VMs console:
Code:
CentOS Linux 7 (Core)
Kernel 3.10.0-1160.71.1.e17.x86_64 on an x86_64
localhost login: [128757.960415] INFO: task mariadb:1870 blocked for more than 120 seconds.
[128757.960643] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[128757.961155] INFO: task mariadb:1956 blocked for more than 120 seconds.
[128757.961309] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[128757.961626] INFO: task mariadb:1981 blocked for more than 120 seconds.
[128757.961805] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[128757.962066] INFO: task zabbix_server:2007 blocked for more than 120 seconds.
[128757.962257] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[128757.962543] INFO: task postgres:2908 blocked for more than 120 seconds.
[128757.962708] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[128877.960722] INFO: task mariadb:1870 blocked for more than 120 seconds.
[128877.960893] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[128877.961475] INFO: task mariadb:1956 blocked for more than 120 seconds.
[128877.961697] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[128877.961994] INFO: task mariadb:1981 blocked for more than 120 seconds.
[128877.962173] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[128877.962422] INFO: task zabbix_server:2007 blocked for more than 120 seconds.
[128877.962638] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[128877.962872] INFO: task postgres:2908 blocked for more than 120 seconds.
[128877.963041] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
(There might be some typos as I did type this by hand)

We managed to recreate the freeze on vm-1112 by backing up vm-1102 at 20:08 this evening and found out some details:
  • freeze occurs on backup to our pbs host of a specific VM (id 1102, see logs below)
  • VM 1102 is in stopped state (before and after backup, its just turned off all the time)
  • VM that freezes is on the same node in the cluster (id 1112, see logs below)
  • the host node itself seems to freeze for some time, as there are unexpected messages regarding network interfaces in the journal (see below)
some more info about our cluster:
  • both VMs use our ceph cluster as storage. storage pool (called ssd_pool, see logs below) consists of 6 OSDs of type ssd, distributed over our 3 nodes
  • pbs is on a separate bare metal host. backup datastore is hosted on a zfs mirror of 2 8tb-sas-ssds
  • ceph+cluster network is separated, there is no VM or backup traffic inside this network
  • all servers (pve nodes, pbs host) are HPE G10 servers. Storage controller is configured in IT mode, so the hosts (i.e. zfs / ceph) can directly access the drives
  • there is no high IO delay during backups
  • qm suspend 1112 && qm resume 1112 does not unfreeze I/O
some more info about the problem:
  • there is no entry for scheduled backups in the task log for Svr01. Svr02 and 03 are shown, but the entry for Svr01 is missing despite the backups being created
  • during the freeze of the VM (yesterday, not when recreating the problem), it seemed that the host/node (Svr01) did also freeze, as the switch where the nodes network cables terminate reported the ports being down for
  • guest agent is enabled and running von the freezing VM. Hower backing up the VM itself in snapshot mode to pbs does not result in a freeze, despite the guest agent issuing fs-freeze and fs-thaw commands
  • another freeze occured some weeks ago. As it only affected a non-productive VM we did not do any further research. It was another VM on another node, so the problem seems to affect the whole cluster

journal of the PVE node (Svr01) when recreating the issue:
Code:
Jan 18 20:08:09 Svr01 pvedaemon[2192084]: <mnih@pve> starting task UPID:Svr01:00237358:0138A183:65A97719:vzdump:1102:mnih@pve:
Jan 18 20:08:09 Svr01 pvedaemon[2323288]: INFO: starting new backup job: vzdump 1102 --notes-template '{{guestname}} test' --storage pbs-Svr07 --node Svr01 --notification-mode auto --mode snapshot --remove 0
Jan 18 20:08:09 Svr01 pvedaemon[2323288]: INFO: Starting Backup of VM 1102 (qemu)
Jan 18 20:08:10 Svr01 systemd[1]: Started 1102.scope.
Jan 18 20:08:11 Svr01 kernel: tap1102i0: entered promiscuous mode
Jan 18 20:08:11 Svr01 kernel: vmbr0v680: port 3(fwpr1102p0) entered blocking state
Jan 18 20:08:11 Svr01 kernel: vmbr0v680: port 3(fwpr1102p0) entered disabled state
Jan 18 20:08:11 Svr01 kernel: fwpr1102p0: entered allmulticast mode
Jan 18 20:08:11 Svr01 kernel: fwpr1102p0: entered promiscuous mode
Jan 18 20:08:11 Svr01 kernel: vmbr0v680: port 3(fwpr1102p0) entered blocking state
Jan 18 20:08:11 Svr01 kernel: vmbr0v680: port 3(fwpr1102p0) entered forwarding state
Jan 18 20:08:11 Svr01 kernel: fwbr1102i0: port 1(fwln1102i0) entered blocking state
Jan 18 20:08:11 Svr01 kernel: fwbr1102i0: port 1(fwln1102i0) entered disabled state
Jan 18 20:08:11 Svr01 kernel: fwln1102i0: entered allmulticast mode
Jan 18 20:08:11 Svr01 kernel: fwln1102i0: entered promiscuous mode
Jan 18 20:08:11 Svr01 kernel: fwbr1102i0: port 1(fwln1102i0) entered blocking state
Jan 18 20:08:11 Svr01 kernel: fwbr1102i0: port 1(fwln1102i0) entered forwarding state
Jan 18 20:08:11 Svr01 kernel: fwbr1102i0: port 2(tap1102i0) entered blocking state
Jan 18 20:08:11 Svr01 kernel: fwbr1102i0: port 2(tap1102i0) entered disabled state
Jan 18 20:08:11 Svr01 kernel: tap1102i0: entered allmulticast mode
Jan 18 20:08:11 Svr01 kernel: fwbr1102i0: port 2(tap1102i0) entered blocking state
Jan 18 20:08:11 Svr01 kernel: fwbr1102i0: port 2(tap1102i0) entered forwarding state
Jan 18 20:08:11 Svr01 kernel: tap1102i1: entered promiscuous mode
Jan 18 20:08:11 Svr01 kernel: vmbr0v580: port 3(fwpr1102p1) entered blocking state
Jan 18 20:08:11 Svr01 kernel: vmbr0v580: port 3(fwpr1102p1) entered disabled state
Jan 18 20:08:11 Svr01 kernel: fwpr1102p1: entered allmulticast mode
Jan 18 20:08:11 Svr01 kernel: fwpr1102p1: entered promiscuous mode
Jan 18 20:08:11 Svr01 kernel: vmbr0v580: port 3(fwpr1102p1) entered blocking state
Jan 18 20:08:11 Svr01 kernel: vmbr0v580: port 3(fwpr1102p1) entered forwarding state
Jan 18 20:08:11 Svr01 kernel: fwbr1102i1: port 1(fwln1102i1) entered blocking state
Jan 18 20:08:11 Svr01 kernel: fwbr1102i1: port 1(fwln1102i1) entered disabled state
Jan 18 20:08:11 Svr01 kernel: fwln1102i1: entered allmulticast mode
Jan 18 20:08:11 Svr01 kernel: fwln1102i1: entered promiscuous mode
Jan 18 20:08:11 Svr01 kernel: fwbr1102i1: port 1(fwln1102i1) entered blocking state
Jan 18 20:08:11 Svr01 kernel: fwbr1102i1: port 1(fwln1102i1) entered forwarding state
Jan 18 20:08:11 Svr01 kernel: fwbr1102i1: port 2(tap1102i1) entered blocking state
Jan 18 20:08:11 Svr01 kernel: fwbr1102i1: port 2(tap1102i1) entered disabled state
Jan 18 20:08:11 Svr01 kernel: tap1102i1: entered allmulticast mode
Jan 18 20:08:11 Svr01 kernel: fwbr1102i1: port 2(tap1102i1) entered blocking state
Jan 18 20:08:11 Svr01 kernel: fwbr1102i1: port 2(tap1102i1) entered forwarding state
Jan 18 20:08:12 Svr01 kernel: tap1102i2: entered promiscuous mode
Jan 18 20:08:12 Svr01 kernel: vmbr0v590: port 2(fwpr1102p2) entered blocking state
Jan 18 20:08:12 Svr01 kernel: vmbr0v590: port 2(fwpr1102p2) entered disabled state
Jan 18 20:08:12 Svr01 kernel: fwpr1102p2: entered allmulticast mode
Jan 18 20:08:12 Svr01 kernel: fwpr1102p2: entered promiscuous mode
Jan 18 20:08:12 Svr01 kernel: vmbr0v590: port 2(fwpr1102p2) entered blocking state
Jan 18 20:08:12 Svr01 kernel: vmbr0v590: port 2(fwpr1102p2) entered forwarding state
Jan 18 20:08:12 Svr01 kernel: fwbr1102i2: port 1(fwln1102i2) entered blocking state
Jan 18 20:08:12 Svr01 kernel: fwbr1102i2: port 1(fwln1102i2) entered disabled state
Jan 18 20:08:12 Svr01 kernel: fwln1102i2: entered allmulticast mode
Jan 18 20:08:12 Svr01 kernel: fwln1102i2: entered promiscuous mode
Jan 18 20:08:12 Svr01 kernel: fwbr1102i2: port 1(fwln1102i2) entered blocking state
Jan 18 20:08:12 Svr01 kernel: fwbr1102i2: port 1(fwln1102i2) entered forwarding state
Jan 18 20:08:12 Svr01 kernel: fwbr1102i2: port 2(tap1102i2) entered blocking state
Jan 18 20:08:12 Svr01 kernel: fwbr1102i2: port 2(tap1102i2) entered disabled state
Jan 18 20:08:12 Svr01 kernel: tap1102i2: entered allmulticast mode
Jan 18 20:08:12 Svr01 kernel: fwbr1102i2: port 2(tap1102i2) entered blocking state
Jan 18 20:08:12 Svr01 kernel: fwbr1102i2: port 2(tap1102i2) entered forwarding state
Jan 18 20:08:52 Svr01 pmxcfs[1879]: [status] notice: received log
Jan 18 20:09:02 Svr01 corosync[1911]:   [TOTEM ] Retransmit List: 1445b9
Jan 18 20:09:29 Svr01 pmxcfs[1879]: [status] notice: received log
Jan 18 20:09:30 Svr01 sshd[2324422]: Accepted publickey for root from xxx.xxx.xxx.xxx port 44960 ssh2: RSA SHA256:Dm+KcoGNqZYkml8RCuj37dTWw5FrFRXOi/8YnN8BZ+A
Jan 18 20:09:30 Svr01 sshd[2324422]: pam_unix(sshd:session): session opened for user root(uid=0) by (uid=0)
Jan 18 20:09:30 Svr01 systemd-logind[1501]: New session 110 of user root.
Jan 18 20:09:30 Svr01 systemd[1]: Started session-110.scope - Session 110 of User root.
Jan 18 20:09:30 Svr01 sshd[2324422]: pam_env(sshd:session): deprecated reading of user environment enabled
Jan 18 20:10:16 Svr01 sshd[2324422]: Received disconnect from xxx.xxx.xxx.xxx port 44960:11: disconnected by user
Jan 18 20:10:16 Svr01 sshd[2324422]: Disconnected from user root xxx.xxx.xxx.xxx port 44960
Jan 18 20:10:16 Svr01 sshd[2324422]: pam_unix(sshd:session): session closed for user root
Jan 18 20:10:16 Svr01 systemd-logind[1501]: Session 110 logged out. Waiting for processes to exit.
Jan 18 20:10:16 Svr01 systemd[1]: session-110.scope: Deactivated successfully.
Jan 18 20:10:16 Svr01 systemd-logind[1501]: Removed session 110.
Jan 18 20:10:16 Svr01 pmxcfs[1879]: [status] notice: received log
Jan 18 20:10:16 Svr01 pmxcfs[1879]: [status] notice: received log
Jan 18 20:10:16 Svr01 sshd[2324989]: Accepted publickey for root from xxx.xxx.xxx.xxx port 46386 ssh2: RSA SHA256:Dm+KcoGNqZYkml8RCuj37dTWw5FrFRXOi/8YnN8BZ+A
Jan 18 20:10:16 Svr01 sshd[2324989]: pam_unix(sshd:session): session opened for user root(uid=0) by (uid=0)
Jan 18 20:10:16 Svr01 systemd-logind[1501]: New session 111 of user root.
Jan 18 20:10:16 Svr01 systemd[1]: Started session-111.scope - Session 111 of User root.
Jan 18 20:10:16 Svr01 sshd[2324989]: pam_env(sshd:session): deprecated reading of user environment enabled
Jan 18 20:10:33 Svr01 sshd[2324989]: Received disconnect from xxx.xxx.xxx.xxx port 46386:11: disconnected by user
Jan 18 20:10:33 Svr01 sshd[2324989]: Disconnected from user root xxx.xxx.xxx.xxx port 46386
Jan 18 20:10:33 Svr01 sshd[2324989]: pam_unix(sshd:session): session closed for user root
Jan 18 20:10:33 Svr01 systemd[1]: session-111.scope: Deactivated successfully.
Jan 18 20:10:33 Svr01 systemd-logind[1501]: Session 111 logged out. Waiting for processes to exit.
Jan 18 20:10:33 Svr01 systemd-logind[1501]: Removed session 111.
Jan 18 20:10:33 Svr01 pmxcfs[1879]: [status] notice: received log
Jan 18 20:10:59 Svr01 pmxcfs[1879]: [status] notice: received log
Jan 18 20:10:59 Svr01 sshd[2325444]: Accepted publickey for root from xxx.xxx.xxx.xxx port 36074 ssh2: RSA SHA256:Dm+KcoGNqZYkml8RCuj37dTWw5FrFRXOi/8YnN8BZ+A
Jan 18 20:10:59 Svr01 sshd[2325444]: pam_unix(sshd:session): session opened for user root(uid=0) by (uid=0)
Jan 18 20:10:59 Svr01 systemd-logind[1501]: New session 112 of user root.
Jan 18 20:10:59 Svr01 systemd[1]: Started session-112.scope - Session 112 of User root.
Jan 18 20:10:59 Svr01 sshd[2325444]: pam_env(sshd:session): deprecated reading of user environment enabled
Jan 18 20:11:12 Svr01 kernel: tap1102i0: left allmulticast mode
Jan 18 20:11:12 Svr01 kernel: fwbr1102i0: port 2(tap1102i0) entered disabled state
Jan 18 20:11:12 Svr01 kernel: fwbr1102i0: port 1(fwln1102i0) entered disabled state
Jan 18 20:11:12 Svr01 kernel: vmbr0v680: port 3(fwpr1102p0) entered disabled state
Jan 18 20:11:12 Svr01 kernel: fwln1102i0 (unregistering): left allmulticast mode
Jan 18 20:11:12 Svr01 kernel: fwln1102i0 (unregistering): left promiscuous mode
Jan 18 20:11:12 Svr01 kernel: fwbr1102i0: port 1(fwln1102i0) entered disabled state
Jan 18 20:11:12 Svr01 kernel: fwpr1102p0 (unregistering): left allmulticast mode
Jan 18 20:11:12 Svr01 kernel: fwpr1102p0 (unregistering): left promiscuous mode
Jan 18 20:11:12 Svr01 kernel: vmbr0v680: port 3(fwpr1102p0) entered disabled state
Jan 18 20:11:12 Svr01 kernel: tap1102i1: left allmulticast mode
Jan 18 20:11:12 Svr01 kernel: fwbr1102i1: port 2(tap1102i1) entered disabled state
Jan 18 20:11:12 Svr01 kernel: fwbr1102i1: port 1(fwln1102i1) entered disabled state
Jan 18 20:11:12 Svr01 kernel: vmbr0v580: port 3(fwpr1102p1) entered disabled state
Jan 18 20:11:12 Svr01 kernel: fwln1102i1 (unregistering): left allmulticast mode
Jan 18 20:11:12 Svr01 kernel: fwln1102i1 (unregistering): left promiscuous mode
Jan 18 20:11:12 Svr01 kernel: fwbr1102i1: port 1(fwln1102i1) entered disabled state
Jan 18 20:11:12 Svr01 kernel: fwpr1102p1 (unregistering): left allmulticast mode
Jan 18 20:11:12 Svr01 kernel: fwpr1102p1 (unregistering): left promiscuous mode
Jan 18 20:11:12 Svr01 kernel: vmbr0v580: port 3(fwpr1102p1) entered disabled state
Jan 18 20:11:13 Svr01 kernel: tap1102i2: left allmulticast mode
Jan 18 20:11:13 Svr01 kernel: fwbr1102i2: port 2(tap1102i2) entered disabled state
Jan 18 20:11:13 Svr01 kernel: fwbr1102i2: port 1(fwln1102i2) entered disabled state
Jan 18 20:11:13 Svr01 kernel: vmbr0v590: port 2(fwpr1102p2) entered disabled state
Jan 18 20:11:13 Svr01 kernel: fwln1102i2 (unregistering): left allmulticast mode
Jan 18 20:11:13 Svr01 kernel: fwln1102i2 (unregistering): left promiscuous mode
Jan 18 20:11:13 Svr01 kernel: fwbr1102i2: port 1(fwln1102i2) entered disabled state
Jan 18 20:11:13 Svr01 kernel: fwpr1102p2 (unregistering): left allmulticast mode
Jan 18 20:11:13 Svr01 kernel: fwpr1102p2 (unregistering): left promiscuous mode
Jan 18 20:11:13 Svr01 kernel: vmbr0v590: port 2(fwpr1102p2) entered disabled state
Jan 18 20:11:13 Svr01 qmeventd[1506]: read: Connection reset by peer
Jan 18 20:11:13 Svr01 systemd[1]: 1102.scope: Deactivated successfully.
Jan 18 20:11:13 Svr01 systemd[1]: 1102.scope: Consumed 5min 20.331s CPU time.
Jan 18 20:11:14 Svr01 qmeventd[2325704]: Starting cleanup for 1102
Jan 18 20:11:14 Svr01 qmeventd[2325704]: Finished cleanup for 1102
Jan 18 20:11:14 Svr01 pvedaemon[2323288]: INFO: Finished Backup of VM 1102 (00:03:05)
Jan 18 20:11:14 Svr01 pvedaemon[2323288]: INFO: Backup job finished successfully
Jan 18 20:11:14 Svr01 postfix/pickup[2312866]: 51D234E55E: uid=0 from=<root>
Jan 18 20:11:14 Svr01 pvedaemon[2192084]: <mnih@pve> end task UPID:Svr01:00237358:0138A183:65A97719:vzdump:1102:mnih@pve: OK
Jan 18 20:11:14 Svr01 postfix/cleanup[2325839]: 51D234E55E: message-id=<20240118191114.51D234E55E@Svr01.local>
Jan 18 20:11:14 Svr01 postfix/qmgr[1875]: 51D234E55E: from=<root@Svr01.local>, size=13596, nrcpt=1 (queue active)
Jan 18 20:11:15 Svr01 postfix/smtp[2325841]: 51D234E55E: to=<mnih@somecompany.com>, relay=somecompany-com.mail.protection.outlook.com[aaa.bbb.ccc.ddd]:25, delay=0.74, delays=0.01/0.01/0.14/0.58, dsn=2.6.0, status=sent (250 2.6.0 <20240118191114.51D234E55E@Svr01.local> [InternalId=35377645623599, Hostname=someserver.eurprd07.prod.outlook.com] 21908 bytes in 0.150, 142.055 KB/sec Queued mail for delivery)
 
Last edited:
Additional info

config of the vm that freezes:
Code:
agent: 1
bootdisk: scsi0
cores: 6
memory: 4096
name: vm-1112
net0: e1000=CA:C1:C3:0A:11:F2,bridge=vmbr0,firewall=1,tag=550
numa: 0
ostype: l26
scsi0: ssd_pool:vm-1112-disk-0,aio=threads,discard=on,iothread=1,size=42G,ssd=1
scsihw: virtio-scsi-single
smbios1: uuid=0be06eea-e163-4b7c-a56b-225969e8ae4d
sockets: 2
config of the vm being backed up:
Code:
bootdisk: scsi0
cores: 3
ide2: none,media=cdrom
memory: 8192
name: vm-1102
net0: virtio=6A:68:E9:85:65:52,bridge=vmbr0,firewall=1,tag=680
net1: virtio=46:06:C9:7C:F0:7C,bridge=vmbr0,firewall=1,tag=580
net2: virtio=72:24:77:89:6D:13,bridge=vmbr0,firewall=1,tag=590
numa: 0
ostype: l26
scsi0: ssd_pool:vm-1102-disk-0,discard=on,size=200G,ssd=1
scsihw: virtio-scsi-pci
smbios1: uuid=4729ac55-453a-4fef-866a-702737a81afc
sockets: 2
vmgenid: 880e6312-fda6-492e-a976-14e57fbb4b0c
backup log:
Code:
INFO: starting new backup job: vzdump 1102 --notes-template '{{guestname}} test' --storage pbs-Svr07 --node Svr01 --notification-mode auto --mode snapshot --remove 0
INFO: Starting Backup of VM 1102 (qemu)
INFO: Backup started at 2024-01-18 20:08:09
INFO: status = stopped
INFO: backup mode: stop
INFO: ionice priority: 7
INFO: VM Name: vm-1102
INFO: include disk 'scsi0' 'ssd_pool:vm-1102-disk-0' 200G
INFO: creating Proxmox Backup Server archive 'vm/1102/2024-01-18T19:08:09Z'
INFO: starting kvm to execute backup task
INFO: started backup task 'a56b088d-e24f-4195-84ee-a7da0740aed9'
INFO: scsi0: dirty-bitmap status: created new
INFO:   3% (7.0 GiB of 200.0 GiB) in 3s, read: 2.3 GiB/s, write: 0 B/s
INFO:   4% (9.8 GiB of 200.0 GiB) in 6s, read: 949.3 MiB/s, write: 0 B/s
INFO:   5% (11.0 GiB of 200.0 GiB) in 9s, read: 390.7 MiB/s, write: 0 B/s
INFO:   6% (12.1 GiB of 200.0 GiB) in 12s, read: 386.7 MiB/s, write: 0 B/s
INFO:   7% (14.0 GiB of 200.0 GiB) in 17s, read: 391.2 MiB/s, write: 0 B/s
INFO:   8% (16.3 GiB of 200.0 GiB) in 23s, read: 392.0 MiB/s, write: 0 B/s
INFO:   9% (18.2 GiB of 200.0 GiB) in 28s, read: 389.6 MiB/s, write: 0 B/s
INFO:  10% (20.1 GiB of 200.0 GiB) in 33s, read: 396.0 MiB/s, write: 0 B/s
INFO:  11% (22.1 GiB of 200.0 GiB) in 38s, read: 403.2 MiB/s, write: 0 B/s
INFO:  12% (24.0 GiB of 200.0 GiB) in 43s, read: 395.2 MiB/s, write: 0 B/s
INFO:  13% (26.4 GiB of 200.0 GiB) in 49s, read: 396.0 MiB/s, write: 0 B/s
INFO:  14% (29.8 GiB of 200.0 GiB) in 52s, read: 1.1 GiB/s, write: 0 B/s
INFO:  20% (40.9 GiB of 200.0 GiB) in 55s, read: 3.7 GiB/s, write: 0 B/s
INFO:  22% (45.0 GiB of 200.0 GiB) in 58s, read: 1.4 GiB/s, write: 0 B/s
INFO:  23% (46.4 GiB of 200.0 GiB) in 1m 1s, read: 496.0 MiB/s, write: 0 B/s
INFO:  24% (48.4 GiB of 200.0 GiB) in 1m 6s, read: 395.2 MiB/s, write: 0 B/s
INFO:  26% (52.2 GiB of 200.0 GiB) in 1m 10s, read: 970.0 MiB/s, write: 0 B/s
INFO:  31% (63.5 GiB of 200.0 GiB) in 1m 13s, read: 3.8 GiB/s, write: 0 B/s
INFO:  37% (74.7 GiB of 200.0 GiB) in 1m 16s, read: 3.7 GiB/s, write: 0 B/s
INFO:  42% (85.7 GiB of 200.0 GiB) in 1m 19s, read: 3.7 GiB/s, write: 0 B/s
INFO:  48% (96.9 GiB of 200.0 GiB) in 1m 22s, read: 3.7 GiB/s, write: 0 B/s
INFO:  54% (108.2 GiB of 200.0 GiB) in 1m 25s, read: 3.8 GiB/s, write: 0 B/s
INFO:  59% (119.0 GiB of 200.0 GiB) in 1m 28s, read: 3.6 GiB/s, write: 0 B/s
INFO:  65% (130.1 GiB of 200.0 GiB) in 1m 31s, read: 3.7 GiB/s, write: 0 B/s
INFO:  70% (140.7 GiB of 200.0 GiB) in 1m 34s, read: 3.5 GiB/s, write: 0 B/s
INFO:  75% (150.1 GiB of 200.0 GiB) in 1m 37s, read: 3.1 GiB/s, write: 0 B/s
INFO:  76% (152.2 GiB of 200.0 GiB) in 1m 43s, read: 346.7 MiB/s, write: 0 B/s
INFO:  77% (154.1 GiB of 200.0 GiB) in 1m 48s, read: 397.6 MiB/s, write: 0 B/s
INFO:  78% (156.0 GiB of 200.0 GiB) in 1m 53s, read: 396.0 MiB/s, write: 0 B/s
INFO:  79% (158.3 GiB of 200.0 GiB) in 1m 59s, read: 396.0 MiB/s, write: 0 B/s
INFO:  80% (160.3 GiB of 200.0 GiB) in 2m 4s, read: 396.0 MiB/s, write: 0 B/s
INFO:  81% (162.2 GiB of 200.0 GiB) in 2m 9s, read: 397.6 MiB/s, write: 0 B/s
INFO:  82% (164.3 GiB of 200.0 GiB) in 2m 14s, read: 423.2 MiB/s, write: 0 B/s
INFO:  83% (166.2 GiB of 200.0 GiB) in 2m 19s, read: 388.8 MiB/s, write: 0 B/s
INFO:  84% (168.1 GiB of 200.0 GiB) in 2m 24s, read: 383.2 MiB/s, write: 0 B/s
INFO:  85% (170.4 GiB of 200.0 GiB) in 2m 30s, read: 397.3 MiB/s, write: 0 B/s
INFO:  86% (172.3 GiB of 200.0 GiB) in 2m 35s, read: 400.0 MiB/s, write: 0 B/s
INFO:  87% (174.3 GiB of 200.0 GiB) in 2m 40s, read: 398.4 MiB/s, write: 0 B/s
INFO:  88% (176.1 GiB of 200.0 GiB) in 2m 43s, read: 630.7 MiB/s, write: 0 B/s
INFO:  89% (178.1 GiB of 200.0 GiB) in 2m 48s, read: 400.0 MiB/s, write: 0 B/s
INFO:  91% (182.1 GiB of 200.0 GiB) in 2m 52s, read: 1.0 GiB/s, write: 0 B/s
INFO:  94% (188.1 GiB of 200.0 GiB) in 2m 55s, read: 2.0 GiB/s, write: 0 B/s
INFO:  97% (195.1 GiB of 200.0 GiB) in 2m 58s, read: 2.3 GiB/s, write: 0 B/s
INFO: 100% (200.0 GiB of 200.0 GiB) in 3m, read: 2.4 GiB/s, write: 0 B/s
INFO: backup is sparse: 146.76 GiB (73%) total zero data
INFO: backup was done incrementally, reused 200.00 GiB (100%)
INFO: transferred 200.00 GiB in 180 seconds (1.1 GiB/s)
INFO: stopping kvm after backup task
INFO: adding notes to backup
INFO: Finished Backup of VM 1102 (00:03:05)
INFO: Backup finished at 2024-01-18 20:11:14
INFO: Backup job finished successfully
INFO: notified via target `mail-to-root`
TASK OK
journal on pbs host during backup
Code:
Jan 18 20:08:12 Svr07 proxmox-backup-proxy[2969]: starting new backup on datastore 'pbspool' from ::ffff:xxx.xxx.xxx.xxx: "vm/1102/2024-01-18T19:08:09Z"
Jan 18 20:08:12 Svr07 proxmox-backup-proxy[2969]: download 'index.json.blob' from previous backup.
Jan 18 20:08:12 Svr07 proxmox-backup-proxy[2969]: register chunks in 'drive-scsi0.img.fidx' from previous backup.
Jan 18 20:08:12 Svr07 proxmox-backup-proxy[2969]: download 'drive-scsi0.img.fidx' from previous backup.
Jan 18 20:08:12 Svr07 proxmox-backup-proxy[2969]: created new fixed index 1 ("vm/1102/2024-01-18T19:08:09Z/drive-scsi0.img.fidx")
Jan 18 20:08:12 Svr07 proxmox-backup-proxy[2969]: add blob "/ssdpool/pbspool/vm/1102/2024-01-18T19:08:09Z/qemu-server.conf.blob" (393 bytes, comp: 393)
Jan 18 20:08:21 Svr07 proxmox-backup-proxy[2969]: error during snapshot file listing: 'unable to load blob '"/ssdpool/pbspool/vm/1102/2024-01-18T19:08:09Z/index.json.blob"' - No such file or directory (os error 2)'
Jan 18 20:11:11 Svr07 proxmox-backup-proxy[2969]: Upload statistics for 'drive-scsi0.img.fidx'
Jan 18 20:11:11 Svr07 proxmox-backup-proxy[2969]: UUID: a89ac3fa56bf4aa6bb8bb5caaee6e5f1
Jan 18 20:11:11 Svr07 proxmox-backup-proxy[2969]: Checksum: feec2b3fa46161b988c44feb5e5918c676fc1b816759c5d159eef71695c029b7
Jan 18 20:11:11 Svr07 proxmox-backup-proxy[2969]: Size: 214748364800
Jan 18 20:11:11 Svr07 proxmox-backup-proxy[2969]: Chunk count: 51200
Jan 18 20:11:11 Svr07 proxmox-backup-proxy[2969]: Upload size: 4194304 (0%)
Jan 18 20:11:11 Svr07 proxmox-backup-proxy[2969]: Duplicates: 51199+1 (100%)
Jan 18 20:11:11 Svr07 proxmox-backup-proxy[2969]: Compression: 0%
Jan 18 20:11:11 Svr07 proxmox-backup-proxy[2969]: successfully closed fixed index 1
Jan 18 20:11:11 Svr07 proxmox-backup-proxy[2969]: add blob "/ssdpool/pbspool/vm/1102/2024-01-18T19:08:09Z/index.json.blob" (329 bytes, comp: 329)
Jan 18 20:11:11 Svr07 proxmox-backup-[2969]: Svr07 proxmox-backup-proxy[2969]: syncing filesystem
Jan 18 20:11:11 Svr07 proxmox-backup-proxy[2969]: successfully finished backup
Jan 18 20:11:11 Svr07 proxmox-backup-proxy[2969]: backup finished successfully
Jan 18 20:11:11 Svr07 proxmox-backup-proxy[2969]: TASK OK
Jan 18 20:11:14 Svr07 proxmox-backup-proxy[2969]: Upload backup log to datastore 'pbspool', root namespace vm/1102/2024-01-18T19:08:09Z/client.log.blob
Jan 18 20:11:14 Svr07 proxmox-backup-[2969]: Svr07 proxmox-backup-proxy[2969]: write rrd data back to disk
Jan 18 20:11:14 Svr07 proxmox-backup-[2969]: Svr07 proxmox-backup-proxy[2969]: starting rrd data sync
Jan 18 20:11:14 Svr07 proxmox-backup-[2969]: Svr07 proxmox-backup-proxy[2969]: rrd journal successfully committed (33 files in 0.008 seconds)
(despite the error pbs verified the backup successfully)
pveversion -v
Code:
proxmox-ve: 8.1.0 (running kernel: 6.5.11-7-pve)
pve-manager: 8.1.3 (running version: 8.1.3/b46aac3b42da5d15)
proxmox-kernel-helper: 8.1.0
pve-kernel-6.2: 8.0.5
proxmox-kernel-6.5: 6.5.11-7
proxmox-kernel-6.5.11-7-pve-signed: 6.5.11-7
proxmox-kernel-6.5.11-4-pve-signed: 6.5.11-4
proxmox-kernel-6.2.16-20-pve: 6.2.16-20
proxmox-kernel-6.2: 6.2.16-20
pve-kernel-6.2.16-3-pve: 6.2.16-3
ceph: 17.2.7-pve1
ceph-fuse: 17.2.7-pve1
corosync: 3.1.7-pve3
criu: 3.17.1-2
glusterfs-client: 10.3-5
ifupdown2: 3.2.0-1+pmx8
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-4
libknet1: 1.28-pve1
libproxmox-acme-perl: 1.5.0
libproxmox-backup-qemu0: 1.4.1
libproxmox-rs-perl: 0.3.3
libpve-access-control: 8.0.7
libpve-apiclient-perl: 3.3.1
libpve-common-perl: 8.1.0
libpve-guest-common-perl: 5.0.6
libpve-http-server-perl: 5.0.5
libpve-network-perl: 0.9.5
libpve-rs-perl: 0.8.7
libpve-storage-perl: 8.0.5
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 5.0.2-4
lxcfs: 5.0.3-pve4
novnc-pve: 1.4.0-3
proxmox-backup-client: 3.1.2-1
proxmox-backup-file-restore: 3.1.2-1
proxmox-kernel-helper: 8.1.0
proxmox-mail-forward: 0.2.2
proxmox-mini-journalreader: 1.4.0
proxmox-widget-toolkit: 4.1.3
pve-cluster: 8.0.5
pve-container: 5.0.8
pve-docs: 8.1.3
pve-edk2-firmware: 4.2023.08-2
pve-firewall: 5.0.3
pve-firmware: 3.9-1
pve-ha-manager: 4.0.3
pve-i18n: 3.1.5
pve-qemu-kvm: 8.1.2-6
pve-xtermjs: 5.3.0-3
qemu-server: 8.0.10
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.2.2-pve1
ceph status:
Code:
  cluster:
    id:     9c3b2a84-c61b-4b81-b942-6e0e8827ced0
    health: HEALTH_OK
  services:
    mon: 3 daemons, quorum Svr01,Svr02,Svr03 (age 2d)
    mgr: Svr02(active, since 2d), standbys: Svr01, Svr03
    mds: 1/1 daemons up, 2 standby
    osd: 18 osds: 18 up (since 2d), 18 in (since 11w)
  data:
    volumes: 1/1 healthy
    pools:   4 pools, 897 pgs
    objects: 325.17k objects, 1.2 TiB
    usage:   3.7 TiB used, 33 TiB / 37 TiB avail
    pgs:     897 active+clean
  io:
    client:   150 MiB/s rd, 192 MiB/s wr, 541 op/s rd, 1.14k op/s wr
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!