Disk becomes read only after failed backup

dorianim

New Member
Feb 23, 2022
1
0
1
21
Germany
dorian.im
Hey there,

I recently set up my first Proxmox backup server, and I am amazed how easy and straight forward that was.
However, I have a strange problem: Whenever a backup fails for some reason (e.g. internet outage), one of my disks is mounted read only and write access can only be restored by shutting the vm down and booting it up again.

Any idea on how to troubleshoot this?

Best Regards,
Dorian
 
We just experienced the exact same thing, never saw it before on pbs1 but we updated to pbs2 last week.

The backup failed with "backup write data failed: command error: protocol canceled", I suspect because of some network timeout (the pbs is on another location).
That does happen sometimes but never caused a read-only filesystem on the VM which is a more serious problem. All backups are in snapshot mode, PVE cluster updated also recently to the latest version 7.

Is this a known problem ? Anything we can do to prevent this from happening ?
 
Hi,
please share the full backup task log from both PVE and PBS and the output of pveversion -v and proxmox-backup-manager versions --verbose respectively. Please also share the VM configuration with qm config <ID> and tell us what kind of storage the affected disk is on.

Is there anything interesting in /var/log/syslog on the host or in the corresponding log in the VM around the time the issue happened?
 
@fiona
below are the details that you asked for.

pveversion -v

proxmox-ve: 7.2-1 (running kernel: 5.4.128-1-pve)
pve-manager: 7.2-7 (running version: 7.2-7/d0dd0e85)
pve-kernel-helper: 7.2-12
pve-kernel-5.15: 7.2-10
pve-kernel-5.4: 6.4-18
pve-kernel-5.3: 6.1-6
pve-kernel-5.15.53-1-pve: 5.15.53-1
pve-kernel-5.15.39-4-pve: 5.15.39-4
pve-kernel-5.4.189-2-pve: 5.4.189-2
pve-kernel-5.4.128-1-pve: 5.4.128-2
pve-kernel-5.3.18-3-pve: 5.3.18-3
pve-kernel-5.3.10-1-pve: 5.3.10-1
ceph-fuse: 14.2.21-1
corosync: 3.1.5-pve2
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown: 0.8.36+pve1
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.24-pve1
libproxmox-acme-perl: 1.4.2
libproxmox-backup-qemu0: 1.3.1-1
libpve-access-control: 7.2-4
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.2-2
libpve-guest-common-perl: 4.1-2
libpve-http-server-perl: 4.1-3
libpve-storage-perl: 7.2-8
libqb0: 1.0.5-1
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 5.0.0-3
lxcfs: 4.0.12-pve1
novnc-pve: 1.3.0-3
proxmox-backup-client: 2.2.5-1
proxmox-backup-file-restore: 2.2.5-1
proxmox-mini-journalreader: 1.3-1
proxmox-widget-toolkit: 3.5.1
pve-cluster: 7.2-2
pve-container: 4.2-2
pve-docs: 7.2-2
pve-edk2-firmware: 3.20220526-1
pve-firewall: 4.2-5
pve-firmware: 3.5-1
pve-ha-manager: 3.4.0
pve-i18n: 2.7-2
pve-qemu-kvm: 7.0.0-3
pve-xtermjs: 4.16.0-1
qemu-server: 7.2-4
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.7.1~bpo11+1
vncterm: 1.7-1
zfsutils-linux: 2.1.5-pve1


proxmox-backup-manager versions --verbose

proxmox-backup 2.2-1 running kernel: 5.15.39-4-pve
proxmox-backup-server 2.2.5-1 running version: 2.2.5
pve-kernel-5.15 7.2-10
pve-kernel-helper 7.2-12
pve-kernel-5.4 6.4-19
pve-kernel-5.15.39-4-pve 5.15.39-4
pve-kernel-5.4.195-1-pve 5.4.195-1
pve-kernel-5.4.162-1-pve 5.4.162-2
pve-kernel-5.4.65-1-pve 5.4.65-1
ifupdown2 3.1.0-1+pmx3
libjs-extjs 7.0.0-1
proxmox-backup-docs 2.2.5-1
proxmox-backup-client 2.2.5-1
proxmox-mini-journalreader 1.2-1
proxmox-widget-toolkit 3.5.1
pve-xtermjs 4.16.0-1
smartmontools 7.2-pve3
zfsutils-linux 2.1.5-pve1


qm config 804

boot: cdn
bootdisk: virtio0
cores: 2
description: 85.17.206.30%XXXXXXXXXXX / xxxxxxxxxxxxxxxxx
ide2: zfs2-iso:iso/ubuntu-18.04.3-live-server-amd64.iso,media=cdrom
memory: 16192
name: newxoip-live
net0: virtio=92:F0:96:FA:B5:5E,bridge=vmbr0
net1: virtio=AA:46:FA:D7:D2:12,bridge=vmbr1
numa: 0
onboot: 1
ostype: l26
scsihw: virtio-scsi-pci
smbios1: uuid=d021a26e-a589-43a4-872f-670d4bea2464
sockets: 2
virtio0: Truenas:804/vm-804-disk-0.raw,size=250G
vmgenid: bcdae75e-9f85-4892-8b89-d1607ef0c259

On the server pm8 /var/log/syslog
# below is an extract for the syslog, but the only log entry that is relevent is the one referencing "/mnt/datastore/pbs1tank/vm/804/". I have added the entries around that line to show that there are other activities running without hassles at the same or similar timeslots. This is the same for both servers. I have placed a # at the start of the line for lines that I think are not related to the issue.(I might not be correct with this assumption)

Sep 15 00:05:36 pbs1 proxmox-backup-proxy[2723]: add blob "/mnt/datastore/pbs1tank/vm/804/2022-09-14T22:05:35Z/qemu-server.conf.blob" (390 bytes, comp: 390)
#Sep 15 00:07:01 pbs1 CRON[1886954]: (root) CMD ("/root/.acme.sh"/acme.sh --cron --home "/root/.acme.sh" > /dev/null)
#Sep 15 00:07:42 pbs1 proxmox-backup-proxy[2723]: Upload statistics for 'drive-virtio0.img.fidx'
#Sep 15 00:07:42 pbs1 proxmox-backup-proxy[2723]: UUID: 182cfad06ce14fe686aa3cb1de615e04
#Sep 15 00:07:42 pbs1 proxmox-backup-proxy[2723]: Checksum: 47b61996b6871f1821cb61815282c6e11d46f285ea47aae670d5c14b93838274
#Sep 15 00:07:42 pbs1 proxmox-backup-proxy[2723]: Size: 483183820800
#Sep 15 00:07:42 pbs1 proxmox-backup-proxy[2723]: Chunk count: 115200
#Sep 15 00:07:42 pbs1 proxmox-backup-proxy[2723]: Upload size: 9592373248 (1%)
#Sep 15 00:07:42 pbs1 proxmox-backup-proxy[2723]: Duplicates: 112913+2 (98%)
#Sep 15 00:07:42 pbs1 proxmox-backup-proxy[2723]: Compression: 22%
#Sep 15 00:07:42 pbs1 proxmox-backup-proxy[2723]: successfully closed fixed index 1
#Sep 15 00:07:42 pbs1 proxmox-backup-proxy[2723]: add blob "/mnt/datastore/pbs1tank/vm/204/2022-09-14T21:21:28Z/index.json.blob" (326 bytes, comp: 326)
#Sep 15 00:07:42 pbs1 proxmox-backup-proxy[2723]: successfully finished backup
#Sep 15 00:07:42 pbs1 proxmox-backup-proxy[2723]: backup finished successfully
#Sep 15 00:07:42 pbs1 proxmox-backup-proxy[2723]: TASK OK

On the server pbs1 /var/log/syslog

#Sep 15 01:00:13 pbs1 systemd[1]: proxmox-backup-daily-update.service: Succeeded.
#Sep 15 01:00:13 pbs1 systemd[1]: Finished Daily Proxmox Backup Server update and maintenance activities.
#Sep 15 01:00:13 pbs1 systemd[1]: proxmox-backup-daily-update.service: Consumed 1.046s CPU time.
#Sep 15 01:02:21 pbs1 proxmox-backup-proxy[2723]: starting new backup on datastore 'pbs1tank': "vm/3302/2022-09-14T23:02:20Z"
#Sep 15 01:02:21 pbs1 proxmox-backup-proxy[2723]: download 'index.json.blob' from previous backup.
#Sep 15 01:02:21 pbs1 proxmox-backup-proxy[2723]: register chunks in 'drive-scsi0.img.fidx' from previous backup.
#Sep 15 01:02:21 pbs1 proxmox-backup-proxy[2723]: download 'drive-scsi0.img.fidx' from previous backup.
#Sep 15 01:02:21 pbs1 proxmox-backup-proxy[2723]: starting new backup on datastore 'pbs1tank': "vm/805/2022-09-14T23:02:18Z"
#Sep 15 01:02:21 pbs1 proxmox-backup-proxy[2723]: download 'index.json.blob' from previous backup.
#Sep 15 01:02:21 pbs1 proxmox-backup-proxy[2723]: register chunks in 'drive-scsi0.img.fidx' from previous backup.
#Sep 15 01:02:21 pbs1 proxmox-backup-proxy[2723]: download 'drive-scsi0.img.fidx' from previous backup.
#Sep 15 01:02:22 pbs1 proxmox-backup-proxy[2723]: created new fixed index 1 ("vm/805/2022-09-14T23:02:18Z/drive-scsi0.img.fidx")
#Sep 15 01:02:22 pbs1 proxmox-backup-proxy[2723]: created new fixed index 1 ("vm/3302/2022-09-14T23:02:20Z/drive-scsi0.img.fidx")
#Sep 15 01:02:22 pbs1 proxmox-backup-proxy[2723]: add blob "/mnt/datastore/pbs1tank/vm/805/2022-09-14T23:02:18Z/qemu-server.conf.blob" (408 bytes, comp: 408)
#Sep 15 01:02:22 pbs1 proxmox-backup-proxy[2723]: add blob "/mnt/datastore/pbs1tank/vm/3302/2022-09-14T23:02:20Z/qemu-server.conf.blob" (317 bytes, comp: 317)
Sep 15 01:02:25 pbs1 proxmox-backup-proxy[2723]: POST /fixed_chunk: 400 Bad Request: error reading a body from connection: timed out
Sep 15 01:02:25 pbs1 proxmox-backup-proxy[2723]: POST /fixed_chunk: 400 Bad Request: error reading a body from connection: timed out
Sep 15 01:02:25 pbs1 proxmox-backup-proxy[2723]: POST /fixed_chunk: 400 Bad Request: error reading a body from connection: timed out
Sep 15 01:02:25 pbs1 proxmox-backup-proxy[2723]: POST /fixed_chunk: 400 Bad Request: error reading a body from connection: timed out
Sep 15 01:02:25 pbs1 proxmox-backup-proxy[2723]: POST /fixed_chunk: 400 Bad Request: error reading a body from connection: timed out
Sep 15 01:02:25 pbs1 proxmox-backup-proxy[2723]: backup failed: connection error: timed out
Sep 15 01:02:25 pbs1 proxmox-backup-proxy[2723]: removing failed backup
Sep 15 01:02:25 pbs1 proxmox-backup-proxy[2723]: POST /fixed_chunk: 400 Bad Request: error reading a body from connection: timed out
Sep 15 01:02:25 pbs1 proxmox-backup-proxy[2723]: removing backup snapshot "/mnt/datastore/pbs1tank/vm/804/2022-09-14T22:05:35Z"
Sep 15 01:02:25 pbs1 proxmox-backup-proxy[2723]: TASK ERROR: connection error: timed out
Sep 15 01:02:51 pbs1 proxmox-backup-proxy[2723]: Upload statistics for 'drive-scsi0.img.fidx'
#Sep 15 01:02:51 pbs1 proxmox-backup-proxy[2723]: UUID: 064d694e1d384e22a68ccb33213e500f
#Sep 15 01:02:51 pbs1 proxmox-backup-proxy[2723]: Checksum: ed4ac3ef63fe16401890d45812c46a358ba13892be9ba931c67418acaa04e429
#Sep 15 01:02:51 pbs1 proxmox-backup-proxy[2723]: Size: 1782579200
#Sep 15 01:02:51 pbs1 proxmox-backup-proxy[2723]: Chunk count: 425
#Sep 15 01:02:51 pbs1 proxmox-backup-proxy[2723]: Upload size: 1602224128 (89%)
#Sep 15 01:02:51 pbs1 proxmox-backup-proxy[2723]: Duplicates: 43+4 (11%)
#Sep 15 01:02:51 pbs1 proxmox-backup-proxy[2723]: Compression: 16%
#Sep 15 01:02:51 pbs1 proxmox-backup-proxy[2723]: successfully closed fixed index 1
#Sep 15 01:02:51 pbs1 proxmox-backup-proxy[2723]: add blob "/mnt/datastore/pbs1tank/vm/3302/2022-09-14T23:02:20Z/index.json.blob" (328 bytes, comp: 328)
#Sep 15 01:02:51 pbs1 proxmox-backup-proxy[2723]: successfully finished backup
#Sep 15 01:02:51 pbs1 proxmox-backup-proxy[2723]: backup finished successfully
#Sep 15 01:02:51 pbs1 proxmox-backup-proxy[2723]: TASK OK

# Below is the extract from the backlog that is placed within the email.

804: 2022-09-15 00:05:35 INFO: Starting Backup of VM 804 (qemu)
804: 2022-09-15 00:05:35 INFO: status = running
804: 2022-09-15 00:05:35 INFO: VM Name: newxoip-live
804: 2022-09-15 00:05:35 INFO: include disk 'virtio0' 'Truenas:804/vm-804-disk-0.raw' 250G
804: 2022-09-15 00:05:36 INFO: backup mode: snapshot
804: 2022-09-15 00:05:36 INFO: ionice priority: 7
804: 2022-09-15 00:05:36 INFO: creating Proxmox Backup Server archive 'vm/804/2022-09-14T22:05:35Z'
804: 2022-09-15 00:05:36 INFO: started backup task '7387c5da-dcc5-499f-80ff-77b28572c80d'
804: 2022-09-15 00:05:36 INFO: resuming VM again
804: 2022-09-15 00:05:36 INFO: virtio0: dirty-bitmap status: OK (58.5 GiB of 250.0 GiB dirty)
804: 2022-09-15 00:05:36 INFO: using fast incremental mode (dirty-bitmap), 58.5 GiB dirty of 250.0 GiB total
804: 2022-09-15 00:05:39 INFO: 0% (112.0 MiB of 58.5 GiB) in 3s, read: 37.3 MiB/s, write: 37.3 MiB/s
804: 2022-09-15 00:06:24 INFO: 1% (624.0 MiB of 58.5 GiB) in 48s, read: 11.4 MiB/s, write: 11.4 MiB/s
804: 2022-09-15 00:07:27 INFO: 2% (1.2 GiB of 58.5 GiB) in 1m 51s, read: 9.1 MiB/s, write: 9.1 MiB/s
804: 2022-09-15 00:08:28 INFO: 3% (1.8 GiB of 58.5 GiB) in 2m 52s, read: 10.0 MiB/s, write: 10.0 MiB/s
804: 2022-09-15 00:09:25 INFO: 4% (2.4 GiB of 58.5 GiB) in 3m 49s, read: 10.5 MiB/s, write: 10.5 MiB/s
804: 2022-09-15 00:10:20 INFO: 5% (2.9 GiB of 58.5 GiB) in 4m 44s, read: 10.6 MiB/s, write: 10.6 MiB/s
804: 2022-09-15 00:11:26 INFO: 6% (3.5 GiB of 58.5 GiB) in 5m 50s, read: 9.2 MiB/s, write: 9.2 MiB/s
804: 2022-09-15 00:12:39 INFO: 7% (4.1 GiB of 58.5 GiB) in 7m 3s, read: 8.2 MiB/s, write: 8.2 MiB/s
804: 2022-09-15 00:13:49 INFO: 8% (4.7 GiB of 58.5 GiB) in 8m 13s, read: 8.7 MiB/s, write: 8.7 MiB/s
804: 2022-09-15 00:14:53 INFO: 9% (5.3 GiB of 58.5 GiB) in 9m 17s, read: 9.4 MiB/s, write: 9.4 MiB/s
804: 2022-09-15 00:15:52 INFO: 10% (5.9 GiB of 58.5 GiB) in 10m 16s, read: 9.9 MiB/s, write: 9.9 MiB/s
804: 2022-09-15 00:16:54 INFO: 11% (6.4 GiB of 58.5 GiB) in 11m 18s, read: 9.7 MiB/s, write: 9.7 MiB/s
804: 2022-09-15 00:17:50 INFO: 12% (7.0 GiB of 58.5 GiB) in 12m 14s, read: 10.7 MiB/s, write: 10.7 MiB/s
804: 2022-09-15 00:18:56 INFO: 13% (7.6 GiB of 58.5 GiB) in 13m 20s, read: 9.3 MiB/s, write: 9.3 MiB/s
804: 2022-09-15 00:19:57 INFO: 14% (8.2 GiB of 58.5 GiB) in 14m 21s, read: 9.8 MiB/s, write: 9.8 MiB/s
804: 2022-09-15 00:21:16 INFO: 15% (8.8 GiB of 58.5 GiB) in 15m 40s, read: 7.6 MiB/s, write: 7.6 MiB/s
804: 2022-09-15 00:22:37 INFO: 16% (9.4 GiB of 58.5 GiB) in 17m 1s, read: 7.4 MiB/s, write: 7.4 MiB/s
804: 2022-09-15 00:23:41 INFO: 17% (10.0 GiB of 58.5 GiB) in 18m 5s, read: 9.8 MiB/s, write: 9.8 MiB/s
804: 2022-09-15 00:24:34 INFO: 18% (10.6 GiB of 58.5 GiB) in 18m 58s, read: 11.0 MiB/s, write: 11.0 MiB/s
804: 2022-09-15 00:25:29 INFO: 19% (11.1 GiB of 58.5 GiB) in 19m 53s, read: 10.8 MiB/s, write: 10.8 MiB/s
804: 2022-09-15 00:26:21 INFO: 20% (11.7 GiB of 58.5 GiB) in 20m 45s, read: 11.4 MiB/s, write: 11.4 MiB/s
804: 2022-09-15 00:27:30 INFO: 21% (12.3 GiB of 58.5 GiB) in 21m 54s, read: 8.8 MiB/s, write: 8.8 MiB/s
804: 2022-09-15 00:28:41 INFO: 22% (12.9 GiB of 58.5 GiB) in 23m 5s, read: 8.5 MiB/s, write: 8.5 MiB/s
804: 2022-09-15 00:29:47 INFO: 23% (13.5 GiB of 58.5 GiB) in 24m 11s, read: 8.8 MiB/s, write: 8.8 MiB/s
804: 2022-09-15 00:30:59 INFO: 24% (14.0 GiB of 58.5 GiB) in 25m 23s, read: 8.3 MiB/s, write: 8.3 MiB/s
804: 2022-09-15 00:32:09 INFO: 25% (14.6 GiB of 58.5 GiB) in 26m 33s, read: 8.7 MiB/s, write: 8.7 MiB/s
804: 2022-09-15 00:33:15 INFO: 26% (15.2 GiB of 58.5 GiB) in 27m 39s, read: 8.9 MiB/s, write: 8.9 MiB/s
804: 2022-09-15 00:34:24 INFO: 27% (15.8 GiB of 58.5 GiB) in 28m 48s, read: 8.8 MiB/s, write: 8.8 MiB/s
804: 2022-09-15 00:35:43 INFO: 28% (16.4 GiB of 58.5 GiB) in 30m 7s, read: 7.6 MiB/s, write: 7.6 MiB/s
804: 2022-09-15 00:36:59 INFO: 29% (17.0 GiB of 58.5 GiB) in 31m 23s, read: 7.8 MiB/s, write: 7.8 MiB/s
804: 2022-09-15 00:37:59 INFO: 30% (17.6 GiB of 58.5 GiB) in 32m 23s, read: 10.2 MiB/s, write: 10.2 MiB/s
804: 2022-09-15 00:39:13 INFO: 31% (18.1 GiB of 58.5 GiB) in 33m 37s, read: 8.0 MiB/s, write: 8.0 MiB/s
804: 2022-09-15 00:40:19 INFO: 32% (18.7 GiB of 58.5 GiB) in 34m 43s, read: 9.2 MiB/s, write: 9.2 MiB/s
804: 2022-09-15 00:41:19 INFO: 33% (19.3 GiB of 58.5 GiB) in 35m 43s, read: 9.9 MiB/s, write: 9.9 MiB/s
804: 2022-09-15 00:42:22 INFO: 34% (19.9 GiB of 58.5 GiB) in 36m 46s, read: 9.7 MiB/s, write: 9.7 MiB/s
804: 2022-09-15 00:43:25 INFO: 35% (20.5 GiB of 58.5 GiB) in 37m 49s, read: 9.6 MiB/s, write: 9.6 MiB/s
804: 2022-09-15 00:44:28 INFO: 36% (21.1 GiB of 58.5 GiB) in 38m 52s, read: 9.2 MiB/s, write: 9.2 MiB/s
804: 2022-09-15 00:45:33 INFO: 37% (21.7 GiB of 58.5 GiB) in 39m 57s, read: 9.2 MiB/s, write: 9.2 MiB/s
804: 2022-09-15 01:02:18 INFO: 37% (21.9 GiB of 58.5 GiB) in 56m 42s, read: 220.1 KiB/s, write: 220.1 KiB/s
804: 2022-09-15 01:02:18 ERROR: backup write data failed: command error: protocol canceled
804: 2022-09-15 01:02:18 INFO: aborting backup job
804: 2022-09-15 01:02:18 INFO: resuming VM again
804: 2022-09-15 01:02:18 ERROR: Backup of VM 804 failed - backup write data failed: command error: protocol canceled

If there is anything else you would like from me please let me know and I will get it to you.

Thanks in advance for the help and advice.
Lawrence
 

Attachments

  • pbs1_disks.png
    pbs1_disks.png
    92.1 KB · Views: 7
  • pbs1tank.png
    pbs1tank.png
    23.2 KB · Views: 7
Last edited:
@fiona
I found another thread that mentioned how to view the disk config. Here that is.

cat /etc/pve/storage.cfg
dir: local
path /var/lib/vz
content vztmpl,iso,backup

lvmthin: local-lvm
thinpool data
vgname pve
content images,rootdir

nfs: zfs2-pm8backup
export /mnt/zfs2-evo/pm8backup
path /mnt/pve/zfs2-pm8backup
server 10.0.1.50
content snippets,iso,backup,rootdir,vztmpl,images
prune-backups keep-last=3

nfs: zfs2-pm6backup
export /mnt/zfs2-evo/pm6backup
path /mnt/pve/zfs2-pm6backup
server 10.0.1.50
content backup
prune-backups keep-last=3

nfs: Truenas
export /mnt/tank/proxmox
path /mnt/pve/Truenas
server 10.0.1.8
content iso,snippets,rootdir,images,vztmpl

pbs: pbs1
datastore pbs1tank
server pbs1.enem.nl
content backup
fingerprint 47:9b:e7:56:a2:10:b8:bd:2c:c8:2d:d5:b3:6b:fa:e1:11:41:f0:69:ad:53:8e:ed:35:67:08:aa:53:6a:af:62
prune-backups keep-all=1
username xxxxxxxxx@pbs

nfs: zfs2-iso
export /mnt/zfs2-evo/iso
path /mnt/pve/zfs2-iso
server 10.0.1.50
content iso
prune-backups keep-all=1

nfs: zfs2-pmevobackup
export /mnt/zfs2-evo/pm2backup
path /mnt/pve/zfs2-pmevobackup
server 10.0.1.50
content backup
prune-backups keep-all=1

Lawrence
 
Last edited:
@fiona
below are the details that you asked for.
Thanks :)

qm config 804

boot: cdn
bootdisk: virtio0
cores: 2
description: 85.17.206.30%XXXXXXXXXXX / xxxxxxxxxxxxxxxxx
ide2: zfs2-iso:iso/ubuntu-18.04.3-live-server-amd64.iso,media=cdrom
memory: 16192
name: newxoip-live
net0: virtio=92:F0:96:FA:B5:5E,bridge=vmbr0
net1: virtio=AA:46:FA:D7:D2:12,bridge=vmbr1
numa: 0
onboot: 1
ostype: l26
scsihw: virtio-scsi-pci
smbios1: uuid=d021a26e-a589-43a4-872f-670d4bea2464
sockets: 2
virtio0: Truenas:804/vm-804-disk-0.raw,size=250G
vmgenid: bcdae75e-9f85-4892-8b89-d1607ef0c259
Since the disk is on a network storage, could it be that the network failure also affected communicating with the disk?

On the server pm8 /var/log/syslog
# below is an extract for the syslog, but the only log entry that is relevent is the one referencing "/mnt/datastore/pbs1tank/vm/804/". I have added the entries around that line to show that there are other activities running without hassles at the same or similar timeslots. This is the same for both servers. I have placed a # at the start of the line for lines that I think are not related to the issue.(I might not be correct with this assumption)

Sep 15 00:05:36 pbs1 proxmox-backup-proxy[2723]: add blob "/mnt/datastore/pbs1tank/vm/804/2022-09-14T22:05:35Z/qemu-server.conf.blob" (390 bytes, comp: 390)
#Sep 15 00:07:01 pbs1 CRON[1886954]: (root) CMD ("/root/.acme.sh"/acme.sh --cron --home "/root/.acme.sh" > /dev/null)
#Sep 15 00:07:42 pbs1 proxmox-backup-proxy[2723]: Upload statistics for 'drive-virtio0.img.fidx'
#Sep 15 00:07:42 pbs1 proxmox-backup-proxy[2723]: UUID: 182cfad06ce14fe686aa3cb1de615e04
#Sep 15 00:07:42 pbs1 proxmox-backup-proxy[2723]: Checksum: 47b61996b6871f1821cb61815282c6e11d46f285ea47aae670d5c14b93838274
#Sep 15 00:07:42 pbs1 proxmox-backup-proxy[2723]: Size: 483183820800
#Sep 15 00:07:42 pbs1 proxmox-backup-proxy[2723]: Chunk count: 115200
#Sep 15 00:07:42 pbs1 proxmox-backup-proxy[2723]: Upload size: 9592373248 (1%)
#Sep 15 00:07:42 pbs1 proxmox-backup-proxy[2723]: Duplicates: 112913+2 (98%)
#Sep 15 00:07:42 pbs1 proxmox-backup-proxy[2723]: Compression: 22%
#Sep 15 00:07:42 pbs1 proxmox-backup-proxy[2723]: successfully closed fixed index 1
#Sep 15 00:07:42 pbs1 proxmox-backup-proxy[2723]: add blob "/mnt/datastore/pbs1tank/vm/204/2022-09-14T21:21:28Z/index.json.blob" (326 bytes, comp: 326)
#Sep 15 00:07:42 pbs1 proxmox-backup-proxy[2723]: successfully finished backup
#Sep 15 00:07:42 pbs1 proxmox-backup-proxy[2723]: backup finished successfully
#Sep 15 00:07:42 pbs1 proxmox-backup-proxy[2723]: TASK OK

On the server pbs1 /var/log/syslog

#Sep 15 01:00:13 pbs1 systemd[1]: proxmox-backup-daily-update.service: Succeeded.
#Sep 15 01:00:13 pbs1 systemd[1]: Finished Daily Proxmox Backup Server update and maintenance activities.
#Sep 15 01:00:13 pbs1 systemd[1]: proxmox-backup-daily-update.service: Consumed 1.046s CPU time.
#Sep 15 01:02:21 pbs1 proxmox-backup-proxy[2723]: starting new backup on datastore 'pbs1tank': "vm/3302/2022-09-14T23:02:20Z"
#Sep 15 01:02:21 pbs1 proxmox-backup-proxy[2723]: download 'index.json.blob' from previous backup.
#Sep 15 01:02:21 pbs1 proxmox-backup-proxy[2723]: register chunks in 'drive-scsi0.img.fidx' from previous backup.
#Sep 15 01:02:21 pbs1 proxmox-backup-proxy[2723]: download 'drive-scsi0.img.fidx' from previous backup.
#Sep 15 01:02:21 pbs1 proxmox-backup-proxy[2723]: starting new backup on datastore 'pbs1tank': "vm/805/2022-09-14T23:02:18Z"
#Sep 15 01:02:21 pbs1 proxmox-backup-proxy[2723]: download 'index.json.blob' from previous backup.
#Sep 15 01:02:21 pbs1 proxmox-backup-proxy[2723]: register chunks in 'drive-scsi0.img.fidx' from previous backup.
#Sep 15 01:02:21 pbs1 proxmox-backup-proxy[2723]: download 'drive-scsi0.img.fidx' from previous backup.
#Sep 15 01:02:22 pbs1 proxmox-backup-proxy[2723]: created new fixed index 1 ("vm/805/2022-09-14T23:02:18Z/drive-scsi0.img.fidx")
#Sep 15 01:02:22 pbs1 proxmox-backup-proxy[2723]: created new fixed index 1 ("vm/3302/2022-09-14T23:02:20Z/drive-scsi0.img.fidx")
#Sep 15 01:02:22 pbs1 proxmox-backup-proxy[2723]: add blob "/mnt/datastore/pbs1tank/vm/805/2022-09-14T23:02:18Z/qemu-server.conf.blob" (408 bytes, comp: 408)
#Sep 15 01:02:22 pbs1 proxmox-backup-proxy[2723]: add blob "/mnt/datastore/pbs1tank/vm/3302/2022-09-14T23:02:20Z/qemu-server.conf.blob" (317 bytes, comp: 317)
Sep 15 01:02:25 pbs1 proxmox-backup-proxy[2723]: POST /fixed_chunk: 400 Bad Request: error reading a body from connection: timed out
Sep 15 01:02:25 pbs1 proxmox-backup-proxy[2723]: POST /fixed_chunk: 400 Bad Request: error reading a body from connection: timed out
Sep 15 01:02:25 pbs1 proxmox-backup-proxy[2723]: POST /fixed_chunk: 400 Bad Request: error reading a body from connection: timed out
Sep 15 01:02:25 pbs1 proxmox-backup-proxy[2723]: POST /fixed_chunk: 400 Bad Request: error reading a body from connection: timed out
Sep 15 01:02:25 pbs1 proxmox-backup-proxy[2723]: POST /fixed_chunk: 400 Bad Request: error reading a body from connection: timed out
Sep 15 01:02:25 pbs1 proxmox-backup-proxy[2723]: backup failed: connection error: timed out
Sep 15 01:02:25 pbs1 proxmox-backup-proxy[2723]: removing failed backup
Sep 15 01:02:25 pbs1 proxmox-backup-proxy[2723]: POST /fixed_chunk: 400 Bad Request: error reading a body from connection: timed out
Sep 15 01:02:25 pbs1 proxmox-backup-proxy[2723]: removing backup snapshot "/mnt/datastore/pbs1tank/vm/804/2022-09-14T22:05:35Z"
Sep 15 01:02:25 pbs1 proxmox-backup-proxy[2723]: TASK ERROR: connection error: timed out
Sep 15 01:02:51 pbs1 proxmox-backup-proxy[2723]: Upload statistics for 'drive-scsi0.img.fidx'
#Sep 15 01:02:51 pbs1 proxmox-backup-proxy[2723]: UUID: 064d694e1d384e22a68ccb33213e500f
#Sep 15 01:02:51 pbs1 proxmox-backup-proxy[2723]: Checksum: ed4ac3ef63fe16401890d45812c46a358ba13892be9ba931c67418acaa04e429
#Sep 15 01:02:51 pbs1 proxmox-backup-proxy[2723]: Size: 1782579200
#Sep 15 01:02:51 pbs1 proxmox-backup-proxy[2723]: Chunk count: 425
#Sep 15 01:02:51 pbs1 proxmox-backup-proxy[2723]: Upload size: 1602224128 (89%)
#Sep 15 01:02:51 pbs1 proxmox-backup-proxy[2723]: Duplicates: 43+4 (11%)
#Sep 15 01:02:51 pbs1 proxmox-backup-proxy[2723]: Compression: 16%
#Sep 15 01:02:51 pbs1 proxmox-backup-proxy[2723]: successfully closed fixed index 1
#Sep 15 01:02:51 pbs1 proxmox-backup-proxy[2723]: add blob "/mnt/datastore/pbs1tank/vm/3302/2022-09-14T23:02:20Z/index.json.blob" (328 bytes, comp: 328)
#Sep 15 01:02:51 pbs1 proxmox-backup-proxy[2723]: successfully finished backup
#Sep 15 01:02:51 pbs1 proxmox-backup-proxy[2723]: backup finished successfully
#Sep 15 01:02:51 pbs1 proxmox-backup-proxy[2723]: TASK OK
I was actually thinking about the syslog on the Proxmox VE host. Sorry, should've been clearer :/
Can you also check the logs within the VM?
 
Network storage: Theoretically yes, but the storage is on the LAN while pbs is on the other side of the country.
Also, we have other vm's running on the same storage that kept running (and backup) without problems.


There's one thing that stood out in the PVE syslog and that was an old nfs share that was removed some time ago but was still in the storage config:

Sep 15 00:47:56 pm8 pvestatd[1308]: unable to activate storage 'zfs2-pm9backup' - directory '/mnt/pve/zfs2-pm9backup' does not exist or is unreachable

I've seen strange things with missing storage and backup in the past but could this be really causing a vm going read-only ?


Also checked the syslog in the VM but nothing to see there.


Another thing i just noticed in the backup log:

804: 2022-09-15 00:05:36 INFO: creating Proxmox Backup Server archive 'vm/804/2022-09-14T22:05:35Z'
804: 2022-09-15 00:05:36 INFO: started backup task '7387c5da-dcc5-499f-80ff-77b28572c80d'
804: 2022-09-15 00:05:36 INFO: resuming VM again
.
.
.
804: 2022-09-15 00:45:33 INFO: 37% (21.7 GiB of 58.5 GiB) in 39m 57s, read: 9.2 MiB/s, write: 9.2 MiB/s
804: 2022-09-15 01:02:18 INFO: 37% (21.9 GiB of 58.5 GiB) in 56m 42s, read: 220.1 KiB/s, write: 220.1 KiB/s
804: 2022-09-15 01:02:18 ERROR: backup write data failed: command error: protocol canceled
804: 2022-09-15 01:02:18 INFO: aborting backup job
804: 2022-09-15 01:02:18 INFO: resuming VM again
804: 2022-09-15 01:02:18 ERROR: Backup of VM 804 failed - backup write data failed: command error: protocol canceled

What does this mean ? Was the VM resumed at the start of the job or only after the failure an hour later ?
 
proxmox-ve: 7.2-1 (running kernel: 5.4.128-1-pve)
pve-manager: 7.2-7 (running version: 7.2-7/d0dd0e85)
pve-kernel-helper: 7.2-12
pve-kernel-5.15: 7.2-10
pve-kernel-5.4: 6.4-18
pve-kernel-5.3: 6.1-6
pve-kernel-5.15.53-1-pve: 5.15.53-1
pve-kernel-5.15.39-4-pve: 5.15.39-4
pve-kernel-5.4.189-2-pve: 5.4.189-2
pve-kernel-5.4.128-1-pve: 5.4.128-2

Don't know if this could be of relevance for your particular problem, but did you pin that kernel for a specific reason (if so, which one?) or did you simply never reboot the PVE-host after the major upgrade from 6 to 7? If the latter, it might be a good idea to do it now.
 
Network storage: Theoretically yes, but the storage is on the LAN while pbs is on the other side of the country.
Also, we have other vm's running on the same storage that kept running (and backup) without problems.
Ok, so we can rule that out :)
Also checked the syslog in the VM but nothing to see there.
Except for the disk going read-only I suppose? Or was that only visible from the outside?

Another thing i just noticed in the backup log:

804: 2022-09-15 00:05:36 INFO: creating Proxmox Backup Server archive 'vm/804/2022-09-14T22:05:35Z'
804: 2022-09-15 00:05:36 INFO: started backup task '7387c5da-dcc5-499f-80ff-77b28572c80d'
804: 2022-09-15 00:05:36 INFO: resuming VM again
.
.
.
804: 2022-09-15 00:45:33 INFO: 37% (21.7 GiB of 58.5 GiB) in 39m 57s, read: 9.2 MiB/s, write: 9.2 MiB/s
804: 2022-09-15 01:02:18 INFO: 37% (21.9 GiB of 58.5 GiB) in 56m 42s, read: 220.1 KiB/s, write: 220.1 KiB/s
804: 2022-09-15 01:02:18 ERROR: backup write data failed: command error: protocol canceled
804: 2022-09-15 01:02:18 INFO: aborting backup job
804: 2022-09-15 01:02:18 INFO: resuming VM again
804: 2022-09-15 01:02:18 ERROR: Backup of VM 804 failed - backup write data failed: command error: protocol canceled

What does this mean ? Was the VM resumed at the start of the job or only after the failure an hour later ?
For snapshot mode, the VM is stopped for the backup operation to start and then resumed.

But the second one might be a good hint. We do issue the cont command to resume the VM unconditionally in the error case. This is relevant for the stop mode backup (so the VM will be started after the error) and I'd expect it to be fine when the VM is already running too. But I'm not entirely sure, so I'll test around a bit.
 
Except for the disk going read-only I suppose? Or was that only visible from the outside?
yes, only because of processes failing suddenly.

Don't know if this could be of relevance for your particular problem, but did you pin that kernel for a specific reason (if so, which one?) or did you simply never reboot the PVE-host after the major upgrade from 6 to 7? If the latter, it might be a good idea to do it now.

Turns out it actually was never rebooted yes, no particular reason besides inexperience. We will do that asap, thanks for the remark!
 
  • Like
Reactions: Neobin
yes, only because of processes failing suddenly.
804: 2022-09-15 00:45:33 INFO: 37% (21.7 GiB of 58.5 GiB) in 39m 57s, read: 9.2 MiB/s, write: 9.2 MiB/s
804: 2022-09-15 01:02:18 INFO: 37% (21.9 GiB of 58.5 GiB) in 56m 42s, read: 220.1 KiB/s, write: 220.1 KiB/s
804: 2022-09-15 01:02:18 ERROR: backup write data failed: command error: protocol canceled
I think the following might have happened: when the backup connection is gone, the VM can't do any writes to blocks that were not yet backed up. It has to wait for the old block to be uploaded first. And waiting 15+ minutes for writes will cause some processes to fail of course. But technically, the disk would not be read-only (as seen from the guest or QEMU). But it is "read-only" for 15+ minutes in practice of course ;)

There's already some enhancement request (here and here) that would improve the situation when implemented.
 
The first one ooks very similar. Didn't know the vm speed became the same as pbs, is that the case even in snapshot mode ?
This would make using a remote PBS less attractive for busy vm's.

Concerning your theorie, yes. I guess the VM OS stayed in that state till the reboot with some kind of kernel panic.
 
This issue has now happened today, the question that I would like to understand and have feedback on is, this issue up to has only appeared on a VM running Ubuntu 22.xx. Can there be an issue with UBUNTU btrfs and this issue happening.

Thanks
Lawrence
 
Hi,
This issue has now happened today, the question that I would like to understand and have feedback on is, this issue up to has only appeared on a VM running Ubuntu 22.xx. Can there be an issue with UBUNTU btrfs and this issue happening.

Thanks
Lawrence
do you get the same ERROR: backup write data failed: command error: protocol canceled error? The error itself should not depend on any file system used inside the guest (the backup happens on the block level).
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!