Hi I have a 2 node cluster on 7.1-7
one of the VM's fail's its nightly backups:
104: 2021-12-08 04:07:09 INFO: status = running
104: 2021-12-08 04:07:09 INFO: VM Name: sidious-Kuberneties
104: 2021-12-08 04:07:09 INFO: include disk 'scsi0' 'vms-ssd:vm-104-disk-0' 128G
104: 2021-12-08 04:07:09 INFO: backup mode: snapshot
104: 2021-12-08 04:07:09 INFO: ionice priority: 7
104: 2021-12-08 04:07:09 INFO: creating vzdump archive '/mnt/v01/dump/vzdump-qemu-104-2021_12_08-04_07_09.vma.zst'
104: 2021-12-08 04:07:09 INFO: started backup task 'bf86277f-2eb2-4f3e-9203-59eeb4493cfd'
104: 2021-12-08 04:07:09 INFO: resuming VM again
104: 2021-12-08 04:07:13 INFO: 2% (2.9 GiB of 128.0 GiB) in 3s, read: 999.3 MiB/s, write: 52.7 MiB/s
104: 2021-12-08 04:07:27 INFO: 3% (3.9 GiB of 128.0 GiB) in 17s, read: 71.7 MiB/s, write: 66.7 MiB/s
104: 2021-12-08 04:07:52 INFO: 4% (5.1 GiB of 128.0 GiB) in 42s, read: 50.3 MiB/s, write: 43.9 MiB/s
104: 2021-12-08 04:08:12 INFO: 5% (6.5 GiB of 128.0 GiB) in 1m 2s, read: 67.8 MiB/s, write: 64.5 MiB/s
104: 2021-12-08 04:08:28 INFO: 6% (7.7 GiB of 128.0 GiB) in 1m 18s, read: 79.3 MiB/s, write: 62.4 MiB/s
104: 2021-12-08 04:08:42 INFO: 7% (9.0 GiB of 128.0 GiB) in 1m 32s, read: 94.8 MiB/s, write: 72.8 MiB/s
104: 2021-12-08 04:08:57 INFO: 8% (10.3 GiB of 128.0 GiB) in 1m 47s, read: 85.8 MiB/s, write: 82.6 MiB/s
104: 2021-12-08 04:09:09 INFO: 9% (11.5 GiB of 128.0 GiB) in 1m 59s, read: 109.1 MiB/s, write: 93.7 MiB/s
104: 2021-12-08 04:09:24 INFO: 10% (12.9 GiB of 128.0 GiB) in 2m 14s, read: 93.5 MiB/s, write: 49.6 MiB/s
104: 2021-12-08 04:09:35 INFO: 11% (14.6 GiB of 128.0 GiB) in 2m 25s, read: 161.5 MiB/s, write: 74.5 MiB/s
104: 2021-12-08 04:09:40 INFO: 12% (15.5 GiB of 128.0 GiB) in 2m 30s, read: 177.1 MiB/s, write: 48.2 MiB/s
104: 2021-12-08 04:09:52 INFO: 13% (16.7 GiB of 128.0 GiB) in 2m 42s, read: 99.3 MiB/s, write: 67.1 MiB/s
104: 2021-12-08 04:10:05 INFO: 14% (18.1 GiB of 128.0 GiB) in 2m 55s, read: 116.8 MiB/s, write: 64.3 MiB/s
104: 2021-12-08 04:10:10 INFO: 15% (19.5 GiB of 128.0 GiB) in 3m, read: 285.6 MiB/s, write: 48.8 MiB/s
104: 2021-12-08 04:10:13 INFO: 16% (21.2 GiB of 128.0 GiB) in 3m 3s, read: 553.4 MiB/s, write: 76.9 MiB/s
104: 2021-12-08 04:10:16 INFO: 18% (23.4 GiB of 128.0 GiB) in 3m 6s, read: 768.5 MiB/s, write: 64.8 MiB/s
104: 2021-12-08 04:10:21 INFO: 19% (24.5 GiB of 128.0 GiB) in 3m 11s, read: 229.1 MiB/s, write: 40.5 MiB/s
104: 2021-12-08 04:10:31 INFO: 20% (25.6 GiB of 128.0 GiB) in 3m 21s, read: 110.1 MiB/s, write: 100.7 MiB/s
104: 2021-12-08 04:10:31 ERROR: job failed with err -5 - Input/output error
104: 2021-12-08 04:10:31 INFO: aborting backup job
104: 2021-12-08 04:10:31 INFO: resuming VM again
104: 2021-12-08 04:10:34 ERROR: Backup of VM 104 failed - job failed with err -5 - Input/output error
and when I try and migrate it to the other node it fail with:
drive-scsi0: transferred 25.5 GiB of 128.0 GiB (19.92%) in 4m 11s
drive-scsi0: Cancelling block job
drive-scsi0: Done.
2021-12-08 09:27:18 ERROR: online migrate failure - block job (mirror) error: drive-scsi0: 'mirror' has been cancelled
2021-12-08 09:27:18 aborting phase 2 - cleanup resources
2021-12-08 09:27:18 migrate_cancel
2021-12-08 09:27:23 ERROR: migration finished with problems (duration 00:04:24)
TASK ERROR: migration problems
this is the config of the VM
root@pve:~# qm config 104
boot: order=scsi0;ide2;net0
cores: 4
ide2: none,media=cdrom
memory: 12288
name: sidious-Kuberneties
net0: virtio=26:4C:E1:E4:E2:E1,bridge=vmbr0,firewall=1
numa: 0
onboot: 1
ostype: l26
scsi0: vms-ssd:vm-104-disk-0,size=128G
scsihw: virtio-scsi-pci
smbios1: uuid=8390c86e-c755-4f84-90c9-1ef3681c33aa
sockets: 2
vmgenid: 888a22d7-1ee5-4f5e-9e73-1009ec4710e1
vms-ssd is a ZFS pool
PVE Versions:
root@pve:~# pveversion -v
proxmox-ve: 7.1-1 (running kernel: 5.13.19-1-pve)
pve-manager: 7.1-7 (running version: 7.1-7/df5740ad)
pve-kernel-helper: 7.1-6
pve-kernel-5.13: 7.1-5
pve-kernel-5.11: 7.0-10
pve-kernel-5.4: 6.4-7
pve-kernel-5.3: 6.1-6
pve-kernel-5.13.19-2-pve: 5.13.19-4
pve-kernel-5.13.19-1-pve: 5.13.19-3
pve-kernel-5.11.22-7-pve: 5.11.22-12
pve-kernel-5.4.143-1-pve: 5.4.143-1
pve-kernel-5.4.140-1-pve: 5.4.140-1
pve-kernel-5.3.18-3-pve: 5.3.18-3
pve-kernel-5.3.10-1-pve: 5.3.10-1
ceph: 16.2.6-pve2
ceph-fuse: 16.2.6-pve2
corosync: 3.1.5-pve2
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown: 0.8.36+pve1
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.22-pve2
libproxmox-acme-perl: 1.4.0
libproxmox-backup-qemu0: 1.2.0-1
libpve-access-control: 7.1-5
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.0-14
libpve-guest-common-perl: 4.0-3
libpve-http-server-perl: 4.0-4
libpve-storage-perl: 7.0-15
libqb0: 1.0.5-1
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 4.0.9-4
lxcfs: 4.0.8-pve2
novnc-pve: 1.2.0-3
proxmox-backup-client: 2.1.2-1
proxmox-backup-file-restore: 2.1.2-1
proxmox-mini-journalreader: 1.3-1
proxmox-widget-toolkit: 3.4-4
pve-cluster: 7.1-2
pve-container: 4.1-2
pve-docs: 7.1-2
pve-edk2-firmware: 3.20210831-2
pve-firewall: 4.2-5
pve-firmware: 3.3-3
pve-ha-manager: 3.3-1
pve-i18n: 2.6-2
pve-qemu-kvm: 6.1.0-3
pve-xtermjs: 4.12.0-1
qemu-server: 7.1-4
smartmontools: 7.2-pve2
spiceterm: 3.2-2
swtpm: 0.7.0~rc1+2
vncterm: 1.7-1
zfsutils-linux: 2.1.1-pve3
I have 6 VM's and 2 containers split evenly across the 2 nodes, and this is the only one having an issue.
Not sure what/where to look.
one of the VM's fail's its nightly backups:
104: 2021-12-08 04:07:09 INFO: status = running
104: 2021-12-08 04:07:09 INFO: VM Name: sidious-Kuberneties
104: 2021-12-08 04:07:09 INFO: include disk 'scsi0' 'vms-ssd:vm-104-disk-0' 128G
104: 2021-12-08 04:07:09 INFO: backup mode: snapshot
104: 2021-12-08 04:07:09 INFO: ionice priority: 7
104: 2021-12-08 04:07:09 INFO: creating vzdump archive '/mnt/v01/dump/vzdump-qemu-104-2021_12_08-04_07_09.vma.zst'
104: 2021-12-08 04:07:09 INFO: started backup task 'bf86277f-2eb2-4f3e-9203-59eeb4493cfd'
104: 2021-12-08 04:07:09 INFO: resuming VM again
104: 2021-12-08 04:07:13 INFO: 2% (2.9 GiB of 128.0 GiB) in 3s, read: 999.3 MiB/s, write: 52.7 MiB/s
104: 2021-12-08 04:07:27 INFO: 3% (3.9 GiB of 128.0 GiB) in 17s, read: 71.7 MiB/s, write: 66.7 MiB/s
104: 2021-12-08 04:07:52 INFO: 4% (5.1 GiB of 128.0 GiB) in 42s, read: 50.3 MiB/s, write: 43.9 MiB/s
104: 2021-12-08 04:08:12 INFO: 5% (6.5 GiB of 128.0 GiB) in 1m 2s, read: 67.8 MiB/s, write: 64.5 MiB/s
104: 2021-12-08 04:08:28 INFO: 6% (7.7 GiB of 128.0 GiB) in 1m 18s, read: 79.3 MiB/s, write: 62.4 MiB/s
104: 2021-12-08 04:08:42 INFO: 7% (9.0 GiB of 128.0 GiB) in 1m 32s, read: 94.8 MiB/s, write: 72.8 MiB/s
104: 2021-12-08 04:08:57 INFO: 8% (10.3 GiB of 128.0 GiB) in 1m 47s, read: 85.8 MiB/s, write: 82.6 MiB/s
104: 2021-12-08 04:09:09 INFO: 9% (11.5 GiB of 128.0 GiB) in 1m 59s, read: 109.1 MiB/s, write: 93.7 MiB/s
104: 2021-12-08 04:09:24 INFO: 10% (12.9 GiB of 128.0 GiB) in 2m 14s, read: 93.5 MiB/s, write: 49.6 MiB/s
104: 2021-12-08 04:09:35 INFO: 11% (14.6 GiB of 128.0 GiB) in 2m 25s, read: 161.5 MiB/s, write: 74.5 MiB/s
104: 2021-12-08 04:09:40 INFO: 12% (15.5 GiB of 128.0 GiB) in 2m 30s, read: 177.1 MiB/s, write: 48.2 MiB/s
104: 2021-12-08 04:09:52 INFO: 13% (16.7 GiB of 128.0 GiB) in 2m 42s, read: 99.3 MiB/s, write: 67.1 MiB/s
104: 2021-12-08 04:10:05 INFO: 14% (18.1 GiB of 128.0 GiB) in 2m 55s, read: 116.8 MiB/s, write: 64.3 MiB/s
104: 2021-12-08 04:10:10 INFO: 15% (19.5 GiB of 128.0 GiB) in 3m, read: 285.6 MiB/s, write: 48.8 MiB/s
104: 2021-12-08 04:10:13 INFO: 16% (21.2 GiB of 128.0 GiB) in 3m 3s, read: 553.4 MiB/s, write: 76.9 MiB/s
104: 2021-12-08 04:10:16 INFO: 18% (23.4 GiB of 128.0 GiB) in 3m 6s, read: 768.5 MiB/s, write: 64.8 MiB/s
104: 2021-12-08 04:10:21 INFO: 19% (24.5 GiB of 128.0 GiB) in 3m 11s, read: 229.1 MiB/s, write: 40.5 MiB/s
104: 2021-12-08 04:10:31 INFO: 20% (25.6 GiB of 128.0 GiB) in 3m 21s, read: 110.1 MiB/s, write: 100.7 MiB/s
104: 2021-12-08 04:10:31 ERROR: job failed with err -5 - Input/output error
104: 2021-12-08 04:10:31 INFO: aborting backup job
104: 2021-12-08 04:10:31 INFO: resuming VM again
104: 2021-12-08 04:10:34 ERROR: Backup of VM 104 failed - job failed with err -5 - Input/output error
and when I try and migrate it to the other node it fail with:
drive-scsi0: transferred 25.5 GiB of 128.0 GiB (19.92%) in 4m 11s
drive-scsi0: Cancelling block job
drive-scsi0: Done.
2021-12-08 09:27:18 ERROR: online migrate failure - block job (mirror) error: drive-scsi0: 'mirror' has been cancelled
2021-12-08 09:27:18 aborting phase 2 - cleanup resources
2021-12-08 09:27:18 migrate_cancel
2021-12-08 09:27:23 ERROR: migration finished with problems (duration 00:04:24)
TASK ERROR: migration problems
this is the config of the VM
root@pve:~# qm config 104
boot: order=scsi0;ide2;net0
cores: 4
ide2: none,media=cdrom
memory: 12288
name: sidious-Kuberneties
net0: virtio=26:4C:E1:E4:E2:E1,bridge=vmbr0,firewall=1
numa: 0
onboot: 1
ostype: l26
scsi0: vms-ssd:vm-104-disk-0,size=128G
scsihw: virtio-scsi-pci
smbios1: uuid=8390c86e-c755-4f84-90c9-1ef3681c33aa
sockets: 2
vmgenid: 888a22d7-1ee5-4f5e-9e73-1009ec4710e1
vms-ssd is a ZFS pool
PVE Versions:
root@pve:~# pveversion -v
proxmox-ve: 7.1-1 (running kernel: 5.13.19-1-pve)
pve-manager: 7.1-7 (running version: 7.1-7/df5740ad)
pve-kernel-helper: 7.1-6
pve-kernel-5.13: 7.1-5
pve-kernel-5.11: 7.0-10
pve-kernel-5.4: 6.4-7
pve-kernel-5.3: 6.1-6
pve-kernel-5.13.19-2-pve: 5.13.19-4
pve-kernel-5.13.19-1-pve: 5.13.19-3
pve-kernel-5.11.22-7-pve: 5.11.22-12
pve-kernel-5.4.143-1-pve: 5.4.143-1
pve-kernel-5.4.140-1-pve: 5.4.140-1
pve-kernel-5.3.18-3-pve: 5.3.18-3
pve-kernel-5.3.10-1-pve: 5.3.10-1
ceph: 16.2.6-pve2
ceph-fuse: 16.2.6-pve2
corosync: 3.1.5-pve2
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown: 0.8.36+pve1
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.22-pve2
libproxmox-acme-perl: 1.4.0
libproxmox-backup-qemu0: 1.2.0-1
libpve-access-control: 7.1-5
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.0-14
libpve-guest-common-perl: 4.0-3
libpve-http-server-perl: 4.0-4
libpve-storage-perl: 7.0-15
libqb0: 1.0.5-1
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 4.0.9-4
lxcfs: 4.0.8-pve2
novnc-pve: 1.2.0-3
proxmox-backup-client: 2.1.2-1
proxmox-backup-file-restore: 2.1.2-1
proxmox-mini-journalreader: 1.3-1
proxmox-widget-toolkit: 3.4-4
pve-cluster: 7.1-2
pve-container: 4.1-2
pve-docs: 7.1-2
pve-edk2-firmware: 3.20210831-2
pve-firewall: 4.2-5
pve-firmware: 3.3-3
pve-ha-manager: 3.3-1
pve-i18n: 2.6-2
pve-qemu-kvm: 6.1.0-3
pve-xtermjs: 4.12.0-1
qemu-server: 7.1-4
smartmontools: 7.2-pve2
spiceterm: 3.2-2
swtpm: 0.7.0~rc1+2
vncterm: 1.7-1
zfsutils-linux: 2.1.1-pve3
I have 6 VM's and 2 containers split evenly across the 2 nodes, and this is the only one having an issue.
Not sure what/where to look.