We have a setup of three node proxmox cluster and NFS VMstore is used to store the vm also a nfs iso volume added in to the cluster.
Currently there are three VM's on this cluster and while we migrating vm from one node to another node some times it is failing by saying that nfsiso or vmstore volume is not online. But when we are checking the df -h there is no issue with the nfs volume. If we again try to do the vm migrate it works. I am wondering how the vm migrate worked on second time with out changing anything
Proxmox version is 6.1-3
Logs are attached here. Can any one help on this
2020-05-13 05:06:26 starting migration of VM 101 to node 'ascchypsrv3' (172.22.176.53)
2020-05-13 05:06:26 starting VM 101 on remote node 'ascchypsrv3'
2020-05-13 05:06:28 storage 'nfsiso' is not online
2020-05-13 05:06:28 ERROR: online migrate failure - command '/usr/bin/ssh -e none -o 'BatchMode=yes' -o 'HostKeyAlias=ascchypsrv3' root@172.22.176.53 qm start 101 --skiplock --migratedfrom ascchypsrv1 --migration_type secure --stateuri unix --machine pc-i440fx-4.1+pve1' failed: exit code 255
2020-05-13 05:06:28 aborting phase 2 - cleanup resources
2020-05-13 05:06:28 migrate_cancel
2020-05-13 05:06:29 ERROR: migration finished with problems (duration 00:00:04)
TASK ERROR: migration problems
ping vigyaan-scn.issdc.gov.in
root@ascchypsrv1:~# qm migrate 102 ascchypsrv2
can't migrate running VM without --online
root@ascchypsrv1:~# qm migrate 102 ascchypsrv2 --online
2020-05-13 05:27:55 starting migration of VM 102 to node 'ascchypsrv2' (172.22.176.52)
2020-05-13 05:27:55 starting VM 102 on remote node 'ascchypsrv2'
2020-05-13 05:27:58 storage 'nfsiso' is not online
2020-05-13 05:27:58 ERROR: online migrate failure - command '/usr/bin/ssh -e none -o 'BatchMode=yes' -o 'HostKeyAlias=ascchypsrv2' root@172.22.176.52 qm start 102 --skiplock --migratedfrom ascchypsrv1 --migration_type secure --stateuri unix --machine pc-i440fx-4.1+pve1' failed: exit code 255
2020-05-13 05:27:58 aborting phase 2 - cleanup resources
2020-05-13 05:27:58 migrate_cancel
2020-05-13 05:27:58 ERROR: migration finished with problems (duration 00:00:04)
migration problems
root@ascchypsrv1:~# qm migrate 102 ascchypsrv2 --online
2020-05-13 05:28:09 starting migration of VM 102 to node 'ascchypsrv2' (172.22.176.52)
2020-05-13 05:28:10 starting VM 102 on remote node 'ascchypsrv2'
2020-05-13 05:28:11 start remote tunnel
2020-05-13 05:28:12 ssh tunnel ver 1
2020-05-13 05:28:12 starting online/live migration on unix:/run/qemu-server/102.migrate
2020-05-13 05:28:12 migrate_set_speed: 8589934592
2020-05-13 05:28:12 migrate_set_downtime: 0.1
2020-05-13 05:28:12 set migration_caps
2020-05-13 05:28:12 set cachesize: 2147483648
2020-05-13 05:28:12 start migrate command to unix:/run/qemu-server/102.migrate
2020-05-13 05:28:13 migration status: active (transferred 100662522, remaining 14132170752), total 17197506560)
2020-05-13 05:28:13 migration xbzrle cachesize: 2147483648 transferred 0 pages 0 cachemiss 0 overflow 0
2020-05-13 05:28:14 migration status: active (transferred 187400409, remaining 9360424960), total 17197506560)
2020-05-13 05:28:14 migration xbzrle cachesize: 2147483648 transferred 0 pages 0 cachemiss 0 overflow 0
2020-05-13 05:28:15 migration status: active (transferred 256946683, remaining 1051828224), total 17197506560)
2020-05-13 05:28:15 migration xbzrle cachesize: 2147483648 transferred 0 pages 0 cachemiss 0 overflow 0
2020-05-13 05:28:16 migration status: active (transferred 374636235, remaining 925507584), total 17197506560)
2020-05-13 05:28:16 migration xbzrle cachesize: 2147483648 transferred 0 pages 0 cachemiss 0 overflow 0
2020-05-13 05:28:17 migration status: active (transferred 492379103, remaining 786104320), total 17197506560)
2020-05-13 05:28:17 migration xbzrle cachesize: 2147483648 transferred 0 pages 0 cachemiss 0 overflow 0
2020-05-13 05:28:18 migration status: active (transferred 610169860, remaining 662179840), total 17197506560)
2020-05-13 05:28:18 migration xbzrle cachesize: 2147483648 transferred 0 pages 0 cachemiss 0 overflow 0
2020-05-13 05:28:19 migration status: active (transferred 728040474, remaining 542912512), total 17197506560)
2020-05-13 05:28:19 migration xbzrle cachesize: 2147483648 transferred 0 pages 0 cachemiss 0 overflow 0
2020-05-13 05:28:20 migration status: active (transferred 845673488, remaining 410640384), total 17197506560)
2020-05-13 05:28:20 migration xbzrle cachesize: 2147483648 transferred 0 pages 0 cachemiss 0 overflow 0
2020-05-13 05:28:21 migration status: active (transferred 962757250, remaining 224559104), total 17197506560)
2020-05-13 05:28:21 migration xbzrle cachesize: 2147483648 transferred 0 pages 0 cachemiss 0 overflow 0
2020-05-13 05:28:22 migration status: active (transferred 1080373938, remaining 107171840), total 17197506560)
2020-05-13 05:28:22 migration xbzrle cachesize: 2147483648 transferred 0 pages 0 cachemiss 0 overflow 0
2020-05-13 05:28:23 migration speed: 1489.45 MB/s - downtime 98 ms
2020-05-13 05:28:23 migration status: completed
2020-05-13 05:28:25 migration finished successfully (duration 00:00:16)
auto lo
iface lo inet loopback
iface ens3f0 inet manual
iface ens3f1 inet manual
iface ens2f0 inet manual
iface ens2f1 inet manual
iface eno1 inet manual
iface eno2 inet manual
auto bond0
iface bond0 inet manual
bond-slaves ens2f0 ens3f0
bond-miimon 100
bond-mode active-backup
#1Gig bond
auto bond1
iface bond1 inet manual
bond-slaves eno1 eno2
bond-miimon 100
bond-mode active-backup
#10Gig bond
auto vmbr0
iface vmbr0 inet static
address 172.22.176.51
netmask 255.255.255.0
gateway 172.22.176.3
bridge-ports bond0
bridge-stp off
bridge-fd 0
#1Gig for managment
auto vmbr1
iface vmbr1 inet manual
bridge-ports bond1
bridge-stp off
bridge-fd 0
#10Gig for data/vm
root@ascchypsrv1:~# cat /etc/pve/storage.cfg
dir: local
path /var/lib/vz
content vztmpl,backup,iso
lvmthin: local-lvm
thinpool data
vgname pve
content images,rootdir
nfs: idsnasccVMStore
export /ifs/data/sonas/idsnascc-vmstore
path /mnt/pve/idsnasccVMStore
server vigyaan-scn.issdc.gov.in
content images
nfs: nfsiso
export /ifs/data/sonas/nfsiso
path /mnt/pve/nfsiso
server vigyaan-scn.issdc.gov.in
content iso
root@ascchypsrv1:~# pveversion
pve-manager/6.1-3/37248ce6 (running kernel: 5.3.10-1-pve)
root@ascchypsrv1:~# corosync
May 13 08:32:03 notice [MAIN ] Corosync Cluster Engine 3.0.2 starting up
May 13 08:32:03 info [MAIN ] Corosync built-in features: dbus monitoring watchdog systemd xmlconf snmp pie relro bindnow
root@ascchypsrv1:~# corosync -version
Corosync Cluster Engine, version '3.0.2'
Copyright (c) 2006-2018 Red Hat, Inc.
root@ascchypsrv1:~#
root@ascchypsrv1:~# pvesm status
Name Type Status Total Used Available %
idsnasccVMStore nfs active 2147483648 278856704 1868626944 12.99%
local dir active 98559220 1957156 91552516 1.99%
local-lvm lvmthin active 449990656 0 449990656 0.00%
nfsiso nfs active 104857600 49415168 55442432 47.13%
root@ascchypsrv1:~# pvecm status
Cluster information
-------------------
Name: idsnascc
Config Version: 3
Transport: knet
Secure auth: on
Quorum information
------------------
Date: Wed May 13 08:33:08 2020
Quorum provider: corosync_votequorum
Nodes: 3
Node ID: 0x00000001
Ring ID: 1.54
Quorate: Yes
Votequorum information
----------------------
Expected votes: 3
Highest expected: 3
Total votes: 3
Quorum: 2
Flags: Quorate
Membership information
----------------------
Nodeid Votes Name
0x00000001 1 172.22.176.51 (local)
0x00000002 1 172.22.176.52
0x00000003 1 172.22.176.53
root@ascchypsrv1:~#
Currently there are three VM's on this cluster and while we migrating vm from one node to another node some times it is failing by saying that nfsiso or vmstore volume is not online. But when we are checking the df -h there is no issue with the nfs volume. If we again try to do the vm migrate it works. I am wondering how the vm migrate worked on second time with out changing anything
Proxmox version is 6.1-3
Logs are attached here. Can any one help on this
2020-05-13 05:06:26 starting migration of VM 101 to node 'ascchypsrv3' (172.22.176.53)
2020-05-13 05:06:26 starting VM 101 on remote node 'ascchypsrv3'
2020-05-13 05:06:28 storage 'nfsiso' is not online
2020-05-13 05:06:28 ERROR: online migrate failure - command '/usr/bin/ssh -e none -o 'BatchMode=yes' -o 'HostKeyAlias=ascchypsrv3' root@172.22.176.53 qm start 101 --skiplock --migratedfrom ascchypsrv1 --migration_type secure --stateuri unix --machine pc-i440fx-4.1+pve1' failed: exit code 255
2020-05-13 05:06:28 aborting phase 2 - cleanup resources
2020-05-13 05:06:28 migrate_cancel
2020-05-13 05:06:29 ERROR: migration finished with problems (duration 00:00:04)
TASK ERROR: migration problems
ping vigyaan-scn.issdc.gov.in
root@ascchypsrv1:~# qm migrate 102 ascchypsrv2
can't migrate running VM without --online
root@ascchypsrv1:~# qm migrate 102 ascchypsrv2 --online
2020-05-13 05:27:55 starting migration of VM 102 to node 'ascchypsrv2' (172.22.176.52)
2020-05-13 05:27:55 starting VM 102 on remote node 'ascchypsrv2'
2020-05-13 05:27:58 storage 'nfsiso' is not online
2020-05-13 05:27:58 ERROR: online migrate failure - command '/usr/bin/ssh -e none -o 'BatchMode=yes' -o 'HostKeyAlias=ascchypsrv2' root@172.22.176.52 qm start 102 --skiplock --migratedfrom ascchypsrv1 --migration_type secure --stateuri unix --machine pc-i440fx-4.1+pve1' failed: exit code 255
2020-05-13 05:27:58 aborting phase 2 - cleanup resources
2020-05-13 05:27:58 migrate_cancel
2020-05-13 05:27:58 ERROR: migration finished with problems (duration 00:00:04)
migration problems
root@ascchypsrv1:~# qm migrate 102 ascchypsrv2 --online
2020-05-13 05:28:09 starting migration of VM 102 to node 'ascchypsrv2' (172.22.176.52)
2020-05-13 05:28:10 starting VM 102 on remote node 'ascchypsrv2'
2020-05-13 05:28:11 start remote tunnel
2020-05-13 05:28:12 ssh tunnel ver 1
2020-05-13 05:28:12 starting online/live migration on unix:/run/qemu-server/102.migrate
2020-05-13 05:28:12 migrate_set_speed: 8589934592
2020-05-13 05:28:12 migrate_set_downtime: 0.1
2020-05-13 05:28:12 set migration_caps
2020-05-13 05:28:12 set cachesize: 2147483648
2020-05-13 05:28:12 start migrate command to unix:/run/qemu-server/102.migrate
2020-05-13 05:28:13 migration status: active (transferred 100662522, remaining 14132170752), total 17197506560)
2020-05-13 05:28:13 migration xbzrle cachesize: 2147483648 transferred 0 pages 0 cachemiss 0 overflow 0
2020-05-13 05:28:14 migration status: active (transferred 187400409, remaining 9360424960), total 17197506560)
2020-05-13 05:28:14 migration xbzrle cachesize: 2147483648 transferred 0 pages 0 cachemiss 0 overflow 0
2020-05-13 05:28:15 migration status: active (transferred 256946683, remaining 1051828224), total 17197506560)
2020-05-13 05:28:15 migration xbzrle cachesize: 2147483648 transferred 0 pages 0 cachemiss 0 overflow 0
2020-05-13 05:28:16 migration status: active (transferred 374636235, remaining 925507584), total 17197506560)
2020-05-13 05:28:16 migration xbzrle cachesize: 2147483648 transferred 0 pages 0 cachemiss 0 overflow 0
2020-05-13 05:28:17 migration status: active (transferred 492379103, remaining 786104320), total 17197506560)
2020-05-13 05:28:17 migration xbzrle cachesize: 2147483648 transferred 0 pages 0 cachemiss 0 overflow 0
2020-05-13 05:28:18 migration status: active (transferred 610169860, remaining 662179840), total 17197506560)
2020-05-13 05:28:18 migration xbzrle cachesize: 2147483648 transferred 0 pages 0 cachemiss 0 overflow 0
2020-05-13 05:28:19 migration status: active (transferred 728040474, remaining 542912512), total 17197506560)
2020-05-13 05:28:19 migration xbzrle cachesize: 2147483648 transferred 0 pages 0 cachemiss 0 overflow 0
2020-05-13 05:28:20 migration status: active (transferred 845673488, remaining 410640384), total 17197506560)
2020-05-13 05:28:20 migration xbzrle cachesize: 2147483648 transferred 0 pages 0 cachemiss 0 overflow 0
2020-05-13 05:28:21 migration status: active (transferred 962757250, remaining 224559104), total 17197506560)
2020-05-13 05:28:21 migration xbzrle cachesize: 2147483648 transferred 0 pages 0 cachemiss 0 overflow 0
2020-05-13 05:28:22 migration status: active (transferred 1080373938, remaining 107171840), total 17197506560)
2020-05-13 05:28:22 migration xbzrle cachesize: 2147483648 transferred 0 pages 0 cachemiss 0 overflow 0
2020-05-13 05:28:23 migration speed: 1489.45 MB/s - downtime 98 ms
2020-05-13 05:28:23 migration status: completed
2020-05-13 05:28:25 migration finished successfully (duration 00:00:16)
auto lo
iface lo inet loopback
iface ens3f0 inet manual
iface ens3f1 inet manual
iface ens2f0 inet manual
iface ens2f1 inet manual
iface eno1 inet manual
iface eno2 inet manual
auto bond0
iface bond0 inet manual
bond-slaves ens2f0 ens3f0
bond-miimon 100
bond-mode active-backup
#1Gig bond
auto bond1
iface bond1 inet manual
bond-slaves eno1 eno2
bond-miimon 100
bond-mode active-backup
#10Gig bond
auto vmbr0
iface vmbr0 inet static
address 172.22.176.51
netmask 255.255.255.0
gateway 172.22.176.3
bridge-ports bond0
bridge-stp off
bridge-fd 0
#1Gig for managment
auto vmbr1
iface vmbr1 inet manual
bridge-ports bond1
bridge-stp off
bridge-fd 0
#10Gig for data/vm
root@ascchypsrv1:~# cat /etc/pve/storage.cfg
dir: local
path /var/lib/vz
content vztmpl,backup,iso
lvmthin: local-lvm
thinpool data
vgname pve
content images,rootdir
nfs: idsnasccVMStore
export /ifs/data/sonas/idsnascc-vmstore
path /mnt/pve/idsnasccVMStore
server vigyaan-scn.issdc.gov.in
content images
nfs: nfsiso
export /ifs/data/sonas/nfsiso
path /mnt/pve/nfsiso
server vigyaan-scn.issdc.gov.in
content iso
root@ascchypsrv1:~# pveversion
pve-manager/6.1-3/37248ce6 (running kernel: 5.3.10-1-pve)
root@ascchypsrv1:~# corosync
May 13 08:32:03 notice [MAIN ] Corosync Cluster Engine 3.0.2 starting up
May 13 08:32:03 info [MAIN ] Corosync built-in features: dbus monitoring watchdog systemd xmlconf snmp pie relro bindnow
root@ascchypsrv1:~# corosync -version
Corosync Cluster Engine, version '3.0.2'
Copyright (c) 2006-2018 Red Hat, Inc.
root@ascchypsrv1:~#
root@ascchypsrv1:~# pvesm status
Name Type Status Total Used Available %
idsnasccVMStore nfs active 2147483648 278856704 1868626944 12.99%
local dir active 98559220 1957156 91552516 1.99%
local-lvm lvmthin active 449990656 0 449990656 0.00%
nfsiso nfs active 104857600 49415168 55442432 47.13%
root@ascchypsrv1:~# pvecm status
Cluster information
-------------------
Name: idsnascc
Config Version: 3
Transport: knet
Secure auth: on
Quorum information
------------------
Date: Wed May 13 08:33:08 2020
Quorum provider: corosync_votequorum
Nodes: 3
Node ID: 0x00000001
Ring ID: 1.54
Quorate: Yes
Votequorum information
----------------------
Expected votes: 3
Highest expected: 3
Total votes: 3
Quorum: 2
Flags: Quorate
Membership information
----------------------
Nodeid Votes Name
0x00000001 1 172.22.176.51 (local)
0x00000002 1 172.22.176.52
0x00000003 1 172.22.176.53
root@ascchypsrv1:~#