Hello,
I've installed a Proxmox clustered environment on the 5.2-2 version:
The cluster is based on 3 exactly similar nodes with the same exact versions running and the same physical architecture.
The cluster is healthy:
The storage is an also healthy ceph cluster:
For whatever reasons, I cannot do live migration on VM:
I've suspected network DNS issues, the SSH session initiation between nodes being longish (~1.5 sec) so I've added hosts information in the /etc/hosts file but to no avail:
Duckduckgo failed my so far and I cannot find a reasonable explanation to this issue.
Does anyone has an idea?
I've installed a Proxmox clustered environment on the 5.2-2 version:
Code:
root@srv-pve1:~# pveversion -v
proxmox-ve: 5.2-2 (running kernel: 4.15.18-7-pve)
pve-manager: 5.2-9 (running version: 5.2-9/4b30e8f9)
pve-kernel-4.15: 5.2-10
pve-kernel-4.15.18-7-pve: 4.15.18-27
pve-kernel-4.15.17-1-pve: 4.15.17-9
ceph: 12.2.8-pve1
corosync: 2.4.2-pve5
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: 1.2-2
libjs-extjs: 6.0.1-2
libpve-access-control: 5.0-8
libpve-apiclient-perl: 2.0-5
libpve-common-perl: 5.0-40
libpve-guest-common-perl: 2.0-18
libpve-http-server-perl: 2.0-11
libpve-storage-perl: 5.0-30
libqb0: 1.0.1-1
lvm2: 2.02.168-pve6
lxc-pve: 3.0.2+pve1-2
lxcfs: 3.0.2-2
novnc-pve: 1.0.0-2
proxmox-widget-toolkit: 1.0-20
pve-cluster: 5.0-30
pve-container: 2.0-28
pve-docs: 5.2-8
pve-firewall: 3.0-14
pve-firmware: 2.0-5
pve-ha-manager: 2.0-5
pve-i18n: 1.0-6
pve-libspice-server1: 0.12.8-3
pve-qemu-kvm: 2.11.2-1
pve-xtermjs: 1.0-5
qemu-server: 5.0-36
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3
zfsutils-linux: 0.7.11-pve1~bpo1
The cluster is based on 3 exactly similar nodes with the same exact versions running and the same physical architecture.
The cluster is healthy:
Code:
root@srv-pve1:~# pvecm status
Quorum information
------------------
Date: Wed Oct 17 07:57:12 2018
Quorum provider: corosync_votequorum
Nodes: 3
Node ID: 0x00000001
Ring ID: 1/64
Quorate: Yes
Votequorum information
----------------------
Expected votes: 3
Highest expected: 3
Total votes: 3
Quorum: 2
Flags: Quorate
Membership information
----------------------
Nodeid Votes Name
0x00000001 1 10.0.0.101 (local)
0x00000002 1 10.0.0.102
0x00000003 1 10.0.0.103
The storage is an also healthy ceph cluster:
Code:
root@srv-pve1:~# ceph-brag
{
"cluster_creation_date": "2018-10-11 17:51:26.259055",
"uuid": "dd52bfc1-5409-4730-8f3a-72637478418a",
"components_count": {
"num_data_bytes": 20857718616,
"num_mons": 3,
"num_pgs": 768,
"num_mdss": 0,
"num_pools": 2,
"num_osds": 18,
"num_bytes_total": 72002146295808,
"num_objects": 5911
},
"crush_types": [
{
"count": 6,
"type": "host"
},
{
"count": 2,
"type": "root"
},
{
"count": 18,
"type": "devices"
}
],
"ownership": {},
"pool_metadata": [
{
"type": 1,
"id": 6,
"size": 3
},
{
"type": 1,
"id": 8,
"size": 2
}
],
"sysinfo": {
"kernel_types": [
{
"count": 18,
"type": "#1 SMP PVE 4.15.18-27 (Wed, 10 Oct 2018 10:50:11 +0200)"
}
],
"cpu_archs": [
{
"count": 18,
"arch": "x86_64"
}
],
"cpus": [
{
"count": 18,
"cpu": "Intel(R) Xeon(R) CPU E5-2440 0 @ 2.40GHz"
}
],
"kernel_versions": [
{
"count": 18,
"version": "4.15.18-7-pve"
}
],
"ceph_versions": [
{
"count": 18,
"version": "12.2.8(6f01265ca03a6b9d7f3b7f759d8894bb9dbb6840)"
}
],
"os_info": [
{
"count": 18,
"os": "Linux"
}
],
"distros": []
}
}
For whatever reasons, I cannot do live migration on VM:
Code:
()
2018-10-17 07:48:19 starting migration of VM 101 to node 'srv-pve2' (192.168.1.102)
2018-10-17 07:48:19 copying disk images
2018-10-17 07:48:19 starting VM 101 on remote node 'srv-pve2'
2018-10-17 07:48:21 ERROR: online migrate failure - unable to detect remote migration address
2018-10-17 07:48:21 aborting phase 2 - cleanup resources
2018-10-17 07:48:21 migrate_cancel
2018-10-17 07:48:22 ERROR: migration finished with problems (duration 00:00:04)
TASK ERROR: migration problems
I've suspected network DNS issues, the SSH session initiation between nodes being longish (~1.5 sec) so I've added hosts information in the /etc/hosts file but to no avail:
Code:
root@srv-pve1:~# cat /etc/hosts
127.0.0.1 localhost.localdomain localhost
192.168.1.101 srv-pve1.mydomain.local srv-pve1 pvelocalhost
# The following lines are desirable for IPv6 capable hosts
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts
10.0.0.101 srv-pve1-private
10.0.0.102 srv-pve2-private
10.0.0.103 srv-pve3-private
192.168.1.101 srv-pve1
192.168.1.102 srv-pve2
192.168.1.103 srv-pve3
Duckduckgo failed my so far and I cannot find a reasonable explanation to this issue.
Does anyone has an idea?