Testing live migration on 4 node quorate cluster. It is not 100% of cases but it is reproducible. I migrate vm from one node to another and I get this
VM is on the target node but in the paused state
When migration is successful I have following sequence
When Migration fails to complete I am getting
No resume as you see
I can run resume manually and VM starts to work but that is not an acceptable solution.
Cluster Config
pvecm status
Syslog virtn2n1-la (source node of this migration)
Syslog virtn2-la (target node of this migration)
Also when I do pvecm status, I get this in syslog (don't know if it relevant)
I tried changing hardware and options on VM. I tested migration with and without HA. It can be reproduced in both scenarios.
I don't remember these issues in 3.4 when I tested.
Code:
task started by HA resource agent
Oct 09 22:04:22 starting migration of VM 100 to node 'virt2n2-la' (38.102.250.229)
Oct 09 22:04:22 copying disk images
Oct 09 22:04:22 starting VM 100 on remote node 'virt2n2-la'
Oct 09 22:04:24 starting ssh migration tunnel
Oct 09 22:04:24 starting online/live migration on localhost:60000
Oct 09 22:04:24 migrate_set_speed: 8589934592
Oct 09 22:04:24 migrate_set_downtime: 0.1
Oct 09 22:04:26 migration status: active (transferred 240337644, remaining 61005824), total 2156601344)
Oct 09 22:04:26 migration xbzrle cachesize: 134217728 transferred 0 pages 0 cachemiss 0 overflow 0
Oct 09 22:04:28 migration speed: 512.00 MB/s - downtime 11 ms
Oct 09 22:04:28 migration status: completed
Oct 09 22:04:29 ERROR: unable to find configuration file for VM 100 - no such machine
Oct 09 22:04:29 ERROR: command '/usr/bin/ssh -o 'BatchMode=yes' root@38.102.250.229 qm resume 100 --skiplock' failed: exit code 2
Oct 09 22:04:31 ERROR: migration finished with problems (duration 00:00:09)
TASK ERROR: migration problems
VM is on the target node but in the paused state
When migration is successful I have following sequence
Code:
virt2n2-la HA 100 Migrate
Executing HA migrate for VM 100 to node virt2n1-la
TASK OK
virt2n1-la VM 100 Start
Running as unit 100.scope.
libust[43261/43261]: Error: Error cancelling global ust listener thread: No such process (in lttng_ust_exit() at lttng-ust-comm.c:1592)
libust[43261/43261]: Error: Error cancelling local ust listener thread: No such process (in lttng_ust_exit() at lttng-ust-comm.c:1601)
migration listens on tcp:[localhost]:60000
TASK OK
virt2n2-la VM 100 Migrate
task started by HA resource agent
Oct 09 22:03:06 starting migration of VM 100 to node 'virt2n1-la' (38.102.250.228)
Oct 09 22:03:06 copying disk images
Oct 09 22:03:06 starting VM 100 on remote node 'virt2n1-la'
Oct 09 22:03:07 starting ssh migration tunnel
Oct 09 22:03:08 starting online/live migration on localhost:60000
Oct 09 22:03:08 migrate_set_speed: 8589934592
Oct 09 22:03:08 migrate_set_downtime: 0.1
Oct 09 22:03:10 migration status: active (transferred 239552889, remaining 60329984), total 2156601344)
Oct 09 22:03:10 migration xbzrle cachesize: 134217728 transferred 0 pages 0 cachemiss 0 overflow 0
Oct 09 22:03:12 migration speed: 512.00 MB/s - downtime 19 ms
Oct 09 22:03:12 migration status: completed
Oct 09 22:03:15 migration finished successfully (duration 00:00:09)
TASK OK
virt2n1-la VM 100 Resume
TASK OK
When Migration fails to complete I am getting
Code:
virt2n1-la HA 100 Migrate
Executing HA migrate for VM 100 to node virt2n2-la
TASK OK
virt2n1-la VM 100 Migrate
task started by HA resource agent
Oct 09 22:04:22 starting migration of VM 100 to node 'virt2n2-la' (38.102.250.229)
Oct 09 22:04:22 copying disk images
Oct 09 22:04:22 starting VM 100 on remote node 'virt2n2-la'
Oct 09 22:04:24 starting ssh migration tunnel
Oct 09 22:04:24 starting online/live migration on localhost:60000
Oct 09 22:04:24 migrate_set_speed: 8589934592
Oct 09 22:04:24 migrate_set_downtime: 0.1
Oct 09 22:04:26 migration status: active (transferred 240337644, remaining 61005824), total 2156601344)
Oct 09 22:04:26 migration xbzrle cachesize: 134217728 transferred 0 pages 0 cachemiss 0 overflow 0
Oct 09 22:04:28 migration speed: 512.00 MB/s - downtime 11 ms
Oct 09 22:04:28 migration status: completed
Oct 09 22:04:29 ERROR: unable to find configuration file for VM 100 - no such machine
Oct 09 22:04:29 ERROR: command '/usr/bin/ssh -o 'BatchMode=yes' root@38.102.250.229 qm resume 100 --skiplock' failed: exit code 2
Oct 09 22:04:31 ERROR: migration finished with problems (duration 00:00:09)
TASK ERROR: migration problems
virt2n2-la VM 100 Start
Running as unit 100.scope.
libust[41997/41997]: Error: Error cancelling global ust listener thread: No such process (in lttng_ust_exit() at lttng-ust-comm.c:1592)
libust[41997/41997]: Error: Error cancelling local ust listener thread: No such process (in lttng_ust_exit() at lttng-ust-comm.c:1601)
migration listens on tcp:[localhost]:60000
TASK OK
No resume as you see
I can run resume manually and VM starts to work but that is not an acceptable solution.
Code:
VM Config
bootdisk: virtio0
cores: 2
cpu: host
hotplug: disk,network
ide2: none,media=cdrom
localtime: 1
memory: 2048
name: monitor1n1-la.siptalk.com
net0: virtio=D2:2B:82:35:92:EC,bridge=vmbr0,firewall=1,queues=8
net1: virtio=B2:72:6B:66:44:E4,bridge=vmbr1,firewall=1
numa: 0
ostype: l26
protection: 1
scsihw: virtio-scsi-pci
smbios1: uuid=f38594bd-b5b0-4253-8a95-f192ec4cfdf7
sockets: 2
virtio0: ceph:vm-100-disk-1,discard=on,backup=no,iothread=on,size=32G
Cluster Config
Code:
logging {
debug: on
to_syslog: yes
}
nodelist {
node {
nodeid: 4
quorum_votes: 1
ring0_addr: virt2n4-la
ring1_addr: virt2n4-la-int
}
node {
nodeid: 2
quorum_votes: 1
ring0_addr: virt2n2-la
ring1_addr: virt2n2-la-int
}
node {
nodeid: 3
quorum_votes: 1
ring0_addr: virt2n3-la
ring1_addr: virt2n3-la-int
}
node {
nodeid: 1
quorum_votes: 1
ring0_addr: virt2n1-la
ring1_addr: virt2n1-la-int
}
}
quorum {
provider: corosync_votequorum
}
totem {
cluster_name: virt2-jcluster
config_version: 4
ip_version: ipv4
rrp_mode: passive
secauth: on
version: 2
interface {
mcastaddr: 239.192.170.222
mcastport: 5405
ringnumber: 0
ttl: 1
}
interface {
mcastaddr: 239.192.170.223
mcastport: 5405
ringnumber: 1
ttl: 1
}
}
pvecm status
Code:
Quorum information
------------------
Date: Fri Oct 9 22:28:04 2015
Quorum provider: corosync_votequorum
Nodes: 4
Node ID: 0x00000002
Ring ID: 368
Quorate: Yes
Votequorum information
----------------------
Expected votes: 4
Highest expected: 4
Total votes: 4
Quorum: 3
Flags: Quorate
Membership information
----------------------
Nodeid Votes Name
0x00000001 1 38.102.250.228
0x00000002 1 38.102.250.229 (local)
0x00000003 1 38.102.250.231
0x00000004 1 38.102.250.232
Syslog virtn2n1-la (source node of this migration)
Code:
Oct 9 22:04:11 virt2n1-la pvedaemon[42548]: <root@pam> starting task UPID:virt2n1-la:0000AAB9:000DB8F1:56189C4B:hamigrate:100:root@pam:
Oct 9 22:04:22 virt2n1-la pve-ha-lrm[43783]: <root@pam> starting task UPID:virt2n1-la:0000AB08:000DBD8B:56189C56:qmigrate:100:root@pam:
Oct 9 22:04:23 virt2n1-la pmxcfs[1569]: [status] notice: received log
Oct 9 22:04:23 virt2n1-la pve-ha-lrm[43783]: Task still active, waiting
Oct 9 22:04:24 virt2n1-la pmxcfs[1569]: [status] notice: received log
Oct 9 22:04:24 virt2n1-la pve-ha-lrm[43783]: Task still active, waiting
Oct 9 22:04:25 virt2n1-la pve-ha-lrm[43783]: Task still active, waiting
Oct 9 22:04:26 virt2n1-la pve-ha-lrm[43783]: Task still active, waiting
Oct 9 22:04:27 virt2n1-la pve-ha-lrm[43783]: Task still active, waiting
Oct 9 22:04:28 virt2n1-la pve-ha-lrm[43783]: Task still active, waiting
Oct 9 22:04:29 virt2n1-la kernel: [ 9018.495736] fwbr100i0: port 2(tap100i0) entered disabled state
Oct 9 22:04:29 virt2n1-la kernel: [ 9018.497498] fwbr100i0: port 1(fwln100i0) entered disabled state
Oct 9 22:04:29 virt2n1-la kernel: [ 9018.497707] vmbr0: port 3(fwpr100p0) entered disabled state
Oct 9 22:04:29 virt2n1-la kernel: [ 9018.497862] device fwln100i0 left promiscuous mode
Oct 9 22:04:29 virt2n1-la kernel: [ 9018.497498] fwbr100i0: port 1(fwln100i0) entered disabled state
Oct 9 22:04:29 virt2n1-la kernel: [ 9018.497707] vmbr0: port 3(fwpr100p0) entered disabled state
Oct 9 22:04:29 virt2n1-la kernel: [ 9018.497862] device fwln100i0 left promiscuous mode
Oct 9 22:04:29 virt2n1-la kernel: [ 9018.497863] fwbr100i0: port 1(fwln100i0) entered disabled state
Oct 9 22:04:29 virt2n1-la kernel: [ 9018.498158] device fwpr100p0 left promiscuous mode
Oct 9 22:04:29 virt2n1-la kernel: [ 9018.498160] vmbr0: port 3(fwpr100p0) entered disabled state
Oct 9 22:04:29 virt2n1-la kernel: [ 9018.688578] fwbr100i1: port 2(tap100i1) entered disabled state
Oct 9 22:04:29 virt2n1-la kernel: [ 9018.690246] fwbr100i1: port 1(fwln100i1) entered disabled state
Oct 9 22:04:29 virt2n1-la kernel: [ 9018.690449] vmbr1: port 2(fwpr100p1) entered disabled state
Oct 9 22:04:29 virt2n1-la kernel: [ 9018.690611] device fwln100i1 left promiscuous mode
Oct 9 22:04:29 virt2n1-la kernel: [ 9018.690612] fwbr100i1: port 1(fwln100i1) entered disabled state
Oct 9 22:04:29 virt2n1-la kernel: [ 9018.690965] device fwpr100p1 left promiscuous mode
Oct 9 22:04:29 virt2n1-la kernel: [ 9018.690966] vmbr1: port 2(fwpr100p1) entered disabled state
Oct 9 22:04:29 virt2n1-la pve-ha-lrm[43783]: Task still active, waiting
Oct 9 22:04:30 virt2n1-la pve-ha-lrm[43783]: Task still active, waiting
Oct 9 22:04:31 virt2n1-la ntpd[7311]: Deleting interface #111 fwpr100p1, fe80::20ff:3fff:fe3f:8696#123, interface stats: received=0, se
nt=0, dropped=0, active_time=80 secs
Oct 9 22:04:31 virt2n1-la ntpd[7311]: Deleting interface #110 fwln100i0, fe80::f8a4:70ff:fe6b:a1a3#123, interface stats: received=0, se
nt=0, dropped=0, active_time=80 secs
Oct 9 22:04:31 virt2n1-la ntpd[7311]: Deleting interface #109 tap100i1, fe80::f430:43ff:fe85:eda5#123, interface stats: received=0, sen
t=0, dropped=0, active_time=80 secs
Oct 9 22:04:31 virt2n1-la ntpd[7311]: Deleting interface #108 fwbr100i0, fe80::e8f2:d8ff:fe73:5140#123, interface stats: received=0, se
nt=0, dropped=0, active_time=80 secs
Oct 9 22:04:31 virt2n1-la ntpd[7311]: Deleting interface #107 tap100i0, fe80::8467:feff:fe10:c470#123, interface stats: received=0, sen
t=0, dropped=0, active_time=80 secs
Oct 9 22:04:31 virt2n1-la ntpd[7311]: Deleting interface #106 fwbr100i1, fe80::f0bb:17ff:fe40:f487#123, interface stats: received=0, se
nt=0, dropped=0, active_time=80 secs
Oct 9 22:04:31 virt2n1-la ntpd[7311]: Deleting interface #105 fwpr100p0, fe80::b043:31ff:fe57:5778#123, interface stats: received=0, se
nt=0, dropped=0, active_time=80 secs
Oct 9 22:04:31 virt2n1-la ntpd[7311]: Deleting interface #104 fwln100i1, fe80::e89c:2bff:fe2d:ebd7#123, interface stats: received=0, se
nt=0, dropped=0, active_time=80 secs
Oct 9 22:04:31 virt2n1-la ntpd[7311]: peers refreshed
Oct 9 22:04:31 virt2n1-la pve-ha-lrm[43784]: migration problems
Oct 9 22:04:31 virt2n1-la pve-ha-lrm[43783]: <root@pam> end task UPID:virt2n1-la:0000AB08:000DBD8B:56189C56:qmigrate:100:root@pam: migr
ation problems
Syslog virtn2-la (target node of this migration)
Code:
Oct 9 22:04:23 virt2n2-la qm[41978]: <root@pam> starting task UPID:virt2n2-la:0000A3FE:000CA705:56189C57:qmstart:100:root@pam:
Oct 9 22:04:23 virt2n2-la qm[41982]: start VM 100: UPID:virt2n2-la:0000A3FE:000CA705:56189C57:qmstart:100:root@pam:
Oct 9 22:04:23 virt2n2-la kernel: [ 8298.986081] device tap100i0 entered promiscuous mode
Oct 9 22:04:23 virt2n2-la kernel: [ 8298.999154] IPv6: ADDRCONF(NETDEV_UP): fwln100i0: link is not ready
Oct 9 22:04:23 virt2n2-la kernel: [ 8299.001568] IPv6: ADDRCONF(NETDEV_CHANGE): fwln100i0: link becomes ready
Oct 9 22:04:23 virt2n2-la kernel: [ 8299.003493] device fwln100i0 entered promiscuous mode
Oct 9 22:04:23 virt2n2-la kernel: [ 8299.003693] fwbr100i0: port 1(fwln100i0) entered forwarding state
Oct 9 22:04:23 virt2n2-la kernel: [ 8299.003698] fwbr100i0: port 1(fwln100i0) entered forwarding state
Oct 9 22:04:23 virt2n2-la kernel: [ 8299.005755] device fwpr100p0 entered promiscuous mode
Oct 9 22:04:23 virt2n2-la kernel: [ 8299.005845] vmbr0: port 2(fwpr100p0) entered forwarding state
Oct 9 22:04:23 virt2n2-la kernel: [ 8299.005852] vmbr0: port 2(fwpr100p0) entered forwarding state
Oct 9 22:04:23 virt2n2-la kernel: [ 8299.005852] vmbr0: port 2(fwpr100p0) entered forwarding state
Oct 9 22:04:23 virt2n2-la kernel: [ 8299.007633] fwbr100i0: port 2(tap100i0) entered forwarding state
Oct 9 22:04:23 virt2n2-la kernel: [ 8299.007651] fwbr100i0: port 2(tap100i0) entered forwarding state
Oct 9 22:04:24 virt2n2-la kernel: [ 8299.327133] device tap100i1 entered promiscuous mode
Oct 9 22:04:24 virt2n2-la kernel: [ 8299.339725] IPv6: ADDRCONF(NETDEV_UP): fwln100i1: link is not ready
Oct 9 22:04:24 virt2n2-la kernel: [ 8299.342323] IPv6: ADDRCONF(NETDEV_CHANGE): fwln100i1: link becomes ready
Oct 9 22:04:24 virt2n2-la kernel: [ 8299.344602] device fwln100i1 entered promiscuous mode
Oct 9 22:04:24 virt2n2-la kernel: [ 8299.344817] fwbr100i1: port 1(fwln100i1) entered forwarding state
Oct 9 22:04:24 virt2n2-la kernel: [ 8299.344835] fwbr100i1: port 1(fwln100i1) entered forwarding state
Oct 9 22:04:24 virt2n2-la kernel: [ 8299.346551] device fwpr100p1 entered promiscuous mode
Oct 9 22:04:24 virt2n2-la kernel: [ 8299.346782] vmbr1: port 2(fwpr100p1) entered forwarding state
Oct 9 22:04:24 virt2n2-la kernel: [ 8299.346798] vmbr1: port 2(fwpr100p1) entered forwarding state
Oct 9 22:04:24 virt2n2-la kernel: [ 8299.348441] fwbr100i1: port 2(tap100i1) entered forwarding state
Oct 9 22:04:24 virt2n2-la kernel: [ 8299.348448] fwbr100i1: port 2(tap100i1) entered forwarding state
Oct 9 22:04:24 virt2n2-la qm[41978]: <root@pam> end task UPID:virt2n2-la:0000A3FE:000CA705:56189C57:qmstart:100:root@pam: OK
Oct 9 22:04:28 virt2n2-la ntpd[4117]: Listen normally on 108 tap100i0 fe80::4032:98ff:fe87:eeac UDP 123
Oct 9 22:04:28 virt2n2-la ntpd[4117]: Listen normally on 109 fwln100i1 fe80::4c3a:93ff:fe81:443b UDP 123
Oct 9 22:04:28 virt2n2-la ntpd[4117]: Listen normally on 110 tap100i1 fe80::7051:11ff:fed2:566e UDP 123
Oct 9 22:04:28 virt2n2-la ntpd[4117]: Listen normally on 111 fwln100i0 fe80::7c20:dcff:fe33:9bc9 UDP 123
Oct 9 22:04:28 virt2n2-la ntpd[4117]: Listen normally on 112 fwpr100p1 fe80::9861:84ff:fef2:fa66 UDP 123
Oct 9 22:04:28 virt2n2-la ntpd[4117]: Listen normally on 113 fwbr100i1 fe80::1c87:bfff:fee4:4276 UDP 123
Oct 9 22:04:28 virt2n2-la ntpd[4117]: Listen normally on 114 fwbr100i0 fe80::94f8:f6ff:fe65:89e6 UDP 123
Oct 9 22:04:28 virt2n2-la ntpd[4117]: Listen normally on 115 fwpr100p0 fe80::5411:a6ff:fee1:a128 UDP 123
Oct 9 22:04:28 virt2n2-la ntpd[4117]: peers refreshed
Also when I do pvecm status, I get this in syslog (don't know if it relevant)
Code:
Oct 9 22:48:44 virt2n2-la corosync[2261]: [QB ] IPC credentials authenticated (2261-61669-27)
Oct 9 22:48:44 virt2n2-la corosync[2261]: [QB ] connecting to client [61669]
Oct 9 22:48:44 virt2n2-la corosync[2261]: [QB ] shm size:1048589; real_size:1052672; rb->word_size:263168
Oct 9 22:48:44 virt2n2-la corosync[2261]: [QB ] shm size:1048589; real_size:1052672; rb->word_size:263168
Oct 9 22:48:44 virt2n2-la corosync[2261]: [QB ] shm size:1048589; real_size:1052672; rb->word_size:263168
Oct 9 22:48:44 virt2n2-la corosync[2261]: [MAIN ] connection created
Oct 9 22:48:44 virt2n2-la corosync[2261]: [CMAP ] lib_init_fn: conn=0x562af4146f30
Oct 9 22:48:44 virt2n2-la corosync[2261]: [QB ] IPC credentials authenticated (2261-61669-28)
Oct 9 22:48:44 virt2n2-la corosync[2261]: [QB ] connecting to client [61669]
Oct 9 22:48:44 virt2n2-la corosync[2261]: [QB ] shm size:1048589; real_size:1052672; rb->word_size:263168
Oct 9 22:48:44 virt2n2-la corosync[2261]: [QB ] shm size:1048589; real_size:1052672; rb->word_size:263168
Oct 9 22:48:44 virt2n2-la corosync[2261]: [QB ] shm size:1048589; real_size:1052672; rb->word_size:263168
Oct 9 22:48:44 virt2n2-la corosync[2261]: [MAIN ] connection created
Oct 9 22:48:44 virt2n2-la corosync[2261]: [QUORUM] lib_init_fn: conn=0x562af4148530
Oct 9 22:48:44 virt2n2-la corosync[2261]: [QUORUM] got quorum_type request on 0x562af4148530
Oct 9 22:48:44 virt2n2-la corosync[2261]: [QB ] IPC credentials authenticated (2261-61669-29)
Oct 9 22:48:44 virt2n2-la corosync[2261]: [QB ] connecting to client [61669]
Oct 9 22:48:44 virt2n2-la corosync[2261]: [QB ] shm size:1048589; real_size:1052672; rb->word_size:263168
Oct 9 22:48:44 virt2n2-la corosync[2261]: [QB ] shm size:1048589; real_size:1052672; rb->word_size:263168
Oct 9 22:48:44 virt2n2-la corosync[2261]: [QB ] shm size:1048589; real_size:1052672; rb->word_size:263168
Oct 9 22:48:44 virt2n2-la corosync[2261]: [MAIN ] connection created
Oct 9 22:48:44 virt2n2-la corosync[2261]: [QB ] IPC credentials authenticated (2261-61669-30)
Oct 9 22:48:44 virt2n2-la corosync[2261]: [QB ] connecting to client [61669]
Oct 9 22:48:44 virt2n2-la corosync[2261]: [QB ] shm size:1048589; real_size:1052672; rb->word_size:263168
Oct 9 22:48:44 virt2n2-la corosync[2261]: [QB ] shm size:1048589; real_size:1052672; rb->word_size:263168
Oct 9 22:48:44 virt2n2-la corosync[2261]: [QB ] shm size:1048589; real_size:1052672; rb->word_size:263168
Oct 9 22:48:44 virt2n2-la corosync[2261]: [MAIN ] connection created
Oct 9 22:48:44 virt2n2-la corosync[2261]: [QUORUM] got quorate request on 0x562af4148530
Oct 9 22:48:44 virt2n2-la corosync[2261]: [QUORUM] got trackstart request on 0x562af4148530
Oct 9 22:48:44 virt2n2-la corosync[2261]: [QUORUM] sending initial status to 0x562af4148530
Oct 9 22:48:44 virt2n2-la corosync[2261]: [QUORUM] sending quorum notification to 0x562af4148530, length = 64
Oct 9 22:48:44 virt2n2-la corosync[2261]: [QUORUM] got trackstop request on 0x562af4148530
Oct 9 22:48:44 virt2n2-la corosync[2261]: [VOTEQ ] got getinfo request on 0x562af414bdc0 for node 2
Oct 9 22:48:44 virt2n2-la corosync[2261]: [VOTEQ ] getinfo response error: 1
Oct 9 22:48:44 virt2n2-la corosync[2261]: [VOTEQ ] got getinfo request on 0x562af414bdc0 for node 1
Oct 9 22:48:44 virt2n2-la corosync[2261]: [VOTEQ ] getinfo response error: 1
Oct 9 22:48:44 virt2n2-la corosync[2261]: [VOTEQ ] got getinfo request on 0x562af414bdc0 for node 2
Oct 9 22:48:44 virt2n2-la corosync[2261]: [VOTEQ ] getinfo response error: 1
Oct 9 22:48:44 virt2n2-la corosync[2261]: [VOTEQ ] got getinfo request on 0x562af414bdc0 for node 3
Oct 9 22:48:44 virt2n2-la corosync[2261]: [VOTEQ ] getinfo response error: 1
Oct 9 22:48:44 virt2n2-la corosync[2261]: [VOTEQ ] got getinfo request on 0x562af414bdc0 for node 4
Oct 9 22:48:44 virt2n2-la corosync[2261]: [VOTEQ ] getinfo response error: 1
Oct 9 22:48:44 virt2n2-la corosync[2261]: [QB ] HUP conn (2261-61669-27)
Oct 9 22:48:44 virt2n2-la corosync[2261]: [QB ] qb_ipcs_disconnect(2261-61669-27) state:2
Oct 9 22:48:44 virt2n2-la corosync[2261]: [QB ] epoll_ctl(del): Bad file descriptor (9)
Oct 9 22:48:44 virt2n2-la corosync[2261]: [MAIN ] cs_ipcs_connection_closed()
Oct 9 22:48:44 virt2n2-la corosync[2261]: [CMAP ] exit_fn for conn=0x562af4146f30
Oct 9 22:48:44 virt2n2-la corosync[2261]: [MAIN ] cs_ipcs_connection_destroyed()
Oct 9 22:48:44 virt2n2-la corosync[2261]: [QB ] Free'ing ringbuffer: /dev/shm/qb-cmap-response-2261-61669-27-header
Oct 9 22:48:44 virt2n2-la corosync[2261]: [QB ] Free'ing ringbuffer: /dev/shm/qb-cmap-event-2261-61669-27-header
Oct 9 22:48:44 virt2n2-la corosync[2261]: [QB ] Free'ing ringbuffer: /dev/shm/qb-cmap-request-2261-61669-27-header
Oct 9 22:48:44 virt2n2-la corosync[2261]: [QB ] HUP conn (2261-61669-28)
Oct 9 22:48:44 virt2n2-la corosync[2261]: [QB ] qb_ipcs_disconnect(2261-61669-28) state:2
Oct 9 22:48:44 virt2n2-la corosync[2261]: [QB ] epoll_ctl(del): Bad file descriptor (9)
Oct 9 22:48:44 virt2n2-la corosync[2261]: [MAIN ] cs_ipcs_connection_closed()
Oct 9 22:48:44 virt2n2-la corosync[2261]: [QUORUM] lib_exit_fn: conn=0x562af4148530
Oct 9 22:48:44 virt2n2-la corosync[2261]: [MAIN ] cs_ipcs_connection_destroyed()
Oct 9 22:48:44 virt2n2-la corosync[2261]: [QB ] Free'ing ringbuffer: /dev/shm/qb-quorum-response-2261-61669-28-header
Oct 9 22:48:44 virt2n2-la corosync[2261]: [QB ] Free'ing ringbuffer: /dev/shm/qb-quorum-event-2261-61669-28-header
Oct 9 22:48:44 virt2n2-la corosync[2261]: [QB ] Free'ing ringbuffer: /dev/shm/qb-quorum-request-2261-61669-28-header
Oct 9 22:48:44 virt2n2-la corosync[2261]: [QB ] HUP conn (2261-61669-29)
Oct 9 22:48:44 virt2n2-la corosync[2261]: [QB ] qb_ipcs_disconnect(2261-61669-29) state:2
Oct 9 22:48:44 virt2n2-la corosync[2261]: [QB ] epoll_ctl(del): Bad file descriptor (9)
Oct 9 22:48:44 virt2n2-la corosync[2261]: [MAIN ] cs_ipcs_connection_closed()
Oct 9 22:48:44 virt2n2-la corosync[2261]: [MAIN ] cs_ipcs_connection_destroyed()
Oct 9 22:48:44 virt2n2-la corosync[2261]: [QB ] Free'ing ringbuffer: /dev/shm/qb-cfg-response-2261-61669-29-header
Oct 9 22:48:44 virt2n2-la corosync[2261]: [QB ] Free'ing ringbuffer: /dev/shm/qb-cfg-event-2261-61669-29-header
Oct 9 22:48:44 virt2n2-la corosync[2261]: [QB ] Free'ing ringbuffer: /dev/shm/qb-cfg-request-2261-61669-29-header
Oct 9 22:48:44 virt2n2-la corosync[2261]: [QB ] HUP conn (2261-61669-30)
Oct 9 22:48:44 virt2n2-la corosync[2261]: [QB ] qb_ipcs_disconnect(2261-61669-30) state:2
Oct 9 22:48:44 virt2n2-la corosync[2261]: [QB ] epoll_ctl(del): Bad file descriptor (9)
Oct 9 22:48:44 virt2n2-la corosync[2261]: [MAIN ] cs_ipcs_connection_closed()
Oct 9 22:48:44 virt2n2-la corosync[2261]: [MAIN ] cs_ipcs_connection_destroyed()
Oct 9 22:48:44 virt2n2-la corosync[2261]: [QB ] Free'ing ringbuffer: /dev/shm/qb-votequorum-response-2261-61669-30-header
Oct 9 22:48:44 virt2n2-la corosync[2261]: [QB ] Free'ing ringbuffer: /dev/shm/qb-votequorum-event-2261-61669-30-header
Oct 9 22:48:44 virt2n2-la corosync[2261]: [QB ] Free'ing ringbuffer: /dev/shm/qb-votequorum-request-2261-61669-30-header
I tried changing hardware and options on VM. I tested migration with and without HA. It can be reproduced in both scenarios.
I don't remember these issues in 3.4 when I tested.
Last edited: