Good day.
I had a two node cluster based on pve-manager/3.1-21/93bf03d4 (running kernel: 2.6.32-26-pve), storage configured with DRBD replication and GFS2 filesystem. Offline migration works well but online.. fails with error:
In syslog:
If i execute "/usr/sbin/vzctl exec 100 /bin/cat /proc/net/dev" command directly in console I got the following output:
Help me please find and fix the problem. Thank you!
I had a two node cluster based on pve-manager/3.1-21/93bf03d4 (running kernel: 2.6.32-26-pve), storage configured with DRBD replication and GFS2 filesystem. Offline migration works well but online.. fails with error:
Code:
[COLOR=#000000][FONT=tahoma]Feb 17 16:55:36 starting migration of CT 100 to node 'srv1' (192.168.0.1)[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]Feb 17 16:55:36 container is running - using online migration[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]Feb 17 16:55:36 container data is on shared storage 'ssd-replica'[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]Feb 17 16:55:36 start live migration - suspending container[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]Feb 17 16:55:46 # vzctl --skiplock chkpnt 100 --suspend[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]Feb 17 16:55:36 Setting up checkpoint...[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]Feb 17 16:55:36 suspend...[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]Feb 17 16:55:46 Can not suspend container: Interrupted system call[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]Feb 17 16:55:46 Error: timed out (10 seconds).[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]Feb 17 16:55:46 Error: Unfrozen tasks (no more than 10): see dmesg output.[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]Feb 17 16:55:46 ERROR: Failed to suspend container: Checkpointing failed[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]Feb 17 16:55:46 aborting phase 1 - cleanup resources[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]Feb 17 16:55:46 start final cleanup[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]Feb 17 16:55:46 ERROR: migration aborted (duration 00:00:11): Failed to suspend container: Checkpointing failed[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]TASK ERROR: migration aborted[/FONT][/COLOR]
In syslog:
Code:
Feb 17 16:55:35 srv2 pvedaemon[3761]: <root@pam> starting task UPID:srv2:00003DA1:0003340B:530206C7:vzmigrate:100:root@pam:
Feb 17 16:55:39 srv2 pvedaemon[3761]: command '/usr/sbin/vzctl exec 100 /bin/cat /proc/net/dev' failed: exit code 8
Feb 17 16:55:42 srv2 pvestatd[4090]: command '/usr/sbin/vzctl exec 100 /bin/cat /proc/net/dev' failed: exit code 8
Feb 17 16:55:45 srv2 pvestatd[4090]: command '/usr/sbin/vzctl exec 100 /bin/cat /proc/net/dev' failed: exit code 8
Feb 17 16:55:45 srv2 pvestatd[4090]: status update time (6.070 seconds)
Feb 17 16:55:46 srv2 kernel: CPT ERR: ffff880bedae9000,100 :timed out (10 seconds).
Feb 17 16:55:46 srv2 kernel: CPT ERR: ffff880bedae9000,100 :Unfrozen tasks (no more than 10): see dmesg output.
Feb 17 16:55:46 srv2 kernel: saslauthd D ffff880beb8a71a0 0 15636 15598 100 0x00800004
Feb 17 16:55:46 srv2 kernel: ffff880c26ad1dd8 0000000000000082 0000000000000000 ffff880beb8a71a0
Feb 17 16:55:46 srv2 kernel: 0000000126ad1e48 0000000000000000 0000000000000000 ffffffffa05f2720
Feb 17 16:55:46 srv2 kernel: 0000000000000286 00000001001b3cc5 ffff880c26ad1fd8 ffff880c26ad1fd8
Feb 17 16:55:46 srv2 kernel: Call Trace:
Feb 17 16:55:46 srv2 kernel: [<ffffffff8109b40e>] ? prepare_to_wait+0x4e/0x80
Feb 17 16:55:46 srv2 kernel: [<ffffffffa05e5ea3>] dlm_posix_lock+0x193/0x360 [dlm]
Feb 17 16:55:46 srv2 kernel: [<ffffffff8109b440>] ? autoremove_wake_function+0x0/0x40
Feb 17 16:55:46 srv2 kernel: [<ffffffffa081d599>] gfs2_lock+0x79/0xf0 [gfs2]
Feb 17 16:55:46 srv2 kernel: [<ffffffff811f2ff3>] vfs_lock_file+0x23/0x40
Feb 17 16:55:46 srv2 kernel: [<ffffffff811f3693>] fcntl_setlk+0x143/0x2f0
Feb 17 16:55:46 srv2 kernel: [<ffffffff811b3c67>] sys_fcntl+0xc7/0x550
Feb 17 16:55:46 srv2 kernel: [<ffffffff81543b65>] ? page_fault+0x25/0x30
Feb 17 16:55:46 srv2 kernel: [<ffffffff8100b182>] system_call_fastpath+0x16/0x1b
Feb 17 16:55:46 srv2 pvedaemon[15777]: migration aborted
Feb 17 16:55:46 srv2 pvedaemon[3761]: <root@pam> end task UPID:srv2:00003DA1:0003340B:530206C7:vzmigrate:100:root@pam: migration aborted
Code:
root@srv2:/var/log# /usr/sbin/vzctl exec 100 /bin/cat /proc/net/dev
Inter-| Receive | Transmit
face |bytes packets errs drop fifo frame compressed multicast|bytes packets errs drop fifo colls carrier compressed
lo: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
venet0: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
root@srv2:/var/log#