When I migrate/restore VM I see that kworker use 99% IO (by iotop). And other VMs on destination node sometimes stuck.
All running VMs on destination node show this exception:
I use HW RAID1. Lvm-thin pools.
Hardware: R540 - Silver 4114
Raid: Perc H730
pveversion -v
I think problem maybe with dd. Because when I use live migration with local disks, I see copying of disks with good speed and kworker is sleeping.
iotop:
perf report:
From host dmesg:
Monitoring during migration:
All running VMs on destination node show this exception:
I use HW RAID1. Lvm-thin pools.
Hardware: R540 - Silver 4114
Raid: Perc H730
pveversion -v
Code:
# pveversion -v
proxmox-ve: 5.3-1 (running kernel: 4.15.18-10-pve)
pve-manager: 5.3-9 (running version: 5.3-9/ba817b29)
pve-kernel-4.15: 5.3-1
pve-kernel-4.15.18-10-pve: 4.15.18-32
corosync: 2.4.4-pve1
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: 1.2-2
libjs-extjs: 6.0.1-2
libpve-access-control: 5.1-3
libpve-apiclient-perl: 2.0-5
libpve-common-perl: 5.0-45
libpve-guest-common-perl: 2.0-20
libpve-http-server-perl: 2.0-11
libpve-storage-perl: 5.0-37
libqb0: 1.0.3-1~bpo9
lvm2: 2.02.168-pve6
lxc-pve: 3.1.0-2
lxcfs: 3.0.2-2
novnc-pve: 1.0.0-2
proxmox-widget-toolkit: 1.0-22
pve-cluster: 5.0-33
pve-container: 2.0-34
pve-docs: 5.3-2
pve-edk2-firmware: 1.20181023-1
pve-firewall: 3.0-17
pve-firmware: 2.0-6
pve-ha-manager: 2.0-6
pve-i18n: 1.0-9
pve-libspice-server1: 0.14.1-2
pve-qemu-kvm: 2.12.1-1
pve-xtermjs: 3.10.1-1
qemu-server: 5.0-46
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3
zfsutils-linux: 0.7.12-pve1~bpo1
I think problem maybe with dd. Because when I use live migration with local disks, I see copying of disks with good speed and kworker is sleeping.
iotop:
Code:
Total DISK READ : 21.26 K/s | Total DISK WRITE : 84.38 M/s
Actual DISK READ: 21.26 K/s | Actual DISK WRITE: 85.33 M/s
TID PRIO USER DISK READ DISK WRITE SWAPIN IO> COMMAND
4173 be/4 root 0.00 B 0.00 B 0.00 % 99.27 % [kworker/u80:2]
5091 be/4 root 89.00 K 0.00 B 0.00 % 87.92 % vgs --separator : --noheadings --units b ~-options vg_name,vg_size,vg_free,lv_count
2647 be/4 root 0.00 B 0.00 B 0.00 % 11.58 % [kworker/u82:0]
39193 be/4 root 4.00 K 0.00 B 0.00 % 4.34 % [kworker/u80:1]
4830 be/4 root 0.00 B 0.00 B 0.00 % 0.03 % [kworker/u81:1]
635 be/3 root 0.00 B 36.00 K 0.00 % 0.03 % [jbd2/dm-1-8]
4986 be/4 root 0.00 B 0.00 B 0.00 % 0.01 % [kworker/u81:4]
37696 be/4 root 0.00 B 0.00 B 0.00 % 0.01 % [kworker/u81:2]
4739 be/4 root 0.00 B 0.00 B 0.00 % 0.00 % [kworker/u82:3]
4352 be/4 root 0.00 B 929.50 M 0.00 % 0.00 % dd of=/dev/pve-hdd/vm-123-disk-2 conv=sparse bs=64k
perf report:
Code:
+ 14.02% 0.00% kworker/u80:2 [kernel.kallsyms] [k] ret_from_fork
+ 14.02% 0.00% kworker/u80:2 [kernel.kallsyms] [k] kthread
+ 14.02% 0.00% kworker/u80:2 [kernel.kallsyms] [k] worker_thread
+ 14.02% 0.00% kworker/u80:2 [kernel.kallsyms] [k] process_one_work
+ 14.02% 0.00% kworker/u80:2 [kernel.kallsyms] [k] do_worker
+ 14.02% 0.00% kworker/u80:2 [kernel.kallsyms] [k] process_prepared
+ 14.02% 0.01% kworker/u80:2 [kernel.kallsyms] [k] process_prepared_mapping
+ 13.60% 0.06% kworker/u80:2 [kernel.kallsyms] [k] inc_remap_and_issue_cell
+ 12.99% 0.02% kworker/u80:2 [kernel.kallsyms] [k] remap_and_issue
+ 12.93% 0.00% kworker/u80:2 [kernel.kallsyms] [k] issue
+ 12.86% 0.17% kworker/u80:2 [kernel.kallsyms] [k] generic_make_request
From host dmesg:
Code:
[Sat Feb 9 02:57:43 2019] INFO: task dd:29806 blocked for more than 120 seconds.
[Sat Feb 9 02:57:43 2019] Tainted: P O 4.15.18-10-pve #1
[Sat Feb 9 02:57:43 2019] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Sat Feb 9 02:57:43 2019] dd D 0 29806 29784 0x00000000
[Sat Feb 9 02:57:43 2019] Call Trace:
[Sat Feb 9 02:57:43 2019] __schedule+0x3e0/0x870
[Sat Feb 9 02:57:43 2019] schedule+0x36/0x80
[Sat Feb 9 02:57:43 2019] io_schedule+0x16/0x40
[Sat Feb 9 02:57:43 2019] wait_on_page_bit_common+0xf3/0x190
[Sat Feb 9 02:57:43 2019] ? page_cache_tree_insert+0xe0/0xe0
[Sat Feb 9 02:57:43 2019] __filemap_fdatawait_range+0xfa/0x160
[Sat Feb 9 02:57:43 2019] filemap_write_and_wait+0x4d/0x90
[Sat Feb 9 02:57:43 2019] __blkdev_put+0x7a/0x210
[Sat Feb 9 02:57:43 2019] ? fsnotify+0x259/0x440
[Sat Feb 9 02:57:43 2019] blkdev_put+0x4c/0xd0
[Sat Feb 9 02:57:43 2019] blkdev_close+0x34/0x70
[Sat Feb 9 02:57:43 2019] __fput+0xea/0x220
[Sat Feb 9 02:57:43 2019] ____fput+0xe/0x10
[Sat Feb 9 02:57:43 2019] task_work_run+0x9d/0xc0
[Sat Feb 9 02:57:43 2019] exit_to_usermode_loop+0xc4/0xd0
[Sat Feb 9 02:57:43 2019] do_syscall_64+0xf4/0x130
[Sat Feb 9 02:57:43 2019] entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[Sat Feb 9 02:57:43 2019] RIP: 0033:0x7f3fec42ccf0
[Sat Feb 9 02:57:43 2019] RSP: 002b:00007ffd46e71a08 EFLAGS: 00000246 ORIG_RAX: 0000000000000003
[Sat Feb 9 02:57:43 2019] RAX: 0000000000000000 RBX: 00007f3fec904698 RCX: 00007f3fec42ccf0
[Sat Feb 9 02:57:43 2019] RDX: 00007ffd46e71a70 RSI: 00007ffd46e71a70 RDI: 0000000000000001
[Sat Feb 9 02:57:43 2019] RBP: 0000000000000000 R08: 0000000000006000 R09: 0000000000004000
[Sat Feb 9 02:57:43 2019] R10: 00000000000005e1 R11: 0000000000000246 R12: ffffffffffffffff
[Sat Feb 9 02:57:43 2019] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[Sat Feb 9 02:57:43 2019] INFO: task systemd-udevd:33283 blocked for more than 120 seconds.
[Sat Feb 9 02:57:43 2019] Tainted: P O 4.15.18-10-pve #1
Monitoring during migration:
Last edited: