VM start problem with virtio-net

Mitya

Active Member
Feb 19, 2013
51
0
26
Russia, Ryazan
I have cluster of 3 almost identical nodes with ceph storage. From some poing it won't start new VMs with virtio network interfaces. If I change virtio to e1000 then all works fine.

On fail it prints this to host's dmesg:
[4299319.801973] device tap106i0 entered promiscuous mode
[4299319.860393] vmbr0v3506: port 2(tap106i0) entered forwarding state
[4299319.860400] vmbr0v3506: port 2(tap106i0) entered forwarding state
[4299319.876536] kvm: page allocation failure: order:4, mode:0x1040d0
[4299319.876543] CPU: 27 PID: 4712 Comm: kvm Tainted: G O-------------- 3.10.0-3-pve #1
[4299319.876546] Hardware name: Supermicro H8DGU/H8DGU, BIOS 3.5 11/25/2013
[4299319.876549] 0000000000000000 ffff8801bbdf58f8 ffffffff816119de ffff8801bbdf5988
[4299319.876562] ffffffff81141f18 0000000000000000 00000000ffffffff ffff8801bbdf5928
[4299319.876570] ffffffff81144676 00000040bbdf5958 0000000400000010 001040d00000001f
[4299319.876580] Call Trace:
[4299319.876591] [<ffffffff816119de>] dump_stack+0x19/0x1b
[4299319.876598] [<ffffffff81141f18>] warn_alloc_failed+0xf8/0x160
[4299319.876604] [<ffffffff81144676>] ? drain_local_pages+0x16/0x20
[4299319.876609] [<ffffffff8114608b>] __alloc_pages_nodemask+0x94b/0xb70
[4299319.876617] [<ffffffff81182fb8>] alloc_pages_current+0xb8/0x190
[4299319.876622] [<ffffffff81140f9e>] __get_free_pages+0xe/0x50
[4299319.876628] [<ffffffff8118ee59>] kmalloc_order_trace+0x39/0xb0
[4299319.876636] [<ffffffffa07bcf79>] vhost_net_open+0x29/0x1b0 [vhost_net]
[4299319.876642] [<ffffffff813a15af>] misc_open+0xaf/0x1c0
[4299319.876648] [<ffffffff811ae3ab>] chrdev_open+0x9b/0x1b0
[4299319.876655] [<ffffffff811a7553>] do_dentry_open+0x213/0x2c0
[4299319.876659] [<ffffffff811ae310>] ? cdev_put+0x30/0x30
[4299319.876663] [<ffffffff811a7635>] finish_open+0x35/0x50
[4299319.876668] [<ffffffff811b904e>] do_last+0x6fe/0xf10
[4299319.876673] [<ffffffff811b52e8>] ? inode_permission+0x18/0x50
[4299319.876678] [<ffffffff811b5398>] ? link_path_walk+0x78/0x890
[4299319.876683] [<ffffffff811b9917>] path_openat+0xb7/0x4b0
[4299319.876689] [<ffffffff8161d1ba>] ? do_page_fault+0x1a/0x70
[4299319.876693] [<ffffffff811ba3f1>] do_filp_open+0x41/0xa0
[4299319.876699] [<ffffffff811c6b13>] ? __alloc_fd+0xd3/0x120
[4299319.876704] [<ffffffff811a89a4>] do_sys_open+0xf4/0x1e0
[4299319.876709] [<ffffffff8161d1ba>] ? do_page_fault+0x1a/0x70
[4299319.876714] [<ffffffff81060a30>] ? task_stopped_code+0x60/0x60
[4299319.876719] [<ffffffff811a8ab2>] SyS_open+0x22/0x30
[4299319.876724] [<ffffffff81621b59>] system_call_fastpath+0x16/0x1b
[4299319.876726] Mem-Info:
[4299319.876728] Node 0 DMA per-cpu:
[4299319.876731] CPU 0: hi: 0, btch: 1 usd: 0
[4299319.876732] CPU 1: hi: 0, btch: 1 usd: 0
[4299319.876734] CPU 2: hi: 0, btch: 1 usd: 0
[4299319.876735] CPU 3: hi: 0, btch: 1 usd: 0
[4299319.876737] CPU 4: hi: 0, btch: 1 usd: 0
[4299319.876738] CPU 5: hi: 0, btch: 1 usd: 0
[4299319.876740] CPU 6: hi: 0, btch: 1 usd: 0
[4299319.876741] CPU 7: hi: 0, btch: 1 usd: 0
[4299319.876743] CPU 8: hi: 0, btch: 1 usd: 0
[4299319.876744] CPU 9: hi: 0, btch: 1 usd: 0
[4299319.876746] CPU 10: hi: 0, btch: 1 usd: 0
[4299319.876747] CPU 11: hi: 0, btch: 1 usd: 0
[4299319.876749] CPU 12: hi: 0, btch: 1 usd: 0
[4299319.876750] CPU 13: hi: 0, btch: 1 usd: 0
[4299319.876752] CPU 14: hi: 0, btch: 1 usd: 0
[4299319.876753] CPU 15: hi: 0, btch: 1 usd: 0
[4299319.876755] CPU 16: hi: 0, btch: 1 usd: 0
[4299319.876756] CPU 17: hi: 0, btch: 1 usd: 0
[4299319.876758] CPU 18: hi: 0, btch: 1 usd: 0
[4299319.876759] CPU 19: hi: 0, btch: 1 usd: 0
[4299319.876761] CPU 20: hi: 0, btch: 1 usd: 0
[4299319.876762] CPU 21: hi: 0, btch: 1 usd: 0
[4299319.876764] CPU 22: hi: 0, btch: 1 usd: 0
[4299319.876765] CPU 23: hi: 0, btch: 1 usd: 0
[4299319.876767] CPU 24: hi: 0, btch: 1 usd: 0
[4299319.876768] CPU 25: hi: 0, btch: 1 usd: 0
[4299319.876770] CPU 26: hi: 0, btch: 1 usd: 0
[4299319.876771] CPU 27: hi: 0, btch: 1 usd: 0
[4299319.876773] CPU 28: hi: 0, btch: 1 usd: 0
[4299319.876774] CPU 29: hi: 0, btch: 1 usd: 0
[4299319.876776] CPU 30: hi: 0, btch: 1 usd: 0
[4299319.876777] CPU 31: hi: 0, btch: 1 usd: 0
[4299319.876779] Node 0 DMA32 per-cpu:
[4299319.876781] CPU 0: hi: 186, btch: 31 usd: 0
[4299319.876783] CPU 1: hi: 186, btch: 31 usd: 0
[4299319.876784] CPU 2: hi: 186, btch: 31 usd: 0
[4299319.876786] CPU 3: hi: 186, btch: 31 usd: 0
[4299319.876788] CPU 4: hi: 186, btch: 31 usd: 0
[4299319.876789] CPU 5: hi: 186, btch: 31 usd: 0
[4299319.876791] CPU 6: hi: 186, btch: 31 usd: 0
[4299319.876793] CPU 7: hi: 186, btch: 31 usd: 0
[4299319.876794] CPU 8: hi: 186, btch: 31 usd: 0
[4299319.876796] CPU 9: hi: 186, btch: 31 usd: 0
[4299319.876797] CPU 10: hi: 186, btch: 31 usd: 0
[4299319.876799] CPU 11: hi: 186, btch: 31 usd: 0
[4299319.876800] CPU 12: hi: 186, btch: 31 usd: 0
[4299319.876802] CPU 13: hi: 186, btch: 31 usd: 0
[4299319.876804] CPU 14: hi: 186, btch: 31 usd: 0
[4299319.876805] CPU 15: hi: 186, btch: 31 usd: 0
[4299319.876807] CPU 16: hi: 186, btch: 31 usd: 0
[4299319.876808] CPU 17: hi: 186, btch: 31 usd: 0
[4299319.876810] CPU 18: hi: 186, btch: 31 usd: 0
[4299319.876812] CPU 19: hi: 186, btch: 31 usd: 0
[4299319.876813] CPU 20: hi: 186, btch: 31 usd: 0
[4299319.876815] CPU 21: hi: 186, btch: 31 usd: 0
[4299319.876816] CPU 22: hi: 186, btch: 31 usd: 0
[4299319.876818] CPU 23: hi: 186, btch: 31 usd: 0
[4299319.876820] CPU 24: hi: 186, btch: 31 usd: 0
[4299319.876821] CPU 25: hi: 186, btch: 31 usd: 0
[4299319.876823] CPU 26: hi: 186, btch: 31 usd: 0
[4299319.876824] CPU 27: hi: 186, btch: 31 usd: 0
[4299319.876826] CPU 28: hi: 186, btch: 31 usd: 0
[4299319.876827] CPU 29: hi: 186, btch: 31 usd: 0
[4299319.876829] CPU 30: hi: 186, btch: 31 usd: 0
[4299319.876831] CPU 31: hi: 186, btch: 31 usd: 0
[4299319.876832] Node 0 Normal per-cpu:
[4299319.876834] CPU 0: hi: 186, btch: 31 usd: 0
[4299319.876836] CPU 1: hi: 186, btch: 31 usd: 0
[4299319.876837] CPU 2: hi: 186, btch: 31 usd: 0
[4299319.876839] CPU 3: hi: 186, btch: 31 usd: 0
[4299319.876841] CPU 4: hi: 186, btch: 31 usd: 171
[4299319.876842] CPU 5: hi: 186, btch: 31 usd: 0
[4299319.876844] CPU 6: hi: 186, btch: 31 usd: 0
[4299319.876845] CPU 7: hi: 186, btch: 31 usd: 0
[4299319.876847] CPU 8: hi: 186, btch: 31 usd: 0
[4299319.876848] CPU 9: hi: 186, btch: 31 usd: 0
[4299319.876850] CPU 10: hi: 186, btch: 31 usd: 0
[4299319.876852] CPU 11: hi: 186, btch: 31 usd: 0
[4299319.876853] CPU 12: hi: 186, btch: 31 usd: 0
[4299319.876855] CPU 13: hi: 186, btch: 31 usd: 0
[4299319.876857] CPU 14: hi: 186, btch: 31 usd: 0
[4299319.876858] CPU 15: hi: 186, btch: 31 usd: 0
[4299319.876861] CPU 16: hi: 186, btch: 31 usd: 0
[4299319.876863] CPU 17: hi: 186, btch: 31 usd: 0
[4299319.876864] CPU 18: hi: 186, btch: 31 usd: 0
[4299319.876866] CPU 19: hi: 186, btch: 31 usd: 0
[4299319.876868] CPU 20: hi: 186, btch: 31 usd: 0
[4299319.876869] CPU 21: hi: 186, btch: 31 usd: 0
[4299319.876871] CPU 22: hi: 186, btch: 31 usd: 0
[4299319.876872] CPU 23: hi: 186, btch: 31 usd: 0
[4299319.876874] CPU 24: hi: 186, btch: 31 usd: 0
[4299319.876875] CPU 25: hi: 186, btch: 31 usd: 0
[4299319.876877] CPU 26: hi: 186, btch: 31 usd: 0
[4299319.876878] CPU 27: hi: 186, btch: 31 usd: 0
[4299319.876880] CPU 28: hi: 186, btch: 31 usd: 0
[4299319.876881] CPU 29: hi: 186, btch: 31 usd: 0
[4299319.876883] CPU 30: hi: 186, btch: 31 usd: 0
[4299319.876884] CPU 31: hi: 186, btch: 31 usd: 0
[4299319.876886] Node 2 Normal per-cpu:
[4299319.876888] CPU 0: hi: 186, btch: 31 usd: 0
[4299319.876889] CPU 1: hi: 186, btch: 31 usd: 0
[4299319.876891] CPU 2: hi: 186, btch: 31 usd: 0
[4299319.876892] CPU 3: hi: 186, btch: 31 usd: 0
[4299319.876894] CPU 4: hi: 186, btch: 31 usd: 0
[4299319.876896] CPU 5: hi: 186, btch: 31 usd: 0
[4299319.876897] CPU 6: hi: 186, btch: 31 usd: 0
[4299319.876899] CPU 7: hi: 186, btch: 31 usd: 0
[4299319.876900] CPU 8: hi: 186, btch: 31 usd: 0
[4299319.876902] CPU 9: hi: 186, btch: 31 usd: 0
[4299319.876903] CPU 10: hi: 186, btch: 31 usd: 0
[4299319.876905] CPU 11: hi: 186, btch: 31 usd: 0
[4299319.876906] CPU 12: hi: 186, btch: 31 usd: 0
[4299319.876908] CPU 13: hi: 186, btch: 31 usd: 0
[4299319.876910] CPU 14: hi: 186, btch: 31 usd: 0
[4299319.876911] CPU 15: hi: 186, btch: 31 usd: 0
[4299319.876912] CPU 16: hi: 186, btch: 31 usd: 0
[4299319.876914] CPU 17: hi: 186, btch: 31 usd: 0
[4299319.876915] CPU 18: hi: 186, btch: 31 usd: 0
[4299319.876917] CPU 19: hi: 186, btch: 31 usd: 0
[4299319.876918] CPU 20: hi: 186, btch: 31 usd: 0
[4299319.876920] CPU 21: hi: 186, btch: 31 usd: 0
[4299319.876921] CPU 22: hi: 186, btch: 31 usd: 0
[4299319.876923] CPU 23: hi: 186, btch: 31 usd: 0
[4299319.876924] CPU 24: hi: 186, btch: 31 usd: 0
[4299319.876926] CPU 25: hi: 186, btch: 31 usd: 0
[4299319.876927] CPU 26: hi: 186, btch: 31 usd: 0
[4299319.876929] CPU 27: hi: 186, btch: 31 usd: 0
[4299319.876930] CPU 28: hi: 186, btch: 31 usd: 0
[4299319.876932] CPU 29: hi: 186, btch: 31 usd: 0
[4299319.876934] CPU 30: hi: 186, btch: 31 usd: 0
[4299319.876935] CPU 31: hi: 186, btch: 31 usd: 0
[4299319.876939] active_anon:1025392 inactive_anon:341457 isolated_anon:0
[4299319.876939] active_file:5427207 inactive_file:8157689 isolated_file:0
[4299319.876939] unevictable:19425 dirty:39 writeback:0 unstable:0
[4299319.876939] free:140799 slab_reclaimable:414546 slab_unreclaimable:231099
[4299319.876939] mapped:164977 shmem:61029 pagetables:8071 bounce:0
[4299319.876939] free_cma:0
[4299319.876943] Node 0 DMA free:15872kB min:20kB low:24kB high:28kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15956kB managed:15872kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
[4299319.876949] lowmem_reserve[]: 0 3408 32077 32077
[4299319.876952] Node 0 DMA32 free:207580kB min:4772kB low:5964kB high:7156kB active_anon:193400kB inactive_anon:218352kB active_file:460236kB inactive_file:1992384kB unevictable:7500kB isolated(anon):0kB isolated(file):0kB present:3652160kB managed:3490152kB mlocked:7500kB dirty:0kB writeback:0kB mapped:20096kB shmem:9300kB slab_reclaimable:280372kB slab_unreclaimable:108556kB kernel_stack:1480kB pagetables:2416kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
[4299319.876958] lowmem_reserve[]: 0 0 28668 28668
[4299319.876961] Node 0 Normal free:132092kB min:40156kB low:50192kB high:60232kB active_anon:3005712kB inactive_anon:700200kB active_file:9998116kB inactive_file:12160688kB unevictable:49284kB isolated(anon):0kB isolated(file):0kB present:29884416kB managed:29356872kB mlocked:49284kB dirty:104kB writeback:0kB mapped:394348kB shmem:141208kB slab_reclaimable:889348kB slab_unreclaimable:578844kB kernel_stack:8592kB pagetables:19976kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
[4299319.876966] lowmem_reserve[]: 0 0 0 0
[4299319.876969] Node 2 Normal free:207652kB min:45156kB low:56444kB high:67732kB active_anon:902456kB inactive_anon:447276kB active_file:11250476kB inactive_file:18477684kB unevictable:20916kB isolated(anon):0kB isolated(file):0kB present:33538048kB managed:33012120kB mlocked:20916kB dirty:68kB writeback:0kB mapped:245464kB shmem:93608kB slab_reclaimable:488464kB slab_unreclaimable:236996kB kernel_stack:2320kB pagetables:9892kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
[4299319.876975] lowmem_reserve[]: 0 0 0 0
[4299319.876978] Node 0 DMA: 0*4kB 0*8kB 0*16kB 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (R) 3*4096kB (M) = 15872kB
[4299319.876990] Node 0 DMA32: 22728*4kB (UEMR) 7515*8kB (UER) 3460*16kB (UER) 34*32kB (UER) 2*64kB (R) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 207608kB
[4299319.877001] Node 0 Normal: 22471*4kB (UEM) 5085*8kB (UEM) 70*16kB (UM) 34*32kB (UEM) 7*64kB (M) 1*128kB (M) 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 133348kB
[4299319.877013] Node 2 Normal: 51348*4kB (UEM) 176*8kB (UEM) 4*16kB (UE) 6*32kB (UE) 1*64kB (R) 0*128kB 1*256kB (R) 1*512kB (R) 1*1024kB (R) 0*2048kB 0*4096kB = 208912kB
[4299319.877026] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[4299319.877028] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[4299319.877030] 13666441 total pagecache pages
[4299319.877032] 18240 pages in swap cache
[4299319.877034] Swap cache stats: add 48253, delete 30013, find 1421126/1423282
[4299319.877035] Free swap = 60818904kB
[4299319.877036] Total swap = 60948476kB
[4299320.173555] 16773119 pages RAM
[4299320.173559] 299931 pages reserved
[4299320.173560] 1817893 pages shared
[4299320.173561] 14859963 pages non-shared
[4299320.185620] vmbr0v3506: port 2(tap106i0) entered disabled state
[4299320.185947] vmbr0v3506: port 2(tap106i0) entered disabled state

Versions:
root@vmc2-rnd:/mnt/pve/backup/images# pveversion -vproxmox-ve-2.6.32: 3.2-129 (running kernel: 3.10.0-3-pve)
pve-manager: 3.2-4 (running version: 3.2-4/e24a91c1)
pve-kernel-2.6.32-30-pve: 2.6.32-130
pve-kernel-3.10.0-3-pve: 3.10.0-11
lvm2: 2.02.98-pve4
clvm: 2.02.98-pve4
corosync-pve: 1.4.5-1
openais-pve: 1.1.4-3
libqb0: 0.11.1-2
redhat-cluster-pve: 3.2.0-2
resource-agents-pve: 3.9.2-4
fence-agents-pve: 4.0.5-1
pve-cluster: 3.0-12
qemu-server: 3.1-16
pve-firmware: 1.1-3
libpve-common-perl: 3.0-18
libpve-access-control: 3.0-11
libpve-storage-perl: 3.0-19
pve-libspice-server1: 0.12.4-3
vncterm: 1.1-6
vzctl: 4.0-1pve5
vzprocps: 2.0.11-2
vzquota: 3.1-2
pve-qemu-kvm: 1.7-8
ksm-control-daemon: 1.1-1
glusterfs-client: 3.4.2-1

And this seems like a fix: http://permalink.gmane.org/gmane.linux.kernel.virtualization/22089
Or maybe not...
 
Last edited:
Hi, maybe this one:
http://comments.gmane.org/gmane.comp.emulators.kvm.devel/121828

BTW, what is your current memory usage ? Are your near full ? (#free -m ? )
Hi. Yes.

And no, there's only one small VM running on this host (I'm trying to start second) and most memory used as disk cache, which we can assume as free.
# free -m total used free shared buffers cached
Mem: 64348 62972 1375 0 166 52064
-/+ buffers/cache: 10741 53606
Swap: 59519 125 59394
 
May be not directly. But in 3.16.1 same technique is used: if kmalloc() didn't succeed, try vmalloc(). And now use kvfree() instead of own vhost_net_kvfree(). Comapre allocation of memory for variabe n in the beginning vhost_net_open() in https://git.kernel.org/cgit/linux/k...lain/drivers/vhost/net.c?id=refs/tags/v3.16.1 and https://git.kernel.org/cgit/linux/k...lain/drivers/vhost/net.c?id=refs/tags/v3.10.1

And no, I didn't tried memtest, but bug is being reproduced on all 3 servers of cluster (not every start of VM, but very likely). It's funny to watch while HA tries to start VM on different nodes until succeed. :)
 
Last edited:
May be not directly. But in 3.16.1 same technique is used: if kmalloc() didn't succeed, try vmalloc(). And now use kvfree() instead of own vhost_net_kvfree(). Comapre allocation of memory for variabe n in the beginning vhost_net_open() in https://git.kernel.org/cgit/linux/k...lain/drivers/vhost/net.c?id=refs/tags/v3.16.1 and https://git.kernel.org/cgit/linux/k...lain/drivers/vhost/net.c?id=refs/tags/v3.10.1

And no, I didn't tried memtest, but bug is being reproduced on all 3 servers of cluster (not every start of VM, but very likely). It's funny to watch while HA tries to start VM on different nodes until succeed. :)


Do you have find the commit ?
I can try to apply it on proxmox 3.10 kernel. (which is based on rhel7 kernel, so with already a lot of backported patches, but not this one)
 
Ok, I find it

vhost-net: extend device allocation to vmalloc

https://git.kernel.org/cgit/linux/k...c?id=23cc5a991c7a9fb7e6d6550e65cee4f4173111c5

and also this fix
vhost-net: don't open-code kvfree
https://git.kernel.org/cgit/linux/k...c?id=d04257b07f2362d4eb550952d5bf5f4241a8046d

Commit say:

"vhost-net: extend device allocation to vmalloc
Michael Mueller provided a patch to reduce the size ofvhost-net structure as some allocations could fail undermemory pressure/fragmentation. We are still left withhigh order allocations though."


So, it seem that you have memory pressure or fragmentation.

I'll try to see if I can apply patches
 
Don't apply on current kernel

>>Hunk #1 succeeded at 18 with fuzz 1 (offset 1 line).
>>Hunk #2 FAILED at 700.
>>Hunk #3 succeeded at 842 (offset 14 lines).
>>1 out of 3 hunks FAILED -- saving rejects to file drivers/vhost/net.c.rej

Sorry, seem to risky to backport it ourself to redhat kernel. I think it's better to wait for next rhel7 kernel update.
 
Could You show current fully-patched net.c? I'm sure that I can adopt patch to it. It should be easy...
I'll try tomorrow with current CentOS's kernel :)
 
Last edited:
What's the status on this? I'm also currently affected by this bug and am using e1000 NIC for now.
 
Patch submitted to pve-devel and I'm testing fixed kernel. Successfully. Still not commited to PVE's git.
Actually e1000 shouldn't be affected. If it is, then this is another bug.
 
Hi,
Is it stable since you use the patch ?
Yes, even after I have raised memory usage and ran all load that caused problem several times:
Code:
free -m
             total       used       free     shared    buffers     cached
Mem:         64348      63147       1201          0        150      26045
-/+ buffers/cache:      36951      27396
Swap:        59519        500      59019
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!