virtio net crashing after upgrade to proxmox 2.0

opty

Renowned Member
Apr 5, 2012
20
0
66
France
Hello,

My Gentoo guest is crashing on network init, since I have upgraded my hostnode to proxmox 2.0

Kernel on the guest is 3.2.1

Switching back to E1000 permits to boot up

here is the kernel dump with virtio net

Apr 5 08:23:17 backupext kernel: [ 69.960350] INFO: rcu_sched detected stall on CPU 3 (t=15000 jiffies)
Apr 5 08:23:17 backupext kernel: [ 69.960350] Pid: 5399, comm: ip Not tainted 3.2.1-gentoo-r2 #1
Apr 5 08:23:17 backupext kernel: [ 69.960350] Call Trace:
Apr 5 08:23:17 backupext kernel: [ 69.960350] <IRQ> [<ffffffff81089cdd>] ? __rcu_pending+0x1ed/0x400
Apr 5 08:23:17 backupext kernel: [ 69.960350] [<ffffffff8108a147>] ? rcu_check_callbacks+0x57/0x110
Apr 5 08:23:17 backupext kernel: [ 69.960350] [<ffffffff81049b3f>] ? update_process_times+0x3f/0x80
Apr 5 08:23:17 backupext kernel: [ 69.960350] [<ffffffff810677f2>] ? tick_nohz_handler+0x72/0xd0
Apr 5 08:23:17 backupext kernel: [ 69.960350] [<ffffffff8101c113>] ? smp_apic_timer_interrupt+0x63/0xa0
Apr 5 08:23:17 backupext kernel: [ 69.960350] [<ffffffff814beb4b>] ? apic_timer_interrupt+0x6b/0x70
Apr 5 08:23:17 backupext kernel: [ 69.960350] <EOI> [<ffffffff8128c40a>] ? virtqueue_get_buf+0x7a/0xc0
Apr 5 08:23:17 backupext kernel: [ 69.960350] [<ffffffff812e9606>] ? virtnet_send_command.clone.28+0x226/0x250
Apr 5 08:23:17 backupext kernel: [ 69.960350] [<ffffffff811f7ca2>] ? sg_init_table+0x22/0x50
Apr 5 08:23:17 backupext kernel: [ 69.960350] [<ffffffff812e9717>] ? virtnet_set_rx_mode+0xe7/0x2f0
Apr 5 08:23:17 backupext kernel: [ 69.960350] [<ffffffff813793c8>] ? dev_set_rx_mode+0x28/0x40
Apr 5 08:23:17 backupext kernel: [ 69.960350] [<ffffffff81379499>] ? __dev_open+0xb9/0xf0
Apr 5 08:23:17 backupext kernel: [ 69.960350] [<ffffffff8137970a>] ? __dev_change_flags+0x9a/0x180
Apr 5 08:23:17 backupext kernel: [ 69.960350] [<ffffffff813798a0>] ? dev_change_flags+0x20/0x70
Apr 5 08:23:17 backupext kernel: [ 69.960350] [<ffffffff81385ce6>] ? do_setlink+0x1c6/0x9d0
Apr 5 08:23:17 backupext kernel: [ 69.960350] [<ffffffff81462b57>] ? inet6_fill_ifla6_attrs+0x237/0x2c0
Apr 5 08:23:17 backupext kernel: [ 69.960350] [<ffffffff814209ed>] ? snmp_fold_field+0x4d/0x70
Apr 5 08:23:17 backupext kernel: [ 69.960350] [<ffffffff81206c0d>] ? nla_parse+0x2d/0xd0
Apr 5 08:23:17 backupext kernel: [ 69.960350] [<ffffffff81387398>] ? rtnl_newlink+0x348/0x530
Apr 5 08:23:17 backupext kernel: [ 69.960350] [<ffffffff81462bd0>] ? inet6_fill_ifla6_attrs+0x2b0/0x2c0
Apr 5 08:23:17 backupext kernel: [ 69.960350] [<ffffffff8136c6ac>] ? __alloc_skb+0x7c/0x170
Apr 5 08:23:17 backupext kernel: [ 69.960350] [<ffffffff81386aa0>] ? __rtnl_unlock+0x20/0x20
Apr 5 08:23:17 backupext kernel: [ 69.960350] [<ffffffff81393009>] ? netlink_rcv_skb+0x99/0xc0
Apr 5 08:23:17 backupext kernel: [ 69.960350] [<ffffffff81386998>] ? rtnetlink_rcv+0x18/0x20
Apr 5 08:23:17 backupext kernel: [ 69.960350] [<ffffffff81392909>] ? netlink_unicast+0x299/0x2e0
Apr 5 08:23:17 backupext kernel: [ 69.960350] [<ffffffff81392b68>] ? netlink_sendmsg+0x218/0x340
Apr 5 08:23:17 backupext kernel: [ 69.960350] [<ffffffff8136332b>] ? sock_sendmsg+0xab/0xe0
Apr 5 08:23:17 backupext kernel: [ 69.960350] [<ffffffff8109d560>] ? __alloc_pages_nodemask+0x110/0x6f0
Apr 5 08:23:17 backupext kernel: [ 69.960350] [<ffffffff8109d560>] ? __alloc_pages_nodemask+0x110/0x6f0
Apr 5 08:23:17 backupext kernel: [ 69.960350] [<ffffffff810966e8>] ? find_get_page+0x18/0xa0
Apr 5 08:23:17 backupext kernel: [ 69.960350] [<ffffffff81098578>] ? filemap_fault+0x98/0x480
Apr 5 08:23:17 backupext kernel: [ 69.960350] [<ffffffff8136332b>] ? sock_sendmsg+0xab/0xe0
Apr 5 08:23:17 backupext kernel: [ 69.960350] [<ffffffff811e864c>] ? cpumask_any_but+0x2c/0x40
Apr 5 08:23:17 backupext kernel: [ 69.960350] [<ffffffff81365265>] ? move_addr_to_kernel+0x55/0x80
Apr 5 08:23:17 backupext kernel: [ 69.960350] [<ffffffff8136f966>] ? verify_iovec+0x66/0xe0
Apr 5 08:23:17 backupext kernel: [ 69.960350] [<ffffffff813637fe>] ? __sys_sendmsg+0x3be/0x3d0
Apr 5 08:23:17 backupext kernel: [ 69.960350] [<ffffffff81026a39>] ? do_page_fault+0x199/0x420
Apr 5 08:23:17 backupext kernel: [ 69.960350] [<ffffffff810b6573>] ? do_brk+0x283/0x370
Apr 5 08:23:17 backupext kernel: [ 69.960350] [<ffffffff813660a4>] ? sys_sendmsg+0x44/0x80
Apr 5 08:23:17 backupext kernel: [ 69.960350] [<ffffffff814be0bb>] ? system_call_fastpath+0x16/0x1b





Any idea?

thanks
 
The same kernel seems to work with virtio driver on another upgraded server, maybe that something failed during the upgrade process on the other server?
 
in my case:
1. add the one virtio disk or two virtio disk for guest(l26), virtio net can boot up,and system normal boot up;
2. add the third virtio disk for the one guest, can't install os(sl6.2) for the guest, and virtio net can't boot up,and occur error like this:
tack:
Call Trace:
Code: 89 04 24 48 8b 7b 18 8b 95 24 ff ff ff e8 f7 ba f2 ff 85 c0 78 43 48 8b 7b
18 4c 8d 65 bc e8 06 b6 f2 ff eb 06 0f 1f 40 00 f3 90 <48> 8b 7b 18 4c 89 e6 e8

3. add the fourth virtio disk for guest(l26), can't install os(sl6.2) for the guest, and can't boot up the system;
4. add the fifth virtio disk for guest(l26), can't install os(sl6.2) for the guest, and can't boot up the system
error.PNG

Well, any idea?
Thanks
 
Last edited:
Hi guys,

could you use the monitor on your vm and type "info pci", and post the result ?

thanks


Update: I have just tested with centos6.2 and debian6, no problem with 5 virtio disk and 4 virtio nics....
 
Last edited:
Hi guys,

could you use the monitor on your vm and type "info pci", and post the result ?

thanks


Update: I have just tested with centos6.2 and debian6, no problem with 5 virtio disk and 4 virtio nics....

Hello, here it is

Code:
# info pci
  Bus  0, device   0, function 0:
    Host bridge: PCI device 8086:1237
      id ""
  Bus  0, device   1, function 0:
    ISA bridge: PCI device 8086:7000
      id ""
  Bus  0, device   1, function 1:
    IDE controller: PCI device 8086:7010
      BAR4: I/O at 0xc140 [0xc14f].
      id ""
  Bus  0, device   1, function 2:
    USB controller: PCI device 8086:7020
      IRQ 11.
      BAR4: I/O at 0xc100 [0xc11f].
      id ""
  Bus  0, device   1, function 3:
    Bridge: PCI device 8086:7113
      IRQ 9.
      id ""
  Bus  0, device   2, function 0:
    VGA controller: PCI device 1013:00b8
      BAR0: 32 bit prefetchable memory at 0xfc000000 [0xfdffffff].
      BAR1: 32 bit memory at 0xfebf0000 [0xfebf0fff].
      BAR6: 32 bit memory at 0xffffffffffffffff [0x0000fffe].
      id ""
  Bus  0, device  10, function 0:
    SCSI controller: PCI device 1af4:1001
      IRQ 10.
      BAR0: I/O at 0xc000 [0xc03f].
      BAR1: 32 bit memory at 0xfebf1000 [0xfebf1fff].
      id "virtio0"
  Bus  0, device  11, function 0:
    SCSI controller: PCI device 1af4:1001
      IRQ 11.
      BAR0: I/O at 0xc040 [0xc07f].
      BAR1: 32 bit memory at 0xfebf2000 [0xfebf2fff].
      id "virtio1"
  Bus  0, device  12, function 0:
    SCSI controller: PCI device 1af4:1001
      IRQ 11.
      BAR0: I/O at 0xc080 [0xc0bf].
      BAR1: 32 bit memory at 0xfebf3000 [0xfebf3fff].
      id "virtio2"
  Bus  0, device  13, function 0:
    SCSI controller: PCI device 1af4:1001
      IRQ 10.
      BAR0: I/O at 0xc0c0 [0xc0ff].
      BAR1: 32 bit memory at 0xfebf4000 [0xfebf4fff].
      id "virtio3"
  Bus  0, device  18, function 0:
    Ethernet controller: PCI device 1af4:1000
      IRQ 10.
      BAR0: I/O at 0xc120 [0xc13f].
      BAR1: 32 bit memory at 0xfebf5000 [0xfebf5fff].
      BAR6: 32 bit memory at 0xffffffffffffffff [0x0000fffe].
      id "net0"
BTW, I'll try to boot up with others kernels on the host and the guest

UPDATE : same problem happens with :
- netbooted 3.2.13 kernel from OVH (the host is an hosted server at OVH) in place of 2.6.32-10-pve on the host
- kernel 3.1.6 instead of 3.2.1 on gentoo guest
 
Last edited:
I've 4 virtio disk and 1 virtio net. If I remove 1 virtio disk the virtio net is not anymore crashing!!!
 
But if I add a fifth virtio disk I have a timeout with udev for the fifth virtio disk as if I was limited to 4 virtio devices whatever the type of device
 
More tests shows that I can have any number of functional virtio-net as soon as I don't have no more than 3 virtio-disks... So here is the tested combinations that do (not) work :
- 3 virtio disks + 1 virtio-net = OK
- 3 virtio disks + 2 virtio-net = OK
- 3 virtio disks + 1 scsi (lsi) disk + 1 virtio-net = OK
- 3 virtio disks + 1 scsi (lsi) disk + 2 virtio-net = OK
- 4 virtio disks + 1 virtio-net = NOK (hang at net init on the virtio-net)
- 4 virtio disks + 1 e1000 = OK
- 4 virtio disks + 1 e1000 + 1 virtio-net = NOK (hang at net init on the virtio-net)
- 5 virtio disks + 1 e1000 = NOK (udevadm settle timeout on disk N°5 which become unusable)
- 5 virtio disks + 2 virtio-net = NOK (udevadm settle timeout on disk N°5 + hang on the virtio-net)
- 5 virtio disks + 3 virtio-net = NOK (udev settle timeout on disk N°5 + hang on the first virtio-net)

With latest proxmox 1.9 I was able to have 4 virtio disks and at least 1 virtio net
 
hi, spirit

this is my case
env: centos 6.2
pve: pve-manager/2.0/ff6cd700

vm hardware:
1 system disk(virtio0) + 3 virtio disk + 1 virtio-net(net0) = NOK (hang at net init on the virtio-net)

# ps aux|grep kvm
root 2025 0.0 0.0 0 0 ? S 10:07 0:00 [kvm-irqfd-clean]
root 7808 100 1.0 8595160 679624 ? Sl 10:45 5:18 /usr/bin/kvm -id 100 -chardev socket,id=monitor,path=/var/run/qemu-server/100.mon,server,nowait -mon chardev=monitor,mode=readline -vnc unix:/var/run/qemu-server/100.vnc,x509,password -pidfile /var/run/qemu-server/100.pid -daemonize -usbdevice tablet -name centos-6.2 -smp sockets=1,cores=4 -nodefaults -boot menu=on -vga cirrus -localtime -k en-us -drive file=/dev/disk5/vm-100-disk-1,if=none,id=drive-virtio3,aio=native,cache=none -device virtio-blk-pci,drive=drive-virtio3,id=virtio3,bus=pci.0,addr=0xd -drive file=/dev/disk3/vm-100-disk-1,if=none,id=drive-virtio1,aio=native,cache=none -device virtio-blk-pci,drive=drive-virtio1,id=virtio1,bus=pci.0,addr=0xb -drive if=none,id=drive-ide2,media=cdrom,aio=native -device ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=200 -drive file=/dev/disk2/vm-100-disk-1,if=none,id=drive-virtio0,aio=native,cache=none -device virtio-blk-pci,drive=drive-virtio0,id=virtio0,bus=pci.0,addr=0xa,bootindex=102 -drive file=/dev/disk4/vm-100-disk-1,if=none,id=drive-virtio2,aio=native,cache=none -device virtio-blk-pci,drive=drive-virtio2,id=virtio2,bus=pci.0,addr=0xc -m 8192 -netdev type=tap,id=net0,ifname=tap100i0,script=/var/lib/qemu-server/pve-bridge,vhost=on -device virtio-net-pci,mac=6A:A3:E9:EA:51:17,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=300
root 7825 0.0 0.0 0 0 ? S 10:45 0:00 [kvm-pit-wq]



hang at net init on the virtio-net:
error.PNG

Thanks!
 

Attachments

  • error.PNG
    error.PNG
    13.1 KB · Views: 5
hi, spirit

this is my case
env: centos 6.2
pve: pve-manager/2.0/ff6cd700

vm hardware:
1 system disk(virtio0) + 3 virtio disk + 1 virtio-net(net0) = NOK (hang at net init on the virtio-net)

# ps aux|grep kvm
root 2025 0.0 0.0 0 0 ? S 10:07 0:00 [kvm-irqfd-clean]
root 7808 100 1.0 8595160 679624 ? Sl 10:45 5:18 /usr/bin/kvm -id 100 -chardev socket,id=monitor,path=/var/run/qemu-server/100.mon,server,nowait -mon chardev=monitor,mode=readline -vnc unix:/var/run/qemu-server/100.vnc,x509,password -pidfile /var/run/qemu-server/100.pid -daemonize -usbdevice tablet -name centos-6.2 -smp sockets=1,cores=4 -nodefaults -boot menu=on -vga cirrus -localtime -k en-us -drive file=/dev/disk5/vm-100-disk-1,if=none,id=drive-virtio3,aio=native,cache=none -device virtio-blk-pci,drive=drive-virtio3,id=virtio3,bus=pci.0,addr=0xd -drive file=/dev/disk3/vm-100-disk-1,if=none,id=drive-virtio1,aio=native,cache=none -device virtio-blk-pci,drive=drive-virtio1,id=virtio1,bus=pci.0,addr=0xb -drive if=none,id=drive-ide2,media=cdrom,aio=native -device ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=200 -drive file=/dev/disk2/vm-100-disk-1,if=none,id=drive-virtio0,aio=native,cache=none -device virtio-blk-pci,drive=drive-virtio0,id=virtio0,bus=pci.0,addr=0xa,bootindex=102 -drive file=/dev/disk4/vm-100-disk-1,if=none,id=drive-virtio2,aio=native,cache=none -device virtio-blk-pci,drive=drive-virtio2,id=virtio2,bus=pci.0,addr=0xc -m 8192 -netdev type=tap,id=net0,ifname=tap100i0,script=/var/lib/qemu-server/pve-bridge,vhost=on -device virtio-net-pci,mac=6A:A3:E9:EA:51:17,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=300
root 7825 0.0 0.0 0 0 ? S 10:45 0:00 [kvm-pit-wq]



hang at net init on the virtio-net:
View attachment 875

Thanks!

Hi,
I found some bugzillas here:

[abrt] kernel: BUG: soft lockup - CPU#0 stuck for 185s!
https://bugzilla.redhat.com/show_bug.cgi?id=794478#c4
[PATCH 0/5 V5] Avoid soft lockup message when KVM is stopped by host
https://lkml.org/lkml/2011/12/5/490



as workaround,could you try to add "nosoftlockup" in your grub.cfg on kernel params ?


UPDATE:
"good" news, I can reproduce the bug with centos 6.2.
Just put 5 virtio disk, without virtio nic.
4 virtio disk boot fine, it's stuck on 5th disk.
I'll try to investigate more.
Capture du 2012-04-08 12:02:00.png
 
Last edited:
for 5 virtio disks, I get a time out on udev starting but no errors nor delays at initrd, in fact I never got problem at initrd, problems happen for me either at udev for disks either at init network script for virtio-net
 
"nosoftlockup" in kernel command line does not do anything but since this not a "soft lock up" but a "CPU Stall" from RCU_SHED, I'm not very surprised
 
Last edited:
Hi,
I had tested with upgrading my debian squeeze (2.6.32 kernel) to debian wheezy (3.2 kernel -udev 175).
And now I have the same problem, this is hanging on 5th virtio disk.

But, I can boot debian wheezy with 2.6.32 kernel - udev 175.

So It must be a kernel problem and not udev.

I'll try to post a message to kvm dev mailing list.
 
Last edited:
Hello, I've just compiled kernel 2.6.32 on my Gentoo guest and I can confirm that the problem does not happen with that kernel version

So the following guest kernels versions have problems with Proxmox 2.0 (an its current KVM) when using 5 or more virtio-disks or 4 virtio-disks and 1 or more virtio-net :
- 3.0.17
- 3.1.6
- 3.2.1
- 3.2.12

These kernels were working with Proxmox 1.9
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!