Random crashing version 3.3

Discussion in 'Proxmox VE: Installation and configuration' started by NSW, Dec 17, 2014.

  1. NSW

    NSW New Member

    Joined:
    Jul 19, 2011
    Messages:
    17
    Likes Received:
    0
    Hi,

    I am getting some random crashing with both a new install and an updated install. I've searched and cant really find any good information on it. If someone out there has any ideas, i would greatly appreciate the help. This node is part of a small cluster that was being updated from 3.2-4 to 3.3-5.

    Below is the most recent crash that i have actually been able to capture.

    Code:
    ------------[ cut here ]------------
    kernel BUG at net/core/skbuff.c:2717!
    invalid opcode: 0000 [#1] SMP 
    last sysfs file: /sys/kernel/uevent_seqnum
    CPU 0 
    Modules linked in: netconsole ip_set vzethdev vznetdev pio_nfs pio_direct pfmt_raw pfmt_ploop1 ploop simfs vzrst nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 vzcpt nf_conntrack vzdquota vzmon vzdev ip6t_REJECT ip6table_mangle ip6table_filter ip6_tables xt_length xt_hl xt_tcpmss xt_TCPMSS iptable_mangle iptable_filter xt_multiport xt_limit vhost_net xt_dscp tun macvtap macvlan nfnetlink_log nfnetlink ipt_REJECT kvm_amd ip_tables kvm dlm configfs vzevent ib_iser rdma_cm ib_addr iw_cm ib_cm ib_sa ib_mad ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi nfsd nfs nfs_acl auth_rpcgss fscache lockd sunrpc bonding 8021q garp ipv6 fuse snd_pcsp snd_pcm snd_page_alloc snd_timer snd serio_raw i2c_piix4 fam15h_power k10temp amd64_edac_mod edac_mce_amd edac_core soundcore shpchp ext3 mbcache jbd sg ata_generic pata_acpi mpt2sas raid_class usb_storage igb i2c_algo_bit bnx2 pata_atiixp i2c_core dca scsi_transport_sas ahci [last unloaded: scsi_wait_scan]
    
    Pid: 7620, comm: kvm veid: 0 Not tainted 2.6.32-34-pve #1 042stab094_7 Supermicro H8DGU-LN4/H8DGU-LN4
    RIP: 0010:[<ffffffff81472c49>]  [<ffffffff81472c49>] skb_segment+0x709/0x740
    RSP: 0018:ffff8800282037f0  EFLAGS: 00010212
    RAX: 0000000000000000 RBX: ffff8804246cbe40 RCX: ffff88042bba0d40
    RDX: 000000000000004d RSI: ffff88042d135882 RDI: ffff880423962882
    RBP: ffff8800282038a0 R08: 0000000000000000 R09: 0000000000000000
    R10: ffff880423962800 R11: 0000000000000000 R12: 00000000000005ee
    R13: 0000000000000000 R14: ffff880429ea3e80 R15: 000000000000004d
    FS:  00007f18615a2900(0000) GS:ffff880028200000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 000000000e9f3000 CR3: 000000100a715000 CR4: 00000000000407f0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
    Process kvm (pid: 7620, veid: 0, threadinfo ffff88100a7a0000, task ffff8810208e0d30)
    Stack:
     ffff880028203820 ffffffff8105bda8 0000000100000000 000000820000003c
    <d> 000000000000003c 0000000000000046 0000000000000000 0000000000000046
    <d> 0100880000000000 ffffffffffffffba ffff88042d286580 0000000000000000
    Call Trace:
     <IRQ> 
     [<ffffffff8105bda8>] ? task_rq_lock+0x58/0xa0
     [<ffffffff814c3e51>] tcp_tso_segment+0xf1/0x320
     [<ffffffff81471ce7>] ? __kfree_skb+0x47/0xa0
     [<ffffffff814eb2e1>] inet_gso_segment+0x111/0x2e0
     [<ffffffff8147f037>] skb_mac_gso_segment+0xa7/0x290
     [<ffffffff8147f278>] __skb_gso_segment+0x58/0xc0
     [<ffffffff8147f2f3>] skb_gso_segment+0x13/0x20
     [<ffffffff8147f391>] dev_hard_start_xmit+0x91/0x5f0
     [<ffffffff8149e19a>] sch_direct_xmit+0x16a/0x1d0
     [<ffffffff8147fbd8>] dev_queue_xmit+0x208/0x300
     [<ffffffff8151f780>] ? __br_forward+0x0/0xd0
     [<ffffffff8151f48b>] br_dev_queue_push_xmit+0x7b/0xc0
     [<ffffffff8151f528>] br_forward_finish+0x58/0x60
     [<ffffffff8151f82b>] __br_forward+0xab/0xd0
     [<ffffffff8151f3ee>] deliver_clone+0x3e/0x60
     [<ffffffff8151f780>] ? __br_forward+0x0/0xd0
     [<ffffffff8151f722>] br_flood+0x82/0xe0
     [<ffffffff8151facc>] br_flood_forward+0x1c/0x20
     [<ffffffff81520c60>] br_handle_frame_finish+0x330/0x370
     [<ffffffff81520e4a>] br_handle_frame+0x1aa/0x250
     [<ffffffff8147ffdf>] __netif_receive_skb+0x24f/0x770
     [<ffffffff81480648>] netif_receive_skb+0x58/0x60
     [<ffffffff81480848>] napi_gro_complete+0xc8/0x150
     [<ffffffff81480ad3>] dev_gro_receive+0x203/0x320
     [<ffffffff8152f358>] vlan_gro_common+0x1b8/0x260
     [<ffffffff8152f882>] vlan_gro_receive+0x82/0xa0
     [<ffffffffa00772de>] igb_receive_skb+0x2e/0x50 [igb]
     [<ffffffffa0081cdf>] igb_poll+0x74f/0x1370 [igb]
     [<ffffffff81060b4d>] ? enqueue_task_fair+0xdd/0x1f0
     [<ffffffff81058c96>] ? enqueue_task+0x66/0x80
     [<ffffffff814810b1>] net_rx_action+0x1a1/0x3b0
     [<ffffffff81014d79>] ? read_tsc+0x9/0x20
     [<ffffffff8107d24b>] __do_softirq+0x11b/0x260
     [<ffffffff8100c4cc>] call_softirq+0x1c/0x30
     [<ffffffff81010235>] do_softirq+0x75/0xb0
     [<ffffffff8107d525>] irq_exit+0xc5/0xd0
     [<ffffffff81563f92>] do_IRQ+0x72/0xe0
     [<ffffffff8100bb13>] ret_from_intr+0x0/0x11
     <EOI> 
     [<ffffffff811bd6f0>] ? sys_ioctl+0x0/0x80
     [<ffffffff8100b182>] ? system_call_fastpath+0x16/0x1b
    Code: c5 fc ff ff 41 8b 87 d4 00 00 00 49 03 87 d8 00 00 00 48 83 78 18 00 75 1b 48 89 48 18 e9 f8 fe ff ff f0 ff 81 ec 00 00 
    Code:
    proxmox-ve-2.6.32: 3.3-139 (running kernel: 2.6.32-34-pve)
    pve-manager: 3.3-5 (running version: 3.3-5/bfebec03)
    pve-kernel-2.6.32-32-pve: 2.6.32-136
    pve-kernel-2.6.32-34-pve: 2.6.32-139
    lvm2: 2.02.98-pve4
    clvm: 2.02.98-pve4
    corosync-pve: 1.4.7-1
    openais-pve: 1.1.4-3
    libqb0: 0.11.1-2
    redhat-cluster-pve: 3.2.0-2
    resource-agents-pve: 3.9.2-4
    fence-agents-pve: 4.0.10-1
    pve-cluster: 3.0-15
    qemu-server: 3.3-3
    pve-firmware: 1.1-3
    libpve-common-perl: 3.0-19
    libpve-access-control: 3.0-15
    libpve-storage-perl: 3.0-25
    pve-libspice-server1: 0.12.4-3
    vncterm: 1.1-8
    vzctl: 4.0-1pve6
    vzprocps: 2.0.11-2
    vzquota: 3.1-2
    pve-qemu-kvm: 2.1-10
    ksm-control-daemon: 1.1-1
    glusterfs-client: 3.5.2-1
    Thanks in advance for any advice or info you can provide.
     
  2. udo

    udo Well-Known Member
    Proxmox Subscriber

    Joined:
    Apr 22, 2009
    Messages:
    5,835
    Likes Received:
    159
    Hi,
    don't know if it helps, but I see you use an Supermicro MB. I had one Supermicro mainboard, where pve suddenly reboots until my colleague flash an new bios.

    Unfortunal supermicro don't show, which issues are solved with which upgrade (but wrote you should only bios-update if your issue is bios-related!!) - other companys, like Asus, has much better information. This is the reason, why I avoid supermicro these days.
    Nevertheless, you can try an bios-update...

    Udo
     
  3. deludi

    deludi New Member

    Joined:
    Oct 14, 2013
    Messages:
    26
    Likes Received:
    0
    Hi nsw,

    Udo is correct.
    We have had major problems with the supermicro x9scm-f boards (total hypervisor crash when starting a big copy) Turned out that one of the nics had issues with linux. This was not a proxmox issue, we managed to reproduce this error on debian and centos with plain kvm.
    Solution: replace the boards with supermicro atom c2750 boards, rock solid since the replacement.
    The x9scm-f boards are now used for our omnios+napp-it storage boxes: stable without any problem. The illumos driver for that nic is better than the linux one.

    Regards,

    Dirk Adamsky
     
  4. NSW

    NSW New Member

    Joined:
    Jul 19, 2011
    Messages:
    17
    Likes Received:
    0
    Udo and Dirk,

    Thanks for the input. I've checked the BIOS on both and they are up to date. Whats wired is the other identical server, still running 3.2-4, has had no crashing issues at all. It's only the updated boxes that are crashing. I may look at rolling back the kernel and see if that helps at all.

    On the supermicro BIOS issue, i can agree. They tell you nothing about the updates or what they fix. We went with supermicro because we got a good deal on them and they had a good AMD option. We are running the AS-2022G-URF and they use the H8DGU-F board.
     
  5. udo

    udo Well-Known Member
    Proxmox Subscriber

    Joined:
    Apr 22, 2009
    Messages:
    5,835
    Likes Received:
    159
    Hi,
    you can try the 2.6.32-33 ( I had also on an AMD-system trouble with 2.6.32-28 -32 ). The 2.6.32-27 run stable for me and the 2.6.32-33 too.


    Udo
     
  6. NSW

    NSW New Member

    Joined:
    Jul 19, 2011
    Messages:
    17
    Likes Received:
    0
    Reporting back, no luck with the other kernels. Still crashing with the same error on both boxes. I even tried pushing one of the servers up to the pvetest repo and updating with no effect. Updated firmware on the interconnecting switch, still no luck. I guess the last option i have is to downgrade/reinstall to 3.2-4 which is running without a single problem. *sigh* Going to miss the NoVNC console. :(

    Anyway, thanks for all the input Udo. I'll keep working on this with a test system and hopefully a newer version down the line will work without crashing daily. If anyone has any other ideas, i'm open to input.
     
  1. This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
    By continuing to use this site, you are consenting to our use of cookies.
    Dismiss Notice