Strange kernel log

jdw · Oct 10, 2011

Hello,

I have a few Proxmox 1.9 servers running in a test configuration.

I noticed this morning one went to "nosync" and I logged in, the root fs was full because /var/log/syslog took up the entire disk.

I rotated it out and hupped syslog. Sync is now fine, but the /var/log/syslog file is still growing at the rate of 10MB/min.

The contents look like this, over and over:

Code:

Oct 10 11:12:35 v9 kernel: ------------[ cut here ]------------Oct 10 11:12:35 v9 kernel: WARNING: at net/core/dev.c:1683 skb_gso_segment+0x1f7/0x2d0() (Tainted: G        W  ----------------  )
Oct 10 11:12:35 v9 kernel: Hardware name: X8DTT
Oct 10 11:12:35 v9 kernel: 802.1Q VLAN Support: caps=(0x110829, 0x0) len=2932 data_len=0 ip_summed=1
Oct 10 11:12:35 v9 kernel: Modules linked in: vhost_net macvtap macvlan tun kvm_intel kvm vzethdev vznetdev simfs vzrst nf_nat nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 vzcpt nfs lockd fscache nfs_acl auth_rpcgss sunrpc vzdquota vzmon vzdev ip6t_REJECT ip6table_mangle ip6table_filter ip6_tables xt_length xt_hl xt_tcpmss xt_TCPMSS iptable_mangle iptable_filter xt_multiport xt_limit xt_dscp ipt_REJECT ip_tables vzevent ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi 8021q garp bridge stp llc bonding ipv6 snd_pcm snd_timer tpm_tis tpm snd tpm_bios soundcore i2c_i801 snd_page_alloc serio_raw i2c_core ghes hed pcspkr ioatdma i7core_edac dca edac_core shpchp ext3 jbd mbcache dm_mirror dm_region_hash dm_log dm_snapshot ahci e1000e [last unloaded: scsi_wait_scan]
Oct 10 11:12:35 v9 kernel: Pid: 327597, comm: vhost-327588 Tainted: G        W  ----------------   2.6.32-6-pve #1
Oct 10 11:12:35 v9 kernel: Call Trace:
Oct 10 11:12:35 v9 kernel: <IRQ>  [<ffffffff81438d47>] ? skb_gso_segment+0x1f7/0x2d0
Oct 10 11:12:35 v9 kernel: [<ffffffff81438d47>] ? skb_gso_segment+0x1f7/0x2d0
Oct 10 11:12:35 v9 kernel: [<ffffffff810699a8>] ? warn_slowpath_common+0x88/0xe0
Oct 10 11:12:35 v9 kernel: [<ffffffff81069afe>] ? warn_slowpath_fmt+0x6e/0x70
Oct 10 11:12:35 v9 kernel: [<ffffffffa01e10e8>] ? bond_dev_queue_xmit+0x48/0x1c0 [bonding]
Oct 10 11:12:35 v9 kernel: [<ffffffffa03062ba>] ? ipt_do_table+0x2ba/0x654 [ip_tables]
Oct 10 11:12:35 v9 kernel: [<ffffffffa023a621>] ? vlan_ethtool_get_drvinfo+0x31/0x40 [8021q]
Oct 10 11:12:35 v9 kernel: [<ffffffff81438d47>] ? skb_gso_segment+0x1f7/0x2d0
Oct 10 11:12:35 v9 kernel: [<ffffffff81438fb8>] ? dev_hard_start_xmit+0x198/0x480
Oct 10 11:12:35 v9 kernel: [<ffffffff8143dc36>] ? dev_queue_xmit+0x466/0x540
Oct 10 11:12:35 v9 kernel: [<ffffffffa021559d>] ? br_dev_queue_push_xmit+0x5d/0xc0 [bridge]
Oct 10 11:12:35 v9 kernel: [<ffffffffa021b60d>] ? br_nf_dev_queue_xmit+0x2d/0xb0 [bridge]
Oct 10 11:12:35 v9 kernel: [<ffffffffa021c428>] ? br_nf_post_routing+0x1e8/0x2c0 [bridge]
Oct 10 11:12:35 v9 kernel: [<ffffffff814624ff>] ? nf_iterate+0x6f/0xb0
Oct 10 11:12:35 v9 kernel: [<ffffffffa0215540>] ? br_dev_queue_push_xmit+0x0/0xc0 [bridge]
Oct 10 11:12:35 v9 kernel: [<ffffffff814625f3>] ? nf_hook_slow+0xb3/0x110
Oct 10 11:12:35 v9 kernel: [<ffffffffa0215540>] ? br_dev_queue_push_xmit+0x0/0xc0 [bridge]
Oct 10 11:12:35 v9 kernel: [<ffffffffa0215646>] ? br_forward_finish+0x46/0x70 [bridge]
Oct 10 11:12:35 v9 kernel: [<ffffffffa021bc78>] ? br_nf_forward_finish+0x138/0x140 [bridge]
Oct 10 11:12:35 v9 kernel: [<ffffffffa021d170>] ? br_nf_forward_ip+0x290/0x3c0 [bridge]
Oct 10 11:12:35 v9 kernel: [<ffffffff814624ff>] ? nf_iterate+0x6f/0xb0
Oct 10 11:12:35 v9 kernel: [<ffffffffa0215600>] ? br_forward_finish+0x0/0x70 [bridge]
Oct 10 11:12:35 v9 kernel: [<ffffffff814625f3>] ? nf_hook_slow+0xb3/0x110
Oct 10 11:12:35 v9 kernel: [<ffffffffa0215600>] ? br_forward_finish+0x0/0x70 [bridge]
Oct 10 11:12:35 v9 kernel: [<ffffffffa02156e1>] ? __br_forward+0x71/0xc0 [bridge]
Oct 10 11:12:35 v9 kernel: [<ffffffffa0215785>] ? br_forward+0x55/0x80 [bridge]
Oct 10 11:12:35 v9 kernel: [<ffffffffa021670c>] ? br_handle_frame_finish+0x15c/0x2e0 [bridge]
Oct 10 11:12:35 v9 kernel: [<ffffffffa021c20d>] ? br_nf_pre_routing_finish+0x32d/0x360 [bridge]
Oct 10 11:12:35 v9 kernel: [<ffffffffa021bee0>] ? br_nf_pre_routing_finish+0x0/0x360 [bridge]
Oct 10 11:12:35 v9 kernel: [<ffffffff814625f3>] ? nf_hook_slow+0xb3/0x110
Oct 10 11:12:35 v9 kernel: [<ffffffffa021bee0>] ? br_nf_pre_routing_finish+0x0/0x360 [bridge]
Oct 10 11:12:35 v9 kernel: [<ffffffffa021c8d0>] ? br_nf_pre_routing+0x3d0/0x7a0 [bridge]
Oct 10 11:12:35 v9 kernel: [<ffffffff814624ff>] ? nf_iterate+0x6f/0xb0
Oct 10 11:12:35 v9 kernel: [<ffffffffa02165b0>] ? br_handle_frame_finish+0x0/0x2e0 [bridge]
Oct 10 11:12:35 v9 kernel: [<ffffffff814625f3>] ? nf_hook_slow+0xb3/0x110
Oct 10 11:12:35 v9 kernel: [<ffffffffa02165b0>] ? br_handle_frame_finish+0x0/0x2e0 [bridge]
Oct 10 11:12:35 v9 kernel: [<ffffffffa0216a19>] ? br_handle_frame+0x189/0x260 [bridge]
Oct 10 11:12:35 v9 kernel: [<ffffffff8143852c>] ? __netif_receive_skb+0x25c/0x730
Oct 10 11:12:35 v9 kernel: [<ffffffff81436513>] ? __napi_complete+0x23/0x50
Oct 10 11:12:35 v9 kernel: [<ffffffff81438a92>] ? process_backlog+0x92/0x100
Oct 10 11:12:35 v9 kernel: [<ffffffff8143ce56>] ? net_rx_action+0x126/0x320
Oct 10 11:12:35 v9 kernel: [<ffffffff81072cfa>] ? __do_softirq+0x13a/0x230
Oct 10 11:12:35 v9 kernel: [<ffffffff8100c48c>] ? call_softirq+0x1c/0x30
Oct 10 11:12:35 v9 kernel: <EOI>  [<ffffffff8100e0c5>] ? do_softirq+0x65/0xa0
Oct 10 11:12:35 v9 kernel: [<ffffffff8143d4d6>] ? netif_rx_ni+0x26/0x30
Oct 10 11:12:35 v9 kernel: [<ffffffffa05d9da3>] ? tun_sendmsg+0x263/0x4d4 [tun]
Oct 10 11:12:35 v9 kernel: [<ffffffffa05f6f35>] ? handle_tx+0x225/0x500 [vhost_net]
Oct 10 11:12:35 v9 kernel: [<ffffffffa05f7245>] ? handle_tx_kick+0x15/0x20 [vhost_net]
Oct 10 11:12:35 v9 kernel: [<ffffffffa05f4a17>] ? vhost_worker+0xb7/0x130 [vhost_net]
Oct 10 11:12:35 v9 kernel: [<ffffffffa05f4960>] ? vhost_worker+0x0/0x130 [vhost_net]
Oct 10 11:12:35 v9 kernel: [<ffffffffa05f4960>] ? vhost_worker+0x0/0x130 [vhost_net]
Oct 10 11:12:35 v9 kernel: [<ffffffff81092e26>] ? kthread+0x96/0xb0
Oct 10 11:12:35 v9 kernel: [<ffffffff8100c38a>] ? child_rip+0xa/0x20
Oct 10 11:12:35 v9 kernel: [<ffffffff81092d90>] ? kthread+0x0/0xb0
Oct 10 11:12:35 v9 kernel: [<ffffffff8100c380>] ? child_rip+0x0/0x20
Oct 10 11:12:35 v9 kernel: ---[ end trace 638a4eabe20ad25a ]---

I have no idea what this means. Everything appears to work fine, but the above looks ugly.

There is only one test VM on this particular server. I migrated that VM to a different test machine successfully and the instant it finished, the syslog insanity also moved to that machine. The only thing different about that VM is that we're trying to get Virtnet network drivers to work. (And they appear to, except for this.)

The LAN config is three Intel gigabit ethernet ports bonded in LACP mode, with VLANs on top. The virtual machine has two virtnet network interfaces on two different vlans.

If anyone could kick me in the right direction here, I would really appreciate it.

Thanks!

tom · Oct 11, 2011

pls post also the output of 'pveversion -v'

jdw · Oct 11, 2011

Here we go:

Code:

pve-manager: 1.9-24 (pve-manager/1.9/6542)
running kernel: 2.6.32-6-pve
proxmox-ve-2.6.32: 1.9-43
pve-kernel-2.6.32-6-pve: 2.6.32-43
qemu-server: 1.1-32
pve-firmware: 1.0-13
libpve-storage-perl: 1.0-19
vncterm: 0.9-2
vzctl: 3.0.28-1pve5
vzdump: 1.2-15
vzprocps: 2.0.11-2
vzquota: 3.0.11-1
pve-qemu-kvm: 0.15.0-1
ksm-control-daemon: 1.0-6

There is a moderately good chance this is related to the guest OS virtio driver. I'm exploring that possibility now, but I would still love to know what exactly this message actually indicates. I mean, it looks like a kernel panic but the host is as stable as can be. There aren't even any material number of dropped packets or packet errors.

tom · Oct 11, 2011

there is newer kernel in the stable repo, also in our test repo. I suggest you upgrade to the latest from pvetest, see http://forum.proxmox.com/threads/7247-New-2.6.32-Kernel-(pvetest)

Bernard · Oct 11, 2011

i guess there might be a corruption of data due to which you have an error in your transmission of data. have a complete recheck of your computer.

jdw · Oct 12, 2011

tom said:
there is newer kernel in the stable repo, also in our test repo. I suggest you upgrade to the latest from pvetest, see http://forum.proxmox.com/threads/7247-New-2.6.32-Kernel-(pvetest)

I have done apt-get update & upgrade on v9. Here's what I have now:

Code:

pve-manager: 1.9-24 (pve-manager/1.9/6542)
running kernel: 2.6.32-6-pve
proxmox-ve-2.6.32: 1.9-47
pve-kernel-2.6.32-6-pve: 2.6.32-47
qemu-server: 1.1-32
pve-firmware: 1.0-14
libpve-storage-perl: 1.0-19
vncterm: 0.9-2
vzctl: 3.0.29-2pve1
vzdump: 1.2-16
vzprocps: 2.0.11-2
vzquota: 3.0.11-1
pve-qemu-kvm: 0.15.0-1
ksm-control-daemon: 1.0-6

I have also updated the virtio driver in the guest OS to the proper version. (It still could be buggy of course, but that's somewhat less likely than it was.)

Unfortunately, the problem still exists:

Code:

Oct 11 18:26:51 v9 kernel: ------------[ cut here ]------------
Oct 11 18:26:51 v9 kernel: WARNING: at net/core/dev.c:1683 skb_gso_segment+0x1f7/0x2d0() (Tainted: G        W  ----------------  )
Oct 11 18:26:51 v9 kernel: Hardware name: X8DTT
Oct 11 18:26:51 v9 kernel: 802.1Q VLAN Support: caps=(0x110829, 0x0) len=2558 data_len=0 ip_summed=1
Oct 11 18:26:51 v9 kernel: Modules linked in: vhost_net macvtap macvlan tun kvm_intel kvm vzethdev vznetdev simfs vzrst nf_nat nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 vzcpt nfs lockd fscache nfs_acl auth_rpcgss sunrpc vzdquota vzmon vzdev ip6t_REJECT ip6table_mangle ip6table_filter ip6_tables xt_length xt_hl xt_tcpmss xt_TCPMSS iptable_mangle iptable_filter xt_multiport xt_limit xt_dscp ipt_REJECT ip_tables vzevent ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi 8021q garp bridge stp llc bonding ipv6 snd_pcm snd_timer snd soundcore tpm_tis i2c_i801 tpm snd_page_alloc tpm_bios i2c_core ghes hed serio_raw pcspkr ioatdma dca i7core_edac edac_core shpchp ext3 jbd mbcache dm_mirror dm_region_hash dm_log dm_snapshot ahci e1000e [last unloaded: scsi_wait_scan]
Oct 11 18:26:51 v9 kernel: Pid: 4022, comm: vhost-4013 Tainted: G        W  ----------------   2.6.32-6-pve #1
Oct 11 18:26:51 v9 kernel: Call Trace:
Oct 11 18:26:51 v9 kernel: <IRQ>  [<ffffffff81438ff7>] ? skb_gso_segment+0x1f7/0x2d0
Oct 11 18:26:51 v9 kernel: [<ffffffff81438ff7>] ? skb_gso_segment+0x1f7/0x2d0
Oct 11 18:26:51 v9 kernel: [<ffffffff81069968>] ? warn_slowpath_common+0x88/0xe0
Oct 11 18:26:51 v9 kernel: [<ffffffff81069abe>] ? warn_slowpath_fmt+0x6e/0x70
Oct 11 18:26:51 v9 kernel: [<ffffffffa01d90e8>] ? bond_dev_queue_xmit+0x48/0x1c0 [bonding]
Oct 11 18:26:51 v9 kernel: [<ffffffffa03012ba>] ? ipt_do_table+0x2ba/0x654 [ip_tables]
Oct 11 18:26:51 v9 kernel: [<ffffffffa0235621>] ? vlan_ethtool_get_drvinfo+0x31/0x40 [8021q]
Oct 11 18:26:51 v9 kernel: [<ffffffff81438ff7>] ? skb_gso_segment+0x1f7/0x2d0
Oct 11 18:26:51 v9 kernel: [<ffffffff81439268>] ? dev_hard_start_xmit+0x198/0x4b0
Oct 11 18:26:51 v9 kernel: [<ffffffff8143df26>] ? dev_queue_xmit+0x466/0x540
Oct 11 18:26:51 v9 kernel: [<ffffffffa02105ad>] ? br_dev_queue_push_xmit+0x5d/0xc0 [bridge]
Oct 11 18:26:51 v9 kernel: [<ffffffffa021663d>] ? br_nf_dev_queue_xmit+0x2d/0xb0 [bridge]
Oct 11 18:26:51 v9 kernel: [<ffffffffa0217458>] ? br_nf_post_routing+0x1e8/0x2c0 [bridge]
Oct 11 18:26:51 v9 kernel: [<ffffffff814627ef>] ? nf_iterate+0x6f/0xb0
Oct 11 18:26:51 v9 kernel: [<ffffffffa0210550>] ? br_dev_queue_push_xmit+0x0/0xc0 [bridge]
Oct 11 18:26:51 v9 kernel: [<ffffffff814628e3>] ? nf_hook_slow+0xb3/0x110
Oct 11 18:26:51 v9 kernel: [<ffffffffa0210550>] ? br_dev_queue_push_xmit+0x0/0xc0 [bridge]
Oct 11 18:26:51 v9 kernel: [<ffffffffa0210656>] ? br_forward_finish+0x46/0x70 [bridge]
Oct 11 18:26:51 v9 kernel: [<ffffffffa0216ca8>] ? br_nf_forward_finish+0x138/0x140 [bridge]
Oct 11 18:26:51 v9 kernel: [<ffffffffa02181a0>] ? br_nf_forward_ip+0x290/0x3c0 [bridge]
Oct 11 18:26:51 v9 kernel: [<ffffffff814627ef>] ? nf_iterate+0x6f/0xb0
Oct 11 18:26:51 v9 kernel: [<ffffffffa0210610>] ? br_forward_finish+0x0/0x70 [bridge]
Oct 11 18:26:51 v9 kernel: [<ffffffff814628e3>] ? nf_hook_slow+0xb3/0x110
Oct 11 18:26:51 v9 kernel: [<ffffffffa0210610>] ? br_forward_finish+0x0/0x70 [bridge]
Oct 11 18:26:51 v9 kernel: [<ffffffffa02106f1>] ? __br_forward+0x71/0xc0 [bridge]
Oct 11 18:26:51 v9 kernel: [<ffffffffa0210795>] ? br_forward+0x55/0x80 [bridge]
Oct 11 18:26:51 v9 kernel: [<ffffffffa021173c>] ? br_handle_frame_finish+0x15c/0x2e0 [bridge]
Oct 11 18:26:51 v9 kernel: [<ffffffffa021723d>] ? br_nf_pre_routing_finish+0x32d/0x360 [bridge]
Oct 11 18:26:51 v9 kernel: [<ffffffffa0216f10>] ? br_nf_pre_routing_finish+0x0/0x360 [bridge]
Oct 11 18:26:51 v9 kernel: [<ffffffff814628e3>] ? nf_hook_slow+0xb3/0x110
Oct 11 18:26:51 v9 kernel: [<ffffffffa0216f10>] ? br_nf_pre_routing_finish+0x0/0x360 [bridge]
Oct 11 18:26:51 v9 kernel: [<ffffffffa0217900>] ? br_nf_pre_routing+0x3d0/0x7a0 [bridge]
Oct 11 18:26:51 v9 kernel: [<ffffffff814627ef>] ? nf_iterate+0x6f/0xb0
Oct 11 18:26:51 v9 kernel: [<ffffffffa02115e0>] ? br_handle_frame_finish+0x0/0x2e0 [bridge]
Oct 11 18:26:51 v9 kernel: [<ffffffff814628e3>] ? nf_hook_slow+0xb3/0x110
Oct 11 18:26:51 v9 kernel: [<ffffffffa02115e0>] ? br_handle_frame_finish+0x0/0x2e0 [bridge]
Oct 11 18:26:51 v9 kernel: [<ffffffffa0211a49>] ? br_handle_frame+0x189/0x260 [bridge]
Oct 11 18:26:51 v9 kernel: [<ffffffff81060db4>] ? rebalance_domains+0x414/0xad0
Oct 11 18:26:51 v9 kernel: [<ffffffff814387dc>] ? __netif_receive_skb+0x25c/0x730
Oct 11 18:26:51 v9 kernel: [<ffffffff814367c3>] ? __napi_complete+0x23/0x50
Oct 11 18:26:51 v9 kernel: [<ffffffff81438d42>] ? process_backlog+0x92/0x100
Oct 11 18:26:51 v9 kernel: [<ffffffff8143d146>] ? net_rx_action+0x126/0x320
Oct 11 18:26:51 v9 kernel: [<ffffffff81072cba>] ? __do_softirq+0x13a/0x230
Oct 11 18:26:51 v9 kernel: [<ffffffff8100c48c>] ? call_softirq+0x1c/0x30
Oct 11 18:26:51 v9 kernel: <EOI>  [<ffffffff8100e0c5>] ? do_softirq+0x65/0xa0
Oct 11 18:26:51 v9 kernel: [<ffffffff8143d7c6>] ? netif_rx_ni+0x26/0x30
Oct 11 18:26:51 v9 kernel: [<ffffffffa05d5da3>] ? tun_sendmsg+0x263/0x4d4 [tun]
Oct 11 18:26:51 v9 kernel: [<ffffffffa05f2f35>] ? handle_tx+0x225/0x500 [vhost_net]
Oct 11 18:26:51 v9 kernel: [<ffffffffa05f3245>] ? handle_tx_kick+0x15/0x20 [vhost_net]
Oct 11 18:26:51 v9 kernel: [<ffffffffa05f0a17>] ? vhost_worker+0xb7/0x130 [vhost_net]
Oct 11 18:26:51 v9 kernel: [<ffffffffa05f0960>] ? vhost_worker+0x0/0x130 [vhost_net]
Oct 11 18:26:51 v9 kernel: [<ffffffffa05f0960>] ? vhost_worker+0x0/0x130 [vhost_net]
Oct 11 18:26:51 v9 kernel: [<ffffffff81092e66>] ? kthread+0x96/0xb0
Oct 11 18:26:51 v9 kernel: [<ffffffff8100c38a>] ? child_rip+0xa/0x20
Oct 11 18:26:51 v9 kernel: [<ffffffff81092dd0>] ? kthread+0x0/0xb0
Oct 11 18:26:51 v9 kernel: [<ffffffff8100c380>] ? child_rip+0x0/0x20
Oct 11 18:26:51 v9 kernel: ---[ end trace cdf96ee36f4b0e1a ]---

Thanks for looking at it. Not sure where to go from here.

dietmar · Oct 12, 2011

Do you use iptables/firewall ?

jdw · Oct 12, 2011

dietmar said:
Do you use iptables/firewall ?

For testing purposes, it is just a right-off-the-install disk configuration on a private network, so there's not any iptables/firewall unless something was set up by the install or behind the scenes.

I'm not a Linux expert, but I think this means there's nothing:

Code:

# iptables --list
Chain INPUT (policy ACCEPT)
target     prot opt source               destination         


Chain FORWARD (policy ACCEPT)
target     prot opt source               destination         


Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination

The author of the guest's virtio net driver says:

Linux is unhappy because checksum offloading of the skb is CHECKSUM_UNNECESSARY instead of the expected CHECKSUM_PARTIAL (because CHECKSUM_COMPLETE is not set). I'm not sure why

He asks that I try setting vhost=off and (not at the same time) vnet_hdr=on in the qemu config and see what effect that has.

How would I go about doing that via Proxmox? I found the 105.conf file for this vm, but it looks like there's a lot of translation happening between that and the ultimate /usr/bin/kvm command line.

Thanks!

dietmar · Oct 13, 2011

jdw said:
How would I go about doing that via Proxmox? I found the 105.conf file for this vm, but it looks like there's a lot of translation happening between that and the ultimate /usr/bin/kvm command line.

For testing, you can start kvm directly. To get the command line type:

# qm showcmd <vmid>

Then modify that line as you need and start manually.

jdw · Oct 14, 2011

dietmar said:
# qm showcmd <vmid>

Thank you very much, that was exactly what I needed to run my tests.

We have determined it's definitely related to the checksum offloading and confirmed that every offloaded segment generates one such diagnostic, but have yet to figure out whether it's a problem on the KVM side or the guest side. We're still looking into it. Unfortunately, none of us involved are Linux experts, so it's slow going.

Search

Search

Strange kernel log

jdw

Renowned Member

tom

Proxmox Staff Member

jdw

Renowned Member

tom

Proxmox Staff Member

Bernard

Guest

jdw

Renowned Member

dietmar

Proxmox Staff Member

jdw

Renowned Member

dietmar

Proxmox Staff Member

jdw

Renowned Member