Backup leads to inaccessible Server (with kernel bug)

hk@ · Aug 15, 2011

Hi
we had this on one machine now twice, therefore I'm reporting it.

Things go like this: vzdump does it's job, but takes (for whatever reason) forever to finish, so after starting this backup at 22:22 hours, the following (quite expected for the given circuumstances):

Aug 15 06:40:09 p03 kernel: device-mapper: snapshots: Invalidating snapshot: Unable to allocate exception.
Aug 15 06:40:09 p03 kernel: Buffer I/O error on device dm-2, logical block 6357390
Aug 15 06:40:09 p03 kernel: lost page write due to I/O error on dm-2
Aug 15 06:40:09 p03 kernel: Buffer I/O error on device dm-2, logical block 6357391
Aug 15 06:40:09 p03 kernel: lost page write due to I/O error on dm-2
Aug 15 06:40:09 p03 kernel: Buffer I/O error on device dm-2, logical block 6357392
Aug 15 06:40:09 p03 kernel: lost page write due to I/O error on dm-2
Aug 15 06:40:09 p03 kernel: Buffer I/O error on device dm-2, logical block 6357395
Aug 15 06:40:09 p03 kernel: lost page write due to I/O error on dm-2
Aug 15 06:40:10 p03 kernel: Buffer I/O error on device dm-2, logical block 6357396
Aug 15 06:40:10 p03 kernel: lost page write due to I/O error on dm-2
Aug 15 06:40:10 p03 kernel: Buffer I/O error on device dm-2, logical block 6357400
Aug 15 06:40:10 p03 kernel: Buffer I/O error on device dm-2, logical block 6357299
Aug 15 06:40:10 p03 kernel: lost page write due to I/O error on dm-2
Aug 15 06:40:10 p03 kernel: Buffer I/O error on device dm-2, logical block 6357300
Aug 15 06:40:10 p03 kernel: lost page write due to I/O error on dm-2
Aug 15 06:40:10 p03 kernel: Buffer I/O error on device dm-2, logical block 6357307
Aug 15 06:40:10 p03 kernel: lost page write due to I/O error on dm-2
Aug 15 06:40:10 p03 kernel: Buffer I/O error on device dm-2, logical block 6357308
Aug 15 06:40:10 p03 kernel: lost page write due to I/O error on dm-2
Aug 15 06:40:10 p03 kernel: lost page write due to I/O error on dm-2
Aug 15 06:40:10 p03 kernel: EXT3-fs error (device dm-2): ext3_get_inode_loc: unable to read inode block - inode=1982465, block=7929858
Aug 15 06:40:13 p03 kernel: , block=6357050
Aug 15 06:40:13 p03 kernel: EXT3-fs error (device dm-2): ext3_get_inode_loc: unable to read inode block - inode=1590145, block=6357050
Aug 15 06:40:13 p03 kernel: EXT3-fs error (device dm-2): ext3_get_inode_loc: unable to read inode block - inode=1590145, block=6357050

and of course ext3-fs errors come in the hundreds now and then the expected directory erros (too hundreds of those):

Aug 15 06:41:32 p03 kernel: EXT3-fs error (device dm-2): ext3_find_entry: reading directory #1572933 offset 0
Aug 15 06:41:32 p03 kernel: EXT3-fs error (device dm-2): ext3_find_entry: reading directory #1572933 offset 0
Aug 15 06:41:32 p03 kernel: EXT3-fs error (device dm-2): ext3_find_entry: reading directory #1572933 offset 0

but then it gets dirty:

Aug 15 06:41:32 p03 kernel: EXT3-fs error (device dm-2): ext3_find_entry: reading directory #1025011 offset 0
Aug 15 06:41:32 p03 kernel: BUG: soft lockup - CPU#0 stuck for 69s! [updatedb.mlocat:27937]
Aug 15 06:41:32 p03 kernel: Aborting journal on device dm-2.
Aug 15 06:41:32 p03 kernel: __ratelimit: 12776 callbacks suppressed
Aug 15 06:41:32 p03 kernel: Buffer I/O error on device dm-2, logical block 27001346
Aug 15 06:41:32 p03 kernel: lost page write due to I/O error on dm-2
Aug 15 06:41:32 p03 kernel: Modules linked in: kvm_intel kvm nfs lockd fscache nfs_acl auth_rpcgss sunrpc vzethdev vznetdev s
imfs vzrst vzcpt vzdquota vzmon vzdev ip6t_REJECT ip6table_mangle ip6table_filter ip6_tables xt_length xt_hl xt_tcpmss xt_TCP
MSS xt_multiport xt_dscp vzevent ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scs
i_transport_iscsi bridge stp iptable_nat nf_nat ipt_LOG xt_limit xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_connt
rack ipt_REJECT iptable_filter iptable_mangle ip_tables x_tables ipmi_watchdog ipmi_devintf ipmi_si ipmi_msghandler mptctl co
ntainer evdev snd_pcm snd_timer i2c_i801 snd pcspkr serio_raw button processor i3200_edac i2c_core soundcore snd_page_alloc e
dac_core ext3 jbd mbcache dm_mirror dm_region_hash dm_log dm_snapshot mptsas mptscsih ehci_hcd uhci_hcd floppy ata_piix ata_g
eneric libata mptbase usbcore scsi_transport_sas nls_base tg3 libphy thermal fan thermal_sys [last unloaded: scsi_wait_scan]
Aug 15 06:41:32 p03 kernel: CPU 0:
Aug 15 06:41:32 p03 kernel: Modules linked in: kvm_intel kvm nfs lockd fscache nfs_acl auth_rpcgss sunrpc vzethdev vznetdev s
imfs vzrst vzcpt vzdquota vzmon vzdev ip6t_REJECT ip6table_mangle ip6table_filter ip6_tables xt_length xt_hl xt_tcpmss xt_TCP
MSS xt_multiport xt_dscp vzevent ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scs
i_transport_iscsi bridge stp iptable_nat nf_nat ipt_LOG xt_limit xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_connt
rack ipt_REJECT iptable_filter iptable_mangle ip_tables x_tables ipmi_watchdog ipmi_devintf ipmi_si ipmi_msghandler mptctl container evdev snd_pcm snd_timer i2c_i801 snd pcspkr serio_raw button processor i3200_edac i2c_core soundcore snd_page_alloc edac_core ext3 jbd mbcache dm_mirror dm_region_hash dm_log dm_snapshot mptsas mptscsih ehci_hcd uhci_hcd floppy ata_piix ata_generic libata mptbase usbcore scsi_transport_sas nls_base tg3 libphy thermal fan thermal_sys [last unloaded: scsi_wait_scan]
Aug 15 06:41:32 p03 kernel: Pid: 27937, comm: updatedb.mlocat Tainted: G W 2.6.32-4-pve #1 feoktistov IBM System x3250 M2 -[]-
Aug 15 06:41:32 p03 kernel: RIP: 0010:[<ffffffff81315b35>] [<ffffffff81315b35>] _read_unlock_bh+0x3/0xc
Aug 15 06:41:32 p03 kernel: RSP: 0018:ffff88000ac03bf8 EFLAGS: 00000216
Aug 15 06:41:32 p03 kernel: RAX: 0000000000000000 RBX: ffff880100f88300 RCX: 0000000000000004
Aug 15 06:41:32 p03 kernel: RDX: ffff88000ac19a00 RSI: ffff88000ac03cac RDI: ffffffff814d77d4
Aug 15 06:41:32 p03 kernel: RBP: ffffffff81011733 R08: 0000000000000000 R09: ffff8800465d7140
Aug 15 06:41:32 p03 kernel: R10: 0000000000000206 R11: 0000000000000000 R12: ffff88000ac03b70
Aug 15 06:41:32 p03 kernel: R13: 0000000000000004 R14: 0000000000000000 R15: ffffffff8102513b
Aug 15 06:41:32 p03 kernel: FS: 00007f05da5236e0(0000) GS:ffff88000ac00000(0000) knlGS:0000000000000000
Aug 15 06:41:32 p03 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Aug 15 06:41:32 p03 kernel: CR2: 0000000009fbc000 CR3: 00000001731c6000 CR4: 00000000000426f0
Aug 15 06:41:32 p03 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Aug 15 06:41:32 p03 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Aug 15 06:41:32 p03 kernel: Call Trace:
Aug 15 06:41:32 p03 kernel: <IRQ> [<ffffffff8126e121>] ? neigh_lookup+0xbd/0xcc
Aug 15 06:41:32 p03 kernel: [<ffffffff8126f32c>] ? neigh_event_ns+0x41/0x95
Aug 15 06:41:32 p03 kernel: [<ffffffff812ae496>] ? arp_process+0x353/0x60b
Aug 15 06:41:32 p03 kernel: [<ffffffff812607f0>] ? __netdev_alloc_skb+0x29/0x45
Aug 15 06:41:32 p03 kernel: [<ffffffff810165b1>] ? read_tsc+0xa/0x20
Aug 15 06:41:32 p03 kernel: [<ffffffff810ea8ea>] ? __kmalloc_node_track_caller+0x130/0x168
Aug 15 06:41:32 p03 kernel: [<ffffffff812665f3>] ? netif_receive_skb+0x47c/0x4de
Aug 15 06:41:32 p03 kernel: [<ffffffff81266782>] ? napi_skb_finish+0x1c/0x31
Aug 15 06:41:32 p03 kernel: [<ffffffffa00324dd>] ? tg3_poll+0x6cf/0x93d [tg3]
Aug 15 06:41:32 p03 kernel: [<ffffffff81266ccd>] ? net_rx_action+0xae/0x1c9
Aug 15 06:41:32 p03 kernel: [<ffffffff810549b8>] ? __do_softirq+0x127/0x22f
Aug 15 06:41:32 p03 kernel: [<ffffffff81011d6c>] ? call_softirq+0x1c/0x30
Aug 15 06:41:32 p03 kernel: [<ffffffff810132eb>] ? do_softirq+0x3f/0x7c
Aug 15 06:41:32 p03 kernel: [<ffffffff8105473f>] ? irq_exit+0x78/0xb8
Aug 15 06:41:32 p03 kernel: [<ffffffff81025140>] ? smp_apic_timer_interrupt+0x87/0x95
Aug 15 06:41:32 p03 kernel: [<ffffffff81011733>] ? apic_timer_interrupt+0x13/0x20
Aug 15 06:41:32 p03 kernel: <EOI> [<ffffffff8104f09a>] ? __vprintk+0x472/0x4cc
Aug 15 06:41:32 p03 kernel: [<ffffffff8104f118>] ? vprintk+0x24/0x61
Aug 15 06:41:32 p03 kernel: [<ffffffffa0139c6f>] ? ext3_error+0x6d/0x90 [ext3]
Aug 15 06:41:32 p03 kernel: [<ffffffff811110a1>] ? __find_get_block+0x176/0x186
Aug 15 06:41:32 p03 kernel: [<ffffffff810e7abb>] ? virt_to_head_page+0x9/0x2a
Aug 15 06:41:32 p03 kernel: [<ffffffff81110a0e>] ? submit_bh+0x11c/0x123
Aug 15 06:41:32 p03 kernel: [<ffffffffa0130d34>] ? __ext3_get_inode_loc+0x275/0x29f [ext3]
Aug 15 06:41:32 p03 kernel: [<ffffffff81103fcc>] ? alloc_inode+0x3d/0x74
Aug 15 06:41:32 p03 kernel: [<ffffffffa0130dba>] ? ext3_iget+0x5c/0x3f0 [ext3]
Aug 15 06:41:32 p03 kernel: [<ffffffffa0136b36>] ? ext3_lookup+0x84/0xe4 [ext3]
Aug 15 06:41:32 p03 kernel: [<ffffffff810f9412>] ? do_lookup+0xf1/0x178
Aug 15 06:41:32 p03 kernel: [<ffffffff810f9eab>] ? __link_path_walk+0x689/0x811
Aug 15 06:41:32 p03 kernel: [<ffffffff810fa1bb>] ? path_walk+0x44/0x85
Aug 15 06:41:32 p03 kernel: [<ffffffff810fb4db>] ? do_path_lookup+0x20/0x77
Aug 15 06:41:32 p03 kernel: [<ffffffff810fc81f>] ? user_path_at+0x48/0x79
Aug 15 06:41:32 p03 kernel: [<ffffffff81073d07>] ? do_uncharge_dcache+0x3d/0x51
Aug 15 06:41:32 p03 kernel: [<ffffffff810f4b73>] ? cp_new_stat+0xe9/0xfc
Aug 15 06:41:32 p03 kernel: [<ffffffff810f4d50>] ? vfs_fstatat+0x2c/0x57
Aug 15 06:41:32 p03 kernel: [<ffffffff810f4dd1>] ? sys_newlstat+0x11/0x30
Aug 15 06:41:32 p03 kernel: [<ffffffff810f0753>] ? sys_fchdir+0x69/0x70
Aug 15 06:41:32 p03 kernel: [<ffffffff81010c12>] ? system_call_fastpath+0x16/0x1b

- cut for next post for character limits per post -

hk@ · Aug 15, 2011

Aug 15 06:41:32 p03 kernel:
Aug 15 06:41:32 p03 kernel: Buffer I/O error on device dm-2, logical block 0
Aug 15 06:41:32 p03 kernel: lost page write due to I/O error on dm-2
Aug 15 06:41:32 p03 kernel: EXT3-fs error (device dm-2): ext3_get_inode_loc: unable to read inode block - inode=1990657, block=7963651
Aug 15 06:41:32 p03 kernel: Buffer I/O error on device dm-2, logical block 0
Aug 15 06:41:32 p03 kernel: lost page write due to I/O error on dm-2
Aug 15 06:41:32 p03 kernel: EXT3-fs error (device dm-2): ext3_get_inode_loc: unable to read inode block - inode=893136, block=3571726
Aug 15 06:41:32 p03 kernel: Buffer I/O error on device dm-2, logical block 0
Aug 15 06:41:32 p03 kernel: lost page write due to I/O error on dm-2
Aug 15 06:41:32 p03 kernel: EXT3-fs error (device dm-2): ext3_get_inode_loc: unable to read inode block - inode=893137, block
=3571727
Aug 15 06:41:32 p03 kernel: Buffer I/O error on device dm-2, logical block 0
Aug 15 06:41:32 p03 kernel: lost page write due to I/O error on dm-2
Aug 15 06:41:32 p03 kernel: EXT3-fs error (device dm-2): ext3_get_inode_loc: unable to read inode block - inode=893140, block
=3571727
Aug 15 06:41:32 p03 kernel: Buffer I/O error on device dm-2, logical block 0
Aug 15 06:41:32 p03 kernel: lost page write due to I/O error on dm-2
Aug 15 06:41:32 p03 kernel: EXT3-fs error (device dm-2): ext3_get_inode_loc: unable to read inode block - inode=893144, block=3571727
Aug 15 06:41:32 p03 kernel: Buffer I/O error on device dm-2, logical block 0
Aug 15 06:41:32 p03 kernel: lost page write due to I/O error on dm-2
Aug 15 06:41:32 p03 kernel: EXT3-fs error (device dm-2): ext3_get_inode_loc: unable to read inode block - inode=893186, block=3571730
Aug 15 06:41:32 p03 kernel: Buffer I/O error on device dm-2, logical block 0
Aug 15 06:41:32 p03 kernel: lost page write due to I/O error on dm-2
Aug 15 06:41:32 p03 kernel: EXT3-fs error (device dm-2): ext3_get_inode_loc: unable to read inode block - inode=1181912, block=4718735
Aug 15 06:41:32 p03 kernel: Buffer I/O error on device dm-2, logical block 0
Aug 15 06:41:32 p03 kernel: lost page write due to I/O error on dm-2
Aug 15 06:41:32 p03 kernel: EXT3-fs error (device dm-2): ext3_get_inode_loc: unable to read inode block - inode=893248, block=3571733
Aug 15 06:41:32 p03 kernel: Buffer I/O error on device dm-2, logical block 0
Aug 15 06:41:32 p03 kernel: lost page write due to I/O error on dm-2
Aug 15 06:41:32 p03 kernel: EXT3-fs error (device dm-2): ext3_get_inode_loc: unable to read inode block - inode=893251, block=3571734
Aug 15 06:41:32 p03 kernel: EXT3-fs error (device dm-2): ext3_get_inode_loc: unable to read inode block - inode=893625, block=3571757
Aug 15 06:41:32 p03 kernel: EXT3-fs error (device dm-2): ext3_get_inode_loc: unable to read inode block - inode=1155985, block=4620347
Aug 15 06:41:32 p03 kernel: EXT3-fs error (device dm-2): ext3_get_inode_loc: unable to read inode block - inode=893629, block=3571757
Aug 15 06:41:32 p03 kernel: EXT3-fs error (device dm-2): ext3_get_inode_loc: unable to read inode block - inode=893861, block=3571772

and of course lots of inode errors again.

finally:
Aug 15 06:41:32 p03 kernel: EXT3-fs error (device dm-2): ext3_get_inode_loc: unable to read inode block - inode=2899969, bloc
k=11599874
Aug 15 06:41:47 p03 kernel: ext3_abort called.
Aug 15 06:41:47 p03 kernel: EXT3-fs error (device dm-2): ext3_put_super: Couldn't clean up the journal
Aug 15 06:41:47 p03 kernel: Remounting filesystem read-only
Aug 15 06:41:48 p03 vzdump[16145]: INFO: Finished Backup of VM 10007 (07:56:11)
Aug 15 06:41:48 p03 vzdump[16145]: INFO: Backup job finished successfuly
Aug 15 06:41:49 p03 postfix/postdrop[28253]: warning: uid=0: File too large
Aug 15 06:41:49 p03 postfix/sendmail[28252]: fatal: root(0): message file too big

The last time this happened the machine lost it's HDD IO (and swap of course) and had to be powercycled.

This time the system lost its ip routing capabilities somehow - it was possible to reach via ssh on the local subnet but that's about it - had to reboot the box then.

versions here:
pve-manager: 1.8-18 (pve-manager/1.8/6070)
running kernel: 2.6.32-4-pve
proxmox-ve-2.6.32: 1.8-33
pve-kernel-2.6.32-4-pve: 2.6.32-33
pve-kernel-2.6.18-2-pve: 2.6.18-5
qemu-server: 1.1-30
pve-firmware: 1.0-11
libpve-storage-perl: 1.0-17
vncterm: 0.9-2
vzctl: 3.0.28-1pve1
vzdump: 1.2-14
vzprocps: 2.0.11-2
vzquota: 3.0.11-1
pve-qemu-kvm: 0.14.1-1
ksm-control-daemon: 1.0-6
any fix would be greatly appreciated.

dietmar · Aug 16, 2011

Is there enough free space on the snapshot (try to increase and see if it happens again)?.

chipitsine · Aug 16, 2011

you have both 2.6.32 and 2.6.18 kernels installed, which one is running ?
2.6.18 is beleived to include stable OpenVZ patchset, while 2.6.32 is still testing OpenVZ

hk@ said:
Hi
we had this on one machine now twice, therefore I'm reporting it.

Things go like this: vzdump does it's job, but takes (for whatever reason) forever to finish, so after starting this backup at 22:22 hours, the following (quite expected for the given circuumstances):

Aug 15 06:40:09 p03 kernel: device-mapper: snapshots: Invalidating snapshot: Unable to allocate exception.
Aug 15 06:40:09 p03 kernel: Buffer I/O error on device dm-2, logical block 6357390
Aug 15 06:40:09 p03 kernel: lost page write due to I/O error on dm-2
Aug 15 06:40:09 p03 kernel: Buffer I/O error on device dm-2, logical block 6357391
Aug 15 06:40:09 p03 kernel: lost page write due to I/O error on dm-2
Aug 15 06:40:09 p03 kernel: Buffer I/O error on device dm-2, logical block 6357392
Aug 15 06:40:09 p03 kernel: lost page write due to I/O error on dm-2
Aug 15 06:40:09 p03 kernel: Buffer I/O error on device dm-2, logical block 6357395
Aug 15 06:40:09 p03 kernel: lost page write due to I/O error on dm-2
Aug 15 06:40:10 p03 kernel: Buffer I/O error on device dm-2, logical block 6357396
Aug 15 06:40:10 p03 kernel: lost page write due to I/O error on dm-2
Aug 15 06:40:10 p03 kernel: Buffer I/O error on device dm-2, logical block 6357400
Aug 15 06:40:10 p03 kernel: Buffer I/O error on device dm-2, logical block 6357299
Aug 15 06:40:10 p03 kernel: lost page write due to I/O error on dm-2
Aug 15 06:40:10 p03 kernel: Buffer I/O error on device dm-2, logical block 6357300
Aug 15 06:40:10 p03 kernel: lost page write due to I/O error on dm-2
Aug 15 06:40:10 p03 kernel: Buffer I/O error on device dm-2, logical block 6357307
Aug 15 06:40:10 p03 kernel: lost page write due to I/O error on dm-2
Aug 15 06:40:10 p03 kernel: Buffer I/O error on device dm-2, logical block 6357308
Aug 15 06:40:10 p03 kernel: lost page write due to I/O error on dm-2
Aug 15 06:40:10 p03 kernel: lost page write due to I/O error on dm-2
Aug 15 06:40:10 p03 kernel: EXT3-fs error (device dm-2): ext3_get_inode_loc: unable to read inode block - inode=1982465, block=7929858
Aug 15 06:40:13 p03 kernel: , block=6357050
Aug 15 06:40:13 p03 kernel: EXT3-fs error (device dm-2): ext3_get_inode_loc: unable to read inode block - inode=1590145, block=6357050
Aug 15 06:40:13 p03 kernel: EXT3-fs error (device dm-2): ext3_get_inode_loc: unable to read inode block - inode=1590145, block=6357050

and of course ext3-fs errors come in the hundreds now and then the expected directory erros (too hundreds of those):

Aug 15 06:41:32 p03 kernel: EXT3-fs error (device dm-2): ext3_find_entry: reading directory #1572933 offset 0
Aug 15 06:41:32 p03 kernel: EXT3-fs error (device dm-2): ext3_find_entry: reading directory #1572933 offset 0
Aug 15 06:41:32 p03 kernel: EXT3-fs error (device dm-2): ext3_find_entry: reading directory #1572933 offset 0

but then it gets dirty:

Aug 15 06:41:32 p03 kernel: EXT3-fs error (device dm-2): ext3_find_entry: reading directory #1025011 offset 0
Aug 15 06:41:32 p03 kernel: BUG: soft lockup - CPU#0 stuck for 69s! [updatedb.mlocat:27937]
Aug 15 06:41:32 p03 kernel: Aborting journal on device dm-2.
Aug 15 06:41:32 p03 kernel: __ratelimit: 12776 callbacks suppressed
Aug 15 06:41:32 p03 kernel: Buffer I/O error on device dm-2, logical block 27001346
Aug 15 06:41:32 p03 kernel: lost page write due to I/O error on dm-2
Aug 15 06:41:32 p03 kernel: Modules linked in: kvm_intel kvm nfs lockd fscache nfs_acl auth_rpcgss sunrpc vzethdev vznetdev s
imfs vzrst vzcpt vzdquota vzmon vzdev ip6t_REJECT ip6table_mangle ip6table_filter ip6_tables xt_length xt_hl xt_tcpmss xt_TCP
MSS xt_multiport xt_dscp vzevent ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scs
i_transport_iscsi bridge stp iptable_nat nf_nat ipt_LOG xt_limit xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_connt
rack ipt_REJECT iptable_filter iptable_mangle ip_tables x_tables ipmi_watchdog ipmi_devintf ipmi_si ipmi_msghandler mptctl co
ntainer evdev snd_pcm snd_timer i2c_i801 snd pcspkr serio_raw button processor i3200_edac i2c_core soundcore snd_page_alloc e
dac_core ext3 jbd mbcache dm_mirror dm_region_hash dm_log dm_snapshot mptsas mptscsih ehci_hcd uhci_hcd floppy ata_piix ata_g
eneric libata mptbase usbcore scsi_transport_sas nls_base tg3 libphy thermal fan thermal_sys [last unloaded: scsi_wait_scan]
Aug 15 06:41:32 p03 kernel: CPU 0:
Aug 15 06:41:32 p03 kernel: Modules linked in: kvm_intel kvm nfs lockd fscache nfs_acl auth_rpcgss sunrpc vzethdev vznetdev s
imfs vzrst vzcpt vzdquota vzmon vzdev ip6t_REJECT ip6table_mangle ip6table_filter ip6_tables xt_length xt_hl xt_tcpmss xt_TCP
MSS xt_multiport xt_dscp vzevent ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scs
i_transport_iscsi bridge stp iptable_nat nf_nat ipt_LOG xt_limit xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_connt
rack ipt_REJECT iptable_filter iptable_mangle ip_tables x_tables ipmi_watchdog ipmi_devintf ipmi_si ipmi_msghandler mptctl container evdev snd_pcm snd_timer i2c_i801 snd pcspkr serio_raw button processor i3200_edac i2c_core soundcore snd_page_alloc edac_core ext3 jbd mbcache dm_mirror dm_region_hash dm_log dm_snapshot mptsas mptscsih ehci_hcd uhci_hcd floppy ata_piix ata_generic libata mptbase usbcore scsi_transport_sas nls_base tg3 libphy thermal fan thermal_sys [last unloaded: scsi_wait_scan]
Aug 15 06:41:32 p03 kernel: Pid: 27937, comm: updatedb.mlocat Tainted: G W 2.6.32-4-pve #1 feoktistov IBM System x3250 M2 -[]-
Aug 15 06:41:32 p03 kernel: RIP: 0010:[<ffffffff81315b35>] [<ffffffff81315b35>] _read_unlock_bh+0x3/0xc
Aug 15 06:41:32 p03 kernel: RSP: 0018:ffff88000ac03bf8 EFLAGS: 00000216
Aug 15 06:41:32 p03 kernel: RAX: 0000000000000000 RBX: ffff880100f88300 RCX: 0000000000000004
Aug 15 06:41:32 p03 kernel: RDX: ffff88000ac19a00 RSI: ffff88000ac03cac RDI: ffffffff814d77d4
Aug 15 06:41:32 p03 kernel: RBP: ffffffff81011733 R08: 0000000000000000 R09: ffff8800465d7140
Aug 15 06:41:32 p03 kernel: R10: 0000000000000206 R11: 0000000000000000 R12: ffff88000ac03b70
Aug 15 06:41:32 p03 kernel: R13: 0000000000000004 R14: 0000000000000000 R15: ffffffff8102513b
Aug 15 06:41:32 p03 kernel: FS: 00007f05da5236e0(0000) GS:ffff88000ac00000(0000) knlGS:0000000000000000
Aug 15 06:41:32 p03 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Aug 15 06:41:32 p03 kernel: CR2: 0000000009fbc000 CR3: 00000001731c6000 CR4: 00000000000426f0
Aug 15 06:41:32 p03 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Aug 15 06:41:32 p03 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Aug 15 06:41:32 p03 kernel: Call Trace:
Aug 15 06:41:32 p03 kernel: <IRQ> [<ffffffff8126e121>] ? neigh_lookup+0xbd/0xcc
Aug 15 06:41:32 p03 kernel: [<ffffffff8126f32c>] ? neigh_event_ns+0x41/0x95
Aug 15 06:41:32 p03 kernel: [<ffffffff812ae496>] ? arp_process+0x353/0x60b
Aug 15 06:41:32 p03 kernel: [<ffffffff812607f0>] ? __netdev_alloc_skb+0x29/0x45
Aug 15 06:41:32 p03 kernel: [<ffffffff810165b1>] ? read_tsc+0xa/0x20
Aug 15 06:41:32 p03 kernel: [<ffffffff810ea8ea>] ? __kmalloc_node_track_caller+0x130/0x168
Aug 15 06:41:32 p03 kernel: [<ffffffff812665f3>] ? netif_receive_skb+0x47c/0x4de
Aug 15 06:41:32 p03 kernel: [<ffffffff81266782>] ? napi_skb_finish+0x1c/0x31
Aug 15 06:41:32 p03 kernel: [<ffffffffa00324dd>] ? tg3_poll+0x6cf/0x93d [tg3]
Aug 15 06:41:32 p03 kernel: [<ffffffff81266ccd>] ? net_rx_action+0xae/0x1c9
Aug 15 06:41:32 p03 kernel: [<ffffffff810549b8>] ? __do_softirq+0x127/0x22f
Aug 15 06:41:32 p03 kernel: [<ffffffff81011d6c>] ? call_softirq+0x1c/0x30
Aug 15 06:41:32 p03 kernel: [<ffffffff810132eb>] ? do_softirq+0x3f/0x7c
Aug 15 06:41:32 p03 kernel: [<ffffffff8105473f>] ? irq_exit+0x78/0xb8
Aug 15 06:41:32 p03 kernel: [<ffffffff81025140>] ? smp_apic_timer_interrupt+0x87/0x95
Aug 15 06:41:32 p03 kernel: [<ffffffff81011733>] ? apic_timer_interrupt+0x13/0x20
Aug 15 06:41:32 p03 kernel: <EOI> [<ffffffff8104f09a>] ? __vprintk+0x472/0x4cc
Aug 15 06:41:32 p03 kernel: [<ffffffff8104f118>] ? vprintk+0x24/0x61
Aug 15 06:41:32 p03 kernel: [<ffffffffa0139c6f>] ? ext3_error+0x6d/0x90 [ext3]
Aug 15 06:41:32 p03 kernel: [<ffffffff811110a1>] ? __find_get_block+0x176/0x186
Aug 15 06:41:32 p03 kernel: [<ffffffff810e7abb>] ? virt_to_head_page+0x9/0x2a
Aug 15 06:41:32 p03 kernel: [<ffffffff81110a0e>] ? submit_bh+0x11c/0x123
Aug 15 06:41:32 p03 kernel: [<ffffffffa0130d34>] ? __ext3_get_inode_loc+0x275/0x29f [ext3]
Aug 15 06:41:32 p03 kernel: [<ffffffff81103fcc>] ? alloc_inode+0x3d/0x74
Aug 15 06:41:32 p03 kernel: [<ffffffffa0130dba>] ? ext3_iget+0x5c/0x3f0 [ext3]
Aug 15 06:41:32 p03 kernel: [<ffffffffa0136b36>] ? ext3_lookup+0x84/0xe4 [ext3]
Aug 15 06:41:32 p03 kernel: [<ffffffff810f9412>] ? do_lookup+0xf1/0x178
Aug 15 06:41:32 p03 kernel: [<ffffffff810f9eab>] ? __link_path_walk+0x689/0x811
Aug 15 06:41:32 p03 kernel: [<ffffffff810fa1bb>] ? path_walk+0x44/0x85
Aug 15 06:41:32 p03 kernel: [<ffffffff810fb4db>] ? do_path_lookup+0x20/0x77
Aug 15 06:41:32 p03 kernel: [<ffffffff810fc81f>] ? user_path_at+0x48/0x79
Aug 15 06:41:32 p03 kernel: [<ffffffff81073d07>] ? do_uncharge_dcache+0x3d/0x51
Aug 15 06:41:32 p03 kernel: [<ffffffff810f4b73>] ? cp_new_stat+0xe9/0xfc
Aug 15 06:41:32 p03 kernel: [<ffffffff810f4d50>] ? vfs_fstatat+0x2c/0x57
Aug 15 06:41:32 p03 kernel: [<ffffffff810f4dd1>] ? sys_newlstat+0x11/0x30
Aug 15 06:41:32 p03 kernel: [<ffffffff810f0753>] ? sys_fchdir+0x69/0x70
Aug 15 06:41:32 p03 kernel: [<ffffffff81010c12>] ? system_call_fastpath+0x16/0x1b

- cut for next post for character limits per post -

hk@ · Aug 17, 2011

actually we already maxxed out the 4 GB lv size for this backuprun in the COW - and yes on some backups (like the one in question especially) the COW table gets too big and lvm kicks out the snapshot-volume - this would all be perfectly fine and the backup could go down for this, but it's definitely no reason for the kernel to get shot into some state of agony and loose either its IO-subsystem or its ability to route IPv4...

hk@ · Aug 17, 2011

according to current documentation and promox staff 2.6.32 is stable and supports vz and kvm, so I see no big point here, especially as we had lots of NTP and clock issues with the 2.6.18 proxmox kernel on our IBM boxes here, therefore 2.6.32 is running (as already shown in the original post using pveversion -v)

chipitsine · Aug 17, 2011

hk@ said:
according to current documentation and promox staff 2.6.32 is stable and supports vz and kvm, so I see no big point here, especially as we had lots of NTP and clock issues with the 2.6.18 proxmox kernel on our IBM boxes here, therefore 2.6.32 is running (as already shown in the original post using pveversion -v)

I do not beleive crashing on vzdump should be considered as stable behaviour.
recently we downgraded to 2.6.18, no problems with vzdump since then.

dietmar · Aug 17, 2011

hk@ said:
actually we already maxxed out the 4 GB lv size for this backuprun in the COW - and yes on some backups (like the one in question especially) the COW table gets too big and lvm kicks out the snapshot-volume -

As workaround, your should try to avoid running out of space.

hk@ said:
this would all be perfectly fine and the backup could go down for this, but it's definitely no reason for the kernel to get shot into some state of agony and loose either its IO-subsystem or its ability to route IPv4...

Sure, that is right. I already thought about adding a check to vzdump to test free snapshot space (on the TODO list).

hk@ · Aug 17, 2011

Thanks for the feedback, with promox-2.6.18 I'm stuck with non-working NTP. I guess I would have to try to use the openvz-2.6.18 kernel, but I suppose due to missing KVM support this wouldn't work with promox...

I guess I'm lost again

dietmar · Aug 18, 2011

hk@ said:
I guess I would have to try to use the openvz-2.6.18 kernel

proxmox-2.6.18 kernel is based on openvz-2.6.18 kernel.

chipitsine · Aug 20, 2011

hk@ said:
Thanks for the feedback, with promox-2.6.18 I'm stuck with non-working NTP.

which kind of problems do you have?
we downgraded to 2.6.18, no problem yet.

http://support.ntp.org/bin/view/Support/KnownOsIssues here described several issues with 2.6.18, can you please try those advices?

Search

Search

Backup leads to inaccessible Server (with kernel bug)

hk@

Renowned Member

hk@

Renowned Member

dietmar

Proxmox Staff Member

chipitsine

Guest

hk@

Renowned Member

hk@

Renowned Member

chipitsine

Guest

dietmar

Proxmox Staff Member

hk@

Renowned Member

dietmar

Proxmox Staff Member

chipitsine

Guest