Nothing works anymore

ccube

Active Member
Apr 5, 2011
49
0
26
Passau, Germany, Germany
Hey,
until yesterday, everything was working fine. But since today i get weird syslogs like this:

Code:
INFO: task kdmflush:965 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kdmflush      D ffff88032ea7bb80     0   965      2 0x00000000
 ffff880331a35a50 0000000000000046 0000000300000000 ffff880331a35fd8
 0000000000015800 ffff880331a35fd8 ffff8803307344d0 0000000000015800
 0000000000015800 0000000000015800 0000000000015800 ffff8803307344d0
Call Trace:
 [<ffffffffa00551d3>] wait_barrier+0xb0/0xf5 [raid1]
 [<ffffffff8104d827>] ? default_wake_function+0x0/0x14
 [<ffffffffa0057639>] make_request+0x17e/0x869 [raid1]
 [<ffffffff8103dc47>] ? check_preempt_curr_idle+0x15/0x17
 [<ffffffff8104d815>] ? try_to_wake_up+0x2a9/0x2bb
 [<ffffffff81042894>] ? update_curr+0xde/0x192
 [<ffffffff813a6893>] md_make_request+0xdf/0x1e6
 [<ffffffff810d7f47>] ? mempool_alloc_slab+0x16/0x18
 [<ffffffff812296fe>] generic_make_request+0x2a4/0x329
 [<ffffffff81140ea1>] ? bio_alloc_bioset+0x4d/0xc5
 [<ffffffff813b13e4>] __map_bio+0xa0/0xfe
 [<ffffffff813b249e>] __split_and_process_bio+0x2a9/0x591
 [<ffffffff8106c397>] ? remove_wait_queue+0x4d/0x52
 [<ffffffff813b1bc1>] ? dm_wait_for_completion+0xe1/0xf2
 [<ffffffff813b28d3>] dm_wq_work+0xef/0x18a
 [<ffffffff810683ab>] worker_thread+0x1a9/0x24d
 [<ffffffff814b4aad>] ? schedule+0x58f/0x5f4
 [<ffffffff813b27e4>] ? dm_wq_work+0x0/0x18a
 [<ffffffff8106c0e8>] ? autoremove_wake_function+0x0/0x3d
 [<ffffffff81068202>] ? worker_thread+0x0/0x24d
 [<ffffffff8106bc00>] kthread+0x82/0x8a
 [<ffffffff8100ab24>] kernel_thread_helper+0x4/0x10
 [<ffffffff8106bb7e>] ? kthread+0x0/0x8a
 [<ffffffff8100ab20>] ? kernel_thread_helper+0x0/0x10
INFO: task kcopyd:970 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kcopyd        D ffff880330e383b0     0   970      2 0x00000000
 ffff88032fcf3b00 0000000000000046 0000000000000000 ffff88032fcf3fd8
 0000000000015800 ffff88032fcf3fd8 ffff880330e38000 0000000000015800
 0000000000015800 0000000000015800 0000000000015800 ffff880330e38000
Call Trace:
 [<ffffffff813b7216>] ? vm_get_page+0x0/0x47
 [<ffffffff814b4b92>] io_schedule+0x80/0xc6
 [<ffffffff813b773a>] dm_io+0x1b3/0x2e1
 [<ffffffff813b7216>] ? vm_get_page+0x0/0x47
 [<ffffffff813b71b2>] ? vm_next_page+0x0/0x21
 [<ffffffff810dace3>] ? free_one_page+0x6d/0x7b
 [<ffffffff813be169>] chunk_io+0x88/0xf5
 [<ffffffff813be169>] ? chunk_io+0x88/0xf5
 [<ffffffff810dbda7>] ? __free_pages+0x24/0x26
 [<ffffffff8110af80>] ? __free_slab+0x118/0x125
 [<ffffffff813bbcf2>] ? copy_callback+0x0/0x41
 [<ffffffff813be288>] area_io+0x26/0x28
 [<ffffffff813be492>] persistent_commit_exception+0xbd/0x11f
 [<ffffffff810d7f2f>] ? mempool_free_slab+0x17/0x19
 [<ffffffff813bbd31>] copy_callback+0x3f/0x41
 [<ffffffff813b81cd>] run_complete_job+0x92/0xc3
 [<ffffffff813b7efa>] process_jobs+0x2f/0xfa
 [<ffffffff813b813b>] ? run_complete_job+0x0/0xc3
 [<ffffffff813b7feb>] do_work+0x26/0x54
 [<ffffffff810683ab>] worker_thread+0x1a9/0x24d
 [<ffffffff814b4aad>] ? schedule+0x58f/0x5f4
 [<ffffffff813b7fc5>] ? do_work+0x0/0x54
 [<ffffffff8106c0e8>] ? autoremove_wake_function+0x0/0x3d
 [<ffffffff81068202>] ? worker_thread+0x0/0x24d
 [<ffffffff8106bc00>] kthread+0x82/0x8a
 [<ffffffff8100ab24>] kernel_thread_helper+0x4/0x10
 [<ffffffff8106bb7e>] ? kthread+0x0/0x8a
 [<ffffffff8100ab20>] ? kernel_thread_helper+0x0/0x10
INFO: task kvm:1976 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kvm           D ffff880330b6df70     0  1976      1 0x00000000
 ffff880235f778f8 0000000000000086 0000000300000000 ffff880235f77fd8
 0000000000015800 ffff880235f77fd8 ffff880330b6dbc0 0000000000015800
 0000000000015800 0000000000015800 0000000000015800 ffff880330b6dbc0
Call Trace:
 [<ffffffffa00551d3>] wait_barrier+0xb0/0xf5 [raid1]
 [<ffffffff8104d827>] ? default_wake_function+0x0/0x14
 [<ffffffffa0057639>] make_request+0x17e/0x869 [raid1]
 [<ffffffff813b13e4>] ? __map_bio+0xa0/0xfe
 [<ffffffff813a6893>] md_make_request+0xdf/0x1e6
 [<ffffffff812296fe>] generic_make_request+0x2a4/0x329
 [<ffffffff81229862>] submit_bio+0xdf/0xfc
 [<ffffffff8102f0e7>] ? default_spin_lock_flags+0x9/0xe
 [<ffffffff8114330f>] dio_bio_submit+0x84/0xa9
 [<ffffffff81143e89>] __blockdev_direct_IO_newtrunc+0x810/0x9ad
 [<ffffffff8114217c>] blkdev_direct_IO+0x57/0x59
 [<ffffffff811410e8>] ? blkdev_get_blocks+0x0/0x8f
 [<ffffffff810d77cf>] generic_file_aio_read+0xe4/0x5dd
 [<ffffffff8102f0e7>] ? default_spin_lock_flags+0x9/0xe
 [<ffffffff8111a5b3>] do_sync_read+0xcc/0x112
 [<ffffffff810637e1>] ? kill_pid_info+0x3f/0x4c
 [<ffffffff811f9e75>] ? security_file_permission+0x16/0x18
 [<ffffffff8111b070>] vfs_read+0xad/0x107
 [<ffffffff8111b12b>] sys_pread64+0x61/0x82
 [<ffffffff81009d32>] system_call_fastpath+0x16/0x1b
INFO: task kvm:1977 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kvm           D ffff8803294e4880     0  1977      1 0x00000000
 ffff880235e978f8 0000000000000086 0000000300000000 ffff880235e97fd8
 0000000000015800 ffff880235e97fd8 ffff8803294e44d0 0000000000015800
 0000000000015800 0000000000015800 0000000000015800 ffff8803294e44d0
Call Trace:
 [<ffffffffa00551d3>] wait_barrier+0xb0/0xf5 [raid1]
 [<ffffffff8104d827>] ? default_wake_function+0x0/0x14
 [<ffffffffa0057639>] make_request+0x17e/0x869 [raid1]
 [<ffffffff813b13e4>] ? __map_bio+0xa0/0xfe
 [<ffffffff81042894>] ? update_curr+0xde/0x192
 [<ffffffff813a6893>] md_make_request+0xdf/0x1e6
 [<ffffffff812296fe>] generic_make_request+0x2a4/0x329
 [<ffffffff81229862>] submit_bio+0xdf/0xfc
 [<ffffffff8102f0e7>] ? default_spin_lock_flags+0x9/0xe
 [<ffffffff8114330f>] dio_bio_submit+0x84/0xa9
 [<ffffffff81143e89>] __blockdev_direct_IO_newtrunc+0x810/0x9ad
 [<ffffffff8114217c>] blkdev_direct_IO+0x57/0x59
 [<ffffffff811410e8>] ? blkdev_get_blocks+0x0/0x8f
 [<ffffffff810d77cf>] generic_file_aio_read+0xe4/0x5dd
 [<ffffffff814b66ce>] ? common_interrupt+0xe/0x13
 [<ffffffff8102f0e7>] ? default_spin_lock_flags+0x9/0xe
 [<ffffffff8111a5b3>] do_sync_read+0xcc/0x112
 [<ffffffff8106379a>] ? group_send_sig_info+0x39/0x41
 [<ffffffff810637e1>] ? kill_pid_info+0x3f/0x4c
 [<ffffffff811f9e75>] ? security_file_permission+0x16/0x18
 [<ffffffff8111b070>] vfs_read+0xad/0x107
 [<ffffffff8111b12b>] sys_pread64+0x61/0x82
 [<ffffffff81009d32>] system_call_fastpath+0x16/0x1b
INFO: task kvm:1978 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kvm           D ffff8803296ac880     0  1978      1 0x00000000
 ffff880235e858f8 0000000000000086 0000000300000000 ffff880235e85fd8
 0000000000015800 ffff880235e85fd8 ffff8803296ac4d0 0000000000015800
 0000000000015800 0000000000015800 0000000000015800 ffff8803296ac4d0
Call Trace:
 [<ffffffffa00551d3>] wait_barrier+0xb0/0xf5 [raid1]
 [<ffffffff8104d827>] ? default_wake_function+0x0/0x14
 [<ffffffffa0057639>] make_request+0x17e/0x869 [raid1]
 [<ffffffff813b13e4>] ? __map_bio+0xa0/0xfe
 [<ffffffff814b624a>] ? _raw_spin_lock_irq+0x15/0x19
 [<ffffffff813a6893>] md_make_request+0xdf/0x1e6
 [<ffffffff812296fe>] generic_make_request+0x2a4/0x329
 [<ffffffff81229862>] submit_bio+0xdf/0xfc
 [<ffffffff8102f0e7>] ? default_spin_lock_flags+0x9/0xe
 [<ffffffff8114330f>] dio_bio_submit+0x84/0xa9
 [<ffffffff81143e89>] __blockdev_direct_IO_newtrunc+0x810/0x9ad
 [<ffffffff8114217c>] blkdev_direct_IO+0x57/0x59
 [<ffffffff811410e8>] ? blkdev_get_blocks+0x0/0x8f
 [<ffffffff810d77cf>] generic_file_aio_read+0xe4/0x5dd
 [<ffffffff8102f0e7>] ? default_spin_lock_flags+0x9/0xe
 [<ffffffff8111a5b3>] do_sync_read+0xcc/0x112
 [<ffffffff810637e1>] ? kill_pid_info+0x3f/0x4c
 [<ffffffff811f9e75>] ? security_file_permission+0x16/0x18
 [<ffffffff8111b070>] vfs_read+0xad/0x107
 [<ffffffff8111b12b>] sys_pread64+0x61/0x82
 [<ffffffff81009d32>] system_call_fastpath+0x16/0x1b

All virtual machines are hanging, nothing is working. I cannot stop them. I cannot start other machines.
I have to reset my server.
After rebooting everything seemes to be fine for some minutes, but then it crashes again.
What do you need to help out?

regards
 
pls post logs in this forum, no links to third party servers please.
 
post the output of:

- pveversion -v
- df -h
 
Code:
root@kvm:/home/le# pveversion -v
pve-manager: 1.9-26 (pve-manager/1.9/6567)
running kernel: 2.6.35-2-pve
proxmox-ve-2.6.35: 1.8-13
pve-kernel-2.6.35-2-pve: 2.6.35-13
qemu-server: 1.1-32
pve-firmware: 1.0-14
libpve-storage-perl: 1.0-19
vncterm: 0.9-2
vzctl: 3.0.29-3pve1
vzdump: 1.2.6-1
vzprocps: 2.0.11-2
vzquota: 3.0.12-3
pve-qemu-kvm: 0.15.0-2
ksm-control-daemon: 1.0-6
Code:
root@kvm:/home/le# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/vg0-root   50G  1.2G   46G   3% /
tmpfs                 7.8G     0  7.8G   0% /lib/init/rw
udev                  7.8G  276K  7.8G   1% /dev
tmpfs                 7.8G     0  7.8G   0% /dev/shm
/dev/sda1            1016M   42M  924M   5% /boot
/dev/mapper/vg0-backup
                      2.7T  1.6T  959G  63% /backup
 
Switched to 2.6.32 kernel now.


With vms running:
Code:
root@kvm:/home/le# pveperf 
CPU BOGOMIPS:      52799.20
REGEX/SECOND:      1483964
HD SIZE:           49.61 GB (/dev/mapper/vg0-root)
BUFFERED READS:    148.45 MB/sec
AVERAGE SEEK TIME: 7.56 ms
FSYNCS/SECOND:     57.31
DNS EXT:           60.21 ms
DNS INT:           12.75 ms

Code:
root@kvm:/home/le# lspci 
00:00.0 Host bridge: Intel Corporation Sandy Bridge DRAM Controller (rev 09)
00:01.0 PCI bridge: Intel Corporation Sandy Bridge PCI Express Root Port (rev 09)
00:02.0 VGA compatible controller: Intel Corporation Sandy Bridge Integrated Graphics Controller (rev 09)
00:06.0 PCI bridge: Intel Corporation Sandy Bridge PCI Express Root Port (rev 09)
00:16.0 Communication controller: Intel Corporation Cougar Point HECI Controller #1 (rev 04)
00:1a.0 USB Controller: Intel Corporation Cougar Point USB Enhanced Host Controller #2 (rev 05)
00:1c.0 PCI bridge: Intel Corporation Cougar Point PCI Express Root Port 1 (rev b5)
00:1c.4 PCI bridge: Intel Corporation Cougar Point PCI Express Root Port 5 (rev b5)
00:1c.5 PCI bridge: Intel Corporation Cougar Point PCI Express Root Port 6 (rev b5)
00:1c.7 PCI bridge: Intel Corporation Cougar Point PCI Express Root Port 8 (rev b5)
00:1d.0 USB Controller: Intel Corporation Cougar Point USB Enhanced Host Controller #1 (rev 05)
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev a5)
00:1f.0 ISA bridge: Intel Corporation Cougar Point LPC Controller (rev 05)
00:1f.2 SATA controller: Intel Corporation Cougar Point 6 port SATA AHCI Controller (rev 05)
00:1f.3 SMBus: Intel Corporation Cougar Point SMBus Controller (rev 05)
05:00.0 Ethernet controller: Intel Corporation 82574L Gigabit Network Connection
06:00.0 USB Controller: Device 1b21:1042
 
Last edited:
Code:
root@kvm:/home/le# hdparm -I /dev/sda

/dev/sda:

ATA device, with non-removable media
    Model Number:       ST33000650NS                            
    Serial Number:      Z290GE0S
    Firmware Revision:  0002    
    Transport:          Serial, SATA Rev 3.0
Standards:
    Used: unknown (minor revision code 0x0029) 
    Supported: 8 7 6 5 
    Likely used: 8
Configuration:
    Logical        max    current
    cylinders    16383    16383
    heads        16    16
    sectors/track    63    63
    --
    CHS current addressable sectors:   16514064
    LBA    user addressable sectors:  268435455
    LBA48  user addressable sectors: 5860533168
    Logical/Physical Sector size:           512 bytes
    device size with M = 1024*1024:     2861588 MBytes
    device size with M = 1000*1000:     3000592 MBytes (3000 GB)
    cache/buffer size  = unknown
    Form Factor: 3.5 inch
    Nominal Media Rotation Rate: 7200
Capabilities:
    LBA, IORDY(can be disabled)
    Queue depth: 32
    Standby timer values: spec'd by Standard, no device specific minimum
    R/W multiple sector transfer: Max = 16    Current = 16
    Recommended acoustic management value: 254, current value: 0
    DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 udma5 *udma6 
         Cycle time: min=120ns recommended=120ns
    PIO: pio0 pio1 pio2 pio3 pio4 
         Cycle time: no flow control=120ns  IORDY flow control=120ns
Commands/features:
    Enabled    Supported:
       *    SMART feature set
            Security Mode feature set
       *    Power Management feature set
       *    Write cache
       *    Look-ahead
       *    Host Protected Area feature set
       *    WRITE_BUFFER command
       *    READ_BUFFER command
       *    DOWNLOAD_MICROCODE
            SET_MAX security extension
       *    48-bit Address feature set
       *    Device Configuration Overlay feature set
       *    Mandatory FLUSH_CACHE
       *    FLUSH_CACHE_EXT
       *    SMART error logging
       *    SMART self-test
       *    General Purpose Logging feature set
       *    WRITE_{DMA|MULTIPLE}_FUA_EXT
       *    64-bit World wide name
            Write-Read-Verify feature set
       *    WRITE_UNCORRECTABLE_EXT command
       *    {READ,WRITE}_DMA_EXT_GPL commands
       *    Segmented DOWNLOAD_MICROCODE
            unknown 119[7]
       *    Gen1 signaling speed (1.5Gb/s)
       *    Gen2 signaling speed (3.0Gb/s)
       *    Gen3 signaling speed (6.0Gb/s)
       *    Native Command Queueing (NCQ)
       *    Phy event counters
       *    unknown 76[15]
            Device-initiated interface power management
       *    Software settings preservation
       *    SMART Command Transport (SCT) feature set
       *    SCT LBA Segment Access (AC2)
       *    SCT Error Recovery Control (AC3)
       *    SCT Features Control (AC4)
       *    SCT Data Tables (AC5)
            unknown 206[7]
            unknown 206[12] (vendor specific)
Security: 
    Master password revision code = 65534
        supported
    not    enabled
    not    locked
        frozen
    not    expired: security count
        supported: enhanced erase
    428min for SECURITY ERASE UNIT. 428min for ENHANCED SECURITY ERASE UNIT.
Logical Unit WWN Device Identifier: 5000c5003582966f
    NAA        : 5
    IEEE OUI    : 000c50
    Unique ID    : 03582966f
Checksum: correct
 
FSYNCS/SECOND: 57.31

you should have at least 1000 fsyncs/second. as far as I see this is a single disk? do you use ext3?

post the output of

Code:
mount
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!