Hi Proxmox Folks!!
I have a really strange problem on this server. i recieved it yesterday and did a fresh installation with ext4. Storage is set up as Raid5 with 8x 10k sas drives.
As soon as i start to install a VM, after around 5 minutes, the host hangs hard and the IO goes up to around 10%. vm is basically doing nothing. after a couple of seconds, i start recieving this error in syslog:
Nov 12 10:44:01 pve1 kernel: INFO: task kvm:27507 blocked for more than 120 seconds.
Nov 12 10:44:01 pve1 kernel: Tainted: P O 4.15.18-12-pve #1
Nov 12 10:44:01 pve1 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Nov 12 10:44:01 pve1 kernel: kvm D 0 27507 1 0x00000000
Nov 12 10:44:01 pve1 kernel: Call Trace:
Nov 12 10:44:01 pve1 kernel: __schedule+0x3e0/0x870
Nov 12 10:44:01 pve1 kernel: ? bit_wait+0x60/0x60
Nov 12 10:44:01 pve1 kernel: schedule+0x36/0x80
Nov 12 10:44:01 pve1 kernel: io_schedule+0x16/0x40
Nov 12 10:44:01 pve1 kernel: bit_wait_io+0x11/0x60
Nov 12 10:44:01 pve1 kernel: __wait_on_bit+0x5a/0x90
Nov 12 10:44:01 pve1 kernel: out_of_line_wait_on_bit+0x8e/0xb0
Nov 12 10:44:01 pve1 kernel: ? bit_waitqueue+0x40/0x40
Nov 12 10:44:01 pve1 kernel: __block_write_begin_int+0x262/0x5b0
Nov 12 10:44:01 pve1 kernel: ? I_BDEV+0x20/0x20
Nov 12 10:44:01 pve1 kernel: ? I_BDEV+0x20/0x20
Nov 12 10:44:01 pve1 kernel: block_write_begin+0x4d/0xe0
Nov 12 10:44:01 pve1 kernel: blkdev_write_begin+0x23/0x30
Nov 12 10:44:01 pve1 kernel: generic_perform_write+0xb9/0x1b0
Nov 12 10:44:01 pve1 kernel: __generic_file_write_iter+0x185/0x1c0
Nov 12 10:44:01 pve1 kernel: ? hrtimer_cancel+0x19/0x20
Nov 12 10:44:01 pve1 kernel: blkdev_write_iter+0xa8/0x130
Nov 12 10:44:01 pve1 kernel: do_iter_readv_writev+0x116/0x180
Nov 12 10:44:01 pve1 kernel: ? __blkdev_get+0x4d0/0x4d0
Nov 12 10:44:01 pve1 kernel: ? do_iter_readv_writev+0x116/0x180
Nov 12 10:44:01 pve1 kernel: do_iter_write+0x87/0x1a0
Nov 12 10:44:01 pve1 kernel: vfs_writev+0x98/0x110
Nov 12 10:44:01 pve1 kernel: ? eventfd_write+0x113/0x260
Nov 12 10:44:01 pve1 kernel: do_pwritev+0xb2/0xd0
Nov 12 10:44:01 pve1 kernel: ? do_pwritev+0xb2/0xd0
Nov 12 10:44:01 pve1 kernel: SyS_pwritev+0x11/0x20
Nov 12 10:44:01 pve1 kernel: do_syscall_64+0x73/0x130
Nov 12 10:44:01 pve1 kernel: entry_SYSCALL_64_after_hwframe+0x3d/0xa2
Nov 12 10:44:01 pve1 kernel: RIP: 0033:0x7f008ac7c193
Nov 12 10:44:01 pve1 kernel: RSP: 002b:00007efc587fc5a0 EFLAGS: 00000293 ORIG_RAX: 0000000000000128
Nov 12 10:44:01 pve1 kernel: RAX: ffffffffffffffda RBX: 00007efc6992a800 RCX: 00007f008ac7c193
Nov 12 10:44:01 pve1 kernel: RDX: 0000000000000003 RSI: 00007efc6940d3a0 RDI: 0000000000000017
Nov 12 10:44:01 pve1 kernel: RBP: 00007efc6992a800 R08: 0000000000000000 R09: 00000000ffffffff
Nov 12 10:44:01 pve1 kernel: R10: 00000000b5717000 R11: 0000000000000293 R12: 0000557bb56f7472
Nov 12 10:44:01 pve1 kernel: R13: 00007f007d0c1d38 R14: 00007f006bc3eef0 R15: 0000000000000003
Nov 12 10:44:01 pve1 kernel: INFO: task kvm:27508 blocked for more than 120 seconds.
Nov 12 10:44:01 pve1 kernel: Tainted: P O 4.15.18-12-pve #1
Nov 12 10:44:01 pve1 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Nov 12 10:44:01 pve1 kernel: kvm D 0 27508 1 0x00000000
Nov 12 10:44:01 pve1 kernel: Call Trace:
Nov 12 10:44:01 pve1 kernel: __schedule+0x3e0/0x870
Nov 12 10:44:01 pve1 kernel: ? bit_wait+0x60/0x60
Nov 12 10:44:01 pve1 kernel: schedule+0x36/0x80
Nov 12 10:44:01 pve1 kernel: io_schedule+0x16/0x40
Nov 12 10:44:01 pve1 kernel: bit_wait_io+0x11/0x60
Nov 12 10:44:01 pve1 kernel: __wait_on_bit+0x5a/0x90
Nov 12 10:44:01 pve1 kernel: out_of_line_wait_on_bit+0x8e/0xb0
Nov 12 10:44:01 pve1 kernel: ? bit_waitqueue+0x40/0x40
Nov 12 10:44:01 pve1 kernel: __block_write_begin_int+0x262/0x5b0
Nov 12 10:44:01 pve1 kernel: ? I_BDEV+0x20/0x20
Nov 12 10:44:01 pve1 kernel: ? I_BDEV+0x20/0x20
Nov 12 10:44:01 pve1 kernel: block_write_begin+0x4d/0xe0
Nov 12 10:44:01 pve1 kernel: blkdev_write_begin+0x23/0x30
Nov 12 10:44:01 pve1 kernel: generic_perform_write+0xb9/0x1b0
Nov 12 10:44:01 pve1 kernel: __generic_file_write_iter+0x185/0x1c0
Nov 12 10:44:01 pve1 kernel: ? hrtimer_cancel+0x19/0x20
Nov 12 10:44:01 pve1 kernel: blkdev_write_iter+0xa8/0x130
Nov 12 10:44:01 pve1 kernel: do_iter_readv_writev+0x116/0x180
Nov 12 10:44:01 pve1 kernel: ? __blkdev_get+0x4d0/0x4d0
Nov 12 10:44:01 pve1 kernel: ? do_iter_readv_writev+0x116/0x180
Nov 12 10:44:01 pve1 kernel: do_iter_write+0x87/0x1a0
Nov 12 10:44:01 pve1 kernel: vfs_writev+0x98/0x110
Nov 12 10:44:01 pve1 kernel: do_pwritev+0xb2/0xd0
Nov 12 10:44:01 pve1 kernel: ? do_pwritev+0xb2/0xd0
Nov 12 10:44:01 pve1 kernel: SyS_pwritev+0x11/0x20
Nov 12 10:44:01 pve1 kernel: do_syscall_64+0x73/0x130
Nov 12 10:44:01 pve1 kernel: entry_SYSCALL_64_after_hwframe+0x3d/0xa2
Nov 12 10:44:01 pve1 kernel: RIP: 0033:0x7f008ac7c193
Nov 12 10:44:01 pve1 kernel: RSP: 002b:00007efc577fc5a0 EFLAGS: 00000293 ORIG_RAX: 0000000000000128
Nov 12 10:44:01 pve1 kernel: RAX: ffffffffffffffda RBX: 00007efc6992a7c0 RCX: 00007f008ac7c193
Nov 12 10:44:01 pve1 kernel: RDX: 0000000000000004 RSI: 00007efc6992a780 RDI: 0000000000000017
Nov 12 10:44:01 pve1 kernel: RBP: 00007efc6992a7c0 R08: 0000000000000000 R09: 00000000ffffffff
Nov 12 10:44:01 pve1 kernel: R10: 00000000b5713000 R11: 0000000000000293 R12: 0000557bb56f7472
Nov 12 10:44:01 pve1 kernel: R13: 00007f007d0c1d38 R14: 00007f006bc3ef60 R15: 0000000000000003
i already tried installing proxmox 5.4, but end up with the same problem. yesterday i thought it was because of the raid background init, but its done now and the problem persists. even after reinstalling pve 6.2 or pve. 5.4. same thing.
if i let the host idle, nothing happens. just as soon as it gets some load.
i'm pretty lost at the moment as i have no clue where else to look for the problem. i have installed like 20 proxmox servers, this is the first time i experience such a problem on such a hardware.
thanks for your help.
I have a really strange problem on this server. i recieved it yesterday and did a fresh installation with ext4. Storage is set up as Raid5 with 8x 10k sas drives.
As soon as i start to install a VM, after around 5 minutes, the host hangs hard and the IO goes up to around 10%. vm is basically doing nothing. after a couple of seconds, i start recieving this error in syslog:
Nov 12 10:44:01 pve1 kernel: INFO: task kvm:27507 blocked for more than 120 seconds.
Nov 12 10:44:01 pve1 kernel: Tainted: P O 4.15.18-12-pve #1
Nov 12 10:44:01 pve1 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Nov 12 10:44:01 pve1 kernel: kvm D 0 27507 1 0x00000000
Nov 12 10:44:01 pve1 kernel: Call Trace:
Nov 12 10:44:01 pve1 kernel: __schedule+0x3e0/0x870
Nov 12 10:44:01 pve1 kernel: ? bit_wait+0x60/0x60
Nov 12 10:44:01 pve1 kernel: schedule+0x36/0x80
Nov 12 10:44:01 pve1 kernel: io_schedule+0x16/0x40
Nov 12 10:44:01 pve1 kernel: bit_wait_io+0x11/0x60
Nov 12 10:44:01 pve1 kernel: __wait_on_bit+0x5a/0x90
Nov 12 10:44:01 pve1 kernel: out_of_line_wait_on_bit+0x8e/0xb0
Nov 12 10:44:01 pve1 kernel: ? bit_waitqueue+0x40/0x40
Nov 12 10:44:01 pve1 kernel: __block_write_begin_int+0x262/0x5b0
Nov 12 10:44:01 pve1 kernel: ? I_BDEV+0x20/0x20
Nov 12 10:44:01 pve1 kernel: ? I_BDEV+0x20/0x20
Nov 12 10:44:01 pve1 kernel: block_write_begin+0x4d/0xe0
Nov 12 10:44:01 pve1 kernel: blkdev_write_begin+0x23/0x30
Nov 12 10:44:01 pve1 kernel: generic_perform_write+0xb9/0x1b0
Nov 12 10:44:01 pve1 kernel: __generic_file_write_iter+0x185/0x1c0
Nov 12 10:44:01 pve1 kernel: ? hrtimer_cancel+0x19/0x20
Nov 12 10:44:01 pve1 kernel: blkdev_write_iter+0xa8/0x130
Nov 12 10:44:01 pve1 kernel: do_iter_readv_writev+0x116/0x180
Nov 12 10:44:01 pve1 kernel: ? __blkdev_get+0x4d0/0x4d0
Nov 12 10:44:01 pve1 kernel: ? do_iter_readv_writev+0x116/0x180
Nov 12 10:44:01 pve1 kernel: do_iter_write+0x87/0x1a0
Nov 12 10:44:01 pve1 kernel: vfs_writev+0x98/0x110
Nov 12 10:44:01 pve1 kernel: ? eventfd_write+0x113/0x260
Nov 12 10:44:01 pve1 kernel: do_pwritev+0xb2/0xd0
Nov 12 10:44:01 pve1 kernel: ? do_pwritev+0xb2/0xd0
Nov 12 10:44:01 pve1 kernel: SyS_pwritev+0x11/0x20
Nov 12 10:44:01 pve1 kernel: do_syscall_64+0x73/0x130
Nov 12 10:44:01 pve1 kernel: entry_SYSCALL_64_after_hwframe+0x3d/0xa2
Nov 12 10:44:01 pve1 kernel: RIP: 0033:0x7f008ac7c193
Nov 12 10:44:01 pve1 kernel: RSP: 002b:00007efc587fc5a0 EFLAGS: 00000293 ORIG_RAX: 0000000000000128
Nov 12 10:44:01 pve1 kernel: RAX: ffffffffffffffda RBX: 00007efc6992a800 RCX: 00007f008ac7c193
Nov 12 10:44:01 pve1 kernel: RDX: 0000000000000003 RSI: 00007efc6940d3a0 RDI: 0000000000000017
Nov 12 10:44:01 pve1 kernel: RBP: 00007efc6992a800 R08: 0000000000000000 R09: 00000000ffffffff
Nov 12 10:44:01 pve1 kernel: R10: 00000000b5717000 R11: 0000000000000293 R12: 0000557bb56f7472
Nov 12 10:44:01 pve1 kernel: R13: 00007f007d0c1d38 R14: 00007f006bc3eef0 R15: 0000000000000003
Nov 12 10:44:01 pve1 kernel: INFO: task kvm:27508 blocked for more than 120 seconds.
Nov 12 10:44:01 pve1 kernel: Tainted: P O 4.15.18-12-pve #1
Nov 12 10:44:01 pve1 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Nov 12 10:44:01 pve1 kernel: kvm D 0 27508 1 0x00000000
Nov 12 10:44:01 pve1 kernel: Call Trace:
Nov 12 10:44:01 pve1 kernel: __schedule+0x3e0/0x870
Nov 12 10:44:01 pve1 kernel: ? bit_wait+0x60/0x60
Nov 12 10:44:01 pve1 kernel: schedule+0x36/0x80
Nov 12 10:44:01 pve1 kernel: io_schedule+0x16/0x40
Nov 12 10:44:01 pve1 kernel: bit_wait_io+0x11/0x60
Nov 12 10:44:01 pve1 kernel: __wait_on_bit+0x5a/0x90
Nov 12 10:44:01 pve1 kernel: out_of_line_wait_on_bit+0x8e/0xb0
Nov 12 10:44:01 pve1 kernel: ? bit_waitqueue+0x40/0x40
Nov 12 10:44:01 pve1 kernel: __block_write_begin_int+0x262/0x5b0
Nov 12 10:44:01 pve1 kernel: ? I_BDEV+0x20/0x20
Nov 12 10:44:01 pve1 kernel: ? I_BDEV+0x20/0x20
Nov 12 10:44:01 pve1 kernel: block_write_begin+0x4d/0xe0
Nov 12 10:44:01 pve1 kernel: blkdev_write_begin+0x23/0x30
Nov 12 10:44:01 pve1 kernel: generic_perform_write+0xb9/0x1b0
Nov 12 10:44:01 pve1 kernel: __generic_file_write_iter+0x185/0x1c0
Nov 12 10:44:01 pve1 kernel: ? hrtimer_cancel+0x19/0x20
Nov 12 10:44:01 pve1 kernel: blkdev_write_iter+0xa8/0x130
Nov 12 10:44:01 pve1 kernel: do_iter_readv_writev+0x116/0x180
Nov 12 10:44:01 pve1 kernel: ? __blkdev_get+0x4d0/0x4d0
Nov 12 10:44:01 pve1 kernel: ? do_iter_readv_writev+0x116/0x180
Nov 12 10:44:01 pve1 kernel: do_iter_write+0x87/0x1a0
Nov 12 10:44:01 pve1 kernel: vfs_writev+0x98/0x110
Nov 12 10:44:01 pve1 kernel: do_pwritev+0xb2/0xd0
Nov 12 10:44:01 pve1 kernel: ? do_pwritev+0xb2/0xd0
Nov 12 10:44:01 pve1 kernel: SyS_pwritev+0x11/0x20
Nov 12 10:44:01 pve1 kernel: do_syscall_64+0x73/0x130
Nov 12 10:44:01 pve1 kernel: entry_SYSCALL_64_after_hwframe+0x3d/0xa2
Nov 12 10:44:01 pve1 kernel: RIP: 0033:0x7f008ac7c193
Nov 12 10:44:01 pve1 kernel: RSP: 002b:00007efc577fc5a0 EFLAGS: 00000293 ORIG_RAX: 0000000000000128
Nov 12 10:44:01 pve1 kernel: RAX: ffffffffffffffda RBX: 00007efc6992a7c0 RCX: 00007f008ac7c193
Nov 12 10:44:01 pve1 kernel: RDX: 0000000000000004 RSI: 00007efc6992a780 RDI: 0000000000000017
Nov 12 10:44:01 pve1 kernel: RBP: 00007efc6992a7c0 R08: 0000000000000000 R09: 00000000ffffffff
Nov 12 10:44:01 pve1 kernel: R10: 00000000b5713000 R11: 0000000000000293 R12: 0000557bb56f7472
Nov 12 10:44:01 pve1 kernel: R13: 00007f007d0c1d38 R14: 00007f006bc3ef60 R15: 0000000000000003
i already tried installing proxmox 5.4, but end up with the same problem. yesterday i thought it was because of the raid background init, but its done now and the problem persists. even after reinstalling pve 6.2 or pve. 5.4. same thing.
if i let the host idle, nothing happens. just as soon as it gets some load.
i'm pretty lost at the moment as i have no clue where else to look for the problem. i have installed like 20 proxmox servers, this is the first time i experience such a problem on such a hardware.
thanks for your help.