Proxmox on HP DL380 Gen.9 with P440ar Controller

mhoellwerth85

Member
Nov 12, 2020
3
1
8
38
Hi Proxmox Folks!!

I have a really strange problem on this server. i recieved it yesterday and did a fresh installation with ext4. Storage is set up as Raid5 with 8x 10k sas drives.
As soon as i start to install a VM, after around 5 minutes, the host hangs hard and the IO goes up to around 10%. vm is basically doing nothing. after a couple of seconds, i start recieving this error in syslog:


Nov 12 10:44:01 pve1 kernel: INFO: task kvm:27507 blocked for more than 120 seconds.
Nov 12 10:44:01 pve1 kernel: Tainted: P O 4.15.18-12-pve #1
Nov 12 10:44:01 pve1 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Nov 12 10:44:01 pve1 kernel: kvm D 0 27507 1 0x00000000
Nov 12 10:44:01 pve1 kernel: Call Trace:
Nov 12 10:44:01 pve1 kernel: __schedule+0x3e0/0x870
Nov 12 10:44:01 pve1 kernel: ? bit_wait+0x60/0x60
Nov 12 10:44:01 pve1 kernel: schedule+0x36/0x80
Nov 12 10:44:01 pve1 kernel: io_schedule+0x16/0x40
Nov 12 10:44:01 pve1 kernel: bit_wait_io+0x11/0x60
Nov 12 10:44:01 pve1 kernel: __wait_on_bit+0x5a/0x90
Nov 12 10:44:01 pve1 kernel: out_of_line_wait_on_bit+0x8e/0xb0
Nov 12 10:44:01 pve1 kernel: ? bit_waitqueue+0x40/0x40
Nov 12 10:44:01 pve1 kernel: __block_write_begin_int+0x262/0x5b0
Nov 12 10:44:01 pve1 kernel: ? I_BDEV+0x20/0x20
Nov 12 10:44:01 pve1 kernel: ? I_BDEV+0x20/0x20
Nov 12 10:44:01 pve1 kernel: block_write_begin+0x4d/0xe0
Nov 12 10:44:01 pve1 kernel: blkdev_write_begin+0x23/0x30
Nov 12 10:44:01 pve1 kernel: generic_perform_write+0xb9/0x1b0
Nov 12 10:44:01 pve1 kernel: __generic_file_write_iter+0x185/0x1c0
Nov 12 10:44:01 pve1 kernel: ? hrtimer_cancel+0x19/0x20
Nov 12 10:44:01 pve1 kernel: blkdev_write_iter+0xa8/0x130
Nov 12 10:44:01 pve1 kernel: do_iter_readv_writev+0x116/0x180
Nov 12 10:44:01 pve1 kernel: ? __blkdev_get+0x4d0/0x4d0
Nov 12 10:44:01 pve1 kernel: ? do_iter_readv_writev+0x116/0x180
Nov 12 10:44:01 pve1 kernel: do_iter_write+0x87/0x1a0
Nov 12 10:44:01 pve1 kernel: vfs_writev+0x98/0x110
Nov 12 10:44:01 pve1 kernel: ? eventfd_write+0x113/0x260
Nov 12 10:44:01 pve1 kernel: do_pwritev+0xb2/0xd0
Nov 12 10:44:01 pve1 kernel: ? do_pwritev+0xb2/0xd0
Nov 12 10:44:01 pve1 kernel: SyS_pwritev+0x11/0x20
Nov 12 10:44:01 pve1 kernel: do_syscall_64+0x73/0x130
Nov 12 10:44:01 pve1 kernel: entry_SYSCALL_64_after_hwframe+0x3d/0xa2
Nov 12 10:44:01 pve1 kernel: RIP: 0033:0x7f008ac7c193
Nov 12 10:44:01 pve1 kernel: RSP: 002b:00007efc587fc5a0 EFLAGS: 00000293 ORIG_RAX: 0000000000000128
Nov 12 10:44:01 pve1 kernel: RAX: ffffffffffffffda RBX: 00007efc6992a800 RCX: 00007f008ac7c193
Nov 12 10:44:01 pve1 kernel: RDX: 0000000000000003 RSI: 00007efc6940d3a0 RDI: 0000000000000017
Nov 12 10:44:01 pve1 kernel: RBP: 00007efc6992a800 R08: 0000000000000000 R09: 00000000ffffffff
Nov 12 10:44:01 pve1 kernel: R10: 00000000b5717000 R11: 0000000000000293 R12: 0000557bb56f7472
Nov 12 10:44:01 pve1 kernel: R13: 00007f007d0c1d38 R14: 00007f006bc3eef0 R15: 0000000000000003
Nov 12 10:44:01 pve1 kernel: INFO: task kvm:27508 blocked for more than 120 seconds.
Nov 12 10:44:01 pve1 kernel: Tainted: P O 4.15.18-12-pve #1
Nov 12 10:44:01 pve1 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Nov 12 10:44:01 pve1 kernel: kvm D 0 27508 1 0x00000000
Nov 12 10:44:01 pve1 kernel: Call Trace:
Nov 12 10:44:01 pve1 kernel: __schedule+0x3e0/0x870
Nov 12 10:44:01 pve1 kernel: ? bit_wait+0x60/0x60
Nov 12 10:44:01 pve1 kernel: schedule+0x36/0x80
Nov 12 10:44:01 pve1 kernel: io_schedule+0x16/0x40
Nov 12 10:44:01 pve1 kernel: bit_wait_io+0x11/0x60
Nov 12 10:44:01 pve1 kernel: __wait_on_bit+0x5a/0x90
Nov 12 10:44:01 pve1 kernel: out_of_line_wait_on_bit+0x8e/0xb0
Nov 12 10:44:01 pve1 kernel: ? bit_waitqueue+0x40/0x40
Nov 12 10:44:01 pve1 kernel: __block_write_begin_int+0x262/0x5b0
Nov 12 10:44:01 pve1 kernel: ? I_BDEV+0x20/0x20
Nov 12 10:44:01 pve1 kernel: ? I_BDEV+0x20/0x20
Nov 12 10:44:01 pve1 kernel: block_write_begin+0x4d/0xe0
Nov 12 10:44:01 pve1 kernel: blkdev_write_begin+0x23/0x30
Nov 12 10:44:01 pve1 kernel: generic_perform_write+0xb9/0x1b0
Nov 12 10:44:01 pve1 kernel: __generic_file_write_iter+0x185/0x1c0
Nov 12 10:44:01 pve1 kernel: ? hrtimer_cancel+0x19/0x20
Nov 12 10:44:01 pve1 kernel: blkdev_write_iter+0xa8/0x130
Nov 12 10:44:01 pve1 kernel: do_iter_readv_writev+0x116/0x180
Nov 12 10:44:01 pve1 kernel: ? __blkdev_get+0x4d0/0x4d0
Nov 12 10:44:01 pve1 kernel: ? do_iter_readv_writev+0x116/0x180
Nov 12 10:44:01 pve1 kernel: do_iter_write+0x87/0x1a0
Nov 12 10:44:01 pve1 kernel: vfs_writev+0x98/0x110
Nov 12 10:44:01 pve1 kernel: do_pwritev+0xb2/0xd0
Nov 12 10:44:01 pve1 kernel: ? do_pwritev+0xb2/0xd0
Nov 12 10:44:01 pve1 kernel: SyS_pwritev+0x11/0x20
Nov 12 10:44:01 pve1 kernel: do_syscall_64+0x73/0x130
Nov 12 10:44:01 pve1 kernel: entry_SYSCALL_64_after_hwframe+0x3d/0xa2
Nov 12 10:44:01 pve1 kernel: RIP: 0033:0x7f008ac7c193
Nov 12 10:44:01 pve1 kernel: RSP: 002b:00007efc577fc5a0 EFLAGS: 00000293 ORIG_RAX: 0000000000000128
Nov 12 10:44:01 pve1 kernel: RAX: ffffffffffffffda RBX: 00007efc6992a7c0 RCX: 00007f008ac7c193
Nov 12 10:44:01 pve1 kernel: RDX: 0000000000000004 RSI: 00007efc6992a780 RDI: 0000000000000017
Nov 12 10:44:01 pve1 kernel: RBP: 00007efc6992a7c0 R08: 0000000000000000 R09: 00000000ffffffff
Nov 12 10:44:01 pve1 kernel: R10: 00000000b5713000 R11: 0000000000000293 R12: 0000557bb56f7472
Nov 12 10:44:01 pve1 kernel: R13: 00007f007d0c1d38 R14: 00007f006bc3ef60 R15: 0000000000000003

i already tried installing proxmox 5.4, but end up with the same problem. yesterday i thought it was because of the raid background init, but its done now and the problem persists. even after reinstalling pve 6.2 or pve. 5.4. same thing.

if i let the host idle, nothing happens. just as soon as it gets some load.

i'm pretty lost at the moment as i have no clue where else to look for the problem. i have installed like 20 proxmox servers, this is the first time i experience such a problem on such a hardware.

thanks for your help.
 
hm - have not run into this particular issue - but 2 general recommendations (which quite often help):
* try running the latest available version (PVE 6.2) - unless there's a explicit reason why a earlier version is needed (and even then rather use an older kernel than an EOL version of the whole distribution) - chances are the problems were fixed in newer versions
* install the latest available firmware for your system (especially with raid-controllers a firmware upgrade can fix such issues)

the stacktrace looks like the problem is (as you diagnosed correctly) somewhere in the block/disk layer - does the iLO/BIOS/raid-controller firmware indicate any problems?

I hope this helps!
 
thanks for your quick answer!

yes, ilo states that the cache module is faulty. currently talking with the manufacturer about this problem.

i already tried 6.2 and did all updates (also proxmox ones) before i started setting up VMs. so it sounds pretty clear that the controller has to have some kind of issue.

many thanks again for your help!

are you still based in vienna?
 
  • Like
Reactions: Stoiko Ivanov
yes, ilo states that the cache module is faulty. currently talking with the manufacturer about this problem.
could explain the issue - raid5 is not the fastest to begin with - and without the cache I could imagine that the access becomes quite slog (the hung_task message is just an indication that something took long (more than 2 minutes in that case) - this can mean that some deadlock occured inside the kernel, but quite often in practice means: faulty hardware or slow hardware

are you still based in vienna?
yes - still in 1050 ;)

good luck with the hardware replacement!
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!