Greeting,
I've been experiencing a strange issue that randomly crashes the HW raid I/O for a while,
The issue could be temporary fix by simply reboot the server,
I've tried the stress test to the Machine included the Virtual Drive on MegaRAID SAS 9270-4i, but I can't recreate the issue.
Machine specs:
CPU: AMD 4750G PRO
MB: Asus TUF B450 Plus-ii
Ram: Transcend 16GB ECC-DIMM * 2
PCIE: EVGA RTX 2060
PCIE: LSI MegaRaid 9270-4i
OS: pve-manager/8.1.4/ec5affc9e41f1d79 (running kernel: 6.5.13-1-pve)
PVE Guest: Windows10, TureNas, UbuntuServer.
Additional : GPU Passthrough to Windows10 follow the Topic : PCI/GPU Passthrough on Proxmox VE 8 : Installation and configuration
Symptoms: LSI MegaRaid 9270-4i randomly disconnect / crash.
I've try trying to fix the issues with no luck:
1. Upgrade MegaRaid Lsi 9271-4I Bios/Firmware from 5.48.04.0 to 5.50.03.0, Issue still presents.
2.Install the package from https://hwraid.le-vert.net/wiki/DebianPackages
3. And I just changed the /etc/default/grub, Not sure if this will solve the problem yet.
from :
To :
Below is the ErrorLog:
I've been experiencing a strange issue that randomly crashes the HW raid I/O for a while,
The issue could be temporary fix by simply reboot the server,
I've tried the stress test to the Machine included the Virtual Drive on MegaRAID SAS 9270-4i, but I can't recreate the issue.
Machine specs:
CPU: AMD 4750G PRO
MB: Asus TUF B450 Plus-ii
Ram: Transcend 16GB ECC-DIMM * 2
PCIE: EVGA RTX 2060
PCIE: LSI MegaRaid 9270-4i
OS: pve-manager/8.1.4/ec5affc9e41f1d79 (running kernel: 6.5.13-1-pve)
PVE Guest: Windows10, TureNas, UbuntuServer.
Additional : GPU Passthrough to Windows10 follow the Topic : PCI/GPU Passthrough on Proxmox VE 8 : Installation and configuration
Symptoms: LSI MegaRaid 9270-4i randomly disconnect / crash.
I've try trying to fix the issues with no luck:
1. Upgrade MegaRaid Lsi 9271-4I Bios/Firmware from 5.48.04.0 to 5.50.03.0, Issue still presents.
2.Install the package from https://hwraid.le-vert.net/wiki/DebianPackages
3. And I just changed the /etc/default/grub, Not sure if this will solve the problem yet.
from :
GRUB_CMDLINE_LINUX_DEFAULT="quiet iommu=pt"
To :
GRUB_CMDLINE_LINUX_DEFAULT="quiet pcie_aspm=off iommu=pt
Below is the ErrorLog:
Code:
Mar 15 09:28:54 proxmox kernel: megaraid_sas 0000:06:00.0: megasas_disable_intr_fusion is called outbound_intr_mask:0xffffffff
Mar 15 09:28:54 proxmox kernel: megaraid_sas 0000:06:00.0: FW in FAULT state Fault code:0xfff0000 subcode:0xff00 func:megasas_wait_for_outstanding_fusion
Mar 15 09:28:54 proxmox kernel: megaraid_sas 0000:06:00.0: resetting fusion adapter scsi0.
Mar 15 09:28:54 proxmox kernel: megaraid_sas 0000:06:00.0: Outstanding fastpath IOs: 19
Mar 15 09:30:46 proxmox kernel: megaraid_sas 0000:06:00.0: Diag reset adapter never cleared megasas_adp_reset_fusion 4097
Mar 15 09:30:49 proxmox pvestatd[1194]: VM 101 qmp command failed - VM 101 qmp command 'query-proxmox-support' failed - got timeout
Mar 15 09:30:49 proxmox pvestatd[1194]: status update time (8.236 seconds)
Mar 15 09:30:59 proxmox pvestatd[1194]: VM 101 qmp command failed - VM 101 qmp command 'query-proxmox-support' failed - unable to connect to VM 101 qmp socket - timeout after 51 retries
Mar 15 09:30:59 proxmox pvestatd[1194]: status update time (8.221 seconds)
Mar 15 09:31:09 proxmox pvestatd[1194]: VM 101 qmp command failed - VM 101 qmp command 'query-proxmox-support' failed - unable to connect to VM 101 qmp socket - timeout after 51 retries
Mar 15 09:31:09 proxmox pvestatd[1194]: status update time (8.248 seconds)
Mar 15 09:31:19 proxmox pvestatd[1194]: VM 101 qmp command failed - VM 101 qmp command 'query-proxmox-support' failed - unable to connect to VM 101 qmp socket - timeout after 51 retries
Mar 15 09:31:20 proxmox pvestatd[1194]: status update time (8.252 seconds)
Mar 15 09:31:29 proxmox pvestatd[1194]: VM 101 qmp command failed - VM 101 qmp command 'query-proxmox-support' failed - unable to connect to VM 101 qmp socket - timeout after 51 retries
Mar 15 09:31:29 proxmox pvestatd[1194]: status update time (8.239 seconds)
Mar 15 09:31:39 proxmox pvestatd[1194]: VM 101 qmp command failed - VM 101 qmp command 'query-proxmox-support' failed - unable to connect to VM 101 qmp socket - timeout after 51 retries
Mar 15 09:31:39 proxmox pvestatd[1194]: status update time (8.232 seconds)
Mar 15 09:31:47 proxmox pvedaemon[932564]: <root@pam> successful auth for user 'root@pam'
Mar 15 09:31:49 proxmox pvestatd[1194]: VM 101 qmp command failed - VM 101 qmp command 'query-proxmox-support' failed - unable to connect to VM 101 qmp socket - timeout after 51 retries
Mar 15 09:31:49 proxmox pvestatd[1194]: status update time (8.252 seconds)
Mar 15 09:31:55 proxmox pvedaemon[932371]: <root@pam> successful auth for user 'root@pam'
Mar 15 09:31:59 proxmox pvestatd[1194]: VM 101 qmp command failed - VM 101 qmp command 'query-proxmox-support' failed - unable to connect to VM 101 qmp socket - timeout after 51 retries
Mar 15 09:32:00 proxmox pvestatd[1194]: status update time (8.251 seconds)
Mar 15 09:32:09 proxmox pvestatd[1194]: VM 101 qmp command failed - VM 101 qmp command 'query-proxmox-support' failed - unable to connect to VM 101 qmp socket - timeout after 51 retries
Mar 15 09:32:09 proxmox pvestatd[1194]: status update time (8.212 seconds)
Mar 15 09:32:19 proxmox pvestatd[1194]: VM 101 qmp command failed - VM 101 qmp command 'query-proxmox-support' failed - unable to connect to VM 101 qmp socket - timeout after 51 retries
Mar 15 09:32:19 proxmox pvestatd[1194]: status update time (8.256 seconds)
Mar 15 09:32:29 proxmox pvestatd[1194]: VM 101 qmp command failed - VM 101 qmp command 'query-proxmox-support' failed - unable to connect to VM 101 qmp socket - timeout after 51 retries
Mar 15 09:32:29 proxmox pvestatd[1194]: status update time (8.231 seconds)
Mar 15 09:32:36 proxmox kernel: INFO: task jbd2/sda1-8:702 blocked for more than 120 seconds.
Mar 15 09:32:36 proxmox kernel: Tainted: P O 6.5.13-1-pve #1
Mar 15 09:32:36 proxmox kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Mar 15 09:32:36 proxmox kernel: task:jbd2/sda1-8 state:D stack:0 pid:702 ppid:2 flags:0x00004000
Mar 15 09:32:36 proxmox kernel: Call Trace:
Mar 15 09:32:36 proxmox kernel: <TASK>
Mar 15 09:32:36 proxmox kernel: __schedule+0x3fc/0x1440
Mar 15 09:32:36 proxmox kernel: ? update_load_avg+0x82/0x7f0
Mar 15 09:32:36 proxmox kernel: ? srso_return_thunk+0x5/0x10
Mar 15 09:32:36 proxmox kernel: schedule+0x63/0x110
Mar 15 09:32:36 proxmox kernel: io_schedule+0x46/0x80
Mar 15 09:32:36 proxmox kernel: bit_wait_io+0x11/0x90
Mar 15 09:32:36 proxmox kernel: __wait_on_bit+0x4d/0x120
Mar 15 09:32:36 proxmox kernel: ? __pfx_bit_wait_io+0x10/0x10
Mar 15 09:32:36 proxmox kernel: out_of_line_wait_on_bit+0x8c/0xb0
Mar 15 09:32:36 proxmox kernel: ? __pfx_wake_bit_function+0x10/0x10
Mar 15 09:32:36 proxmox kernel: __wait_on_buffer+0x30/0x50
Mar 15 09:32:36 proxmox kernel: jbd2_journal_commit_transaction+0x1119/0x19d0
Mar 15 09:32:36 proxmox kernel: ? srso_return_thunk+0x5/0x10
Mar 15 09:32:36 proxmox kernel: kjournald2+0xab/0x280
Mar 15 09:32:36 proxmox kernel: ? __pfx_autoremove_wake_function+0x10/0x10
Mar 15 09:32:36 proxmox kernel: ? __pfx_kjournald2+0x10/0x10
Mar 15 09:32:36 proxmox kernel: kthread+0xf2/0x120
Mar 15 09:32:36 proxmox kernel: ? __pfx_kthread+0x10/0x10
Mar 15 09:32:36 proxmox kernel: ret_from_fork+0x47/0x70
Mar 15 09:32:36 proxmox kernel: ? __pfx_kthread+0x10/0x10
Mar 15 09:32:36 proxmox kernel: ret_from_fork_asm+0x1b/0x30
Mar 15 09:32:36 proxmox kernel: </TASK>
Mar 15 09:32:36 proxmox kernel: INFO: task iou-wrk-933562:1033361 blocked for more than 120 seconds.
Mar 15 09:32:36 proxmox kernel: Tainted: P O 6.5.13-1-pve #1
Mar 15 09:32:36 proxmox kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Mar 15 09:32:36 proxmox kernel: task:iou-wrk-933562 state:D stack:0 pid:1033361 ppid:1 flags:0x00004000
Mar 15 09:32:36 proxmox kernel: Call Trace:
Mar 15 09:32:36 proxmox kernel: <TASK>
Mar 15 09:32:36 proxmox kernel: __schedule+0x3fc/0x1440
Mar 15 09:32:36 proxmox kernel: ? srso_return_thunk+0x5/0x10
Mar 15 09:32:36 proxmox kernel: ? unlock_page+0x18/0x60
Mar 15 09:32:36 proxmox kernel: schedule+0x63/0x110
Mar 15 09:32:36 proxmox kernel: io_schedule+0x46/0x80
Mar 15 09:32:36 proxmox kernel: folio_wait_bit_common+0x136/0x330
Mar 15 09:32:36 proxmox kernel: ? __pfx_wake_page_function+0x10/0x10
Mar 15 09:32:36 proxmox kernel: folio_wait_bit+0x18/0x30
Mar 15 09:32:36 proxmox kernel: folio_wait_writeback+0x2c/0xa0
Mar 15 09:32:36 proxmox kernel: __filemap_fdatawait_range+0x90/0x100
Mar 15 09:32:36 proxmox kernel: file_write_and_wait_range+0x93/0xc0
Mar 15 09:32:36 proxmox kernel: ext4_sync_file+0x86/0x380
Mar 15 09:32:36 proxmox kernel: ? raw_spin_rq_unlock+0x10/0x40
Mar 15 09:32:36 proxmox kernel: vfs_fsync_range+0x4b/0xa0
Mar 15 09:32:36 proxmox kernel: ? __schedule+0x404/0x1440
Mar 15 09:32:36 proxmox kernel: io_fsync+0x3d/0x60
Mar 15 09:32:36 proxmox kernel: io_issue_sqe+0x68/0x3f0
Mar 15 09:32:36 proxmox kernel: ? srso_return_thunk+0x5/0x10
Mar 15 09:32:36 proxmox kernel: ? lock_timer_base+0x72/0xa0
Mar 15 09:32:36 proxmox kernel: io_wq_submit_work+0x90/0x2f0
Mar 15 09:32:36 proxmox kernel: ? __timer_delete_sync+0x8c/0x100
Mar 15 09:32:36 proxmox kernel: io_worker_handle_work+0x156/0x590
Mar 15 09:32:36 proxmox kernel: io_wq_worker+0x112/0x3c0
Mar 15 09:32:36 proxmox kernel: ? srso_return_thunk+0x5/0x10
Mar 15 09:32:36 proxmox kernel: ? raw_spin_rq_unlock+0x10/0x40
Mar 15 09:32:36 proxmox kernel: ? srso_return_thunk+0x5/0x10
Mar 15 09:32:36 proxmox kernel: ? finish_task_switch.isra.0+0x85/0x2c0
Mar 15 09:32:36 proxmox kernel: ? __pfx_io_wq_worker+0x10/0x10
Mar 15 09:32:36 proxmox kernel: ret_from_fork+0x47/0x70
Mar 15 09:32:36 proxmox kernel: ? __pfx_io_wq_worker+0x10/0x10
Mar 15 09:32:36 proxmox kernel: ret_from_fork_asm+0x1b/0x30
Mar 15 09:32:36 proxmox kernel: RIP: 0033:0x0
Mar 15 09:32:36 proxmox kernel: RSP: 002b:0000000000000000 EFLAGS: 00000293 ORIG_RAX: 000000000000010f
Mar 15 09:32:36 proxmox kernel: RAX: 0000000000000000 RBX: 00006135f1746fd0 RCX: 0000784b1471b256
Mar 15 09:32:36 proxmox kernel: RDX: 00007fff7ab6b1c0 RSI: 000000000000004f RDI: 00006135f1af4c00
Mar 15 09:32:36 proxmox kernel: RBP: 00007fff7ab6b22c R08: 0000000000000008 R09: 0000000000000000
Mar 15 09:32:36 proxmox kernel: R10: 0000000000000000 R11: 0000000000000293 R12: 00007fff7ab6b1c0
Mar 15 09:32:36 proxmox kernel: R13: 00006135f1746fd0 R14: 00006135f08d2c48 R15: 00007fff7ab6b230
Mar 15 09:32:36 proxmox kernel: </TASK>
Mar 15 09:32:36 proxmox kernel: INFO: task kworker/u64:0:1024601 blocked for more than 120 seconds.
Mar 15 09:32:36 proxmox kernel: Tainted: P O 6.5.13-1-pve #1
Mar 15 09:32:36 proxmox kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Mar 15 09:32:36 proxmox kernel: task:kworker/u64:0 state:D stack:0 pid:1024601 ppid:2 flags:0x00004000
Mar 15 09:32:36 proxmox kernel: Workqueue: writeback wb_workfn (flush-8:0)
Mar 15 09:32:36 proxmox kernel: Call Trace:
Mar 15 09:32:36 proxmox kernel: <TASK>
Mar 15 09:32:36 proxmox kernel: __schedule+0x3fc/0x1440
Mar 15 09:32:36 proxmox kernel: ? srso_return_thunk+0x5/0x10
Mar 15 09:32:36 proxmox kernel: schedule+0x63/0x110
Mar 15 09:32:36 proxmox kernel: io_schedule+0x46/0x80
Mar 15 09:32:36 proxmox kernel: bit_wait_io+0x11/0x90
Mar 15 09:32:36 proxmox kernel: __wait_on_bit+0x4d/0x120
Mar 15 09:32:36 proxmox kernel: ? __pfx_bit_wait_io+0x10/0x10
Mar 15 09:32:36 proxmox kernel: out_of_line_wait_on_bit+0x8c/0xb0
Mar 15 09:32:36 proxmox kernel: ? __pfx_wake_bit_function+0x10/0x10
Mar 15 09:32:36 proxmox kernel: do_get_write_access+0x284/0x440
Mar 15 09:32:36 proxmox kernel: jbd2_journal_get_write_access+0x6b/0xa0
Mar 15 09:32:36 proxmox kernel: __ext4_journal_get_write_access+0x8e/0x1c0
Mar 15 09:32:36 proxmox kernel: ext4_reserve_inode_write+0x67/0xe0
Mar 15 09:32:36 proxmox kernel: __ext4_mark_inode_dirty+0x71/0x240
Mar 15 09:32:36 proxmox kernel: ? srso_return_thunk+0x5/0x10
Mar 15 09:32:36 proxmox kernel: ? __ext4_journal_start_sb+0x157/0x1d0
Mar 15 09:32:36 proxmox kernel: ext4_dirty_inode+0x5c/0x90
Mar 15 09:32:36 proxmox kernel: __mark_inode_dirty+0x5e/0x3b0
Mar 15 09:32:36 proxmox kernel: ext4_da_update_reserve_space+0x184/0x1f0
Mar 15 09:32:36 proxmox kernel: ext4_ext_map_blocks+0xf41/0x1b40
Mar 15 09:32:36 proxmox kernel: ? srso_return_thunk+0x5/0x10
Mar 15 09:32:36 proxmox kernel: ? release_pages+0x155/0x4c0
Mar 15 09:32:36 proxmox kernel: ? filemap_get_folios_tag+0x1c8/0x220
Mar 15 09:32:36 proxmox kernel: ? __folio_batch_release+0x30/0x70
Mar 15 09:32:36 proxmox kernel: ? srso_return_thunk+0x5/0x10
Mar 15 09:32:36 proxmox kernel: ? mpage_prepare_extent_to_map+0x50b/0x550
Mar 15 09:32:36 proxmox kernel: ext4_map_blocks+0x1cb/0x620
Mar 15 09:32:36 proxmox kernel: ? srso_return_thunk+0x5/0x10
Mar 15 09:32:36 proxmox kernel: ? kmem_cache_alloc+0x1a4/0x380
Mar 15 09:32:36 proxmox kernel: ext4_do_writepages+0x711/0xdf0
Mar 15 09:32:36 proxmox kernel: ext4_writepages+0xb5/0x190
Mar 15 09:32:36 proxmox kernel: do_writepages+0xd0/0x1e0
Mar 15 09:32:36 proxmox kernel: ? srso_return_thunk+0x5/0x10
Mar 15 09:32:36 proxmox kernel: ? nvme_prep_rq.part.0+0x3b3/0x870 [nvme]
Mar 15 09:32:36 proxmox kernel: ? srso_return_thunk+0x5/0x10
Mar 15 09:32:36 proxmox kernel: ? fprop_reflect_period_percpu.isra.0+0x87/0x100
Mar 15 09:32:36 proxmox kernel: __writeback_single_inode+0x44/0x370
Mar 15 09:32:36 proxmox kernel: ? srso_return_thunk+0x5/0x10
Mar 15 09:32:36 proxmox kernel: ? srso_return_thunk+0x5/0x10
Mar 15 09:32:36 proxmox kernel: writeback_sb_inodes+0x211/0x510
Mar 15 09:32:36 proxmox kernel: __writeback_inodes_wb+0x54/0x100
Mar 15 09:32:36 proxmox kernel: ? queue_io+0x115/0x120
Mar 15 09:32:36 proxmox kernel: wb_writeback+0x2a8/0x320
Mar 15 09:32:36 proxmox kernel: wb_workfn+0x2c7/0x4d0
Mar 15 09:32:36 proxmox kernel: ? __schedule+0x404/0x1440
Mar 15 09:32:36 proxmox kernel: process_one_work+0x23e/0x450
Mar 15 09:32:36 proxmox kernel: worker_thread+0x50/0x3f0
Mar 15 09:32:36 proxmox kernel: ? __pfx_worker_thread+0x10/0x10
Mar 15 09:32:36 proxmox kernel: kthread+0xf2/0x120
Mar 15 09:32:36 proxmox kernel: ? __pfx_kthread+0x10/0x10
Mar 15 09:32:36 proxmox kernel: ret_from_fork+0x47/0x70
Mar 15 09:32:36 proxmox kernel: ? __pfx_kthread+0x10/0x10
Mar 15 09:32:36 proxmox kernel: ret_from_fork_asm+0x1b/0x30
Mar 15 09:32:36 proxmox kernel: </TASK>
Mar 15 09:32:37 proxmox kernel: megaraid_sas 0000:06:00.0: Diag reset adapter never cleared megasas_adp_reset_fusion 4097
Last edited: