Hi all around,
I'm the latest newbee around here at the moment, so please have a little patience with me. I'll try to describe my issue in sufficient detail.
I started fumbling around with Proxmox VE on a cheap, old, but brandnew and so far unused Intel® Server System SR1625UR I've fed with two Intel® Xeon® Prozessor L5520 (1.75 € Each ) and 192 GB RAM ( .. the most expensive part) and added a LSI SAS 9201-16e HBA to add 'a few' drives later.
The SR1625UR has an integrated backplane with a LSI SAS1078 sitting on it (the passive version, without RAID-key) but sadly with the IR-Firmware. I have looked very close to find an IT-firmware version - but no success. On the other hand, the search for it brought up every piece of firmware for the rest of my hardware. So, I can say everything is up to date now (including the most recent IR-Firmware for the LSI SAS1078).
Proxmox looked bad from the beginning, since i got quite a lot of error messages with my first installation (proxmox-kernel-6.8.12-4-pve-signed) complaining about mptsas and refusing to boot. After stepping back (proxmox-kernel-6.5.13-6-pve-signed) the whole thing calmed down, still complaining about mptsas but went on. Stepping forward (proxmox-kernel-6.11.11-1-pve-signed) is working in the same manner, so i decided to stay with this state:
I have two SSD drives attached to the LSI SAS1078 backplane, which I'm using for Proxmox VE in a ZFS-mirror configuration
When firing up the box in this configuration it is booting but complaining about the LSI SAS1078 backplane and consuming a huge amount of time to attach the drives:
That doesn't really harm, since no further messages about the LSI SAS1078 backplane appear when i got this far - but:
I want to add two more drives, using them for ZIL and cache for the upcoming ZFS-pool. If I do so, when the box is up and runnig that looks like this (not filtered by grep to show all the details):
The result is:
The drives get functional, everythig is nice until reboot. The machine will do the same procedure for the four drives but after the third drive I get:
... and the box stopps booting. I don't get the
So, the leftover solution appears to be avoiding the timeot by disabling or extending the timer. What I've tried so far:
So I have three questions at least:
Any help is greatly appreciated.
Thank you in advance!
I'm the latest newbee around here at the moment, so please have a little patience with me. I'll try to describe my issue in sufficient detail.
I started fumbling around with Proxmox VE on a cheap, old, but brandnew and so far unused Intel® Server System SR1625UR I've fed with two Intel® Xeon® Prozessor L5520 (1.75 € Each ) and 192 GB RAM ( .. the most expensive part) and added a LSI SAS 9201-16e HBA to add 'a few' drives later.
The SR1625UR has an integrated backplane with a LSI SAS1078 sitting on it (the passive version, without RAID-key) but sadly with the IR-Firmware. I have looked very close to find an IT-firmware version - but no success. On the other hand, the search for it brought up every piece of firmware for the rest of my hardware. So, I can say everything is up to date now (including the most recent IR-Firmware for the LSI SAS1078).
Code:
# lspci | grep LSI
02:00.0 Serial Attached SCSI controller: Broadcom / LSI SAS2116 PCI-Express Fusion-MPT SAS-2 [Meteor] (rev 02)
07:00.0 SCSI storage controller: Broadcom / LSI SAS1078 PCI-Express Fusion-MPT SAS (rev 04)
Proxmox looked bad from the beginning, since i got quite a lot of error messages with my first installation (proxmox-kernel-6.8.12-4-pve-signed) complaining about mptsas and refusing to boot. After stepping back (proxmox-kernel-6.5.13-6-pve-signed) the whole thing calmed down, still complaining about mptsas but went on. Stepping forward (proxmox-kernel-6.11.11-1-pve-signed) is working in the same manner, so i decided to stay with this state:
Code:
# pveversion --verbose
proxmox-ve: 8.3.0 (running kernel: 6.11.11-1-pve)
pve-manager: 8.3.3 (running version: 8.3.3/f157a38b211595d6)
proxmox-kernel-helper: 8.1.0
proxmox-kernel-6.11.11-1-pve-signed: 6.11.11-1
proxmox-kernel-6.11: 6.11.11-1
proxmox-kernel-6.8: 6.8.12-8
proxmox-kernel-6.8.12-8-pve-signed: 6.8.12-8
ceph-fuse: 17.2.7-pve3
corosync: 3.1.7-pve3
criu: 3.17.1-2+deb12u1
glusterfs-client: 10.3-5
ifupdown2: 3.2.0-1+pmx11
intel-microcode: 3.20241112.1~deb12u1
ksm-control-daemon: 1.5-1
libjs-extjs: 7.0.0-5
libknet1: 1.28-pve1
libproxmox-acme-perl: 1.5.1
libproxmox-backup-qemu0: 1.4.1
libproxmox-rs-perl: 0.3.4
libpve-access-control: 8.2.0
libpve-apiclient-perl: 3.3.2
libpve-cluster-api-perl: 8.0.10
libpve-cluster-perl: 8.0.10
libpve-common-perl: 8.2.9
libpve-guest-common-perl: 5.1.6
libpve-http-server-perl: 5.1.2
libpve-network-perl: 0.10.0
libpve-rs-perl: 0.9.1
libpve-storage-perl: 8.3.3
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 6.0.0-1
lxcfs: 6.0.0-pve2
novnc-pve: 1.5.0-1
proxmox-backup-client: 3.3.2-1
proxmox-backup-file-restore: 3.3.2-2
proxmox-firewall: 0.6.0
proxmox-kernel-helper: 8.1.0
proxmox-mail-forward: 0.3.1
proxmox-mini-journalreader: 1.4.0
proxmox-offline-mirror-helper: 0.6.7
proxmox-widget-toolkit: 4.3.4
pve-cluster: 8.0.10
pve-container: 5.2.3
pve-docs: 8.3.1
pve-edk2-firmware: 4.2023.08-4
pve-esxi-import-tools: 0.7.2
pve-firewall: 5.1.0
pve-firmware: 3.14-3
pve-ha-manager: 4.0.6
pve-i18n: 3.3.3
pve-qemu-kvm: 9.0.2-5
pve-xtermjs: 5.3.0-3
qemu-server: 8.3.6
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.2.7-pve1
I have two SSD drives attached to the LSI SAS1078 backplane, which I'm using for Proxmox VE in a ZFS-mirror configuration
Code:
#lsscsi
[1:0:0:0] disk ATA SAMSUNG MZ7LM480 404Q /dev/sda
[1:0:1:0] disk ATA SAMSUNG MZ7LM480 404Q /dev/sdb
When firing up the box in this configuration it is booting but complaining about the LSI SAS1078 backplane and consuming a huge amount of time to attach the drives:
Code:
# dmesg | grep ioc0
[ 3.270492] mptbase: ioc0: Initiating bringup
[ 3.445852] ioc0: LSISAS1078 C2: Capabilities={Initiator}
[ 15.459540] scsi host1: ioc0: LSISAS1078 C2, FwRev=011bbe00h, Ports=1, MaxQ=276, IRQ=16
[ 15.464632] mptsas: ioc0: attaching sata device: fw_channel 0, fw_id 0, phy 0, sas_addr 0x1221000000000000
[ 16.484972] mptbase: ioc0: WARNING - IOC is in FAULT state (2650h)!!!
[ 16.485080] mptbase: ioc0: WARNING - Issuing HardReset from mpt_fault_reset_work!!
[ 16.485208] mptbase: ioc0: Initiating recovery
[ 16.485299] mptbase: ioc0: WARNING - IOC is in FAULT state!!!
[ 16.485393] mptbase: ioc0: WARNING - FAULT code = 2650h
[ 17.508876] mptbase: ioc0: Recovered from IOC FAULT
[ 29.652870] mptbase: ioc0: WARNING - mpt_fault_reset_work: HardReset: success
[ 29.663116] mptsas: ioc0: attaching sata device: fw_channel 0, fw_id 0, phy 0, sas_addr 0x1221000000000000
[ 29.663430] mptsas: ioc0: attaching sata device: fw_channel 0, fw_id 1, phy 1, sas_addr 0x1221000001000000
[ 29.674558] mptsas: ioc0: attaching sata device: fw_channel 0, fw_id 1, phy 1, sas_addr 0x1221000001000000
[ 30.693159] mptbase: ioc0: WARNING - IOC is in FAULT state (2650h)!!!
[ 30.693266] mptbase: ioc0: WARNING - Issuing HardReset from mpt_fault_reset_work!!
[ 30.693393] mptbase: ioc0: Initiating recovery
[ 30.693483] mptbase: ioc0: WARNING - IOC is in FAULT state!!!
[ 30.693577] mptbase: ioc0: WARNING - FAULT code = 2650h
[ 31.716892] mptbase: ioc0: Recovered from IOC FAULT
[ 43.875052] mptbase: ioc0: WARNING - mpt_fault_reset_work: HardReset: success
That doesn't really harm, since no further messages about the LSI SAS1078 backplane appear when i got this far - but:
I want to add two more drives, using them for ZIL and cache for the upcoming ZFS-pool. If I do so, when the box is up and runnig that looks like this (not filtered by grep to show all the details):
Code:
# dmesg
[47914.379054] mptsas: ioc0: attaching sata device: fw_channel 0, fw_id 7, phy 7, sas_addr 0x1221000007000000
[47914.381688] scsi 1:0:2:0: Direct-Access ATA SAMSUNG MZ7LH480 904Q PQ: 0 ANSI: 5
[47915.183597] mptbase: ioc0: WARNING - IOC is in FAULT state (2650h)!!!
[47915.183639] mptbase: ioc0: WARNING - Issuing HardReset from mpt_fault_reset_work!!
[47915.183668] mptbase: ioc0: Initiating recovery
[47915.183690] mptbase: ioc0: WARNING - IOC is in FAULT state!!!
[47915.183711] mptbase: ioc0: WARNING - FAULT code = 2650h
[47916.205672] mptbase: ioc0: Recovered from IOC FAULT
[47927.108957] sd 1:0:2:0: Attached scsi generic sg2 type 0
[47927.109487] sd 1:0:2:0: [sdc] 937703088 512-byte logical blocks: (480 GB/447 GiB)
[47927.110868] mptbase: ioc0: WARNING - mpt_fault_reset_work: HardReset: success
[47927.111008] sd 1:0:2:0: [sdc] Write Protect is off
[47927.111034] sd 1:0:2:0: [sdc] Mode Sense: 73 00 00 08
[47927.111611] sd 1:0:2:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[47927.120083] mptsas: ioc0: attaching sata device: fw_channel 0, fw_id 6, phy 6, sas_addr 0x1221000006000000
[47927.122720] scsi 1:0:3:0: Direct-Access ATA SAMSUNG MZ7LH480 904Q PQ: 0 ANSI: 5
[47928.174521] mptbase: ioc0: WARNING - IOC is in FAULT state (2650h)!!!
[47928.174566] mptbase: ioc0: WARNING - Issuing HardReset from mpt_fault_reset_work!!
[47928.174600] mptbase: ioc0: Initiating recovery
[47928.174619] mptbase: ioc0: WARNING - IOC is in FAULT state!!!
[47928.174639] mptbase: ioc0: WARNING - FAULT code = 2650h
[47929.197758] mptbase: ioc0: Recovered from IOC FAULT
[47940.189647] sd 1:0:3:0: Attached scsi generic sg3 type 0
[47940.190645] sd 1:0:3:0: [sdd] 937703088 512-byte logical blocks: (480 GB/447 GiB)
[47940.191138] mptbase: ioc0: WARNING - mpt_fault_reset_work: HardReset: success
[47940.192229] sd 1:0:3:0: [sdd] Write Protect is off
[47940.192257] sd 1:0:3:0: [sdd] Mode Sense: 73 00 00 08
[47940.193281] sd 1:0:3:0: [sdd] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[47940.194411] sdc: sdc1 sdc2 sdc3
[47940.195489] sd 1:0:2:0: [sdc] Attached SCSI disk
[47940.210579] sdd: sdd1 sdd2 sdd3
[47940.211699] sd 1:0:3:0: [sdd] Attached SCSI disk
The result is:
Code:
# lsscsi
[1:0:0:0] disk ATA SAMSUNG MZ7LM480 404Q /dev/sda
[1:0:1:0] disk ATA SAMSUNG MZ7LM480 404Q /dev/sdb
[1:0:2:0] disk ATA SAMSUNG MZ7LH480 904Q /dev/sdc
[1:0:3:0] disk ATA SAMSUNG MZ7LH480 904Q /dev/sdd
The drives get functional, everythig is nice until reboot. The machine will do the same procedure for the four drives but after the third drive I get:
Code:
timed out for waiting the udev queue being empty
... and the box stopps booting. I don't get the
dmesg
output for that case, since the machine doesn't get far enough for SSH connection. Removing at least one drive gives enough headroom to finish the boot-sequence (but i need the four of them). I've read about similar issues with this udev-timer and have tried some mentioned workarounds. Mostly the described issues could get solved in a manner that shortens the time passing by in the boot process, but getting there requires assumably changes in the firmware and/or driver for the LSI SAS1078 backplane.So, the leftover solution appears to be avoiding the timeot by disabling or extending the timer. What I've tried so far:
- Extending the timer: Editing
/etc/udev/udev.conf
Code:
# see udev.conf(5) for details
#[/INDENT]
# udevd is also started in the initrd. When this file is modified you might
# also want to rebuild the initrd, so that it will include the modified configuration.
#udev_log=info
#children_max=
#exec_delay=
event_timeout=300 <-- was 180 and commented out.
#timeout_signal=SIGKILL
#resolve_names=early
with calling
update-initramfs
as advised and rebooting the box behaves unchanged.
Assumption: This is the wrong timer.
- Disabling the timer:
systemctl mask systemd-udev-settle.service
The box behaves unchanged also.
Assumption: This is the wrong service.
So I have three questions at least:
- Does anybody know how to obtain and flash an IT-firmware for the LSI SAS1078 backplane?
- Does anybody know how to obtain and install a fixed driver for the LSI SAS1078 backplane?
- Does anybody have a clue how to disable or extend the udev-timer?
Any help is greatly appreciated.
Thank you in advance!
Last edited: