[SOLVED] ZFS disk problems

casalicomputers

Renowned Member
Mar 14, 2015
89
3
73
Hi,

we are building a Proxmox machine for a client using ZFS and while in out lab we are experiencing a weird problem.

The machine is a Dell PowerEdge R550 and it is configured as following:
1) a RAIDZ1 of 5 SAS-HDD disks
2) a RAIDZ1 of 4 SAS-SSD disks

Often on boot, one or more of the disks on the HDD raid (the first one) fail. The fail is reported on both ZFS (we receive the pool status update mail) and the IDRAC.
Even if the disks are reported as failed, they are still working (zpool status report them as working), so look like the disks experience a sudden disconnection-reconnection that trigger the IDRAC and the ZFS disk alerts.

We are also preparing a second server for another client with a similar hardware and a RAID10 SAS-SSD configuration and we are not experiencing any problem, for now, so the problem look like related only to the HDD.

Can we ask some help to troubleshoot this problem? If some logs are needed, we wll add them as soon as possible.

Thanks for the support and the dedicated time,
Have a good day
 
Okay, people will want to know the following about the HDDs:
What make are they?
Model number?

Are they SMR [bad] or CMR [good]?
ZFS is very sensitive to the above, and CMR is where you should be. Otherwise, you'll see drives dropping out randomly, as they cannot keep up with ZFS due to latency timeouts.
 
  • Like
Reactions: Tmanok
The disks are the following:
- Toshiba AL15SEB24EQY
- Toshiba AL15SEB24EQY
- Toshiba AL15SEB24EQY
- Seagate DL2400MM0159 (suspecting an hardware failure we asked for a replacement and we got this, that's why it's from a different maker)
- Toshiba AL15SEB24EQY

Unfortunately we don't know which kind of HDD they are, look like retrieving this information is not so easy by software. When we receive disks to use, we don't know the exact model, but only the performance class, so it is possible we got the wrong disks for this usage. I will update as soon as we find out what kind of HDD they are.
 
The Toshiba drives are nice and high-end: https://storage.toshiba.com/enterprise-hdd/enterprise-performance/al15se-series
Same with the Seagate model: https://www.enterasource.com/dell-rwr8f-2-4tb-sas-2-5-12gbps-hard-drive-seagate-dl2400mm0159
Based on the specs and description, doubtful they would be SMR, so they should be CMR.

Have you tested the power to the drives? [just in case it's due to a brownout?]
Dell systems should be quite reliable, but little things can cause trouble, like the startup surge wattage may exceed the power supply wattage.
What happens if you test these drives on the other server? [You might be able to rule out the drives if it turns out to be an issue with the backplane for the array.]
Have you tried an alternative data cable for the drives?

If you set the system to delay its startup so that all drives are spinning at full speed, does the problem rectify? [Slow spindle startup can be a bad sign]

Are these refurbished drives or NOS?
 
The Toshiba drives are nice and high-end: https://storage.toshiba.com/enterprise-hdd/enterprise-performance/al15se-series
Same with the Seagate model: https://www.enterasource.com/dell-rwr8f-2-4tb-sas-2-5-12gbps-hard-drive-seagate-dl2400mm0159
Based on the specs and description, doubtful they would be SMR, so they should be CMR.
The drives are new and we also think they are CMR based on the performance sheet, even tho we are not 100% sure.

We did some tests to troubleshoot the problem, like:
- Use a Dell supported os directly from usb and try to detach and attach the disks and no disk failure has been detected (other then the attach/detach notification). Since the disks are not mounted, zfs is not running, so look like it's a zfs problem.
- While the server is running, we experienced no problem, even tho it is currently not stressed, so maybe if we push toward the HDD speed limit, we could trigger the problem. We got no problem on warm boot, only on cold boot.

We evaluated with Dell technicians it's not an hardware fault and we would like to make additional tests to troubleshoot the problem.

If you set the system to delay its startup so that all drives are spinning at full speed, does the problem rectify? [Slow spindle startup can be a bad sign]
How can we do this?

For completeness i also add the `zpool status` output (the failing disks are from the `raid5-sas-hdd`):
Code:
  pool: raid5-sas-hdd
 state: ONLINE
  scan: resilvered 516K in 00:00:00 with 0 errors on Fri Apr 28 10:26:30 2023
config:

        NAME           STATE     READ WRITE CKSUM
        raid5-sas-hdd  ONLINE       0     0     0
          raidz1-0     ONLINE       0     0     0
            sda        ONLINE       0     0     0
            sdb        ONLINE       0     0     0
            sdc        ONLINE       0     0     0
            sde        ONLINE       0     0     0
            sdd        ONLINE       0     0     0

errors: No known data errors

  pool: rpool
 state: ONLINE
config:

        NAME                              STATE     READ WRITE CKSUM
        rpool                             ONLINE       0     0     0
          raidz1-0                        ONLINE       0     0     0
            scsi-358ce38ee22527d6d-part3  ONLINE       0     0     0
            scsi-358ce38ee22527d51-part3  ONLINE       0     0     0
            scsi-358ce38ee22527d75-part3  ONLINE       0     0     0
            scsi-358ce38ee22527d49-part3  ONLINE       0     0     0

errors: No known data errors

Thanks for the support!
 
Last edited:
We tried to disconnect-reconnect a disk to get the log pattern into the dmesg and we found what happened before the disk failed during the boot phase.
Just before the disk disconnection we have this in the dmesg output:

Code:
[  178.579018] x86/split lock detection: #AC: CPU 0/KVM/61253 took a split_lock trap at address: 0xfffff8010f6d21c1
[  433.992806] x86/split lock detection: #AC: CPU 0/KVM/48663 took a split_lock trap at address: 0xfffff80661a574af
[  584.381393] x86/split lock detection: #AC: CPU 0/KVM/147008 took a split_lock trap at address: 0xfffff80659a33944
[  937.128377] x86/split lock detection: #AC: CPU 0/KVM/69721 took a split_lock trap at address: 0xfffff80169e574af
[  955.244377] x86/split lock detection: #AC: CPU 0/KVM/107754 took a split_lock trap at address: 0xfffff80613c79c9f
[ 1195.109857] x86/split lock detection: #AC: CPU 4/KVM/61257 took a split_lock trap at address: 0xfffff8010f6574af
[ 2595.594221] x86/split lock detection: #AC: CPU 3/KVM/147045 took a split_lock trap at address: 0xfffff80659ad2106
[ 3301.586997] x86/split lock detection: #AC: CPU 1/KVM/147025 took a split_lock trap at address: 0xfffff80659a33944
[ 3782.642689] x86/split lock detection: #AC: CPU 3/KVM/69725 took a split_lock trap at address: 0xfffff80169e574af
[ 4035.597873] x86/split lock detection: #AC: CPU 3/KVM/147045 took a split_lock trap at address: 0xfffff80659ad2106
[ 5519.866306] x86/split lock detection: #AC: CPU 0/KVM/87870 took a split_lock trap at address: 0xfffff8023a279c9f

Can be relevant?
 
Last edited:
I'm not sure about this one. A potential idea, though, would be to check the smart values of each disk. When the array runs into trouble, compare the values. That way, you'd be able to determine if the disk is repowering, or some other issue; especially ECC issues, which may indicate noisy cables somewhere.
 
We are still testing, but we think we got the problem. While gathering the info through "smartctl" we noticed some disks didn't match the disks they are supposed to be. Look like during cold boot, the system indexes the disks in a different order. I have created the HDD pool using devices as "/dev/sda /deb/sdb /dev/sdc ...", so when such disks are swapped after a reboot, zfs got errors. I have destroyed and recreated the pool using devices from "/dev/disk/by-id/" to be sure the pool pick the right disks. We are currently testing the solution, but we are pretty sure that was the problem.

Sorry for the noise and thanks for the support!
 
Last edited:
I have created the HDD pool using devices as "/dev/sda /deb/sdb /dev/sdc ...", so when such disks are swapped after a reboot, zfs got errors.
That shouldn't matter. Once created a ZFS pool will identify it's disk by the metadata written on the disks, not by the name they are called by linux. So a disk added to the pool as /dev/sda should also work as /dev/sdb or whatever.
But adding disk using /dev/disk/by-id is nice to be able to easier identify a disk that needa to be replaced, as a zpool status then will show the WWN or serial next to the state.
 
We did a lot of cold boot for testing and the problem didn't appear anymore, so i think it's pretty safe to say we solved the problem. Thanks for the support! Have a nice day.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!