How To Stop zpool HDD spindown/standby

ajm

Member
Dec 13, 2020
6
0
6
61
Hi Everyone

Since upgrade to pve 7.2, 4 x seagate HDD in a zfs raidz1 pool keep spinning down into STANDBY mode.

I want to stop this from happening, for many reasons. I want to keep the disks permanently in IDLE/ACTIVE while the proxmox host is on.

I do not know what the cause of the spindown to standby is, in order to turn it off. Is it a new zfs pool setting for version 2.1.6, or has some other default setting changed from keeping the disks active, to now shutting them down after a short period?

The host exposes the zpool via samba shares to the VM. The VM uses the host zpool samba shares for storage. If the VM remains inactive on the samba shares for around 5 minutes, then the host zpool disks spindown into standby mode, and when the shares are accessed by the VM, they perform a staggered spinup, which delays access and freezes the VM by 30 secs to 1 minute or more.

I have tried using /etc/hdparm.conf (for each disk apm = 255, spindown_time = 0), but proxmox does not seem to respect the settings, and the drives still go into standby.

I assume that smartmontools may be the way to go, but after reading the documentation I cannot make heads or tails about how to actually set the drives to be permanently active, and how to make these settings persist across reboots.

Also I do not understand why with previous versions of proxmox using the same configuration the disks never entered standby (which I want), and why in the latest version they seem to go into standby within 5 minutes (which I don't want). If anyone can answer that problem, I suppose I would have a solution.


pveversion -v
Code:
proxmox-ve: 7.2-1 (running kernel: 5.15.64-1-pve)
pve-manager: 7.2-11 (running version: 7.2-11/b76d3178)
pve-kernel-5.15: 7.2-13
pve-kernel-helper: 7.2-13
pve-kernel-5.13: 7.1-9
pve-kernel-5.11: 7.0-10
pve-kernel-5.15.64-1-pve: 5.15.64-1
pve-kernel-5.15.60-1-pve: 5.15.60-1
pve-kernel-5.13.19-6-pve: 5.13.19-15
pve-kernel-5.13.19-1-pve: 5.13.19-3
pve-kernel-5.11.22-7-pve: 5.11.22-12
ceph-fuse: 15.2.17-pve1
corosync: 3.1.5-pve2
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.24-pve1
libproxmox-acme-perl: 1.4.2
libproxmox-backup-qemu0: 1.3.1-1
libpve-access-control: 7.2-4
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.2-3
libpve-guest-common-perl: 4.1-4
libpve-http-server-perl: 4.1-4
libpve-storage-perl: 7.2-10
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 5.0.0-3
lxcfs: 4.0.12-pve1
novnc-pve: 1.3.0-3
proxmox-backup-client: 2.2.7-1
proxmox-backup-file-restore: 2.2.7-1
proxmox-mini-journalreader: 1.3-1
proxmox-widget-toolkit: 3.5.1
pve-cluster: 7.2-2
pve-container: 4.2-3
pve-docs: 7.2-2
pve-edk2-firmware: 3.20220526-1
pve-firewall: 4.2-6
pve-firmware: 3.5-6
pve-ha-manager: 3.4.0
pve-i18n: 2.7-2
pve-qemu-kvm: 7.0.0-4
pve-xtermjs: 4.16.0-1
qemu-server: 7.2-4
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.7.1~bpo11+1
vncterm: 1.7-1
zfsutils-linux: 2.1.6-pve1



hdparm.conf settings that did not work # same for all disks
Code:
/dev/disk/by-id/ata-ST12000NM0007-2A1101_ZCH06778 {
  # advanced power management
  # 255 = disabled
  # 1-127 = spindown
  # 128-254 = high performance
  apm = 255
  # spindown in 5 sec units, min 0, max 255
  spindown_time = 0
  # acoustic_management
  acoustic_management = 254
}


smartctl -i -n standby /dev/sda # same for all disks
Code:
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.64-1-pve] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

Device is in STANDBY mode, exit(2)


zpool status
Code:
 pool: zp0
 state: ONLINE
config:

        NAME                                   STATE     READ WRITE CKSUM
        zp0                                    ONLINE       0     0     0
          raidz1-0                             ONLINE       0     0     0
            ata-ST12000VN0007-2GS116_ZCH029ST  ONLINE       0     0     0
            ata-ST12000VN0007-2GS116_ZCH087RV  ONLINE       0     0     0
            ata-ST12000VN0007-2GS116_ZCH0886L  ONLINE       0     0     0
            ata-ST12000VN0007-2GS116_ZJV01MMP  ONLINE       0     0     0

errors: No known data errors



Any help would be greatly appreciated.

AJ
 
Seagate provide a “Seachest” collection of tools for manipulating their drives, but rather more usefully to users of non-Windows operating systems like Linux they also offer an open-source openSeaChest. The tool to use there is openSeaChest_PowerControl which allows each of the EPC timers to be configured, in an invocation like:

openSeaChest_PowerControl -d /dev/sdc --idle_a 6000 --idle_b 1800000 --idle_c 2400000

The timer values specified are in milliseconds, so this example will park the disk heads after 30 minutes of inactivity. The current settings for a disk can be queried with the --showEPCSettings flag.

My Seagate Archive SMR disk (which began life as an external hard drive and was retired from that role when it became too small to hold as much as I wanted to back up to it) apparently doesn’t support reporting EPC settings (since asking for them says so), and initially didn’t accept new values for the idle timers either. After using the --EPCfeature enable option however, it seems to have accepted custom idle timer values: I’ll have to watch the park counts on that to ensure it actually worked.
Source: https://www.taricorp.net/2021/hdd-parking-monitoring/

ST12000VN0007 is a Seagate drive so it might be worth checking seachest

Hth
 
Thanks APOC for your help with SeaChest, and for your time answering the question.

However, the SeaChest_PowerControl does not seem to solve my problem.

After downloading the latest SeaChest Tools, installing them on my server, and then examining the output for my drives, I found that the idle_c and standby_z timers were disabled.

drive power settings:
Code:
idle_a: power down some electronics
idle_b: park the heads (unloading them)
idle_c: reduce spindle speed, heads unloaded
standby_z: stop spindle completely
* = timer is enabled

bash code:
Code:
# get all drives of that model
MODEL="ST12000NM0007"
HDD_LIST=$(./SeaChest_PowerControl --scan | grep ${MODEL} | grep -Eo "\/dev\/sg[0-9]{1,3}")
for HDD in ${HDD_LIST}; do
  ./SeaChest_PowerControl -d ${HDD} --showEPCSettings
done

outputted the following (each drive is the same):
Code:
==========================================================================================
 SeaChest_PowerControl - Seagate drive utilities - NVMe Enabled
 Copyright (c) 2014-2022 Seagate Technology LLC and/or its Affiliates, All Rights Reserved
 SeaChest_PowerControl Version: 3.1.10-3_2_1 X86_64
 Build Date: Jul 26 2022
 Today: Thu Nov 10 09:00:44 2022        User: root
==========================================================================================

/dev/sg0 - ST12000NM0007-2A1101 - ZCH060NE - SN04 - ATA
.

===EPC Settings===
        * = timer is enabled
        C column = Changeable
        S column = Savable
        All times are in 100 milliseconds

Name       Current Timer Default Timer Saved Timer   Recovery Time C S
Idle A     *1            *1            *1            1             Y Y
Idle B     *1200         *1200         *1200         4             Y Y
Idle C      0             6000          6000         30            Y Y
Standby Z   0             9000          9000         100           Y Y

As can be seen, both timers for idle_c "disk head parking", and standby_z "stop spindle completely", are disabled.

That being the case, it seems to me that the disks going into standby is NOT the result of the seagate disk power settings themselves (which I have now adjusted, but without any change of behaviour), but seems to be the result of some other setting in PROXMOX that I do not know. As I said previously, this behaviour changed after an update of Proxmox.

So currently I have tried both /etc/hdparm.conf, and Seagate SeaChest_PowerControl power settings, and proxmox continues to behave the same, shutting down the disks to standby after about 5 minutes of operation.

So it appears that I still need some help about tracing what would be causing Proxmox to put all my disks in standby after a few minutes.

Thanks
 
Last edited:
This is odd.
Someone from the PVE team should have a look into this. I can't judge nor do I know all the changes that habben underneath. @fabian for a headsup.
 
there's nothing in PVE that does this explicitly. from which version did you upgrade when the issue first showed up? it's possible that some default power management setting somewhere changed (e.g., in the kernel).
 
Hi Fabian

Thanks for your help.

Unfortunately :( I cannot tell you the exact version where the problem began. I did not notice the problem for over 1 month, and I have not documented the updates I have done.

However, when I first installed proxmox, I used the version 7.1 iso, and I did not notice the problem then. Drives behaved as expected (stayed on).

When the system was upgraded to the latest version (as above), I noticed the problem, but unfortunately that doesn't mean much, because I have done many updates since the first install, and I have not been checking the drive standby status after OS upgrades.

I have a few test machines evaluating proxmox, and the problem showed up in each of them, and is now consistent in each machine.

Where are the default power management settings in the kernel, and how can I experiment with them?

Regards

AJ
 
I have some more information regarding the server HDD's going into STANDBY from attempting to debug the problem.

The time delay from IDLE/ACTIVE into standby is completely inconsistent with no-one using the server or the vm's, and no-one using any samba shares on the server. Sometimes it is as low as 127 seconds, and other times it has been as high as 580 seconds.


I have checked the following files for indiscriminate settings, but found none:
Code:
/etc/hdparm.conf
./SeaChest_PowerControl # settings
/etc/default/smartmontools
/etc/init.d/smartmontools
/etc/systemd/system/multi-user.target.wants/smartmontools.service
/etc/systemd/system/smartd.service
/etc/zfs/zpool.d/smart
/etc/zfs/zpool.d/smart_test
/etc/zfs/zpool.d/smartx
/etc/smartmontools
/etc/smartmontools/smartd_warning.d
/etc/smartd.conf
/usr/lib/systemd/system/smartmontools.service
/usr/lib/zfs-linux/zpool.d/smart
/usr/lib/zfs-linux/zpool.d/smart_test
/usr/lib/zfs-linux/zpool.d/smartx
/usr/share/smartmontools/smartd-runner

I have also found the following tuned file, :

/usr/lib/tuned/spindown-disk/tuned.conf
Code:
#
# tuned configuration
#
# spindown-disk usecase:
# Safe extra energy on your laptop or home server
# which wake-up only when you ssh to it. On server
# could be hdparm and sysctl values problematic for
# some type of discs. Laptops should be probably ok
# with these numbers.
#
# Possible problems:
# The script is remounting your ext3 fs if you have
# it as noatime. Also configuration of rsyslog is
# changed to not sync. hdparm is setting disc to
# minimal spins but without use of tuned daemon.
# Bluetooth will be switch off.
# Wifi will be switch into power safe mode.

[main]
summary=Optimize for power saving by spinning-down rotational disks

[disk]
apm=128
spindown=6

[scsi_host]
alpm=medium_power

[sysctl]
vm.dirty_writeback_centisecs=6000
vm.dirty_expire_centisecs=9000
vm.dirty_ratio=60
vm.laptop_mode=5
vm.swappiness=30

[script]
script=${i:PROFILE_DIR}/script.sh

But changing the [disk] settings to apm=254, spindown=0, and activating the spindown-disk tuned profile does not change the behavior.

I have also completely turned off the following services, but still get the same behavior:
Code:
tune-adm off
disabled manual tuned # in /etc/tuned/tuned-main.conf
disabled automatic tuned # /etc/tuned/tuned-main.conf
systemctl stop tuned
systemctl disable tuned
systemctl disable smartd

So at this stage, I still have the same behavior after all of my efforts.

I think I may just reinstall proxmox, and see what happens without any software installed, but the zpool setup, with no samba, vm etc, and see what happens with the disks then.

Any other suggestions would be appreciated.

AJ
 
tuned looks like a possible culprit and is definitely not installed by default ;) so yeah, starting with a stock PVE and checking whether that is affected might be a good way to test - you can then add your changes piece by piece until it breaks again, and you know which change is responsible.
 
Hello,
i might have experiencing similar behaviour with proxmox 8.0.4
8 seagates in ZFS pool and I can hear while near by server that some disk(s) are spinning down after couple of minutes.

it's very fresh instalation of pve without any modifications on the disk level.

hdparams shows 128 on all disks
hdparams -C shows all disks in standby (which i assume is not correct?)


Code:
hdparm -C /dev/sd[cdefghij]                                                                                                                             
                                                                                                                                                                      
/dev/sdc:                                                                                                                                                             
 drive state is:  standby                                                                                                                                             
                                                                                                                                                                      
/dev/sdd:                                                                                                                                                             
 drive state is:  standby                                                                                                                                             
                                                                                                                                                                      
/dev/sde:                                                                                                                                                             
 drive state is:  standby                                                                                                                                             
                                                                                                                                                                      
/dev/sdf:                                                                                                                                                             
 drive state is:  standby                                                                                                                                             
                                                                                                                                                                      
/dev/sdg:                                                                                                                                                             
 drive state is:  standby                                                                                                                                             
                                                                                                                                                                      
/dev/sdh:                                                                                                                                                             
 drive state is:  standby                                                                                                                                             
                                                                                                                                                                      
/dev/sdi:                                                                                                                                                             
 drive state is:  standby                                                                                                                                             
                                                                                                                                                                      
/dev/sdj:                                                                                                                                                             
 drive state is:  standby

to verify is there any live monitoring tool which shows if disk is spinning or spinning down? My assumptions are based on sound I've heard ... so not very reliable source of truth :)

Thanks
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!