Zfs reboot or reload disk

Mattarello

New Member
Jun 19, 2024
5
0
1
Hi, i have a problem with my disk (hdd) in a zfs configuration.
I have one disk m.2 for proxmox, one ssd for vm and 2 4TB hdd with zfs.

The problem I think is proxmox that reboot one of the hdd every few seconds and now it has 5023 power cycle.
Is it correct? Beacuse when the disk reboot is loud and it is very annoying.

Can anyone help me or give me just an advice?

P.S.
I've installed proxmox 2 times, the first time the other hdd has reach the 12k cycle count.
Now It stopped and the second disk started to reboot.
 

Attachments

  • proxmox count.png
    proxmox count.png
    41.2 KB · Views: 13
What did you see in the logs during such power cycle.
Which HDDs are involved (product number)?
Cable/backplane of the HDD -> is there perhaps something loose or broken?
Does this also happen if you activate HDD in another computer?
 
What did you see in the logs during such power cycle.
Which HDDs are involved (product number)?
Cable/backplane of the HDD -> is there perhaps something loose or broken?
Does this also happen if you activate HDD in another computer?
Thank you for your answer.
How can I check the logs? The drives are Red Plus 5400 RPM purchased from a store 2/3 months ago. The cables are not the issue since I tested them with other disks. The HDDs don't have any problems in other PCs.
It seems that when two disks are configured in ZFS, Proxmox reboots/reloads one of the two disks.
 
Last edited:
How can I check the logs?
You can read these with Journalctl. For example:

Code:
# Follow - live log
journalctl -f

# Reverse search
journalctl -r

# 2 days ago
journalctl --since '2 days ago'

The drives are Red Plus 5400 RPM

Please enter the exact name of the drives, preferably the smart values of both drives.

Code:
smartctl -a /dev/disk/by-id/<your disk>

# for example:
smartctl -a /dev/disk/by-id/ata-ST8000NM017B-2TJ103_XXXXXXX

The cables are not the issue since I tested them with other disks. The HDDs don't have any problems in other PCs.
Sounds good.

I think we need a bit more information too:

How many memory have the machine?
What CPU is build in?
Other important hardware...?
What is the goal, what should run on the machine?
 
You can read these with Journalctl. For example:

Code:
# Follow - live log
journalctl -f

# Reverse search
journalctl -r

# 2 days ago
journalctl --since '2 days ago'



Please enter the exact name of the drives, preferably the smart values of both drives.

Code:
smartctl -a /dev/disk/by-id/<your disk>

# for example:
smartctl -a /dev/disk/by-id/ata-ST8000NM017B-2TJ103_XXXXXXX


Sounds good.

I think we need a bit more information too:

How many memory have the machine?
What CPU is build in?
Other important hardware...?
What is the goal, what should run on the machine?


7 Days ago I had disconnected the drives because one of them was rebooting. Now, to run the tests you asked for, I have reconnected them, and they are working fine. Usually, the problem seems to occur when I turn the server off and on again, but not always.

How many memory have the machine? 32GB
What CPU is build in? Ryzen 5 2600
Other important hardware...? a nvidia gpu (maybe a gt 620) because my cpu isn't APU
What is the goal, what should run on the machine? Now i'm running as VM: PfSense, personal Minecraft Server and Windows 10 for data (where i use my zfs), as a CT: Pi-hole, Uptime Kuma, Wireguard


From this log it seems the cables...


Aug 02 14:00:28 pve kernel: ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Aug 02 14:00:28 pve kernel: ata2.00: configured for UDMA/33
Aug 02 14:02:50 pve kernel: ata2: SATA link down (SStatus 0 SControl 310)
Aug 02 14:02:55 pve kernel: ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Aug 02 14:02:55 pve kernel: ata2.00: configured for UDMA/33
Aug 02 14:05:16 pve kernel: ata2: SATA link down (SStatus 0 SControl 310)
Aug 02 14:05:22 pve kernel: ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Aug 02 14:05:22 pve kernel: ata2.00: configured for UDMA/33
Aug 02 14:05:23 pve pvestatd[1543]: status update time (6.064 seconds)
Aug 02 14:07:43 pve kernel: ata2: SATA link down (SStatus 0 SControl 310)
Aug 02 14:07:49 pve kernel: ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Aug 02 14:07:49 pve kernel: ata2.00: configured for UDMA/33
Aug 02 14:10:11 pve kernel: ata2: SATA link down (SStatus 0 SControl 310)
Aug 02 14:10:16 pve kernel: ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Aug 02 14:10:16 pve kernel: ata2.00: configured for UDMA/33
Aug 02 14:12:36 pve kernel: ata2: SATA link down (SStatus 0 SControl 310)
Aug 02 14:12:43 pve kernel: ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Aug 02 14:12:43 pve kernel: ata2.00: configured for UDMA/33
Aug 02 14:12:43 pve pvestatd[1543]: status update time (6.914 seconds)
Aug 02 14:13:23 pve systemd[1]: Starting systemd-tmpfiles-clean.service - Cleanup of Temporary Directories...
Aug 02 14:13:23 pve systemd[1]: systemd-tmpfiles-clean.service: Deactivated successfully.
Aug 02 14:13:23 pve systemd[1]: Finished systemd-tmpfiles-clean.service - Cleanup of Temporary Directories.
Aug 02 14:13:23 pve systemd[1]: run-credentials-systemd\x2dtmpfiles\x2dclean.service.mount: Deactivated successfully.
Aug 02 14:15:04 pve kernel: ata2: SATA link down (SStatus 0 SControl 310)
Aug 02 14:15:10 pve kernel: ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Aug 02 14:15:10 pve kernel: ata2.00: configured for UDMA/33
Aug 02 14:17:01 pve CRON[467458]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Aug 02 14:17:01 pve CRON[467459]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
Aug 02 14:17:01 pve CRON[467458]: pam_unix(cron:session): session closed for user root
Aug 02 14:17:21 pve smartd[1139]: Device: /dev/sda [SAT], SMART Prefailure Attribute: 3 Spin_Up_Time changed from 210 to 211
Aug 02 14:18:07 pve kernel: ata2: SATA link down (SStatus 0 SControl 310)
Aug 02 14:18:14 pve kernel: ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Aug 02 14:18:14 pve kernel: ata2.00: configured for UDMA/33
Aug 02 14:18:14 pve pvestatd[1543]: status update time (7.203 seconds)
Aug 02 14:20:15 pve kernel: ata2: SATA link down (SStatus 0 SControl 310)
Aug 02 14:20:20 pve kernel: ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Aug 02 14:20:20 pve kernel: ata2.00: configured for UDMA/33
Aug 02 14:22:41 pve kernel: ata2: SATA link down (SStatus 0 SControl 310)
Aug 02 14:22:43 pve kernel: ata2: found unknown device (class 0)
Aug 02 14:22:47 pve kernel: ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Aug 02 14:22:47 pve kernel: ata2.00: configured for UDMA/33
Aug 02 14:25:08 pve kernel: ata2: SATA link down (SStatus 0 SControl 310)
Aug 02 14:25:14 pve kernel: ata2: link is slow to respond, please be patient (ready=0)
Aug 02 14:25:14 pve kernel: ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Aug 02 14:25:14 pve kernel: ata2.00: configured for UDMA/33

25 minutes later:

Aug 02 14:52:33 pve kernel: ata2: SATA link down (SStatus 0 SControl 300)
Aug 02 14:52:33 pve kernel: ata2: hard resetting link
Aug 02 14:52:33 pve kernel: ata2: SATA link down (SStatus 0 SControl 300)
Aug 02 14:52:33 pve kernel: ata2: limiting SATA link speed to <unknown>
Aug 02 14:52:33 pve kernel: ata2: hard resetting link
Aug 02 14:52:33 pve kernel: ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 3F0)
Aug 02 14:52:33 pve kernel: ata2.00: configured for UDMA/133
Aug 02 14:52:33 pve kernel: ata2: limiting SATA link speed to 3.0 Gbps
Aug 02 14:52:33 pve kernel: ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
Aug 02 14:52:33 pve kernel: ata2.00: configured for UDMA/133
 

Attachments

  • Screenshot 2024-08-09 141630.png
    Screenshot 2024-08-09 141630.png
    37.4 KB · Views: 2
  • Screenshot 2024-08-09 143950.png
    Screenshot 2024-08-09 143950.png
    67.7 KB · Views: 3
  • Screenshot 2024-08-09 144023.png
    Screenshot 2024-08-09 144023.png
    62.4 KB · Views: 2
  • Screenshot 2024-08-09 144041.png
    Screenshot 2024-08-09 144041.png
    70.2 KB · Views: 2
  • Screenshot 2024-08-09 144049.png
    Screenshot 2024-08-09 144049.png
    63 KB · Views: 2
Thanks for your analyse and the Logs.

How many memory have the machine? 32GB

ZFS need much more memory then other filesystems. Depending on the use case, this could be tight. Please also take a look at this article in the wiki: https://pve.proxmox.com/wiki/ZFS_on_Linux#sysadmin_zfs_limit_memory_usage

From this log it seems the cables...


Aug 02 14:00:28 pve kernel: ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Aug 02 14:00:28 pve kernel: ata2.00: configured for UDMA/33
Aug 02 14:02:50 pve kernel: ata2: SATA link down (SStatus 0 SControl 310)
Aug 02 14:02:55 pve kernel: ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Aug 02 14:02:55 pve kernel: ata2.00: configured for UDMA/33

Yes, it looks very much like a hardware problem.

The cables are not the issue since I tested them with other disks. The HDDs don't have any problems in other PCs.

One idea: Maybe this HDD with now used cable is a bit loose.
The HDDs are also of very hight quality and the values are also okay.
I can only recommend using a different SATA cable. And see if there is a change after.

For example some with clip?

Screenshot_20240809_152824.png
 
F
Thanks for your analyse and the Logs.



ZFS need much more memory then other filesystems. Depending on the use case, this could be tight. Please also take a look at this article in the wiki: https://pve.proxmox.com/wiki/ZFS_on_Linux#sysadmin_zfs_limit_memory_usage



Yes, it looks very much like a hardware problem.



One idea: Maybe this HDD with now used cable is a bit loose.
The HDDs are also of very hight quality and the values are also okay.
I can only recommend using a different SATA cable. And see if there is a change after.

For example some with clip?

View attachment 72724
For you the problem are the sata data cable or power??
I created the conf.zfs file and inserted the code: 'options zfs zfs_arc_max=8589934592 '.
I will try to do further checks regarding the cables, which indeed seem to be the cause of the problem. I thank you immensely for your opinion on my situation. May I ask if you know how to reduce the wearout of the SSD where I have the VMs? It's growing so fast hahaha.
 
May I ask if you know how to reduce the wearout of the SSD where I have the VMs? It's growing so fast hahaha.
Sounds like consumer SSD's, what type do you have exactly?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!