Zfs reboot or reload disk

Mattarello · Aug 2, 2024

Hi, i have a problem with my disk (hdd) in a zfs configuration.
I have one disk m.2 for proxmox, one ssd for vm and 2 4TB hdd with zfs.

The problem I think is proxmox that reboot one of the hdd every few seconds and now it has 5023 power cycle.
Is it correct? Beacuse when the disk reboot is loud and it is very annoying.

Can anyone help me or give me just an advice?

P.S.
I've installed proxmox 2 times, the first time the other hdd has reach the 12k cycle count.
Now It stopped and the second disk started to reboot.

fireon · Aug 7, 2024

What did you see in the logs during such power cycle.
Which HDDs are involved (product number)?
Cable/backplane of the HDD -> is there perhaps something loose or broken?
Does this also happen if you activate HDD in another computer?

Mattarello · Aug 8, 2024

fireon said:
What did you see in the logs during such power cycle.
Which HDDs are involved (product number)?
Cable/backplane of the HDD -> is there perhaps something loose or broken?
Does this also happen if you activate HDD in another computer?

Thank you for your answer.
How can I check the logs? The drives are Red Plus 5400 RPM purchased from a store 2/3 months ago. The cables are not the issue since I tested them with other disks. The HDDs don't have any problems in other PCs.
It seems that when two disks are configured in ZFS, Proxmox reboots/reloads one of the two disks.

fireon · Aug 9, 2024

How can I check the logs?

You can read these with Journalctl. For example:

Code:

# Follow - live log
journalctl -f

# Reverse search
journalctl -r

# 2 days ago
journalctl --since '2 days ago'

The drives are Red Plus 5400 RPM

Please enter the exact name of the drives, preferably the smart values of both drives.

Code:

smartctl -a /dev/disk/by-id/<your disk>

# for example:
smartctl -a /dev/disk/by-id/ata-ST8000NM017B-2TJ103_XXXXXXX

The cables are not the issue since I tested them with other disks. The HDDs don't have any problems in other PCs.

Sounds good.

I think we need a bit more information too:

How many memory have the machine?
What CPU is build in?
Other important hardware...?
What is the goal, what should run on the machine?

Mattarello · Aug 9, 2024

fireon said:
You can read these with Journalctl. For example:

Code:

# Follow - live log journalctl -f # Reverse search journalctl -r # 2 days ago journalctl --since '2 days ago'

Please enter the exact name of the drives, preferably the smart values of both drives.

Code:

smartctl -a /dev/disk/by-id/<your disk> # for example: smartctl -a /dev/disk/by-id/ata-ST8000NM017B-2TJ103_XXXXXXX

Sounds good.

I think we need a bit more information too:

How many memory have the machine?
What CPU is build in?
Other important hardware...?
What is the goal, what should run on the machine?

7 Days ago I had disconnected the drives because one of them was rebooting. Now, to run the tests you asked for, I have reconnected them, and they are working fine. Usually, the problem seems to occur when I turn the server off and on again, but not always.

How many memory have the machine? 32GB
What CPU is build in? Ryzen 5 2600
Other important hardware...? a nvidia gpu (maybe a gt 620) because my cpu isn't APU
What is the goal, what should run on the machine? Now i'm running as VM: PfSense, personal Minecraft Server and Windows 10 for data (where i use my zfs), as a CT: Pi-hole, Uptime Kuma, Wireguard

From this log it seems the cables...

Aug 02 14:00:28 pve kernel: ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Aug 02 14:00:28 pve kernel: ata2.00: configured for UDMA/33
Aug 02 14:02:50 pve kernel: ata2: SATA link down (SStatus 0 SControl 310)
Aug 02 14:02:55 pve kernel: ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Aug 02 14:02:55 pve kernel: ata2.00: configured for UDMA/33
Aug 02 14:05:16 pve kernel: ata2: SATA link down (SStatus 0 SControl 310)
Aug 02 14:05:22 pve kernel: ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Aug 02 14:05:22 pve kernel: ata2.00: configured for UDMA/33
Aug 02 14:05:23 pve pvestatd[1543]: status update time (6.064 seconds)
Aug 02 14:07:43 pve kernel: ata2: SATA link down (SStatus 0 SControl 310)
Aug 02 14:07:49 pve kernel: ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Aug 02 14:07:49 pve kernel: ata2.00: configured for UDMA/33
Aug 02 14:10:11 pve kernel: ata2: SATA link down (SStatus 0 SControl 310)
Aug 02 14:10:16 pve kernel: ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Aug 02 14:10:16 pve kernel: ata2.00: configured for UDMA/33
Aug 02 14:12:36 pve kernel: ata2: SATA link down (SStatus 0 SControl 310)
Aug 02 14:12:43 pve kernel: ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Aug 02 14:12:43 pve kernel: ata2.00: configured for UDMA/33
Aug 02 14:12:43 pve pvestatd[1543]: status update time (6.914 seconds)
Aug 02 14:13:23 pve systemd[1]: Starting systemd-tmpfiles-clean.service - Cleanup of Temporary Directories...
Aug 02 14:13:23 pve systemd[1]: systemd-tmpfiles-clean.service: Deactivated successfully.
Aug 02 14:13:23 pve systemd[1]: Finished systemd-tmpfiles-clean.service - Cleanup of Temporary Directories.
Aug 02 14:13:23 pve systemd[1]: run-credentials-systemd\x2dtmpfiles\x2dclean.service.mount: Deactivated successfully.
Aug 02 14:15:04 pve kernel: ata2: SATA link down (SStatus 0 SControl 310)
Aug 02 14:15:10 pve kernel: ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Aug 02 14:15:10 pve kernel: ata2.00: configured for UDMA/33
Aug 02 14:17:01 pve CRON[467458]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Aug 02 14:17:01 pve CRON[467459]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
Aug 02 14:17:01 pve CRON[467458]: pam_unix(cron:session): session closed for user root
Aug 02 14:17:21 pve smartd[1139]: Device: /dev/sda [SAT], SMART Prefailure Attribute: 3 Spin_Up_Time changed from 210 to 211
Aug 02 14:18:07 pve kernel: ata2: SATA link down (SStatus 0 SControl 310)
Aug 02 14:18:14 pve kernel: ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Aug 02 14:18:14 pve kernel: ata2.00: configured for UDMA/33
Aug 02 14:18:14 pve pvestatd[1543]: status update time (7.203 seconds)
Aug 02 14:20:15 pve kernel: ata2: SATA link down (SStatus 0 SControl 310)
Aug 02 14:20:20 pve kernel: ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Aug 02 14:20:20 pve kernel: ata2.00: configured for UDMA/33
Aug 02 14:22:41 pve kernel: ata2: SATA link down (SStatus 0 SControl 310)
Aug 02 14:22:43 pve kernel: ata2: found unknown device (class 0)
Aug 02 14:22:47 pve kernel: ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Aug 02 14:22:47 pve kernel: ata2.00: configured for UDMA/33
Aug 02 14:25:08 pve kernel: ata2: SATA link down (SStatus 0 SControl 310)
Aug 02 14:25:14 pve kernel: ata2: link is slow to respond, please be patient (ready=0)
Aug 02 14:25:14 pve kernel: ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Aug 02 14:25:14 pve kernel: ata2.00: configured for UDMA/33

25 minutes later:

Aug 02 14:52:33 pve kernel: ata2: SATA link down (SStatus 0 SControl 300)
Aug 02 14:52:33 pve kernel: ata2: hard resetting link
Aug 02 14:52:33 pve kernel: ata2: SATA link down (SStatus 0 SControl 300)
Aug 02 14:52:33 pve kernel: ata2: limiting SATA link speed to <unknown>
Aug 02 14:52:33 pve kernel: ata2: hard resetting link
Aug 02 14:52:33 pve kernel: ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 3F0)
Aug 02 14:52:33 pve kernel: ata2.00: configured for UDMA/133
Aug 02 14:52:33 pve kernel: ata2: limiting SATA link speed to 3.0 Gbps
Aug 02 14:52:33 pve kernel: ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
Aug 02 14:52:33 pve kernel: ata2.00: configured for UDMA/133

fireon · Aug 9, 2024

Thanks for your analyse and the Logs.

How many memory have the machine? 32GB

ZFS need much more memory then other filesystems. Depending on the use case, this could be tight. Please also take a look at this article in the wiki: https://pve.proxmox.com/wiki/ZFS_on_Linux#sysadmin_zfs_limit_memory_usage

From this log it seems the cables...

Aug 02 14:00:28 pve kernel: ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Aug 02 14:00:28 pve kernel: ata2.00: configured for UDMA/33
Aug 02 14:02:50 pve kernel: ata2: SATA link down (SStatus 0 SControl 310)
Aug 02 14:02:55 pve kernel: ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Aug 02 14:02:55 pve kernel: ata2.00: configured for UDMA/33

Yes, it looks very much like a hardware problem.

The cables are not the issue since I tested them with other disks. The HDDs don't have any problems in other PCs.

One idea: Maybe this HDD with now used cable is a bit loose.
The HDDs are also of very hight quality and the values are also okay.
I can only recommend using a different SATA cable. And see if there is a change after.

For example some with clip?

Mattarello · Aug 9, 2024

F

fireon said:
Thanks for your analyse and the Logs.

ZFS need much more memory then other filesystems. Depending on the use case, this could be tight. Please also take a look at this article in the wiki: https://pve.proxmox.com/wiki/ZFS_on_Linux#sysadmin_zfs_limit_memory_usage

Yes, it looks very much like a hardware problem.

One idea: Maybe this HDD with now used cable is a bit loose.
The HDDs are also of very hight quality and the values are also okay.
I can only recommend using a different SATA cable. And see if there is a change after.

For example some with clip?

View attachment 72724

For you the problem are the sata data cable or power??
I created the conf.zfs file and inserted the code: 'options zfs zfs_arc_max=8589934592 '.
I will try to do further checks regarding the cables, which indeed seem to be the cause of the problem. I thank you immensely for your opinion on my situation. May I ask if you know how to reduce the wearout of the SSD where I have the VMs? It's growing so fast hahaha.

fireon · Aug 12, 2024

May I ask if you know how to reduce the wearout of the SSD where I have the VMs? It's growing so fast hahaha.

Sounds like consumer SSD's, what type do you have exactly?

Mattarello · Aug 12, 2024

fireon said:
Sounds like consumer SSD's, what type do you have exactly?

yes is a samsung 860 evo. is this the problem?

fireon · Aug 13, 2024

Mattarello said:
yes is a samsung 860 evo. is this the problem?

Yes. It is. https://pve.proxmox.com/pve-docs/pve-admin-guide.html#install_recommended_requirements

There are a lot of write processes (logs, etc.) Consumer SSDs are not designed for this. For this reason, always use enterprise SSDs (for example Samsung PM series).

Search

Search

Zfs reboot or reload disk

Mattarello

New Member

Attachments

fireon

Distinguished Member

Mattarello

New Member

fireon

Distinguished Member

Mattarello

New Member

Attachments

fireon

Distinguished Member

Mattarello

New Member

fireon

Distinguished Member

Mattarello

New Member

fireon

Distinguished Member

We value your privacy