Weird storage issues, very slow and non-responding

Graxo

New Member
Jun 23, 2024
10
0
1
Hi,

I installed a new server this weekend.
In the "old" server there was a zpool named 'Storage' that contains 5 HDD's. I exported this zpool and moved all the disks to the new server.
Also i installed 5 SSD's for another zpool i wanted to use.
On the new server i ran a zpool import and the pool was usable for a day. Then i got some emails from the monitoring of zfs that a drive has failed and the zpool was degraded. This is when more issues happend, i thought a reboot would solve it so i rebooted my server and then it didnt came back up. I checked the server with a display and it hangs after the ' pve-root, clean messages' . I think this is when the zfs-import services start.
After a while i get a message that the SSD zpool has failed to import. Then nothing happens (i think i waited for like 12-13 hours, 1 night)

Every 'zpool' command i use just doesnt ran normally, i get no output, just a blinking cursor.
I checked if all the disks are in read by the server with `lsblk` and that seems fine. Then i checked all the S.M.A.R.T data and that also is marked as PASSED.
Booting Proxmox normally now just doesnt do it, i have to boot in recovery and disable the zfs-scan and cache services. Then it boots " normally" but the GUI is very broken, when going to the Disks tab it ends with `communication failure (0)`.

Im kinda lost what i can do now. Is there anyone who is willing to help me?
If there is more logging or data needed, let me know.
Logs files with the smart data and the journalctl are attachted
All the disks are connected through a pci expansion card (https://www.amazon.nl/dp/B0CL1W8P1K)
Code:
root@pve-storage:~# lsblk -o+FSTYPE,MODEL,TRAN
NAME                         MAJ:MIN RM   SIZE RO TYPE MOUNTPOINTS FSTYPE      MODEL                     TRAN
sda                            8:0    0  14.6T  0 disk                         ST16000NM001G-2KK103      sata
├─sda1                         8:1    0  14.6T  0 part             zfs_member
└─sda9                         8:9    0     8M  0 part
sdb                            8:16   0  14.6T  0 disk                         ST16000NM001G-2KK103      sata
├─sdb1                         8:17   0  14.6T  0 part             zfs_member
└─sdb9                         8:25   0     8M  0 part
sdc                            8:32   0  14.6T  0 disk                         ST16000NM001G-2KK103      sata
├─sdc1                         8:33   0  14.6T  0 part             zfs_member
└─sdc9                         8:41   0     8M  0 part
sdd                            8:48   0  14.6T  0 disk                         ST16000NM001G-2KK103      sata
├─sdd1                         8:49   0  14.6T  0 part             zfs_member
└─sdd9                         8:57   0     8M  0 part
sde                            8:64   0 465.8G  0 disk                         Samsung SSD 860 EVO 500GB sata
├─sde1                         8:65   0 465.8G  0 part             zfs_member
└─sde9                         8:73   0     8M  0 part
sdf                            8:80   0 465.8G  0 disk                         Samsung SSD 860 EVO 500GB sata
├─sdf1                         8:81   0 465.8G  0 part             zfs_member
└─sdf9                         8:89   0     8M  0 part
sdg                            8:96   0 465.8G  0 disk                         Samsung SSD 860 EVO 500GB sata
├─sdg1                         8:97   0 465.8G  0 part             zfs_member
└─sdg9                         8:105  0     8M  0 part
sdh                            8:112  0 465.8G  0 disk                         Samsung SSD 860 EVO 500GB sata
├─sdh1                         8:113  0 465.8G  0 part             zfs_member
└─sdh9                         8:121  0     8M  0 part
sdi                            8:128  0 465.8G  0 disk                         Samsung SSD 860 EVO 500GB sata
├─sdi1                         8:129  0 465.8G  0 part             zfs_member
└─sdi9                         8:137  0     8M  0 part
sdj                            8:144  0  14.6T  0 disk                         ST16000NM001G-2KK103      sata
├─sdj1                         8:145  0  14.6T  0 part             zfs_member
└─sdj9                         8:153  0     8M  0 part
nvme0n1                      259:0    0 465.8G  0 disk                         INTENSO SSD               nvme
├─nvme0n1p1                  259:1    0  1007K  0 part                                                   nvme
├─nvme0n1p2                  259:2    0     1G  0 part /boot/efi   vfat                                  nvme
└─nvme0n1p3                  259:3    0 464.8G  0 part             LVM2_member                           nvme
  ├─pve-swap                 252:0    0     8G  0 lvm  [SWAP]      swap
  ├─pve-root                 252:1    0    96G  0 lvm  /           ext4
  ├─pve-data_tmeta           252:2    0   3.4G  0 lvm
  │ └─pve-data-tpool         252:4    0 337.9G  0 lvm
  │   ├─pve-data             252:5    0 337.9G  1 lvm
  │   └─pve-vm--102--disk--0 252:6    0    32G  0 lvm
  └─pve-data_tdata           252:3    0 337.9G  0 lvm
    └─pve-data-tpool         252:4    0 337.9G  0 lvm
      ├─pve-data             252:5    0 337.9G  1 lvm
      └─pve-vm--102--disk--0 252:6    0    32G  0 lvm
Code:
root@pve-storage:~# lspci
00:00.0 Host bridge: Intel Corporation Device 461c
00:02.0 VGA compatible controller: Intel Corporation Alder Lake-N [UHD Graphics]
00:0d.0 USB controller: Intel Corporation Alder Lake-N Thunderbolt 4 USB Controller
00:14.0 USB controller: Intel Corporation Alder Lake-N PCH USB 3.2 xHCI Host Controller
00:14.2 RAM memory: Intel Corporation Alder Lake-N PCH Shared SRAM
00:16.0 Communication controller: Intel Corporation Alder Lake-N PCH HECI Controller
00:17.0 SATA controller: Intel Corporation Alder Lake-N SATA AHCI Controller
00:1c.0 PCI bridge: Intel Corporation Device 54b8
00:1c.2 PCI bridge: Intel Corporation Device 54ba
00:1c.6 PCI bridge: Intel Corporation Device 54be
00:1d.0 PCI bridge: Intel Corporation Alder Lake-N PCI Express Root Port
00:1f.0 ISA bridge: Intel Corporation Alder Lake-N PCH eSPI Controller
00:1f.3 Audio device: Intel Corporation Alder Lake-N PCH High Definition Audio Controller
00:1f.4 SMBus: Intel Corporation Alder Lake-N SMBus
00:1f.5 Serial bus controller: Intel Corporation Alder Lake-N SPI (flash) Controller
01:00.0 SATA controller: ASMedia Technology Inc. ASM1166 Serial ATA Controller (rev 02)
02:00.0 PCI bridge: ASMedia Technology Inc. ASM1182e 2-Port PCIe x1 Gen2 Packet Switch
03:03.0 PCI bridge: ASMedia Technology Inc. ASM1182e 2-Port PCIe x1 Gen2 Packet Switch
03:07.0 PCI bridge: ASMedia Technology Inc. ASM1182e 2-Port PCIe x1 Gen2 Packet Switch
04:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8125 2.5GbE Controller (rev 05)
05:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8125 2.5GbE Controller (rev 05)
06:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 15)
07:00.0 Non-Volatile memory controller: Silicon Motion, Inc. SM2263EN/SM2263XT SSD Controller (rev 03)

Update:
I made a TrueNas VM and passthrough the pci expansion card to it, i was able to import 1 of the 2 zpools here.
For the one that isnt working i keep getting `kernel: WARNING: Pool 'Storage' has encountered an uncorrectable I/O failure and has been suspended.` and then the whole system hangs and a reboot is required.
 

Attachments

  • journalctl-21-oct.txt
    913 KB · Views: 1
  • smart_data-21-oct.txt
    54.3 KB · Views: 0
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!