I am running Proxmox 5.2 on a single non-clustered server. This is an HP Z820 with an LSI SAS 3008 controller flashed to IT mode. I am running ZFS across the board. The LSI controller is NOT the controller built onto the motherboard, but a new controller I installed into the server.
My boot devices and primary (rpool) pool are on two Samsung 860 Pro 512GB SSD drives connected to the LSI SAS Ports 0 & 1. I have another pool called spinners with 2 x 8TB HGST Helium drives on SAS ports 2 & 3 and two more Samsung 850 EVO 256GB SSD on SAS Ports 4 & 5 that operate as L2ARC and ZIL for my spinners pool.
The box is a Dual Xeon with 256GB of ECC RAM of which 8GB has been dedicated to ZFS primary ARC.
I have been running this way for over a year with no problems. The box is lightly loaded with a combination of 12 VMs/CTs, windows and linux.
About two weeks ago ZED started alerting me to pool degradation on my rpool with the following error:
I started doing some research and tried reseating the cables, reseating the drive, etc. I would then clear the errors from the pool and it seems the problem would go away for a few days then crop back up. I ran a long smart test of the SSD drive per Samsung's request and it showed 13 CRC errors. Samsung said to replace the drive.
So last night I pulled the drive (all hot swap so this was done live), installed the new drive ran sgdisk and replaced the device in the pool. I thought all was done. This morning at 0130 I received the exact same error on the new drive, so now I know it is not the drive, but I suspect a bad cable/port.
I have a lot of experience with FreeNAS and I know that with FreeNAS it does not matter what port a drive ends up on, it simply will not matter. In fact, one of the things I show people is that I can shut down my FreeNAS server and swap all the drives around between locations and reboot and everything comes up just fine.
What I don't know is if Proxmox with ZFS operates the same exact way. If I shut down the system, pull the drive, put it on another port and reboot, will the system simply see the drive, know it was part of my pool and be OK with it?
I am using device-id for my pools:
So I guess my questions is this: Can I simply shut down the system, swap the drive to a new port and reboot and expect ZFS to identify the drive correctly regardless of port, or should I approach this in a different manner?
My boot devices and primary (rpool) pool are on two Samsung 860 Pro 512GB SSD drives connected to the LSI SAS Ports 0 & 1. I have another pool called spinners with 2 x 8TB HGST Helium drives on SAS ports 2 & 3 and two more Samsung 850 EVO 256GB SSD on SAS Ports 4 & 5 that operate as L2ARC and ZIL for my spinners pool.
The box is a Dual Xeon with 256GB of ECC RAM of which 8GB has been dedicated to ZFS primary ARC.
I have been running this way for over a year with no problems. The box is lightly loaded with a combination of 12 VMs/CTs, windows and linux.
About two weeks ago ZED started alerting me to pool degradation on my rpool with the following error:
Code:
The number of I/O errors associated with a ZFS device exceeded
acceptable levels. ZFS has marked the device as faulted.
I started doing some research and tried reseating the cables, reseating the drive, etc. I would then clear the errors from the pool and it seems the problem would go away for a few days then crop back up. I ran a long smart test of the SSD drive per Samsung's request and it showed 13 CRC errors. Samsung said to replace the drive.
So last night I pulled the drive (all hot swap so this was done live), installed the new drive ran sgdisk and replaced the device in the pool. I thought all was done. This morning at 0130 I received the exact same error on the new drive, so now I know it is not the drive, but I suspect a bad cable/port.
I have a lot of experience with FreeNAS and I know that with FreeNAS it does not matter what port a drive ends up on, it simply will not matter. In fact, one of the things I show people is that I can shut down my FreeNAS server and swap all the drives around between locations and reboot and everything comes up just fine.
What I don't know is if Proxmox with ZFS operates the same exact way. If I shut down the system, pull the drive, put it on another port and reboot, will the system simply see the drive, know it was part of my pool and be OK with it?
I am using device-id for my pools:
Code:
root@proxmox:~# zpool status
pool: rpool
state: ONLINE
scan: resilvered 1.38G in 0h0m with 0 errors on Wed Jul 18 07:27:42 2018
config:
NAME STATE READ WRITE CKSUM
rpool ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
wwn-0x5002538e401c3fcf-part2 ONLINE 0 0 0
wwn-0x5002538d41fb6695-part2 ONLINE 0 0 0
So I guess my questions is this: Can I simply shut down the system, swap the drive to a new port and reboot and expect ZFS to identify the drive correctly regardless of port, or should I approach this in a different manner?