Hello,
I recently set up a PVE node with a Z90M-ITX motherboard, the motherboard has 2 m.2 slots which I have populated with two WD_BLACK SN850X. This is my "vmpool" meant to be used as fast VM storage.
In addition I have a mirror pool for the OS with two Samsung SSD 870 which works just fine, in addition to a RAIDZ 4x18TB pool using a LSI 9300-8i in IT mode, also working flawlessly.
My problem is that every damn reboot PVE sends me an e-mail notifying me about the vmpool beeing in a dregraded state.
And it changes between /dev/nvme1n1p1 and /dev/nvme0n1p1.
In addtion pve reports errors for the first VM that has auto-boot on start enabled:
So it looks like the machine thinks one of the nvme m.2 disks are unavailable, and the zpool is unavailable meaning pve can't start the VMs that should be starting. It's worth noting that the second VM to start always starts. So it resolves itself after X seconds (though the first VM that should have started, never will start).
When I ssh into the machine (instantly) after reboot zpool status vmpool always shows a good state.
Does anyone have any ideas on how I could debug this further?
I recently set up a PVE node with a Z90M-ITX motherboard, the motherboard has 2 m.2 slots which I have populated with two WD_BLACK SN850X. This is my "vmpool" meant to be used as fast VM storage.
Code:
pool: vmpool
state: ONLINE
scan: resilvered 1.08G in 00:00:06 with 0 errors on Sat Jul 6 13:32:02 2024
config:
NAME STATE READ WRITE CKSUM
vmpool ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
nvme-eui.e8238fa6bf530001001b448b47968231 ONLINE 0 0 0
nvme-eui.e8238fa6bf530001001b448b479682ed ONLINE 0 0 0
errors: No known data errors
In addition I have a mirror pool for the OS with two Samsung SSD 870 which works just fine, in addition to a RAIDZ 4x18TB pool using a LSI 9300-8i in IT mode, also working flawlessly.
My problem is that every damn reboot PVE sends me an e-mail notifying me about the vmpool beeing in a dregraded state.
Code:
ZFS has detected that a device was removed.
impact: Fault tolerance of the pool may be compromised.
eid: 6
class: statechange
state: UNAVAIL
host: pve
time: 2024-07-04 13:37:21+0200
vpath: /dev/nvme1n1p1
vphys: pci-0000:03:00.0-nvme-1
vguid: 0xE6A25AD4A10495B4
devid: nvme-WD_BLACK_SN850X_1000GB_24040E803557-part1
pool: vmpool (0x6C84CDE73BEB0949)
And it changes between /dev/nvme1n1p1 and /dev/nvme0n1p1.
In addtion pve reports errors for the first VM that has auto-boot on start enabled:
zfs error: cannot open 'vmpool': no such pool
So it looks like the machine thinks one of the nvme m.2 disks are unavailable, and the zpool is unavailable meaning pve can't start the VMs that should be starting. It's worth noting that the second VM to start always starts. So it resolves itself after X seconds (though the first VM that should have started, never will start).
When I ssh into the machine (instantly) after reboot zpool status vmpool always shows a good state.
Does anyone have any ideas on how I could debug this further?
Last edited: