ZFS pool disk unavailable on every reboot

Blenda

New Member
Jul 7, 2024
1
0
1
Hello,

I recently set up a PVE node with a Z90M-ITX motherboard, the motherboard has 2 m.2 slots which I have populated with two WD_BLACK SN850X. This is my "vmpool" meant to be used as fast VM storage.

Code:
pool: vmpool
state: ONLINE
scan: resilvered 1.08G in 00:00:06 with 0 errors on Sat Jul  6 13:32:02 2024
config:
        NAME                                           STATE     READ WRITE CKSUM
        vmpool                                         ONLINE       0     0     0
          mirror-0                                     ONLINE       0     0     0
            nvme-eui.e8238fa6bf530001001b448b47968231  ONLINE       0     0     0
            nvme-eui.e8238fa6bf530001001b448b479682ed  ONLINE       0     0     0

errors: No known data errors

In addition I have a mirror pool for the OS with two Samsung SSD 870 which works just fine, in addition to a RAIDZ 4x18TB pool using a LSI 9300-8i in IT mode, also working flawlessly.

My problem is that every damn reboot PVE sends me an e-mail notifying me about the vmpool beeing in a dregraded state.

Code:
ZFS has detected that a device was removed.

 impact: Fault tolerance of the pool may be compromised.
    eid: 6
  class: statechange
  state: UNAVAIL
   host: pve
   time: 2024-07-04 13:37:21+0200
  vpath: /dev/nvme1n1p1
  vphys: pci-0000:03:00.0-nvme-1
  vguid: 0xE6A25AD4A10495B4
  devid: nvme-WD_BLACK_SN850X_1000GB_24040E803557-part1
   pool: vmpool (0x6C84CDE73BEB0949)

And it changes between /dev/nvme1n1p1 and /dev/nvme0n1p1.

In addtion pve reports errors for the first VM that has auto-boot on start enabled:

zfs error: cannot open 'vmpool': no such pool

So it looks like the machine thinks one of the nvme m.2 disks are unavailable, and the zpool is unavailable meaning pve can't start the VMs that should be starting. It's worth noting that the second VM to start always starts. So it resolves itself after X seconds (though the first VM that should have started, never will start).

When I ssh into the machine (instantly) after reboot zpool status vmpool always shows a good state.

Does anyone have any ideas on how I could debug this further?
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!