After upgrading my 3 proxmox hosts from PVE 7 to PVE 8, I have issues with ZFS during boot that leads to approx 4-8 email notifications that the proxmox host sends out:
It happens on all 3 proxmox hosts exactly from the time after the upgrade to pve 8, so it cannot be hardware related but is probably some configuration issue.
The relevant system logs that lead to this email are:
The issue seems to be only regarding the ZIL SLOG partitions of the vmpool. It is always about partitions "nvme0n1p1" and "nvme1n1p1" which are used in ZFS as "mirror-1 [log]".
After the boot, the vmpool is working normally and I see read and write operations even on the ZIL SLOG.
And "zpool status" shows the pool as Online without errors:
The number of these last log lines (with the "vdev_state=UNAVAIL") that result in the emails is different between reboots (approx 4-10).
Any ideas how to get rid of these errors?
Code:
ZFS has detected that a device was removed.
impact: Fault tolerance of the pool may be compromised.
eid: 9
class: statechange
state: UNAVAIL
...
It happens on all 3 proxmox hosts exactly from the time after the upgrade to pve 8, so it cannot be hardware related but is probably some configuration issue.
The relevant system logs that lead to this email are:
Code:
29.631480+0200 systemd[1]: Starting zfs-import-cache.service - Import ZFS pools by cache file...
29.631522+0200 systemd[1]: zfs-import-scan.service - Import ZFS pools by device scanning was skipped because of an unmet condition check (ConditionFileNotEmpty=!/etc/zfs/zpool.cache>
33.138671+0200 zpool[1434]: cannot import 'vmpool': one or more devices is currently unavailable
33.142203+0200 zpool[1434]: The devices below are missing or corrupted, use '-m' to import the pool anyway:
33.142203+0200 zpool[1434]: mirror-1 [log]
33.142203+0200 zpool[1434]: nvme0n1p1
33.142203+0200 zpool[1434]: nvme1n1p1
33.142203+0200 zpool[1434]: cachefile import failed, retrying
33.143130+0200 systemd[1]: Finished zfs-import-cache.service - Import ZFS pools by cache file.
33.143214+0200 systemd[1]: Reached target zfs-import.target - ZFS pool import target.
33.160434+0200 systemd[1]: Starting zfs-mount.service - Mount ZFS filesystems...
33.161613+0200 systemd[1]: Starting zfs-volume-wait.service - Wait for ZFS Volume (zvol) links in /dev...
33.297713+0200 zvol_wait[2026]: Testing 19 zvol links
33.365155+0200 zvol_wait[2026]: All zvol links are now present.
33.365530+0200 systemd[1]: Finished zfs-volume-wait.service - Wait for ZFS Volume (zvol) links in /dev.
33.365608+0200 systemd[1]: Reached target zfs-volumes.target - ZFS volumes are ready.
33.415172+0200 systemd[1]: Finished zfs-mount.service - Mount ZFS filesystems.
33.415230+0200 systemd[1]: Reached target local-fs.target - Local File Systems.
...
34.887409+0200 zed[2140]: ZFS Event Daemon 2.2.3-pve2 (PID 2140)
34.888256+0200 zed[2140]: Processing events since eid=0
35.051202+0200 zed[2221]: eid=7 class=statechange pool='vmpool' vdev=nvme1n1p1 vdev_state=UNAVAIL
35.051203+0200 zed[2222]: eid=2 class=config_sync pool='rpool'
35.051203+0200 zed[2220]: eid=5 class=config_sync pool='rpool'
35.051221+0200 zed[2224]: eid=3 class=pool_import pool='rpool'
35.051222+0200 zed[2223]: eid=6 class=statechange pool='vmpool' vdev=nvme0n1p1 vdev_state=UNAVAIL
35.095549+0200 zed[2241]: eid=8 class=vdev.no_replicas pool='vmpool'
35.095915+0200 zed[2243]: eid=9 class=statechange pool='vmpool' vdev=nvme0n1p1 vdev_state=UNAVAIL
35.097410+0200 zed[2254]: eid=10 class=statechange pool='vmpool' vdev=nvme1n1p1 vdev_state=UNAVAIL
35.099247+0200 zed[2268]: eid=12 class=zpool pool='vmpool'
35.099441+0200 zed[2266]: eid=11 class=vdev.no_replicas pool='vmpool'
35.099910+0200 zed[2275]: eid=13 class=statechange pool='vmpool' vdev=nvme1n1p1 vdev_state=UNAVAIL
The issue seems to be only regarding the ZIL SLOG partitions of the vmpool. It is always about partitions "nvme0n1p1" and "nvme1n1p1" which are used in ZFS as "mirror-1 [log]".
After the boot, the vmpool is working normally and I see read and write operations even on the ZIL SLOG.
And "zpool status" shows the pool as Online without errors:
Code:
pool: vmpool
state: ONLINE
scan: scrub repaired 0B in 00:00:18 with 0 errors on Sun Apr 14 00:25:15 2024
config:
NAME STATE READ WRITE CKSUM
vmpool ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
sda5 ONLINE 0 0 0
sdb5 ONLINE 0 0 0
logs
mirror-1 ONLINE 0 0 0
nvme0n1p1 ONLINE 0 0 0
nvme1n1p1 ONLINE 0 0 0
errors: No known data errors
The number of these last log lines (with the "vdev_state=UNAVAIL") that result in the emails is different between reboots (approx 4-10).
Any ideas how to get rid of these errors?
Last edited: