Update to 8.2.2: zpool disappears after each reboot

PeterZ

Active Member
Jul 18, 2019
38
1
28
54
Hi

I just updated my system from 8.1.11 to 8.2.2 and whereas since v5.4, I had 'no problems at all', now 1 of my zpools has an issue.
At first the zpool disappeared and after recreating it, no data (only 393KB) is showing.

I don't understand what is going on and since I have done this process only once, long time ago, I'm puzzled.
The disk is /dev/sda so I ran
Code:
zpool create -f myzpool /dev/sda

1714005557357.png
 
Last edited:
Sorry, but you just screwed yourself. You basically reformatted the zpool and told ZFS to -f Force it.

Hope you have backups.

If you don't understand what is going on with sysadmin-level commands, ask for advice first before you lose data.
 
  • Like
Reactions: PeterZ and Dunuin
Hi @Kingneutron, thanks for answering even if I don't like the answer...
If you don't understand what is going on with sysadmin-level commands, ask for advice first before you lose data
You are completely right about that.
I got home and noticed that there were a lot of updates so I thought of running them when the system was less used.

TBH: I'm not that familiar with ZFS and I remembered that command when I first created the zpool.
So recreating a zpool reformats a disk?

1) What should I have done?
2) There is no way to turn this back?
3) Do you have an idea what might have happened that it disappeared in the first place?
 
> So recreating a zpool reformats a disk?

Basically, yes - you pretty much told it the equivalent of " FORMAT C: /Q /U "

1) What should I have done? ' zpool import -d /dev/disk/by-id ' # and see what comes back
If you can successfully import the pool but still don't see your data, ' zfs mount -a '

2) There is no way to turn this back? Maybe if you take the drive to a data-recovery service right away, but be prepared to spend $400 and up (probably over $1000) with no guarantee of positive results. This is why we keep drumming the 3-2-1 backup march.

3) Do you have an idea what might have happened that it disappeared in the first place? Nope. Too late to tell, sorry


If you don't have backups, you'll need to re-download / re-implement whatever environment and dataset/directory structure that was on that zpool.

Then you start planning a solid, regular, automated backup regimen to separate media (preferably a NAS that's not on the same system, but a separate hard drive will do if you can [preferably] back THAT up to something else in case of disaster.)

Yes, it sucks, and I'm not trying to be unkind here. I just have seen a LOT of these stories where someone had no backup, made an admin-level mistake, or had a power outage, or had Teh Craziness Happen, and now they have nothing to restore from. Most people don't learn the lesson unless they've either had personal data loss, or seen it happen to someone they know.

The best you can do is postulate for the worst case going forward, and vow to never let it happen again. (This includes understanding which commands are destructive - like dd, cat and rm. And it's almost never a good idea to Force something on zfs, unless you know exactly what you're doing and are positive that overriding default behavior is what you want.)

Document a DR plan, invest in backup equipment, test your restores, and rehearse a DR scenario - like restoring your main environment into a VM. Then when something does happen, you should be able to recover fairly quickly by following your established procedures.

Sincerely tho, Best of luck with your DR.
 
  • Like
Reactions: PeterZ and Dunuin
3) Do you have an idea what might have happened that it disappeared in the first place?
Hard to tell without the pool existing any longer.

2) There is no way to turn this back?
Only with data recovery. Either done by professionals or done yourself using software after doing a block-level backup of the disk. Both won't be cheap.
Cheapest option would be to restore your daily/hourly backups you hopefully got.

1) What should I have done?
Diagnosting the problem by looking at the logs, using commands like testdisk, zpool import, snartctl, lsblk and so on.

TBH: I'm not that familiar with ZFS
I always tell people here to learn about the storage that is used because this otherwise might sooner or later cause data-loss. And to have proper backups and a well tested disaster recovery plan. And the famous words "Raid is not a backup" and "Snapshots are not backups". As well as not to store the backups on the same disk as the VMs/LXCs.
This is a very good example why...
So I hope you got recent backups and learn from it for the future so this won't happen again.
 
Last edited:
Guys, it happened again!
I restored from backup, everything was OK but now, after installing updates on Proxmox & rebooting, that zpool is gone...
The disk is listed and the following command:
Code:
zpool import -d /dev/disk/by-id/ata-WDC_WD30EFRX.....
returns:
Code:
no pools available to import

SMART results:
Passed, Wareout N/A

1715441542607.png
 
Last edited:
Okay, that works!! :cool:
Any idea why this happens because I have no clue...

EDIT: rebooting my server makes the zpool disappear again
 
Last edited:
IDK, check systemctl services and see if something failed...

Typically I disable the systemctl stuff and just do that import from /etc/rc.local, try putting that import line in there and reboot
 
Sorry, that is chinese to me..
What I discovered:
Code:
systemctl status zfs-import.service
○ zfs-import.service
     Loaded: masked (Reason: Unit zfs-import.service is masked.)
     Active: inactive (dead)

EDIT: Found this advice
Code:
zpool set cachefile=/etc/zfs/zpool.cache <zfs pool name>
update-initramfs -k all -u
reboot
Didn't help
 
Last edited:
Thank you for your ongoing help!
I have read up on this masked service to get a better understanding.

Code:
systemctl enable --now zfs-import.service
Failed to enable unit: Unit file /lib/systemd/system/zfs-import.service is masked.
Code:
systemctl unmask zfs-import.service
Code:
systemctl enable --now zfs-import.service
Failed to enable unit: Unit file /lib/systemd/system/zfs-import.service is masked.

Code:
systemctl list-units --all zfs*
  UNIT                      LOAD   ACTIVE   SUB     DESCRIPTION
  zfs-import-cache.service  loaded active   exited  Import ZFS pools by cache file
  zfs-import-scan.service   loaded inactive dead    Import ZFS pools by device scanning
  zfs-import@zpool2.service loaded active   exited  Import ZFS pool zpool2
  zfs-mount.service         loaded active   exited  Mount ZFS filesystems
  zfs-share.service         loaded active   exited  ZFS file system shares
  zfs-volume-wait.service   loaded active   exited  Wait for ZFS Volume (zvol) links in /dev
  zfs-zed.service           loaded active   running ZFS Event Daemon (zed)
  zfs-import.target         loaded active   active  ZFS pool import target
  zfs-volumes.target        loaded active   active  ZFS volumes are ready
  zfs.target                loaded active   active  ZFS startup target

Is it normal that there is no zfs-import.service and only a zfs-import@zpool2.service ?
I do have 3 zfs pools though:
-rpool
-myzpool
-zpool2

Code:
journalctl -b0 | grep -i zfs -C 2
May 11 21:09:23 srv kernel: Linux version 6.8.4-3-pve (build@proxmox) (gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40) #1 SMP PREEMPT_DYNAMIC PMX 6.8.4-3 (2024-05-02T11:55Z) ()
May 11 21:09:23 srv kernel: Command line: BOOT_IMAGE=/vmlinuz-6.8.4-3-pve root=ZFS=rpool/ROOT/pve-1 ro root=ZFS=rpool/ROOT/pve-1 boot=zfs quiet
May 11 21:09:23 srv kernel: KERNEL supported cpus:
May 11 21:09:23 srv kernel:   Intel GenuineIntel
--
May 11 21:09:23 srv kernel: pcpu-alloc: s229376 r8192 d114688 u524288 alloc=1*2097152
May 11 21:09:23 srv kernel: pcpu-alloc: [0] 0 1 2 3
May 11 21:09:23 srv kernel: Kernel command line: BOOT_IMAGE=/vmlinuz-6.8.4-3-pve root=ZFS=rpool/ROOT/pve-1 ro root=ZFS=rpool/ROOT/pve-1 boot=zfs quiet
May 11 21:09:23 srv kernel: Unknown kernel command line parameters "BOOT_IMAGE=/vmlinuz-6.8.4-3-pve boot=zfs", will be passed to user space.
May 11 21:09:23 srv kernel: random: crng init done
May 11 21:09:23 srv kernel: Dentry cache hash table entries: 4194304 (order: 13, 33554432 bytes, linear)
--
May 11 21:09:23 srv kernel:     TERM=linux
May 11 21:09:23 srv kernel:     BOOT_IMAGE=/vmlinuz-6.8.4-3-pve
May 11 21:09:23 srv kernel:     boot=zfs
May 11 21:09:23 srv kernel: wmi_bus wmi_bus-PNP0C14:00: WQBC data block query control method not found
May 11 21:09:23 srv kernel: i801_smbus 0000:00:1f.4: SPD Write Disable is set
--
May 11 21:09:23 srv kernel: Btrfs loaded, zoned=yes, fsverity=yes
May 11 21:09:23 srv kernel: spl: loading out-of-tree module taints kernel.
May 11 21:09:23 srv kernel: zfs: module license 'CDDL' taints kernel.
May 11 21:09:23 srv kernel: Disabling lock debugging due to kernel taint
May 11 21:09:23 srv kernel: zfs: module license taints kernel.
May 11 21:09:23 srv kernel: ZFS: Loaded module v2.2.3-pve2, ZFS pool version 5000, ZFS filesystem version 5
May 11 21:09:23 srv kernel:  zd0: p1 p2 p3 p4 < p5 >
May 11 21:09:23 srv kernel:  zd32: p1 p2 p3 p4 p5 p6 p7 p8
--
May 11 21:09:23 srv systemd[1]: Created slice system-modprobe.slice - Slice /system/modprobe.
May 11 21:09:23 srv systemd[1]: Created slice system-postfix.slice - Slice /system/postfix.
May 11 21:09:23 srv systemd[1]: Created slice system-zfs\x2dimport.slice - Slice /system/zfs-import.
May 11 21:09:23 srv systemd[1]: Created slice user.slice - User and Session Slice.
May 11 21:09:23 srv systemd[1]: Started systemd-ask-password-console.path - Dispatch Password Requests to Console Directory Watch.
--
May 11 21:09:23 srv systemd[1]: Starting ifupdown2-pre.service - Helper to synchronize boot up for ifupdown...
May 11 21:09:23 srv systemd[1]: Starting systemd-udev-settle.service - Wait for udev To Complete Device Initialization...
May 11 21:09:23 srv udevadm[519]: systemd-udev-settle.service is deprecated. Please fix zfs-import-scan.service, zfs-import-cache.service not to pull it in.
May 11 21:09:23 srv kernel: intel_pmc_core INT33A1:00:  initialized
May 11 21:09:23 srv kernel: EDAC ie31200: No ECC support
--
May 11 21:09:24 srv systemd[1]: Finished ifupdown2-pre.service - Helper to synchronize boot up for ifupdown.
May 11 21:09:24 srv systemd[1]: Finished systemd-udev-settle.service - Wait for udev To Complete Device Initialization.
May 11 21:09:24 srv systemd[1]: Starting zfs-import@zpool2.service - Import ZFS pool zpool2...
May 11 21:09:24 srv systemd[1]: Finished zfs-import@zpool2.service - Import ZFS pool zpool2.
May 11 21:09:24 srv systemd[1]: Starting zfs-import-cache.service - Import ZFS pools by cache file...
May 11 21:09:24 srv systemd[1]: zfs-import-scan.service - Import ZFS pools by device scanning was skipped because of an unmet condition check (ConditionFileNotEmpty=!/etc/zfs/zpool.cache).
May 11 21:09:24 srv zpool[786]: no pools available to import
May 11 21:09:24 srv systemd[1]: Finished zfs-import-cache.service - Import ZFS pools by cache file.
May 11 21:09:24 srv systemd[1]: Reached target zfs-import.target - ZFS pool import target.
May 11 21:09:24 srv systemd[1]: Starting zfs-mount.service - Mount ZFS filesystems...
May 11 21:09:24 srv systemd[1]: Starting zfs-volume-wait.service - Wait for ZFS Volume (zvol) links in /dev...
May 11 21:09:24 srv zvol_wait[788]: Testing 4 zvol links
May 11 21:09:24 srv zvol_wait[788]: All zvol links are now present.
May 11 21:09:24 srv systemd[1]: Finished zfs-volume-wait.service - Wait for ZFS Volume (zvol) links in /dev.
May 11 21:09:24 srv systemd[1]: Reached target zfs-volumes.target - ZFS volumes are ready.
May 11 21:09:24 srv systemd[1]: Finished zfs-mount.service - Mount ZFS filesystems.
May 11 21:09:24 srv systemd[1]: Reached target local-fs.target - Local File Systems.
May 11 21:09:24 srv systemd[1]: Starting apparmor.service - Load AppArmor profiles...
--
May 11 21:09:25 srv dbus-daemon[904]: [system] AppArmor D-Bus mediation is enabled
May 11 21:09:25 srv systemd[1]: Starting systemd-logind.service - User Login Management...
May 11 21:09:25 srv systemd[1]: Starting zfs-share.service - ZFS file system shares...
May 11 21:09:25 srv systemd[1]: Started zfs-zed.service - ZFS Event Daemon (zed).
May 11 21:09:25 srv systemd[1]: Started dbus.service - D-Bus System Message Bus.
May 11 21:09:25 srv systemd[1]: e2scrub_reap.service: Deactivated successfully.
--
May 11 21:09:25 srv kernel: RPC: Registered tcp NFSv4.1 backchannel transport module.
May 11 21:09:25 srv systemd[1]: Started ksmtuned.service - Kernel Samepage Merging (KSM) Tuning Daemon.
May 11 21:09:25 srv zed[920]: ZFS Event Daemon 2.2.3-pve2 (PID 920)
May 11 21:09:25 srv zed[920]: Processing events since eid=0
May 11 21:09:25 srv rsyslogd[911]: imuxsock: Acquired UNIX socket '/run/systemd/journal/syslog' (fd 3) from systemd.  [v8.2302.0]
--
May 11 21:09:25 srv zed[967]: eid=10 class=config_sync pool='zpool2'
May 11 21:09:25 srv zed[968]: eid=8 class=pool_import pool='zpool2'
May 11 21:09:25 srv systemd[1]: Finished zfs-share.service - ZFS file system shares.
May 11 21:09:25 srv systemd[1]: Reached target zfs.target - ZFS startup target.
May 11 21:09:25 srv zed[966]: eid=7 class=config_sync pool='zpool2'
May 11 21:09:25 srv systemd-logind[918]: New seat seat0.
--
May 12 00:23:31 srv systemd[1]: Reloading.
May 12 00:24:01 srv CRON[110793]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
May 12 00:24:01 srv CRON[110794]: (root) CMD (if [ $(date +%w) -eq 0 ] && [ -x /usr/lib/zfs-linux/scrub ]; then /usr/lib/zfs-linux/scrub; fi)
May 12 00:24:01 srv zed[110810]: eid=16 class=scrub_start pool='myzpool'
May 12 00:24:17 srv zed[110921]: eid=18 class=scrub_start pool='rpool'

I also understand now what you meant by doing that import in /etc/rc.local
I'm just curious why this happened, and because I'm more of 'the root cause has to be dealt with', I will wait a little to see if I get more insight in this (or other users who jump in) and keep your suggestion if nothing else works.
 
Last edited:
  • Like
Reactions: Kingneutron
Just encountered the same issue after updating to 8.2.2. I was able to import the pool back and all the data is fine, but it also disappears again when I reboot the system. My zfs-import.service unit file is also masked and stays masked even after systemctl unmask zfs-import.service. Puzzling how only one of the many pools I have wont start automatically. The only difference is the one that wont start is 2 SSD's on a PCIe card that are in a stripe. The others are single disk pools that are in hot-swap bays connected directly to SATA. Gonna do some more digging into the update to see what happened before I update my other host's.
 
Ended up correcting my issue. I had a couple directories on that particular pool I setup as shared storage that had question marks by them in the gui even after I imported the pool. I deleted the two directories from /mnt/pve on the disk and removed them from the Datacenter - Storage area in the gui and re-created them again in the gui. My problem pool now starts up normal now. I'm guessing something went wonky with the storage configs after the update.
 
  • Like
Reactions: Kingneutron
Hi @apc103, thanks for chiming in, it's almost impossible that I'm the only one with this problem and so far, except for some good help, this topic didn't seem to get enough attention.
Your solution made me look at my storage and I see that the pool I'm referring to here is not showing in the webUI but is mounted ATM.

This is what I have in 'Datacenter' -> 'Storage':
1715978317292.png

On my server 'Disks' -> 'ZFS':
1715980398839.png

TBH, I don't remember how I created this zpool2 but the one disappearing (myzpool) is created through the CLI.
Could this explain my problem?
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!