Server not booting: Failed to start Flush Journal to persistent storage.

Riesling.Dry

Renowned Member
Jul 17, 2014
85
7
73
Yesterday I replaced one of 4 HDs in an ZFS RAID.
zpool status said it was resilvering.
This morning the box hung, on reboot, after GRUB and "Loading RAM Disk", displaying error: [FAILED] Failed to start Flush Journal to persistent storage.

Cursors is blinking (on remote console), but (apparently) not taking any input (will verify as soon as I have physical access to the box) seems to be frozen.

Can the "new" HD be the cause? The RAID should be working fine w. only 3 disks, considering it did, while formatting and adding the "new" one to the RAID...


What does this error mean?
What can I do?
How can I get the server back up?

Many thanks in advance for your ideas and proposals,
~R.
 
Last edited:
Is that ZFS RAID your Proxmox rpool? If so, it sounds like there was another disk/memory error during the resilvering and the pool is broken. Maybe try booting the Proxmox installer (in rescue mode) and check the status of the pool?
 
yes, the rpool. Will try your proposal, and get back. Thanks.
o.k., that didn't work :(
I get:
Code:
error: no such device: rpool.
ERROR: unable to find boot disk automatically.
Press any key to continue...
...and then it reboots.
No console, no command line...
Also tried booting without the new disk: same error as in initial post above.

What else could I try?
 
Find a bootable ISO that supports ZFS; maybe the latest Ubuntu?. The Proxmox installer should be able to give you a console but I don't know how to do that. Maybe boot it in debug mode?
 
  • Like
Reactions: Riesling.Dry
T
Find a bootable ISO that supports ZFS; maybe the latest Ubuntu?.
Yes, that went well, thank you! :)

I can see the rpool and mount it and seems (!) o.k.:

root@ubuntu:/home/ubuntu# zpool import -f -N rpool root@ubuntu:/home/ubuntu# zpool status pool: rpool state: DEGRADED status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scan: resilver in progress since Tue Jan 3 13:32:49 2023 5.08T scanned at 1.01G/s, 5.05T issued at 33.4M/s, 12.4T total 1.20T resilvered, 40.72% done, 2 days 16:03:58 to go config: NAME STATE READ WRITE CKSUM rpool DEGRADED 0 0 0 raidz1-0 DEGRADED 0 0 0 ata-WDC_WD4000F9YZ-09N20L1_WD-WCC5DFCJUC56-part3 ONLINE 0 0 0 ata-WDC_WD4000F9YZ-09N20L1_WD-WMC5D0D7XNZY ONLINE 0 0 0 ata-WDC_WD4000F9YZ-09N20L1_WD-WMC5D0D8M58S-part3 ONLINE 0 0 0 replacing-3 DEGRADED 0 0 0 ata-WDC_WD4000F9YZ-09N20L1_WD-WCC132360088-part3 OFFLINE 0 0 0 ata-WDC_WD4002FYYZ-01B7CB1_K7GBPPVL ONLINE 0 0 0 errors: No known data errors

What is peculiar is that zpool says, there is 176G free while ZFS only has 18.3M available? :oops:

root@ubuntu:/home/ubuntu# zpool get all rpool NAME PROPERTY VALUE SOURCE rpool size 14.5T - rpool capacity 98% - rpool altroot - default rpool health DEGRADED - rpool guid 5179849667437971765 - rpool version - default rpool bootfs rpool/ROOT/pve-1 local rpool delegation on default rpool autoreplace off default rpool cachefile - default rpool failmode wait default rpool listsnapshots off default rpool autoexpand off default rpool dedupratio 1.00x - rpool free 176G - rpool allocated 14.4T - rpool readonly off - rpool ashift 12 local rpool comment - default rpool expandsize - - rpool freeing 0 - rpool fragmentation 58% - rpool leaked 0 - rpool multihost off default rpool checkpoint - - rpool load_guid 5367141173402314463 - rpool autotrim off default rpool compatibility off default rpool feature@async_destroy enabled local rpool feature@empty_bpobj active local rpool feature@lz4_compress active local rpool feature@multi_vdev_crash_dump enabled local rpool feature@spacemap_histogram active local rpool feature@enabled_txg active local rpool feature@hole_birth active local rpool feature@extensible_dataset active local rpool feature@embedded_data active local rpool feature@bookmarks enabled local rpool feature@filesystem_limits enabled local rpool feature@large_blocks enabled local rpool feature@large_dnode enabled local rpool feature@sha512 enabled local rpool feature@skein enabled local rpool feature@edonr enabled local rpool feature@userobj_accounting active local rpool feature@encryption enabled local rpool feature@project_quota active local rpool feature@device_removal enabled local rpool feature@obsolete_counts enabled local rpool feature@zpool_checkpoint enabled local rpool feature@spacemap_v2 active local rpool feature@allocation_classes enabled local rpool feature@resilver_defer active local rpool feature@bookmark_v2 enabled local rpool feature@redaction_bookmarks enabled local rpool feature@redacted_datasets enabled local rpool feature@bookmark_written enabled local rpool feature@log_spacemap active local rpool feature@livelist enabled local rpool feature@device_rebuild enabled local rpool feature@zstd_compress enabled local rpool feature@draid disabled local root@ubuntu:/home/ubuntu# zfs get all rpool NAME PROPERTY VALUE SOURCE rpool type filesystem - rpool creation Mi Sep 29 12:20 2021 - rpool used 10.4T - rpool available 18.3M - rpool referenced 55.4G - rpool compressratio 1.02x - rpool mounted no - rpool quota none default rpool reservation none default rpool recordsize 128K default rpool mountpoint /rpool default rpool sharenfs off default rpool checksum on default rpool compression on local rpool atime off local rpool devices on default rpool exec on default rpool setuid on default rpool readonly off default rpool zoned off default rpool snapdir hidden default rpool aclmode discard default rpool aclinherit restricted default rpool createtxg 1 - rpool canmount on default rpool xattr on default rpool copies 1 default rpool version 5 - rpool utf8only off - rpool normalization none - rpool casesensitivity sensitive - rpool vscan off default rpool nbmand off default rpool sharesmb off default rpool refquota none default rpool refreservation none default rpool guid 17456325780076367687 - rpool primarycache all default rpool secondarycache all default rpool usedbysnapshots 0B - rpool usedbydataset 55.4G - rpool usedbychildren 10.4T - rpool usedbyrefreservation 0B - rpool logbias latency default rpool objsetid 54 - rpool dedup off default rpool mlslabel none default rpool sync standard local rpool dnodesize legacy default rpool refcompressratio 1.00x - rpool written 55.4G - rpool logicalused 10.7T - rpool logicalreferenced 55.4G - rpool volmode default default rpool filesystem_limit none default rpool snapshot_limit none default rpool filesystem_count none default rpool snapshot_count none default rpool snapdev hidden default rpool acltype off default rpool context none default rpool fscontext none default rpool defcontext none default rpool rootcontext none default rpool relatime off default rpool redundant_metadata all default rpool overlay on default rpool encryption off default rpool keylocation none default rpool keyformat none default rpool pbkdf2iters 0 default rpool special_small_blocks 0 default

Now what is the next step to get back to normal boot?
Erase some stuff like some ol' CT that is not needed any more to make space?
Wait for reslivering to finish?
Is there anything unusual in the output above?

And how do you "Flush Journal to persistent storage"?

Cheers,
~R.
 
Last edited:
Also keep in mind that a ZFS pool shouldn't be filled more than 80% or it can't operate optimally. It will become slow and fragment faster, which is bad, as you can't defrag a ZFS pool (your Pool is already heavily fragmented: "rpool fragmentation 58%").
So yes, I would delete stuff until you are under 80%.
 
  • Like
Reactions: Riesling.Dry
So yes, I would delete stuff until you are under 80%.
Thanks for the hint. I deleted some TB, will now wait till tomorrow and reboot after resilvering is finished.
Fingers crossed the box will boot then.
Will get back :°)
 
resilvering finished w. errors.
221 read error on the "new" disk ---> "insufficient replicas"
server does not boot...

Code:
ubuntu@ubuntu:/rpool/data$ zpool status
  pool: rpool
 state: DEGRADED
status: Some supported and requested features are not enabled on the pool.
    The pool can still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
    the pool may no longer be accessible by software that does not support
    the features. See zpool-features(7) for details.
  scan: resilvered 54.5G in 10:19:11 with 0 errors on Sun Jan  8 06:20:58 2023
config:

    NAME                                                    STATE     READ WRITE CKSUM
    rpool                                                   DEGRADED     0     0     0
      raidz1-0                                              DEGRADED     0     0     0
        ata-WDC_WD4000F9YZ-09N20L1_WD-WCC5DFCJUC56-part3    ONLINE       0     0     0
        ata-WDC_WD4000F9YZ-09N20L1_WD-WMC5D0D7XNZY          ONLINE       0     0     0
        ata-WDC_WD4000F9YZ-09N20L1_WD-WMC5D0D8M58S-part3    ONLINE       0     0     0
        replacing-3                                         UNAVAIL    211     0     0  insufficient replicas
          ata-WDC_WD4000F9YZ-09N20L1_WD-WCC132360088-part3  OFFLINE      0     0     0
          ata-WDC_WD4002FYYZ-01B7CB1_K7GBPPVL               FAULTED    221     0     0  too many errors

errors: No known data errors
ubuntu@ubuntu:/rpool/data$

Any ideas?
Next time I am at the box, I will do a zpool status -v which will hopefully reveal more detail.
 
"insufficient replicas" usually indicates a second error on a piece of data which cannot be fixed because you only redundancy against one drive failure.
On the other hand there are "no known data errors", so I'm not use if you lost any data (yet). Maybe the new drive is also having issues? Maybe someone more experienced can tell for sure.
 
  • Like
Reactions: Riesling.Dry
o.k. I'm giving up.
I pulled out the "new" HD, inserted the ol' one back in.
Server booted fine, apparently (!) runs fine.
I stopped ZFS replace, removed the "new" HD from rpool and re-added the ol' one.
Things seem (!) to work like before, yet I will re-fresh the box from scratch w. a clean, new install (and another really NEW HD#4 ;°)

Thanks to all who helped w. ideas and proposals!

</CLOSE> :)
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!