ZFS fails to mount on boot following upgrade

seevee

Active Member
Oct 2, 2017
2
0
41
32
I have Proxmox 5.0-32 installed on a Dell R710 with a single RAIDZ1 array implemented across six 2TB disks. This array is shared between VMs, containers, and the Proxmox installation itself. I understand this configuration isn't ideal, but it is stable and performant enough for my needs - assuming that I can power cycle it like any other machine.

Any time I reboot after issuing an "apt-get dist-upgrade", some obscure part of the ZFS pool apparently fails to mount on reboot. This is unlike the other related issues I have seen here, as the pool is successfully imported, but only these strange subdirectories are not mounted. Below is an example display output after being dropped into BusyBox:

Code:
Command: mount -o zfsutil -t zfs rpool/ROOT/pve-1/07b3e44b4ec33fc185144a325971911c45547f051a1f95a39314afdab787862f /root//07b3e44b4ec33fc185144a325971911c45547f051a1f95a39314afdab787862f
Message: filesystem 'rpool/ROOT/pve-1/07b3e44b4ec33fc185144a325971911c45547f051a1f95a39314afdab787862f' cannot be mounted at '/root//07b3e44b4ec33fc185144a325971911c45547f051a1f95a39314afdab787862f' due to canonicalization error 2.
mount: mounting rpool/ROOT/pve-1/07b3e44b4ec33fc185144a325971911c45547f051a1f95a39314afdab787862f on /root//07b3e44b4ec33fc185144a325971911c45547f051a1f95a39314afdab787862f failed: No such file or directory
Error: 2

Failed to mount rpool/ROOT/pve-1/07b3e44b4ec33fc185144a325971911c45547f051a1f95a39314afdab787862f on /root//07b3e44b4ec33fc185144a325971911c45547f051a1f95a39314afdab787862f.
Manually mount the filesystem and exit.

BusyBox v1.22.1 (Debian 1:1.22.0-19+b3) built-in shell (ash)
Enter 'help' for a list of built-in commands.

/bin/sh: can't access tty: job control turned off
/ #

If I attempt to manually run the command, I get the exact same output:
Code:
filesystem 'rpool/ROOT/pve-1/07b3e44b4ec33fc185144a325971911c45547f051a1f95a39314afdab787862f' cannot be mounted at '/root//07b3e44b4ec33fc185144a325971911c45547f051a1f95a39314afdab787862f' due to canonicalization error 2.
mount: mounting rpool/ROOT/pve-1/07b3e44b4ec33fc185144a325971911c45547f051a1f95a39314afdab787862f on /root//07b3e44b4ec33fc185144a325971911c45547f051a1f95a39314afdab787862f failed: No such file or directory

I get no output whatsoever If I run "mkdir /root//<string-goes-here>" before running mount, but I do get an additional and useless filesystem mounted within root.

Output of "zpool status":
Code:
/ # zpool status
  pool: rpool
 state: ONLINE
  scan: none requested
config:

        NAME        STATE      READ WRITE CKSUM
        rpool       ONLINE        0     0     0
          raidz1-0  ONLINE        0     0     0
            sda2    ONLINE        0     0     0
            sdb2    ONLINE        0     0     0
            sdc2    ONLINE        0     0     0
            sdd2    ONLINE        0     0     0
            sde2    ONLINE        0     0     0
            sdf2    ONLINE        0     0     0

errors: No known data errors

If I issue the "exit" command from here, it will display the exact same message as above, but with the random string changed to reflect the next entry from "zfs list". If I issue the exit command 42 times in a row, the machine will (finally) fully and properly boot. This behavior persists through subsequent reboots, so I currently have to type exit[enter] 42 times following each reboot before Proxmox finally boots up, regardless of whether I run a dist-upgrade or not.

I've found some resources but I can't post them here because reasons.

I have attempted to implement all of these, but have had no success whatsoever. I can successfully modify system files within /root, and I can access utilities like "update-grub" and "update-initramfs" after running the exit procedure and getting out of BusyBox, but the problem persists. "zpool import rpool" just errors out and shows that a pool with that name already exists, as indicated by the output of "zpool status".

The simplest solution I have found is to freshly reinstall Proxmox and recreate the ZFS pool. I have reinstalled to address this issue twice, and am now stuck at BusyBox a third time after narrowing down the issue as much as possible. I am new to ZFS and Proxmox and am definitely out of my depth.

Is there something I should try next or any more information I can provide?
 
who is creating those datasets? they are not part of anything PVE does..
 
hi,

same problem here. after reboot my proxmox server did not start again so i have to look at the monitor. its the same error as above. after entering around 40 times "exit" it bootet normally.

this is not very nice, because i cannot got to the server every time if i want to reboot it for an update...

do you have an suggestion how to solve this?

thanks
 
Hi, same problem here, did as st0ne and the server started but now I get the following errors:
qm start 100
ipcc_send_rec[1] failed: Connection refused
ipcc_send_rec[2] failed: Connection refused
ipcc_send_rec[3] failed: Connection refused
Unable to load access control list: Connection refused


pve-manager is not started, can anybody help?
 
Just a note: seeing the same as the OP. Tried both:
proxmox-ve_5.1-722cc488-1.iso
and
proxmox-ve_5.1-3.iso

Both have this issue.

Tried the @st0ne "spam exit at prompt" method and it actually worked I believe under 20 attempts.

rpool is on a 2x Intel DC S3610 480GB ZFS mirror on the Intel SATA controller of a C612 chipset. The server is a Supermicro 2U Ultra server with 2x E5-2698 V4's.

This same system I do have to add a rootdelay=10.
 
Adding a bit more to this, it almost seems like there are a bunch of rpool ROOT snapshots being created and those are what is messing up the mounting. Perhaps each exit is for one of these?
Code:
root@fmt-pve-10:~# zfs list -t snapshot
NAME                                                                                               USED  AVAIL  REFER  MOUNTPOINT
rpool/ROOT/pve-1/014579f53cfb31770b23418d4e9f6c62a44052ffa4d08386752d085ff5f58eec@219481237          8K      -   439M  -
rpool/ROOT/pve-1/0bbb0c12579736d6363a87311f0d092805cd97bb4baedc8ffb5f64e8fb9fe83d@435607413          8K      -   439M  -
rpool/ROOT/pve-1/14e89baf9818049fef21fa96340b86e554b7be728a0b85ccf7455482e84ff8ad@152400782          8K      -  59.3M  -
rpool/ROOT/pve-1/1db950ce97024660669be7d68f89f7325920e51953aa585a77847cd79d7f2dc6@324224017          8K      -   441M  -
rpool/ROOT/pve-1/353197f9a7babaaccea669440d4c131b381f002c49b923d4934090cbafa8b3bf@924237964          8K      -  59.3M  -
rpool/ROOT/pve-1/51fa3a1f6638ad8a12c4718549017ae72391e32b84cb163479b071dbab741946-init@205453598     0B      -   441M  -
rpool/ROOT/pve-1/5b728d4662831898b05610db35b74cc2040d707f259a7c4abe627ce40749d342@740478434          8K      -  90.2M  -
rpool/ROOT/pve-1/5de5062b7f279f9db04437d13e1a2c6eae6714a40e62a3643fa3fb14705082a4@584052378          8K      -  59.3M  -
rpool/ROOT/pve-1/5f4232890affdb03f4098f5d7c14728ec51a647dce2f02621aa8ef6bd3ae0d81@158039809          8K      -   440M  -
rpool/ROOT/pve-1/6698d16215658053848de14679fdeda2ee3053fc0f6ea0b8c6f5c6d43568a993@672362295          8K      -  64.1M  -
rpool/ROOT/pve-1/7f14abd3f29b84f5d090055237fa5bea8088798594ec05232f2bf7c8da758b26@65366072           8K      -   439M  -
rpool/ROOT/pve-1/95977d88d3173716aa400640f4561f0cf596dff00ad4ee1c0aad470fa473815b@109495860          0B      -   441M  -
rpool/ROOT/pve-1/95977d88d3173716aa400640f4561f0cf596dff00ad4ee1c0aad470fa473815b@647135761          0B      -   441M  -
rpool/ROOT/pve-1/9e9165fd3413b7cc21b40377b0479df40b9bb1eb96455c2e74a619a0490eab8c@487929142          8K      -   441M  -
rpool/ROOT/pve-1/a22e53e9dd3051af4e6d03256ce5176f73a0faefca361871fd21a1185bd294c9-init@749503870     0B      -   441M  -
rpool/ROOT/pve-1/b50b679292ed86575b43310aa1530f48b8bea7aafb64fa2f2f061006b767696d@421214420          8K      -  64.1M  -
rpool/ROOT/pve-1/ec4eb1aea4e29a8dd79fe0823649bf04fff1e479543e61815ab987966da81a4e@404108242          8K      -  59.3M  -

I removed as many as I could but these say dataset is busy when I try removing.
 
  • Like
Reactions: seevee
Adding a bit more to this, it almost seems like there are a bunch of rpool ROOT snapshots being created and those are what is messing up the mounting. Perhaps each exit is for one of these?
Code:
root@fmt-pve-10:~# zfs list -t snapshot
NAME                                                                                               USED  AVAIL  REFER  MOUNTPOINT
rpool/ROOT/pve-1/014579f53cfb31770b23418d4e9f6c62a44052ffa4d08386752d085ff5f58eec@219481237          8K      -   439M  -
rpool/ROOT/pve-1/0bbb0c12579736d6363a87311f0d092805cd97bb4baedc8ffb5f64e8fb9fe83d@435607413          8K      -   439M  -
rpool/ROOT/pve-1/14e89baf9818049fef21fa96340b86e554b7be728a0b85ccf7455482e84ff8ad@152400782          8K      -  59.3M  -
rpool/ROOT/pve-1/1db950ce97024660669be7d68f89f7325920e51953aa585a77847cd79d7f2dc6@324224017          8K      -   441M  -
rpool/ROOT/pve-1/353197f9a7babaaccea669440d4c131b381f002c49b923d4934090cbafa8b3bf@924237964          8K      -  59.3M  -
rpool/ROOT/pve-1/51fa3a1f6638ad8a12c4718549017ae72391e32b84cb163479b071dbab741946-init@205453598     0B      -   441M  -
rpool/ROOT/pve-1/5b728d4662831898b05610db35b74cc2040d707f259a7c4abe627ce40749d342@740478434          8K      -  90.2M  -
rpool/ROOT/pve-1/5de5062b7f279f9db04437d13e1a2c6eae6714a40e62a3643fa3fb14705082a4@584052378          8K      -  59.3M  -
rpool/ROOT/pve-1/5f4232890affdb03f4098f5d7c14728ec51a647dce2f02621aa8ef6bd3ae0d81@158039809          8K      -   440M  -
rpool/ROOT/pve-1/6698d16215658053848de14679fdeda2ee3053fc0f6ea0b8c6f5c6d43568a993@672362295          8K      -  64.1M  -
rpool/ROOT/pve-1/7f14abd3f29b84f5d090055237fa5bea8088798594ec05232f2bf7c8da758b26@65366072           8K      -   439M  -
rpool/ROOT/pve-1/95977d88d3173716aa400640f4561f0cf596dff00ad4ee1c0aad470fa473815b@109495860          0B      -   441M  -
rpool/ROOT/pve-1/95977d88d3173716aa400640f4561f0cf596dff00ad4ee1c0aad470fa473815b@647135761          0B      -   441M  -
rpool/ROOT/pve-1/9e9165fd3413b7cc21b40377b0479df40b9bb1eb96455c2e74a619a0490eab8c@487929142          8K      -   441M  -
rpool/ROOT/pve-1/a22e53e9dd3051af4e6d03256ce5176f73a0faefca361871fd21a1185bd294c9-init@749503870     0B      -   441M  -
rpool/ROOT/pve-1/b50b679292ed86575b43310aa1530f48b8bea7aafb64fa2f2f061006b767696d@421214420          8K      -  64.1M  -
rpool/ROOT/pve-1/ec4eb1aea4e29a8dd79fe0823649bf04fff1e479543e61815ab987966da81a4e@404108242          8K      -  59.3M  -

I removed as many as I could but these say dataset is busy when I try removing.


Any chance you installed Docker prior to this problem happening? I did ... and running into the problem.
 
These are all filesystems IN rpool/ROOT/pve-1

Code:
zfs set canmount=off rpool/ROOT/pve-1/014579f53cfb31770b23418d4e9f6c62a44052ffa4d08386752d085ff5f58eec
zfs set canmount=off rpool/ROOT/pve-1/0bbb0c12579736d6363a87311f0d092805cd97bb4baedc8ffb5f64e8fb9fe83d
zfs set canmount=off rpool/ROOT/pve-1/14e89baf9818049fef21fa96340b86e554b7be728a0b85ccf7455482e84ff8ad
zfs set canmount=off rpool/ROOT/pve-1/1db950ce97024660669be7d68f89f7325920e51953aa585a77847cd79d7f2dc6
zfs set canmount=off rpool/ROOT/pve-1/353197f9a7babaaccea669440d4c131b381f002c49b923d4934090cbafa8b3bf
zfs set canmount=off rpool/ROOT/pve-1/51fa3a1f6638ad8a12c4718549017ae72391e32b84cb163479b071dbab741946-init
zfs set canmount=off rpool/ROOT/pve-1/5b728d4662831898b05610db35b74cc2040d707f259a7c4abe627ce40749d342
zfs set canmount=off rpool/ROOT/pve-1/5de5062b7f279f9db04437d13e1a2c6eae6714a40e62a3643fa3fb14705082a4
zfs set canmount=off rpool/ROOT/pve-1/5f4232890affdb03f4098f5d7c14728ec51a647dce2f02621aa8ef6bd3ae0d81
zfs set canmount=off rpool/ROOT/pve-1/6698d16215658053848de14679fdeda2ee3053fc0f6ea0b8c6f5c6d43568a993
zfs set canmount=off rpool/ROOT/pve-1/7f14abd3f29b84f5d090055237fa5bea8088798594ec05232f2bf7c8da758b26
zfs set canmount=off rpool/ROOT/pve-1/95977d88d3173716aa400640f4561f0cf596dff00ad4ee1c0aad470fa473815b
zfs set canmount=off rpool/ROOT/pve-1/9e9165fd3413b7cc21b40377b0479df40b9bb1eb96455c2e74a619a0490eab8c
zfs set canmount=off rpool/ROOT/pve-1/a22e53e9dd3051af4e6d03256ce5176f73a0faefca361871fd21a1185bd294c9-init
zfs set canmount=off rpool/ROOT/pve-1/b50b679292ed86575b43310aa1530f48b8bea7aafb64fa2f2f061006b767696d
zfs set canmount=off rpool/ROOT/pve-1/ec4eb1aea4e29a8dd79fe0823649bf04fff1e479543e61815ab987966da81a4e

will let you boot cleanly, but consider putting your dockers in there own filesystem, like say rpool/docker
 
  • Like
Reactions: pjkenned
I never wanted to put my Docker stuff anywhere, I just installed it to test something, and then removed it - still was quite the surprise :)
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!