ZFS fails to mount on boot following upgrade

seevee

Renowned Member
Oct 2, 2017
2
0
66
33
I have Proxmox 5.0-32 installed on a Dell R710 with a single RAIDZ1 array implemented across six 2TB disks. This array is shared between VMs, containers, and the Proxmox installation itself. I understand this configuration isn't ideal, but it is stable and performant enough for my needs - assuming that I can power cycle it like any other machine.

Any time I reboot after issuing an "apt-get dist-upgrade", some obscure part of the ZFS pool apparently fails to mount on reboot. This is unlike the other related issues I have seen here, as the pool is successfully imported, but only these strange subdirectories are not mounted. Below is an example display output after being dropped into BusyBox:

Code:
Command: mount -o zfsutil -t zfs rpool/ROOT/pve-1/07b3e44b4ec33fc185144a325971911c45547f051a1f95a39314afdab787862f /root//07b3e44b4ec33fc185144a325971911c45547f051a1f95a39314afdab787862f
Message: filesystem 'rpool/ROOT/pve-1/07b3e44b4ec33fc185144a325971911c45547f051a1f95a39314afdab787862f' cannot be mounted at '/root//07b3e44b4ec33fc185144a325971911c45547f051a1f95a39314afdab787862f' due to canonicalization error 2.
mount: mounting rpool/ROOT/pve-1/07b3e44b4ec33fc185144a325971911c45547f051a1f95a39314afdab787862f on /root//07b3e44b4ec33fc185144a325971911c45547f051a1f95a39314afdab787862f failed: No such file or directory
Error: 2

Failed to mount rpool/ROOT/pve-1/07b3e44b4ec33fc185144a325971911c45547f051a1f95a39314afdab787862f on /root//07b3e44b4ec33fc185144a325971911c45547f051a1f95a39314afdab787862f.
Manually mount the filesystem and exit.

BusyBox v1.22.1 (Debian 1:1.22.0-19+b3) built-in shell (ash)
Enter 'help' for a list of built-in commands.

/bin/sh: can't access tty: job control turned off
/ #

If I attempt to manually run the command, I get the exact same output:
Code:
filesystem 'rpool/ROOT/pve-1/07b3e44b4ec33fc185144a325971911c45547f051a1f95a39314afdab787862f' cannot be mounted at '/root//07b3e44b4ec33fc185144a325971911c45547f051a1f95a39314afdab787862f' due to canonicalization error 2.
mount: mounting rpool/ROOT/pve-1/07b3e44b4ec33fc185144a325971911c45547f051a1f95a39314afdab787862f on /root//07b3e44b4ec33fc185144a325971911c45547f051a1f95a39314afdab787862f failed: No such file or directory

I get no output whatsoever If I run "mkdir /root//<string-goes-here>" before running mount, but I do get an additional and useless filesystem mounted within root.

Output of "zpool status":
Code:
/ # zpool status
  pool: rpool
 state: ONLINE
  scan: none requested
config:

        NAME        STATE      READ WRITE CKSUM
        rpool       ONLINE        0     0     0
          raidz1-0  ONLINE        0     0     0
            sda2    ONLINE        0     0     0
            sdb2    ONLINE        0     0     0
            sdc2    ONLINE        0     0     0
            sdd2    ONLINE        0     0     0
            sde2    ONLINE        0     0     0
            sdf2    ONLINE        0     0     0

errors: No known data errors

If I issue the "exit" command from here, it will display the exact same message as above, but with the random string changed to reflect the next entry from "zfs list". If I issue the exit command 42 times in a row, the machine will (finally) fully and properly boot. This behavior persists through subsequent reboots, so I currently have to type exit[enter] 42 times following each reboot before Proxmox finally boots up, regardless of whether I run a dist-upgrade or not.

I've found some resources but I can't post them here because reasons.

I have attempted to implement all of these, but have had no success whatsoever. I can successfully modify system files within /root, and I can access utilities like "update-grub" and "update-initramfs" after running the exit procedure and getting out of BusyBox, but the problem persists. "zpool import rpool" just errors out and shows that a pool with that name already exists, as indicated by the output of "zpool status".

The simplest solution I have found is to freshly reinstall Proxmox and recreate the ZFS pool. I have reinstalled to address this issue twice, and am now stuck at BusyBox a third time after narrowing down the issue as much as possible. I am new to ZFS and Proxmox and am definitely out of my depth.

Is there something I should try next or any more information I can provide?
 
who is creating those datasets? they are not part of anything PVE does..
 
hi,

same problem here. after reboot my proxmox server did not start again so i have to look at the monitor. its the same error as above. after entering around 40 times "exit" it bootet normally.

this is not very nice, because i cannot got to the server every time if i want to reboot it for an update...

do you have an suggestion how to solve this?

thanks
 
Hi, same problem here, did as st0ne and the server started but now I get the following errors:
qm start 100
ipcc_send_rec[1] failed: Connection refused
ipcc_send_rec[2] failed: Connection refused
ipcc_send_rec[3] failed: Connection refused
Unable to load access control list: Connection refused


pve-manager is not started, can anybody help?
 
Just a note: seeing the same as the OP. Tried both:
proxmox-ve_5.1-722cc488-1.iso
and
proxmox-ve_5.1-3.iso

Both have this issue.

Tried the @st0ne "spam exit at prompt" method and it actually worked I believe under 20 attempts.

rpool is on a 2x Intel DC S3610 480GB ZFS mirror on the Intel SATA controller of a C612 chipset. The server is a Supermicro 2U Ultra server with 2x E5-2698 V4's.

This same system I do have to add a rootdelay=10.
 
Adding a bit more to this, it almost seems like there are a bunch of rpool ROOT snapshots being created and those are what is messing up the mounting. Perhaps each exit is for one of these?
Code:
root@fmt-pve-10:~# zfs list -t snapshot
NAME                                                                                               USED  AVAIL  REFER  MOUNTPOINT
rpool/ROOT/pve-1/014579f53cfb31770b23418d4e9f6c62a44052ffa4d08386752d085ff5f58eec@219481237          8K      -   439M  -
rpool/ROOT/pve-1/0bbb0c12579736d6363a87311f0d092805cd97bb4baedc8ffb5f64e8fb9fe83d@435607413          8K      -   439M  -
rpool/ROOT/pve-1/14e89baf9818049fef21fa96340b86e554b7be728a0b85ccf7455482e84ff8ad@152400782          8K      -  59.3M  -
rpool/ROOT/pve-1/1db950ce97024660669be7d68f89f7325920e51953aa585a77847cd79d7f2dc6@324224017          8K      -   441M  -
rpool/ROOT/pve-1/353197f9a7babaaccea669440d4c131b381f002c49b923d4934090cbafa8b3bf@924237964          8K      -  59.3M  -
rpool/ROOT/pve-1/51fa3a1f6638ad8a12c4718549017ae72391e32b84cb163479b071dbab741946-init@205453598     0B      -   441M  -
rpool/ROOT/pve-1/5b728d4662831898b05610db35b74cc2040d707f259a7c4abe627ce40749d342@740478434          8K      -  90.2M  -
rpool/ROOT/pve-1/5de5062b7f279f9db04437d13e1a2c6eae6714a40e62a3643fa3fb14705082a4@584052378          8K      -  59.3M  -
rpool/ROOT/pve-1/5f4232890affdb03f4098f5d7c14728ec51a647dce2f02621aa8ef6bd3ae0d81@158039809          8K      -   440M  -
rpool/ROOT/pve-1/6698d16215658053848de14679fdeda2ee3053fc0f6ea0b8c6f5c6d43568a993@672362295          8K      -  64.1M  -
rpool/ROOT/pve-1/7f14abd3f29b84f5d090055237fa5bea8088798594ec05232f2bf7c8da758b26@65366072           8K      -   439M  -
rpool/ROOT/pve-1/95977d88d3173716aa400640f4561f0cf596dff00ad4ee1c0aad470fa473815b@109495860          0B      -   441M  -
rpool/ROOT/pve-1/95977d88d3173716aa400640f4561f0cf596dff00ad4ee1c0aad470fa473815b@647135761          0B      -   441M  -
rpool/ROOT/pve-1/9e9165fd3413b7cc21b40377b0479df40b9bb1eb96455c2e74a619a0490eab8c@487929142          8K      -   441M  -
rpool/ROOT/pve-1/a22e53e9dd3051af4e6d03256ce5176f73a0faefca361871fd21a1185bd294c9-init@749503870     0B      -   441M  -
rpool/ROOT/pve-1/b50b679292ed86575b43310aa1530f48b8bea7aafb64fa2f2f061006b767696d@421214420          8K      -  64.1M  -
rpool/ROOT/pve-1/ec4eb1aea4e29a8dd79fe0823649bf04fff1e479543e61815ab987966da81a4e@404108242          8K      -  59.3M  -

I removed as many as I could but these say dataset is busy when I try removing.
 
  • Like
Reactions: seevee
Adding a bit more to this, it almost seems like there are a bunch of rpool ROOT snapshots being created and those are what is messing up the mounting. Perhaps each exit is for one of these?
Code:
root@fmt-pve-10:~# zfs list -t snapshot
NAME                                                                                               USED  AVAIL  REFER  MOUNTPOINT
rpool/ROOT/pve-1/014579f53cfb31770b23418d4e9f6c62a44052ffa4d08386752d085ff5f58eec@219481237          8K      -   439M  -
rpool/ROOT/pve-1/0bbb0c12579736d6363a87311f0d092805cd97bb4baedc8ffb5f64e8fb9fe83d@435607413          8K      -   439M  -
rpool/ROOT/pve-1/14e89baf9818049fef21fa96340b86e554b7be728a0b85ccf7455482e84ff8ad@152400782          8K      -  59.3M  -
rpool/ROOT/pve-1/1db950ce97024660669be7d68f89f7325920e51953aa585a77847cd79d7f2dc6@324224017          8K      -   441M  -
rpool/ROOT/pve-1/353197f9a7babaaccea669440d4c131b381f002c49b923d4934090cbafa8b3bf@924237964          8K      -  59.3M  -
rpool/ROOT/pve-1/51fa3a1f6638ad8a12c4718549017ae72391e32b84cb163479b071dbab741946-init@205453598     0B      -   441M  -
rpool/ROOT/pve-1/5b728d4662831898b05610db35b74cc2040d707f259a7c4abe627ce40749d342@740478434          8K      -  90.2M  -
rpool/ROOT/pve-1/5de5062b7f279f9db04437d13e1a2c6eae6714a40e62a3643fa3fb14705082a4@584052378          8K      -  59.3M  -
rpool/ROOT/pve-1/5f4232890affdb03f4098f5d7c14728ec51a647dce2f02621aa8ef6bd3ae0d81@158039809          8K      -   440M  -
rpool/ROOT/pve-1/6698d16215658053848de14679fdeda2ee3053fc0f6ea0b8c6f5c6d43568a993@672362295          8K      -  64.1M  -
rpool/ROOT/pve-1/7f14abd3f29b84f5d090055237fa5bea8088798594ec05232f2bf7c8da758b26@65366072           8K      -   439M  -
rpool/ROOT/pve-1/95977d88d3173716aa400640f4561f0cf596dff00ad4ee1c0aad470fa473815b@109495860          0B      -   441M  -
rpool/ROOT/pve-1/95977d88d3173716aa400640f4561f0cf596dff00ad4ee1c0aad470fa473815b@647135761          0B      -   441M  -
rpool/ROOT/pve-1/9e9165fd3413b7cc21b40377b0479df40b9bb1eb96455c2e74a619a0490eab8c@487929142          8K      -   441M  -
rpool/ROOT/pve-1/a22e53e9dd3051af4e6d03256ce5176f73a0faefca361871fd21a1185bd294c9-init@749503870     0B      -   441M  -
rpool/ROOT/pve-1/b50b679292ed86575b43310aa1530f48b8bea7aafb64fa2f2f061006b767696d@421214420          8K      -  64.1M  -
rpool/ROOT/pve-1/ec4eb1aea4e29a8dd79fe0823649bf04fff1e479543e61815ab987966da81a4e@404108242          8K      -  59.3M  -

I removed as many as I could but these say dataset is busy when I try removing.


Any chance you installed Docker prior to this problem happening? I did ... and running into the problem.
 
These are all filesystems IN rpool/ROOT/pve-1

Code:
zfs set canmount=off rpool/ROOT/pve-1/014579f53cfb31770b23418d4e9f6c62a44052ffa4d08386752d085ff5f58eec
zfs set canmount=off rpool/ROOT/pve-1/0bbb0c12579736d6363a87311f0d092805cd97bb4baedc8ffb5f64e8fb9fe83d
zfs set canmount=off rpool/ROOT/pve-1/14e89baf9818049fef21fa96340b86e554b7be728a0b85ccf7455482e84ff8ad
zfs set canmount=off rpool/ROOT/pve-1/1db950ce97024660669be7d68f89f7325920e51953aa585a77847cd79d7f2dc6
zfs set canmount=off rpool/ROOT/pve-1/353197f9a7babaaccea669440d4c131b381f002c49b923d4934090cbafa8b3bf
zfs set canmount=off rpool/ROOT/pve-1/51fa3a1f6638ad8a12c4718549017ae72391e32b84cb163479b071dbab741946-init
zfs set canmount=off rpool/ROOT/pve-1/5b728d4662831898b05610db35b74cc2040d707f259a7c4abe627ce40749d342
zfs set canmount=off rpool/ROOT/pve-1/5de5062b7f279f9db04437d13e1a2c6eae6714a40e62a3643fa3fb14705082a4
zfs set canmount=off rpool/ROOT/pve-1/5f4232890affdb03f4098f5d7c14728ec51a647dce2f02621aa8ef6bd3ae0d81
zfs set canmount=off rpool/ROOT/pve-1/6698d16215658053848de14679fdeda2ee3053fc0f6ea0b8c6f5c6d43568a993
zfs set canmount=off rpool/ROOT/pve-1/7f14abd3f29b84f5d090055237fa5bea8088798594ec05232f2bf7c8da758b26
zfs set canmount=off rpool/ROOT/pve-1/95977d88d3173716aa400640f4561f0cf596dff00ad4ee1c0aad470fa473815b
zfs set canmount=off rpool/ROOT/pve-1/9e9165fd3413b7cc21b40377b0479df40b9bb1eb96455c2e74a619a0490eab8c
zfs set canmount=off rpool/ROOT/pve-1/a22e53e9dd3051af4e6d03256ce5176f73a0faefca361871fd21a1185bd294c9-init
zfs set canmount=off rpool/ROOT/pve-1/b50b679292ed86575b43310aa1530f48b8bea7aafb64fa2f2f061006b767696d
zfs set canmount=off rpool/ROOT/pve-1/ec4eb1aea4e29a8dd79fe0823649bf04fff1e479543e61815ab987966da81a4e

will let you boot cleanly, but consider putting your dockers in there own filesystem, like say rpool/docker
 
  • Like
Reactions: pjkenned
I never wanted to put my Docker stuff anywhere, I just installed it to test something, and then removed it - still was quite the surprise :)