Unable to start all LXC after reboot

justjosh

Member
Nov 4, 2019
73
0
6
55
Hi guys,

I recently made a network change so I had to reboot proxmox. Before rebooting all CTs were working perfectly and I had just spun up a new LXC right before reboot.

Code:
Sep 25 09:42:57 proxmox systemd[1]: Starting PVE LXC Container: 101...
Sep 25 09:42:58 proxmox lxc-start[4603]: lxc-start: 101: lxccontainer.c: wait_on_daemonized_start: 865 No such file or directory - Failed to receive the container state
Sep 25 09:42:58 proxmox lxc-start[4603]: lxc-start: 101: tools/lxc_start.c: main: 329 The container failed to start
Sep 25 09:42:58 proxmox lxc-start[4603]: lxc-start: 101: tools/lxc_start.c: main: 332 To get more details, run the container in foreground mode
Sep 25 09:42:58 proxmox lxc-start[4603]: lxc-start: 101: tools/lxc_start.c: main: 335 Additional information can be obtained by setting the --logfile and --logpriority options
Sep 25 09:42:58 proxmox systemd[1]: pve-container@101.service: Control process exited, code=exited, status=1/FAILURE
Sep 25 09:42:58 proxmox systemd[1]: pve-container@101.service: Failed with result 'exit-code'.
Sep 25 09:42:58 proxmox pvedaemon[3546]: unable to get PID for CT 101 (not running?)
Sep 25 09:42:58 proxmox systemd[1]: Failed to start PVE LXC Container: 101.
Sep 25 09:42:58 proxmox pvedaemon[4601]: command 'systemctl start pve-container@101' failed: exit code 1
Sep 25 09:42:58 proxmox kernel: lxc-start[4607]: segfault at 50 ip 00007f342c275f8b sp 00007ffef624e010 error 4 in liblxc.so.1.6.0[7f342c21c000+8a000]
Sep 25 09:42:58 proxmox kernel: Code: 9b c0 ff ff 4d 85 ff 0f 85 82 02 00 00 66 90 48 8b 73 50 48 8b bb f8 00 00 00 e8 80 78 fa ff 4c 8b 74 24 10 48 89 de 4c 89 f7 <41> ff 56 50 4c 89 f7 48 89 de 41 ff 56 58 48 8b 83 f8 00 00 00 8b

Code:
# lxc-start -n 101 -F
lxc-start: 101: conf.c: run_buffer: 352 Script exited with status 2
lxc-start: 101: start.c: lxc_init: 897 Failed to run lxc.hook.pre-start for container "101"
lxc-start: 101: start.c: __lxc_start: 2032 Failed to initialize container "101"

OS drives are stored on an SSD ZFS volume

Code:
  pool: SSD
state: ONLINE
  scan: scrub repaired 0B in 0 days 00:00:21 with 0 errors on Sun Sep 13 00:24:26 2020

Tried mounting the volume via CLI, appears to be missing

Code:
# pct mount 101
mounting container failed
cannot open directory //SSD/subvol-101-disk-0: No such file or directory

When I check the directory /SSD, it is empty

However, under Proxmox GUI, the storage contains all the CT volumes

Any ideas?

PVE is 6.1-3

Thanks
 
Last edited:

justjosh

Member
Nov 4, 2019
73
0
6
55
Hi,

Could you follow the steps under "Obtaining Debugging Logs" at the link below, and post any output from the command as well as the contents of /tmp/lxc-ID.log?

https://pve.proxmox.com/wiki/Linux_...ers_with_tt_span_class_monospaced_pct_span_tt


Code:
lxc-start 101 20200925021614.161 INFO     confile - confile.c:set_config_idmaps:2003 - Read uid map: type u nsid 0 hostid 100000 range 65536
lxc-start 101 20200925021614.161 INFO     confile - confile.c:set_config_idmaps:2003 - Read uid map: type g nsid 0 hostid 100000 range 65536
lxc-start 101 20200925021614.165 INFO     lsm - lsm/lsm.c:lsm_init:50 - LSM security driver AppArmor
lxc-start 101 20200925021614.166 INFO     seccomp - seccomp.c:parse_config_v2:789 - Processing "reject_force_umount  # comment this to allow umount -f;  not recommended"
lxc-start 101 20200925021614.166 INFO     seccomp - seccomp.c:do_resolve_add_rule:535 - Set seccomp rule to reject force umounts
lxc-start 101 20200925021614.166 INFO     seccomp - seccomp.c:parse_config_v2:975 - Added native rule for arch 0 for reject_force_umount action 0(kill)
lxc-start 101 20200925021614.166 INFO     seccomp - seccomp.c:do_resolve_add_rule:535 - Set seccomp rule to reject force umounts
lxc-start 101 20200925021614.167 INFO     seccomp - seccomp.c:parse_config_v2:984 - Added compat rule for arch 1073741827 for reject_force_umount action 0(kill)
lxc-start 101 20200925021614.167 INFO     seccomp - seccomp.c:do_resolve_add_rule:535 - Set seccomp rule to reject force umounts
lxc-start 101 20200925021614.167 INFO     seccomp - seccomp.c:parse_config_v2:994 - Added compat rule for arch 1073741886 for reject_force_umount action 0(kill)
lxc-start 101 20200925021614.167 INFO     seccomp - seccomp.c:do_resolve_add_rule:535 - Set seccomp rule to reject force umounts
lxc-start 101 20200925021614.167 INFO     seccomp - seccomp.c:parse_config_v2:1004 - Added native rule for arch -1073741762 for reject_force_umount action 0(kill)
lxc-start 101 20200925021614.167 INFO     seccomp - seccomp.c:parse_config_v2:789 - Processing "[all]"
lxc-start 101 20200925021614.167 INFO     seccomp - seccomp.c:parse_config_v2:789 - Processing "kexec_load errno 1"
lxc-start 101 20200925021614.167 INFO     seccomp - seccomp.c:parse_config_v2:975 - Added native rule for arch 0 for kexec_load action 327681(errno)
lxc-start 101 20200925021614.168 INFO     seccomp - seccomp.c:parse_config_v2:984 - Added compat rule for arch 1073741827 for kexec_load action 327681(errno)
lxc-start 101 20200925021614.168 INFO     seccomp - seccomp.c:parse_config_v2:994 - Added compat rule for arch 1073741886 for kexec_load action 327681(errno)
lxc-start 101 20200925021614.168 INFO     seccomp - seccomp.c:parse_config_v2:1004 - Added native rule for arch -1073741762 for kexec_load action 327681(errno)
lxc-start 101 20200925021614.168 INFO     seccomp - seccomp.c:parse_config_v2:789 - Processing "open_by_handle_at errno 1"
lxc-start 101 20200925021614.168 INFO     seccomp - seccomp.c:parse_config_v2:975 - Added native rule for arch 0 for open_by_handle_at action 327681(errno)
lxc-start 101 20200925021614.168 INFO     seccomp - seccomp.c:parse_config_v2:984 - Added compat rule for arch 1073741827 for open_by_handle_at action 327681(errno)
lxc-start 101 20200925021614.169 INFO     seccomp - seccomp.c:parse_config_v2:994 - Added compat rule for arch 1073741886 for open_by_handle_at action 327681(errno)
lxc-start 101 20200925021614.169 INFO     seccomp - seccomp.c:parse_config_v2:1004 - Added native rule for arch -1073741762 for open_by_handle_at action 327681(errno)
lxc-start 101 20200925021614.169 INFO     seccomp - seccomp.c:parse_config_v2:789 - Processing "init_module errno 1"
lxc-start 101 20200925021614.169 INFO     seccomp - seccomp.c:parse_config_v2:975 - Added native rule for arch 0 for init_module action 327681(errno)
lxc-start 101 20200925021614.169 INFO     seccomp - seccomp.c:parse_config_v2:984 - Added compat rule for arch 1073741827 for init_module action 327681(errno)
lxc-start 101 20200925021614.170 INFO     seccomp - seccomp.c:parse_config_v2:994 - Added compat rule for arch 1073741886 for init_module action 327681(errno)
lxc-start 101 20200925021614.170 INFO     seccomp - seccomp.c:parse_config_v2:1004 - Added native rule for arch -1073741762 for init_module action 327681(errno)
lxc-start 101 20200925021614.170 INFO     seccomp - seccomp.c:parse_config_v2:789 - Processing "finit_module errno 1"
lxc-start 101 20200925021614.170 INFO     seccomp - seccomp.c:parse_config_v2:975 - Added native rule for arch 0 for finit_module action 327681(errno)
lxc-start 101 20200925021614.170 INFO     seccomp - seccomp.c:parse_config_v2:984 - Added compat rule for arch 1073741827 for finit_module action 327681(errno)
lxc-start 101 20200925021614.170 INFO     seccomp - seccomp.c:parse_config_v2:994 - Added compat rule for arch 1073741886 for finit_module action 327681(errno)
lxc-start 101 20200925021614.171 INFO     seccomp - seccomp.c:parse_config_v2:1004 - Added native rule for arch -1073741762 for finit_module action 327681(errno)
lxc-start 101 20200925021614.171 INFO     seccomp - seccomp.c:parse_config_v2:789 - Processing "delete_module errno 1"
lxc-start 101 20200925021614.171 INFO     seccomp - seccomp.c:parse_config_v2:975 - Added native rule for arch 0 for delete_module action 327681(errno)
lxc-start 101 20200925021614.171 INFO     seccomp - seccomp.c:parse_config_v2:984 - Added compat rule for arch 1073741827 for delete_module action 327681(errno)
lxc-start 101 20200925021614.171 INFO     seccomp - seccomp.c:parse_config_v2:994 - Added compat rule for arch 1073741886 for delete_module action 327681(errno)
lxc-start 101 20200925021614.171 INFO     seccomp - seccomp.c:parse_config_v2:1004 - Added native rule for arch -1073741762 for delete_module action 327681(errno)
lxc-start 101 20200925021614.171 INFO     seccomp - seccomp.c:parse_config_v2:789 - Processing "keyctl errno 38"
lxc-start 101 20200925021614.172 INFO     seccomp - seccomp.c:parse_config_v2:975 - Added native rule for arch 0 for keyctl action 327718(errno)
lxc-start 101 20200925021614.172 INFO     seccomp - seccomp.c:parse_config_v2:984 - Added compat rule for arch 1073741827 for keyctl action 327718(errno)
lxc-start 101 20200925021614.172 INFO     seccomp - seccomp.c:parse_config_v2:994 - Added compat rule for arch 1073741886 for keyctl action 327718(errno)
lxc-start 101 20200925021614.172 INFO     seccomp - seccomp.c:parse_config_v2:1004 - Added native rule for arch -1073741762 for keyctl action 327718(errno)
lxc-start 101 20200925021614.172 INFO     seccomp - seccomp.c:parse_config_v2:1008 - Merging compat seccomp contexts into main context
lxc-start 101 20200925021614.173 INFO     conf - conf.c:run_script_argv:372 - Executing script "/usr/share/lxc/hooks/lxc-pve-prestart-hook" for container "101", config section "lxc"
lxc-start 101 20200925021614.519 DEBUG    conf - conf.c:run_buffer:340 - Script exec /usr/share/lxc/hooks/lxc-pve-prestart-hook 101 lxc pre-start produced output: cannot open directory //SSD/subvol-101-disk-0: No such file or directory

lxc-start 101 20200925021614.527 ERROR    conf - conf.c:run_buffer:352 - Script exited with status 2
lxc-start 101 20200925021614.527 ERROR    start - start.c:lxc_init:897 - Failed to run lxc.hook.pre-start for container "101"
lxc-start 101 20200925021614.527 ERROR    start - start.c:__lxc_start:2032 - Failed to initialize container "101"
 

justjosh

Member
Nov 4, 2019
73
0
6
55
Using zfs mount -O SSD does cause the volumes to show up but the log output is still the same when trying to start the CT.

Code:
# zfs mount -O SSD

Unfortunately, it seems that the volumes are of the wrong size with this mount.

Code:
# ls /SSD -l --block-size=M
total 1M
drwxr----- 3 root root 1M Sep 25 19:49 subvol-101-disk-0
drwxr----- 3 root root 1M Sep 25 19:50 subvol-102-disk-0
...

Code:
# pct mount 101
mounted CT 101 in '/var/lib/lxc/101/rootfs'

Code:
# ls /var/lib/lxc/101/rootfs -l
total 1
drwxr-xr-x 2 root root 2 Sep 25 19:49 dev

Code:
lxc-start 101 20200925115026.991 INFO     confile - confile.c:set_config_idmaps:2003 - Read uid map: type u nsid 0 hostid 100000 range 65536
lxc-start 101 20200925115026.991 INFO     confile - confile.c:set_config_idmaps:2003 - Read uid map: type g nsid 0 hostid 100000 range 65536
lxc-start 101 20200925115026.992 INFO     lsm - lsm/lsm.c:lsm_init:50 - LSM security driver AppArmor
lxc-start 101 20200925115026.992 INFO     seccomp - seccomp.c:parse_config_v2:789 - Processing "reject_force_umount  # comment this to allow umount -f;  not recommended"
lxc-start 101 20200925115026.992 INFO     seccomp - seccomp.c:do_resolve_add_rule:535 - Set seccomp rule to reject force umounts
lxc-start 101 20200925115026.992 INFO     seccomp - seccomp.c:parse_config_v2:975 - Added native rule for arch 0 for reject_force_umount action 0(kill)
lxc-start 101 20200925115026.992 INFO     seccomp - seccomp.c:do_resolve_add_rule:535 - Set seccomp rule to reject force umounts
lxc-start 101 20200925115026.992 INFO     seccomp - seccomp.c:parse_config_v2:984 - Added compat rule for arch 1073741827 for reject_force_umount action 0(kill)
lxc-start 101 20200925115026.992 INFO     seccomp - seccomp.c:do_resolve_add_rule:535 - Set seccomp rule to reject force umounts
lxc-start 101 20200925115026.992 INFO     seccomp - seccomp.c:parse_config_v2:994 - Added compat rule for arch 1073741886 for reject_force_umount action 0(kill)
lxc-start 101 20200925115026.992 INFO     seccomp - seccomp.c:do_resolve_add_rule:535 - Set seccomp rule to reject force umounts
lxc-start 101 20200925115026.992 INFO     seccomp - seccomp.c:parse_config_v2:1004 - Added native rule for arch -1073741762 for reject_force_umount action 0(kill)
lxc-start 101 20200925115026.992 INFO     seccomp - seccomp.c:parse_config_v2:789 - Processing "[all]"
lxc-start 101 20200925115026.992 INFO     seccomp - seccomp.c:parse_config_v2:789 - Processing "kexec_load errno 1"
lxc-start 101 20200925115026.992 INFO     seccomp - seccomp.c:parse_config_v2:975 - Added native rule for arch 0 for kexec_load action 327681(errno)
lxc-start 101 20200925115026.992 INFO     seccomp - seccomp.c:parse_config_v2:984 - Added compat rule for arch 1073741827 for kexec_load action 327681(errno)
lxc-start 101 20200925115026.992 INFO     seccomp - seccomp.c:parse_config_v2:994 - Added compat rule for arch 1073741886 for kexec_load action 327681(errno)
lxc-start 101 20200925115026.992 INFO     seccomp - seccomp.c:parse_config_v2:1004 - Added native rule for arch -1073741762 for kexec_load action 327681(errno)
lxc-start 101 20200925115026.992 INFO     seccomp - seccomp.c:parse_config_v2:789 - Processing "open_by_handle_at errno 1"
lxc-start 101 20200925115026.992 INFO     seccomp - seccomp.c:parse_config_v2:975 - Added native rule for arch 0 for open_by_handle_at action 327681(errno)
lxc-start 101 20200925115026.992 INFO     seccomp - seccomp.c:parse_config_v2:984 - Added compat rule for arch 1073741827 for open_by_handle_at action 327681(errno)
lxc-start 101 20200925115026.992 INFO     seccomp - seccomp.c:parse_config_v2:994 - Added compat rule for arch 1073741886 for open_by_handle_at action 327681(errno)
lxc-start 101 20200925115026.992 INFO     seccomp - seccomp.c:parse_config_v2:1004 - Added native rule for arch -1073741762 for open_by_handle_at action 327681(errno)
lxc-start 101 20200925115026.992 INFO     seccomp - seccomp.c:parse_config_v2:789 - Processing "init_module errno 1"
lxc-start 101 20200925115026.992 INFO     seccomp - seccomp.c:parse_config_v2:975 - Added native rule for arch 0 for init_module action 327681(errno)
lxc-start 101 20200925115026.992 INFO     seccomp - seccomp.c:parse_config_v2:984 - Added compat rule for arch 1073741827 for init_module action 327681(errno)
lxc-start 101 20200925115026.992 INFO     seccomp - seccomp.c:parse_config_v2:994 - Added compat rule for arch 1073741886 for init_module action 327681(errno)
lxc-start 101 20200925115026.992 INFO     seccomp - seccomp.c:parse_config_v2:1004 - Added native rule for arch -1073741762 for init_module action 327681(errno)
lxc-start 101 20200925115026.992 INFO     seccomp - seccomp.c:parse_config_v2:789 - Processing "finit_module errno 1"
lxc-start 101 20200925115026.992 INFO     seccomp - seccomp.c:parse_config_v2:975 - Added native rule for arch 0 for finit_module action 327681(errno)
lxc-start 101 20200925115026.992 INFO     seccomp - seccomp.c:parse_config_v2:984 - Added compat rule for arch 1073741827 for finit_module action 327681(errno)
lxc-start 101 20200925115026.992 INFO     seccomp - seccomp.c:parse_config_v2:994 - Added compat rule for arch 1073741886 for finit_module action 327681(errno)
lxc-start 101 20200925115026.992 INFO     seccomp - seccomp.c:parse_config_v2:1004 - Added native rule for arch -1073741762 for finit_module action 327681(errno)
lxc-start 101 20200925115026.992 INFO     seccomp - seccomp.c:parse_config_v2:789 - Processing "delete_module errno 1"
lxc-start 101 20200925115026.992 INFO     seccomp - seccomp.c:parse_config_v2:975 - Added native rule for arch 0 for delete_module action 327681(errno)
lxc-start 101 20200925115026.992 INFO     seccomp - seccomp.c:parse_config_v2:984 - Added compat rule for arch 1073741827 for delete_module action 327681(errno)
lxc-start 101 20200925115026.992 INFO     seccomp - seccomp.c:parse_config_v2:994 - Added compat rule for arch 1073741886 for delete_module action 327681(errno)
lxc-start 101 20200925115026.992 INFO     seccomp - seccomp.c:parse_config_v2:1004 - Added native rule for arch -1073741762 for delete_module action 327681(errno)
lxc-start 101 20200925115026.992 INFO     seccomp - seccomp.c:parse_config_v2:789 - Processing "keyctl errno 38"
lxc-start 101 20200925115026.992 INFO     seccomp - seccomp.c:parse_config_v2:975 - Added native rule for arch 0 for keyctl action 327718(errno)
lxc-start 101 20200925115026.992 INFO     seccomp - seccomp.c:parse_config_v2:984 - Added compat rule for arch 1073741827 for keyctl action 327718(errno)
lxc-start 101 20200925115026.992 INFO     seccomp - seccomp.c:parse_config_v2:994 - Added compat rule for arch 1073741886 for keyctl action 327718(errno)
lxc-start 101 20200925115026.992 INFO     seccomp - seccomp.c:parse_config_v2:1004 - Added native rule for arch -1073741762 for keyctl action 327718(errno)
lxc-start 101 20200925115026.992 INFO     seccomp - seccomp.c:parse_config_v2:1008 - Merging compat seccomp contexts into main context
lxc-start 101 20200925115026.993 INFO     conf - conf.c:run_script_argv:372 - Executing script "/usr/share/lxc/hooks/lxc-pve-prestart-hook" for container "101", config section "lxc"
lxc-start 101 20200925115027.500 DEBUG    conf - conf.c:run_buffer:340 - Script exec /usr/share/lxc/hooks/lxc-pve-prestart-hook 101 lxc pre-start produced output: unable to detect OS distribution

lxc-start 101 20200925115027.509 ERROR    conf - conf.c:run_buffer:352 - Script exited with status 2
lxc-start 101 20200925115027.509 ERROR    start - start.c:lxc_init:897 - Failed to run lxc.hook.pre-start for container "101"
lxc-start 101 20200925115027.509 ERROR    start - start.c:__lxc_start:2032 - Failed to initialize container "101"
 
Last edited:

justjosh

Member
Nov 4, 2019
73
0
6
55
After further investigation, I managed a temporary bandaid fix by remounting every disk by hand. Not very practical and root cause is not fixed yet either.

Code:
#rm -rf /SSD/subvol-101-disk-0/dev/
zfs mount /SSD/subvol-101-disk-0
...
 

Stoiko Ivanov

Proxmox Staff Member
Staff member
May 2, 2018
4,326
539
118
Which version are you running? (`pveversion -v`)?

we recently included a patch that should help with the not mounted zfs subvolumes for containers which get started on boot
 

justjosh

Member
Nov 4, 2019
73
0
6
55
Which version are you running? (`pveversion -v`)?

we recently included a patch that should help with the not mounted zfs subvolumes for containers which get started on boot

Not sure which version you're looking for. I've been meaning to upgrade to 6.2 but not going to do it until this issue is fixed. Don't want to risk breaking anything else.

Code:
# pveversion -v

proxmox-ve: 6.1-2 (running kernel: 5.3.10-1-pve)
pve-manager: 6.1-3 (running version: 6.1-3/37248ce6)
pve-kernel-5.3: 6.0-12
pve-kernel-helper: 6.0-12
pve-kernel-5.3.10-1-pve: 5.3.10-1
...
zfsutils-linux: 0.8.2-pve2
 

Stoiko Ivanov

Proxmox Staff Member
Staff member
May 2, 2018
4,326
539
118
Not sure which version you're looking for. I've been meaning to upgrade to 6.2 but not going to do it until this issue is fixed. Don't want to risk breaking anything else.

the version of pve-container would have been interesting - but I guess it will be one matching proxmox-ve 6.1-2
the patches I'm referring to (e.g. https://lists.proxmox.com/pipermail/pve-devel/2020-July/044362.html)
were all added after the release of PVE 6.2 - so consider upgrading

as a mitigation in this situation it can also help to set the cache-file property for all zpools in your system to '/etc/zfs/zpool.cache' (the default value) - then the pools are imported early during boot and the subvolumes get mounted before the guests are started

i hope this helps!
 

justjosh

Member
Nov 4, 2019
73
0
6
55
the version of pve-container would have been interesting - but I guess it will be one matching proxmox-ve 6.1-2
the patches I'm referring to (e.g. https://lists.proxmox.com/pipermail/pve-devel/2020-July/044362.html)
were all added after the release of PVE 6.2 - so consider upgrading

as a mitigation in this situation it can also help to set the cache-file property for all zpools in your system to '/etc/zfs/zpool.cache' (the default value) - then the pools are imported early during boot and the subvolumes get mounted before the guests are started

i hope this helps!
pve-container: 3.0-14

The subvolumes are mounted but with an empty dev directory. Is it the same issue? Is it safe to upgrade to 6.2 with this error?
 

justjosh

Member
Nov 4, 2019
73
0
6
55
the version of pve-container would have been interesting - but I guess it will be one matching proxmox-ve 6.1-2
the patches I'm referring to (e.g. https://lists.proxmox.com/pipermail/pve-devel/2020-July/044362.html)
were all added after the release of PVE 6.2 - so consider upgrading

as a mitigation in this situation it can also help to set the cache-file property for all zpools in your system to '/etc/zfs/zpool.cache' (the default value) - then the pools are imported early during boot and the subvolumes get mounted before the guests are started

i hope this helps!
So I just tried to do a vzdump of a container and it tripped the ZFS mount issue again. All volumes disappeared.

Code:
# ls -l /SSD
total 0

I've never seen it do this while everything was running. vzdump ended with error: unexpected status.

Code:
INFO: starting new backup job: vzdump 102 --remove 0 --compress lzo --storage local --node proxmox --mode snapshot
INFO: filesystem type on dumpdir is 'zfs' -using /var/tmp/vzdumptmp8068 for temporary files
INFO: Starting Backup of VM 102 (lxc)
INFO: Backup started at 2020-09-25 22:42:35
INFO: status = running
INFO: CT Name: Transmission
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: create storage snapshot 'vzdump'
INFO: creating archive '/var/lib/vz/dump/vzdump-lxc-102-2020_09_25-22_42_35.tar.lzo'
 

Stoiko Ivanov

Proxmox Staff Member
Staff member
May 2, 2018
4,326
539
118
pve-container: 3.0-14
my guess would be that the issue should be resolved by upgrading to the latest packages.


The subvolumes are mounted but with an empty dev directory
if you are in this situation again - check the zfs properties of the dataset:
Code:
zfs get all SSD/subvol-101-disk-0

Is it safe to upgrade to 6.2 with this error?
as said - the latest packages contain fixes for problems which (on the first look) seem similar to the one you're currently experiencing.

usually PVE upgrades are rather smooth and don't introduce regressions.

I hope this helps!
 

justjosh

Member
Nov 4, 2019
73
0
6
55
my guess would be that the issue should be resolved by upgrading to the latest packages.



if you are in this situation again - check the zfs properties of the dataset:
Code:
zfs get all SSD/subvol-101-disk-0


as said - the latest packages contain fixes for problems which (on the first look) seem similar to the one you're currently experiencing.

usually PVE upgrades are rather smooth and don't introduce regressions.

I hope this helps!
I've just upgraded to 6.2 and rebooted. The CTs won't start.

Code:
mount_autodev: 1074 Permission denied - Failed to create "/dev" directory
lxc_setup: 3238 Failed to mount "/dev"
do_start: 1224 Failed to setup container "101"
__sync_wait: 41 An error occurred in another process (expected sequence number 5)
__lxc_start: 1950 Failed to spawn container "101"
TASK ERROR: startup for container '101' failed

Checked the SSD pool, I see all the CT volumes listed but when checking their contents they are all empty.

Code:
# ls -l /SSD/subvol-101-disk-0/
total 0

Unmounting says that they are not mounted so I just mounted them which fixed the issue.

Code:
# zfs unmount SSD/subvol-101-disk-0
cannot unmount 'SSD/subvol-101-disk-0': not currently mounted
# zfs mount SSD/subvol-101-disk-0
# ls -l /SSD/subvol-101-disk-0/
total 149

Code:
# zfs mount SSD/subvol-101-disk-0

Again, still a band aid fix.
 

Stoiko Ivanov

Proxmox Staff Member
Staff member
May 2, 2018
4,326
539
118
please post the output of:
zfs get all SSD/subvol-101-disk-0

also check the journal since boot for any messages from zfs / zpool

if you export the zpool - there should be no directories/files below it's mountpoint (you can only do so if all containers are stopped)

I hope this helps!
 

justjosh

Member
Nov 4, 2019
73
0
6
55
please post the output of:


also check the journal since boot for any messages from zfs / zpool

if you export the zpool - there should be no directories/files below it's mountpoint (you can only do so if all containers are stopped)

I hope this helps!
Code:
# zfs get all SSD/subvol-101-disk-0
NAME                   PROPERTY              VALUE                   SOURCE
SSD/subvol-101-disk-0  type                  filesystem              -
SSD/subvol-101-disk-0  creation              Fri Sep 25  8:02 2020   -
SSD/subvol-101-disk-0  used                  2.18G                   -
SSD/subvol-101-disk-0  available             37.8G                   -
SSD/subvol-101-disk-0  referenced            2.18G                   -
SSD/subvol-101-disk-0  compressratio         1.38x                   -
SSD/subvol-101-disk-0  mounted               yes                     -
SSD/subvol-101-disk-0  quota                 none                    default
SSD/subvol-101-disk-0  reservation           none                    default
SSD/subvol-101-disk-0  recordsize            128K                    default
SSD/subvol-101-disk-0  mountpoint            /SSD/subvol-101-disk-0  default
SSD/subvol-101-disk-0  sharenfs              off                     default
SSD/subvol-101-disk-0  checksum              on                      default
SSD/subvol-101-disk-0  compression           on                      inherited from SSD
SSD/subvol-101-disk-0  atime                 on                      default
SSD/subvol-101-disk-0  devices               on                      default
SSD/subvol-101-disk-0  exec                  on                      default
SSD/subvol-101-disk-0  setuid                on                      default
SSD/subvol-101-disk-0  readonly              off                     default
SSD/subvol-101-disk-0  zoned                 off                     default
SSD/subvol-101-disk-0  snapdir               hidden                  default
SSD/subvol-101-disk-0  aclinherit            restricted              default
SSD/subvol-101-disk-0  createtxg             3526956                 -
SSD/subvol-101-disk-0  canmount              on                      default
SSD/subvol-101-disk-0  xattr                 sa                      local
SSD/subvol-101-disk-0  copies                1                       default
SSD/subvol-101-disk-0  version               5                       -
SSD/subvol-101-disk-0  utf8only              off                     -
SSD/subvol-101-disk-0  normalization         none                    -
SSD/subvol-101-disk-0  casesensitivity       sensitive               -
SSD/subvol-101-disk-0  vscan                 off                     default
SSD/subvol-101-disk-0  nbmand                off                     default
SSD/subvol-101-disk-0  sharesmb              off                     default
SSD/subvol-101-disk-0  refquota              40G                     local
SSD/subvol-101-disk-0  refreservation        none                    default
SSD/subvol-101-disk-0  guid                  11847681519287553407    -
SSD/subvol-101-disk-0  primarycache          all                     default
SSD/subvol-101-disk-0  secondarycache        all                     default
SSD/subvol-101-disk-0  usedbysnapshots       0B                      -
SSD/subvol-101-disk-0  usedbydataset         2.18G                   -
SSD/subvol-101-disk-0  usedbychildren        0B                      -
SSD/subvol-101-disk-0  usedbyrefreservation  0B                      -
SSD/subvol-101-disk-0  logbias               latency                 default
SSD/subvol-101-disk-0  objsetid              1671                    -
SSD/subvol-101-disk-0  dedup                 off                     default
SSD/subvol-101-disk-0  mlslabel              none                    default
SSD/subvol-101-disk-0  sync                  standard                default
SSD/subvol-101-disk-0  dnodesize             legacy                  default
SSD/subvol-101-disk-0  refcompressratio      1.38x                   -
SSD/subvol-101-disk-0  written               2.18G                   -
SSD/subvol-101-disk-0  logicalused           2.74G                   -
SSD/subvol-101-disk-0  logicalreferenced     2.74G                   -
SSD/subvol-101-disk-0  volmode               default                 default
SSD/subvol-101-disk-0  filesystem_limit      none                    default
SSD/subvol-101-disk-0  snapshot_limit        none                    default
SSD/subvol-101-disk-0  filesystem_count      none                    default
SSD/subvol-101-disk-0  snapshot_count        none                    default
SSD/subvol-101-disk-0  snapdev               hidden                  default
SSD/subvol-101-disk-0  acltype               posixacl                local
SSD/subvol-101-disk-0  context               none                    default
SSD/subvol-101-disk-0  fscontext             none                    default
SSD/subvol-101-disk-0  defcontext            none                    default
SSD/subvol-101-disk-0  rootcontext           none                    default
SSD/subvol-101-disk-0  relatime              off                     default
SSD/subvol-101-disk-0  redundant_metadata    all                     default
SSD/subvol-101-disk-0  overlay               off                     default
SSD/subvol-101-disk-0  encryption            off                     default
SSD/subvol-101-disk-0  keylocation           none                    default
SSD/subvol-101-disk-0  keyformat             none                    default
SSD/subvol-101-disk-0  pbkdf2iters           0                       default
SSD/subvol-101-disk-0  special_small_blocks  0                       default
 

justjosh

Member
Nov 4, 2019
73
0
6
55
Where should I look for the logs possibly related to this?

I have some CTs that directly mount directories from the ZFS pool. For example, CT102 has mp0 = /mnt/shared_a/ I've noticed that when I do a ls -l of the volume, these mount points exist in the subvol disk but with size 0 and empty contents.

I have a theory that these are being mounted before the ZFS pool comes online and that is why proxmox cannot mount the subvol later with the OS drive.

Code:
#ls -l /SSD/subvol-102-disk-0/
>>> Only /mnt directory

#ls -l /SSD/subvol-102-disk-0/mnt/shared_a/
total 0

#zfs mount SSD/subvol-102-disk-0
cannot mount 'SSD/subvol-102-disk-0': directory not empty

#rm -rf /SSD/subvol-102-disk-0/mnt/
zfs mount SSD/subvol-102-disk-0
 

Stoiko Ivanov

Proxmox Staff Member
Staff member
May 2, 2018
4,326
539
118
IIRC you need to clean the directories, which were created by pct, while the dataset was not mounted (this is the reason for the "cannot mount 'SSD/subvol-102-disk-0': directory not empty" message):
* once the container cannot start - check all directories related to that container - if 'dev/' or 'mnt/' (or something else) got create - remove them.
* probably safest would be to stop the containers first
* once done - reboot - all container should start up successfully

if this does not help you can try to set the cachefile for each pool in your system - that should mitigate the issue

I hope this helps!
 

justjosh

Member
Nov 4, 2019
73
0
6
55
IIRC you need to clean the directories, which were created by pct, while the dataset was not mounted (this is the reason for the "cannot mount 'SSD/subvol-102-disk-0': directory not empty" message):
* once the container cannot start - check all directories related to that container - if 'dev/' or 'mnt/' (or something else) got create - remove them.
* probably safest would be to stop the containers first
* once done - reboot - all container should start up successfully

if this does not help you can try to set the cachefile for each pool in your system - that should mitigate the issue

I hope this helps!
I have been cleaning the directories after every reboot but it keeps getting created once I reboot the host but it is very annoying and time consuming since I need to make sure each directory is actually empty before removing. I need a way to stop the PCT from mounting the directory before the dataset is mounted on boot. Maybe some sort of callback or delay.
 

Stoiko Ivanov

Proxmox Staff Member
Staff member
May 2, 2018
4,326
539
118
if you've upgraded to 6.2 (and have pve-container >= 3.2-1 (3.2-2 is available in pvetest) then this seems like a different issue...

check the journal after a reboot (`journalctl -b`) for all messages matching:
* 'zfs|ZFS'
* 'container' (also case-insensitive)
 

justjosh

Member
Nov 4, 2019
73
0
6
55
if you've upgraded to 6.2 (and have pve-container >= 3.2-1 (3.2-2 is available in pvetest) then this seems like a different issue...

check the journal after a reboot (`journalctl -b`) for all messages matching:
* 'zfs|ZFS'
* 'container' (also case-insensitive)
proxmox-ve: 6.2-2 (running kernel: 5.4.60-1-pve)
pve-container: 3.2-1

Doesn't seem to be any errors for zfs on boot. Container only shows the failure to start exit code 1. Not sure if it's got to do with the root delay I placed for boot because the pools are encrypted.
Code:
# journalctl -b | grep zfs
Sep 27 23:49:20 proxmox kernel: Command line: BOOT_IMAGE=/ROOT/pve-1@/boot/vmlinuz-5.4.60-1-pve root=ZFS=rpool/ROOT/pve-1 ro root=ZFS=rpool/ROOT/pve-1 boot=zfs rootdelay=20 quiet
Sep 27 23:49:20 proxmox kernel: Kernel command line: BOOT_IMAGE=/ROOT/pve-1@/boot/vmlinuz-5.4.60-1-pve root=ZFS=rpool/ROOT/pve-1 ro root=ZFS=rpool/ROOT/pve-1 boot=zfs rootdelay=20 quiet
Sep 28 00:01:26 proxmox pmxcfs[3047]: [status] notice: RRDC update error /var/lib/rrdcached/db/pve2-storage/proxmox/local-zfs: -1
Sep 28 22:56:01 proxmox kernel:  ? zpl_readpage+0x9f/0xe0 [zfs]
Sep 29 00:01:51 proxmox pmxcfs[3047]: [status] notice: RRDC update error /var/lib/rrdcached/db/pve2-storage/proxmox/local-zfs: -1
Sep 29 00:02:49 proxmox pmxcfs[3047]: [status] notice: RRDC update error /var/lib/rrdcached/db/pve2-storage/proxmox/local-zfs: -1
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE and Proxmox Mail Gateway. We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get your own in 60 seconds.

Buy now!