VM and CT recovery after storage failure

et1000

New Member
Jan 11, 2024
10
0
1
France
Hello!

My homelab runs PVE 8.4.14. Main storage is 2 mirrored NVMe with ZFS.
After a power outage, the storage got partially corrupted (PANIC: zfs adding existent segment to range tree). I managed to backup important VM to a SATA hard drive (VyOS router, Veeam B&R) with dd, and some containers with cp -a.

I mounted the HDD as a Directory storage and I was able to recover the VyOS VM:
Code:
qm importdisk 101 /mnt/hdd/vm-101-disk-0.img hdd
qm set 101 -scsi0 hdd:101/vm-101-disk-0.raw
qm set 101 --boot order=scsi0

But I struggle for containers, and for the Veeam VM, which contained snapshots.

For the containers, when I try to build a new one, the format is .raw, not a subvol.
I managed to create a subvol container with the CLI, but when I copy the content inside of it, it won't boot:
Code:
sync_wait: 34 An error occurred in another process (expected sequence number 7)
__lxc_start: 2114 Failed to spawn container "105"
TASK ERROR: startup for container '105' failed
How can I make the container run on the HDD storage?

For the Veeam VM I have some vm-113-state-... files, but as I understand I can't have snapshot on Directory storage. So I removed anything related to snapshot in the config (parent=, ...), but I can't boot. It displays: guest has not initialized the display (yet). I might have to build another clean VM and import a Veeam config backup. But if I can get this VM to run I would be happy.

Thanks for your help
Emile
 
Hi,

Code:
root@pve:~# pct config 105
arch: amd64
cores: 1
description: <div align='center'><a href='https://Helper-Scripts.com'><img src='https://raw.githubusercontent.com/tteck/Proxmox/main/misc/images/logo-81x112.png'/></a>%0A%0A  # Nginx Proxy Manager LXC%0A%0A  <a href='https://ko-fi.com/D1D7EP4GF'><img src='https://img.shields.io/badge/&#x2615;-Buy me a coffee-blue' /></a>%0A  </div>%0A
features: keyctl=1,nesting=1
hostname: nginxproxymanager
memory: 1024
net0: name=eth0,bridge=vmbr1,gw=192.168.30.1,hwaddr=BC:24:11:12:DA:CA,ip=192.168.30.2/24,tag=30,type=veth
onboot: 1
ostype: debian
rootfs: hdd:105/subvol-105-disk-0.subvol,size=4G
swap: 512
tags: proxmox-helper-scripts
unprivileged: 1

Code:
root@pve:~# pct start 105 --debug
sync_wait: 34 An error occurred in another process (expected sequence number 7)
__lxc_start: 2114 Failed to spawn container "105"
- ../src/lxc/confile.c:set_config_idmaps:2273 - Read uid map: type g nsid 0 hostid 100000 range 65536
INFO     lsm - ../src/lxc/lsm/lsm.c:lsm_init_static:38 - Initialized LSM security driver AppArmor
INFO     utils - ../src/lxc/utils.c:run_script_argv:587 - Executing script "/usr/share/lxc/hooks/lxc-pve-prestart-hook" for container "105", config section "lxc"
DEBUG    utils - ../src/lxc/utils.c:run_buffer:560 - Script exec /usr/share/lxc/hooks/lxc-pve-prestart-hook 105 lxc pre-start produced output: /etc/os-release file not found and autodetection failed, falling back to 'unmanaged'
WARNING: /etc not present in CT, is the rootfs mounted?
got unexpected ostype (unmanaged != debian)

INFO     cgfsng - ../src/lxc/cgroups/cgfsng.c:unpriv_systemd_create_scope:1498 - Running privileged, not using a systemd unit
DEBUG    seccomp - ../src/lxc/seccomp.c:parse_config_v2:664 - Host native arch is [3221225534]
INFO     seccomp - ../src/lxc/seccomp.c:parse_config_v2:815 - Processing "reject_force_umount  # comment this to allow umount -f;  not recommended"
INFO     seccomp - ../src/lxc/seccomp.c:do_resolve_add_rule:532 - Set seccomp rule to reject force umounts
INFO     seccomp - ../src/lxc/seccomp.c:do_resolve_add_rule:532 - Set seccomp rule to reject force umounts
INFO     seccomp - ../src/lxc/seccomp.c:do_resolve_add_rule:532 - Set seccomp rule to reject force umounts
INFO     seccomp - ../src/lxc/seccomp.c:parse_config_v2:815 - Processing "[all]"
INFO     seccomp - ../src/lxc/seccomp.c:parse_config_v2:815 - Processing "kexec_load errno 1"
INFO     seccomp - ../src/lxc/seccomp.c:do_resolve_add_rule:572 - Adding native rule for syscall[246:kexec_load] action[327681:errno] arch[0]
INFO     seccomp - ../src/lxc/seccomp.c:do_resolve_add_rule:572 - Adding compat rule for syscall[246:kexec_load] action[327681:errno] arch[1073741827]
INFO     seccomp - ../src/lxc/seccomp.c:do_resolve_add_rule:572 - Adding compat rule for syscall[246:kexec_load] action[327681:errno] arch[1073741886]
INFO     seccomp - ../src/lxc/seccomp.c:parse_config_v2:815 - Processing "open_by_handle_at errno 1"
INFO     seccomp - ../src/lxc/seccomp.c:do_resolve_add_rule:572 - Adding native rule for syscall[304:open_by_handle_at] action[327681:errno] arch[0]
INFO     seccomp - ../src/lxc/seccomp.c:do_resolve_add_rule:572 - Adding compat rule for syscall[304:open_by_handle_at] action[327681:errno] arch[1073741827]
INFO     seccomp - ../src/lxc/seccomp.c:do_resolve_add_rule:572 - Adding compat rule for syscall[304:open_by_handle_at] action[327681:errno] arch[1073741886]
INFO     seccomp - ../src/lxc/seccomp.c:parse_config_v2:815 - Processing "init_module errno 1"
INFO     seccomp - ../src/lxc/seccomp.c:do_resolve_add_rule:572 - Adding native rule for syscall[175:init_module] action[327681:errno] arch[0]
INFO     seccomp - ../src/lxc/seccomp.c:do_resolve_add_rule:572 - Adding compat rule for syscall[175:init_module] action[327681:errno] arch[1073741827]
INFO     seccomp - ../src/lxc/seccomp.c:do_resolve_add_rule:572 - Adding compat rule for syscall[175:init_module] action[327681:errno] arch[1073741886]
INFO     seccomp - ../src/lxc/seccomp.c:parse_config_v2:815 - Processing "finit_module errno 1"
INFO     seccomp - ../src/lxc/seccomp.c:do_resolve_add_rule:572 - Adding native rule for syscall[313:finit_module] action[327681:errno] arch[0]
INFO     seccomp - ../src/lxc/seccomp.c:do_resolve_add_rule:572 - Adding compat rule for syscall[313:finit_module] action[327681:errno] arch[1073741827]
INFO     seccomp - ../src/lxc/seccomp.c:do_resolve_add_rule:572 - Adding compat rule for syscall[313:finit_module] action[327681:errno] arch[1073741886]
INFO     seccomp - ../src/lxc/seccomp.c:parse_config_v2:815 - Processing "delete_module errno 1"
INFO     seccomp - ../src/lxc/seccomp.c:do_resolve_add_rule:572 - Adding native rule for syscall[176:delete_module] action[327681:errno] arch[0]
INFO     seccomp - ../src/lxc/seccomp.c:do_resolve_add_rule:572 - Adding compat rule for syscall[176:delete_module] action[327681:errno] arch[1073741827]
INFO     seccomp - ../src/lxc/seccomp.c:do_resolve_add_rule:572 - Adding compat rule for syscall[176:delete_module] action[327681:errno] arch[1073741886]
INFO     seccomp - ../src/lxc/seccomp.c:parse_config_v2:815 - Processing "ioctl errno 1 [1,0x9400,SCMP_CMP_MASKED_EQ,0xff00]"
INFO     seccomp - ../src/lxc/seccomp.c:do_resolve_add_rule:555 - arg_cmp[0]: SCMP_CMP(1, 7, 65280, 37888)
INFO     seccomp - ../src/lxc/seccomp.c:do_resolve_add_rule:572 - Adding native rule for syscall[16:ioctl] action[327681:errno] arch[0]
INFO     seccomp - ../src/lxc/seccomp.c:do_resolve_add_rule:555 - arg_cmp[0]: SCMP_CMP(1, 7, 65280, 37888)
INFO     seccomp - ../src/lxc/seccomp.c:do_resolve_add_rule:572 - Adding compat rule for syscall[16:ioctl] action[327681:errno] arch[1073741827]
INFO     seccomp - ../src/lxc/seccomp.c:do_resolve_add_rule:555 - arg_cmp[0]: SCMP_CMP(1, 7, 65280, 37888)
INFO     seccomp - ../src/lxc/seccomp.c:do_resolve_add_rule:572 - Adding compat rule for syscall[16:ioctl] action[327681:errno] arch[1073741886]
INFO     seccomp - ../src/lxc/seccomp.c:parse_config_v2:1036 - Merging compat seccomp contexts into main context
INFO     start - ../src/lxc/start.c:lxc_init:882 - Container "105" is initialized
INFO     cgfsng - ../src/lxc/cgroups/cgfsng.c:cgfsng_monitor_create:1669 - The monitor process uses "lxc.monitor/105" as cgroup
DEBUG    storage - ../src/lxc/storage/storage.c:storage_query:231 - Detected rootfs type "dir"
DEBUG    storage - ../src/lxc/storage/storage.c:storage_query:231 - Detected rootfs type "dir"
INFO     cgfsng - ../src/lxc/cgroups/cgfsng.c:cgfsng_payload_create:1777 - The container process uses "lxc/105/ns" as inner and "lxc/105" as limit cgroup
INFO     start - ../src/lxc/start.c:lxc_spawn:1769 - Cloned CLONE_NEWUSER
INFO     start - ../src/lxc/start.c:lxc_spawn:1769 - Cloned CLONE_NEWNS
INFO     start - ../src/lxc/start.c:lxc_spawn:1769 - Cloned CLONE_NEWPID
INFO     start - ../src/lxc/start.c:lxc_spawn:1769 - Cloned CLONE_NEWUTS
INFO     start - ../src/lxc/start.c:lxc_spawn:1769 - Cloned CLONE_NEWIPC
INFO     start - ../src/lxc/start.c:lxc_spawn:1769 - Cloned CLONE_NEWCGROUP
DEBUG    start - ../src/lxc/start.c:lxc_try_preserve_namespace:140 - Preserved user namespace via fd 17 and stashed path as user:/proc/382697/fd/17
DEBUG    start - ../src/lxc/start.c:lxc_try_preserve_namespace:140 - Preserved mnt namespace via fd 18 and stashed path as mnt:/proc/382697/fd/18
DEBUG    start - ../src/lxc/start.c:lxc_try_preserve_namespace:140 - Preserved pid namespace via fd 19 and stashed path as pid:/proc/382697/fd/19
DEBUG    start - ../src/lxc/start.c:lxc_try_preserve_namespace:140 - Preserved uts namespace via fd 20 and stashed path as uts:/proc/382697/fd/20
DEBUG    start - ../src/lxc/start.c:lxc_try_preserve_namespace:140 - Preserved ipc namespace via fd 21 and stashed path as ipc:/proc/382697/fd/21
DEBUG    start - ../src/lxc/start.c:lxc_try_preserve_namespace:140 - Preserved cgroup namespace via fd 22 and stashed path as cgroup:/proc/382697/fd/22
DEBUG    idmap_utils - ../src/lxc/idmap_utils.c:idmaptool_on_path_and_privileged:93 - The binary "/usr/bin/newuidmap" does have the setuid bit set
DEBUG    idmap_utils - ../src/lxc/idmap_utils.c:idmaptool_on_path_and_privileged:93 - The binary "/usr/bin/newgidmap" does have the setuid bit set
DEBUG    idmap_utils - ../src/lxc/idmap_utils.c:lxc_map_ids:178 - Functional newuidmap and newgidmap binary found
INFO     cgfsng - ../src/lxc/cgroups/cgfsng.c:cgfsng_setup_limits:3528 - Limits for the unified cgroup hierarchy have been setup
DEBUG    idmap_utils - ../src/lxc/idmap_utils.c:idmaptool_on_path_and_privileged:93 - The binary "/usr/bin/newuidmap" does have the setuid bit set
DEBUG    idmap_utils - ../src/lxc/idmap_utils.c:idmaptool_on_path_and_privileged:93 - The binary "/usr/bin/newgidmap" does have the setuid bit set
INFO     idmap_utils - ../src/lxc/idmap_utils.c:lxc_map_ids:176 - Caller maps host root. Writing mapping directly
NOTICE   utils - ../src/lxc/utils.c:lxc_drop_groups:1572 - Dropped supplimentary groups
INFO     start - ../src/lxc/start.c:do_start:1105 - Unshared CLONE_NEWNET
NOTICE   utils - ../src/lxc/utils.c:lxc_drop_groups:1572 - Dropped supplimentary groups
NOTICE   utils - ../src/lxc/utils.c:lxc_switch_uid_gid:1548 - Switched to gid 0
NOTICE   utils - ../src/lxc/utils.c:lxc_switch_uid_gid:1557 - Switched to uid 0
DEBUG    start - ../src/lxc/start.c:lxc_try_preserve_namespace:140 - Preserved net namespace via fd 5 and stashed path as net:/proc/382697/fd/5
INFO     utils - ../src/lxc/utils.c:run_script_argv:587 - Executing script "/usr/share/lxc/lxcnetaddbr" for container "105", config section "net"
DEBUG    network - ../src/lxc/network.c:netdev_configure_server_veth:876 - Instantiated veth tunnel "veth105i0 <--> vethdmmMRm"
DEBUG    conf - ../src/lxc/conf.c:lxc_mount_rootfs:1240 - Mounted rootfs "/var/lib/lxc/105/rootfs" onto "/usr/lib/x86_64-linux-gnu/lxc/rootfs" with options "(null)"
INFO     conf - ../src/lxc/conf.c:setup_utsname:679 - Set hostname to "nginxproxymanager"
DEBUG    network - ../src/lxc/network.c:setup_hw_addr:3863 - Mac address "BC:24:11:12:DA:CA" on "eth0" has been setup
DEBUG    network - ../src/lxc/network.c:lxc_network_setup_in_child_namespaces_common:4004 - Network device "eth0" has been setup
INFO     network - ../src/lxc/network.c:lxc_setup_network_in_child_namespaces:4061 - Finished setting up network devices with caller assigned names
INFO     conf - ../src/lxc/conf.c:mount_autodev:1023 - Preparing "/dev"
INFO     conf - ../src/lxc/conf.c:mount_autodev:1084 - Prepared "/dev"
DEBUG    conf - ../src/lxc/conf.c:lxc_mount_auto_mounts:539 - Invalid argument - Tried to ensure procfs is unmounted
DEBUG    conf - ../src/lxc/conf.c:lxc_mount_auto_mounts:562 - Invalid argument - Tried to ensure sysfs is unmounted
DEBUG    conf - ../src/lxc/conf.c:mount_entry:2219 - Remounting "/sys/fs/fuse/connections" on "/usr/lib/x86_64-linux-gnu/lxc/rootfs/sys/fs/fuse/connections" to respect bind or remount options
DEBUG    conf - ../src/lxc/conf.c:mount_entry:2238 - Flags for "/sys/fs/fuse/connections" were 4110, required extra flags are 14
DEBUG    conf - ../src/lxc/conf.c:mount_entry:2282 - Mounted "/sys/fs/fuse/connections" on "/usr/lib/x86_64-linux-gnu/lxc/rootfs/sys/fs/fuse/connections" with filesystem type "none"
DEBUG    conf - ../src/lxc/conf.c:mount_entry:2282 - Mounted "proc" on "/usr/lib/x86_64-linux-gnu/lxc/rootfs/dev/.lxc/proc" with filesystem type "proc"
DEBUG    conf - ../src/lxc/conf.c:mount_entry:2282 - Mounted "sys" on "/usr/lib/x86_64-linux-gnu/lxc/rootfs/dev/.lxc/sys" with filesystem type "sysfs"
DEBUG    cgfsng - ../src/lxc/cgroups/cgfsng.c:__cgroupfs_mount:2187 - Mounted cgroup filesystem cgroup2 onto 19((null))
INFO     utils - ../src/lxc/utils.c:run_script_argv:587 - Executing script "/usr/share/lxcfs/lxc.mount.hook" for container "105", config section "lxc"
INFO     utils - ../src/lxc/utils.c:run_script_argv:587 - Executing script "/usr/share/lxc/hooks/lxc-pve-autodev-hook" for container "105", config section "lxc"
INFO     conf - ../src/lxc/conf.c:lxc_fill_autodev:1121 - Populating "/dev"
DEBUG    conf - ../src/lxc/conf.c:lxc_fill_autodev:1205 - Bind mounted host device 16(dev/full) to 18(full)
DEBUG    conf - ../src/lxc/conf.c:lxc_fill_autodev:1205 - Bind mounted host device 16(dev/null) to 18(null)
DEBUG    conf - ../src/lxc/conf.c:lxc_fill_autodev:1205 - Bind mounted host device 16(dev/random) to 18(random)
DEBUG    conf - ../src/lxc/conf.c:lxc_fill_autodev:1205 - Bind mounted host device 16(dev/tty) to 18(tty)
DEBUG    conf - ../src/lxc/conf.c:lxc_fill_autodev:1205 - Bind mounted host device 16(dev/urandom) to 18(urandom)
DEBUG    conf - ../src/lxc/conf.c:lxc_fill_autodev:1205 - Bind mounted host device 16(dev/zero) to 18(zero)
INFO     conf - ../src/lxc/conf.c:lxc_fill_autodev:1209 - Populated "/dev"
INFO     conf - ../src/lxc/conf.c:lxc_transient_proc:3307 - Caller's PID is 1; /proc/self points to 1
DEBUG    conf - ../src/lxc/conf.c:lxc_setup_devpts_child:1554 - Attached detached devpts mount 20 to 18/pts
DEBUG    conf - ../src/lxc/conf.c:lxc_setup_devpts_child:1640 - Created "/dev/ptmx" file as bind mount target
DEBUG    conf - ../src/lxc/conf.c:lxc_setup_devpts_child:1647 - Bind mounted "/dev/pts/ptmx" to "/dev/ptmx"
DEBUG    conf - ../src/lxc/conf.c:lxc_allocate_ttys:908 - Created tty with ptx fd 22 and pty fd 23 and index 1
DEBUG    conf - ../src/lxc/conf.c:lxc_allocate_ttys:908 - Created tty with ptx fd 24 and pty fd 25 and index 2
INFO     conf - ../src/lxc/conf.c:lxc_allocate_ttys:913 - Finished creating 2 tty devices
DEBUG    conf - ../src/lxc/conf.c:lxc_setup_ttys:869 - Bind mounted "pts/1" onto "tty1"
DEBUG    conf - ../src/lxc/conf.c:lxc_setup_ttys:869 - Bind mounted "pts/2" onto "tty2"
INFO     conf - ../src/lxc/conf.c:lxc_setup_ttys:876 - Finished setting up 2 /dev/tty<N> device(s)
INFO     conf - ../src/lxc/conf.c:setup_personality:1720 - Set personality to "0lx0"
DEBUG    conf - ../src/lxc/conf.c:capabilities_deny:3006 - Capabilities have been setup
NOTICE   conf - ../src/lxc/conf.c:lxc_setup:4014 - The container "105" is set up
INFO     apparmor - ../src/lxc/lsm/apparmor.c:apparmor_process_label_set_at:1189 - Set AppArmor label to "lxc-105_</var/lib/lxc>//&:lxc-105_<-var-lib-lxc>:"
INFO     apparmor - ../src/lxc/lsm/apparmor.c:apparmor_process_label_set:1234 - Changed AppArmor profile to lxc-105_</var/lib/lxc>//&:lxc-105_<-var-lib-lxc>:
DEBUG    terminal - ../src/lxc/terminal.c:lxc_terminal_peer_default:709 - No such device - The process does not have a controlling terminal
NOTICE   start - ../src/lxc/start.c:start:2201 - Exec'ing "/sbin/init"
ERROR    start - ../src/lxc/start.c:start:2204 - No such file or directory - Failed to exec "/sbin/init"
ERROR    sync - ../src/lxc/sync.c:sync_wait:34 - An error occurred in another process (expected sequence number 7)
INFO     network - ../src/lxc/network.c:lxc_delete_network_priv:3720 - Removed interface "veth105i0" from ""
DEBUG    network - ../src/lxc/network.c:lxc_delete_network:4217 - Deleted network devices
ERROR    start - ../src/lxc/start.c:__lxc_start:2114 - Failed to spawn container "105"
WARN     start - ../src/lxc/start.c:lxc_abort:1037 - No such process - Failed to send SIGKILL via pidfd 16 for process 382709
startup for container '105' failed

Code:
root@pve:~# qm config 113
agent: 1,fstrim_cloned_disks=1
bios: ovmf
boot: order=scsi0
cores: 4
cpu: x86-64-v2-AES
efidisk0: hdd:113/vm-113-disk-0.raw,efitype=4m,pre-enrolled-keys=1,size=1M
ide0: none,media=cdrom
machine: pc-q35-9.0
memory: 6144
meta: creation-qemu=9.0.2,ctime=1732993027
name: veeam
net0: virtio=BC:24:11:E9:CB:16,bridge=vmbr1,firewall=1,tag=20
numa: 0
onboot: 1
ostype: win11
scsi0: hdd:113/vm-113-disk-1.raw,size=100G
scsihw: virtio-scsi-single
smbios1: uuid=09f7f2a8-0534-4b9a-a226-c16e056e8ad8
sockets: 1
unused0: pool-data:vm-113-disk-0
unused1: pool-data:vm-113-disk-1
vmgenid: c0327ef9-78c1-4adc-93a5-652567ac1c94
 
How did you create that .subvol file? I don't notice anything that would prevent the VM from displaying something. I'd try to follow the system logs during its start.
 
With pct create 105 local:vztmpl/debian-12-standard_12.2-1_amd64.tar.zst --storage hdd --rootfs 0 --unprivileged 0

Then I removed the content of the subvol and replaced it with the old one. The only difference is permissions: the owner is 100000 for the old one, but root when I create a new one.
 
To my knowledge PVE does not create .subvol files. Have you checked what qemu-img info /PATH_TO/subvol-105-disk-0.subvol says?
 
Last edited:
I tried again: pct create 199 local:vztmpl/debian-12-standard_12.2-1_amd64.tar.zst --storage hdd --rootfs 0 --unprivileged 0 and it did create mnt/hdd/images/199/subvol-199-disk-0.subvoldirectory.

Code:
# qemu-img info /mnt/hdd/images/199/subvol-199-disk-0.subvol/
qemu-img: Could not open '/mnt/hdd/images/199/subvol-199-disk-0.subvol/': 'file' driver requires '/mnt/hdd/images/199/subvol-199-disk-0.subvol/' to be a regular file
 
Hmm. I've never seen that behavior before but when using your command it indeed uses a .subvol directory containing the system files of the CT.
How did you restore your files from the old "backup" into that?
 
Last edited:
Was the CT privileged before too? I'm guessing the different permissions might cause issues here.