[SOLVED] LXC container won't start

Code:
pct config 205
arch: amd64
cores: 1
description: net0%3A name=eth0,bridge=vmbr1,firewall=1,gw=147.215.191.1,ip=147.215.150.43,tag=191,type=veth%0A
hostname: be-annuaire
memory: 512
net0: name=eth0,bridge=vmbr1,gw=147.215.150.1,hwaddr=BA:8B:3F:DE:22:0A,ip=147.215.150.43/24,tag=150,type=veth
onboot: 0
ostype: debian
rootfs: KVM:205/vm-205-disk-0.raw,size=0T
swap: 512
unprivileged: 1
unused0: KVM:205/vm-205-disk-1.qcow2
unused1: KVM:205/vm-205-disk-1.raw

It is a service provider who managed our infrastructure this summer. He had to migrate VMs and containers to new hypervisors. I suspect he used the command "qm migrate" instead of "pct migrate" (he tells me no) and since the container does not start anymore.

As I said above, there is a disc for this container but not with the right number.
By looking at the config of the LXC container, I see that it uses the vm-205-disk-0.raw disk while I have the vm-205-disk-1.raw disk on the FS:

Code:
ls -al /mnt/pve/KVM/images/205
total 12637456
drwxr-----   2 root root        4096 Jul 28 19:02 .
drwxr-xr-x 122 root root       12288 Jul  8 12:45 ..
-rw-r--r--   1 root root  6619398144 Jul 28 19:11 vm-205-disk-1.qcow2
-rw-r-----   1 root root 34359738368 Jul 25 12:36 vm-205-disk-1.raw
 
you can try changing the configuration file and point it to one of the other disk and see if it boots
 
I changed the conf to use disk 1 and managed to boot it:
Code:
cat /etc/pve/lxc/205.conf
#net0%3A name=eth0,bridge=vmbr1,firewall=1,gw=147.215.191.1,ip=147.215.150.43,tag=191,type=veth
arch: amd64
cores: 1
hostname: be-annuaire
memory: 512
net0: name=eth0,bridge=vmbr1,gw=147.215.150.1,hwaddr=BA:8B:3F:DE:22:0A,ip=147.215.150.43/24,tag=150,type=veth
onboot: 0
ostype: debian
rootfs: KVM:205/vm-205-disk-1.raw,size=0T
swap: 512
unprivileged: 1
unused0: KVM:205/vm-205-disk-1.qcow2
unused1: KVM:205/vm-205-disk-1.raw

pct start 205

pct status 205
status: running

But I no longer have access to the container by SSH and if I log in via "pct enter 205" I have permission problems:
Code:
ssh be-annuaire
Connection reset by 147.215.150.43 port 22
pct enter 205
bash: /root/.bashrc: Permission denied
ls -al
ls: cannot open directory '.': Permission denied
pwd
/root
I don't know if this is related but is the "size=0T" parameter of the conf also not a problem?
 
hi, you can run the rescan command again to update the size
 
Rescan has successfully updated the "size =" parameter:
Code:
cat /etc/pve/lxc/205.conf
#net0%3A name=eth0,bridge=vmbr1,firewall=1,gw=147.215.191.1,ip=147.215.150.43,tag=191,type=veth
arch: amd64
cores: 1
hostname: be-annuaire
memory: 512
net0: name=eth0,bridge=vmbr1,gw=147.215.150.1,hwaddr=BA:8B:3F:DE:22:0A,ip=147.215.150.43/24,tag=150,type=veth
onboot: 0
ostype: debian
rootfs: KVM:205/vm-205-disk-1.raw,size=32G
swap: 512
unprivileged: 1
unused0: KVM:205/vm-205-disk-1.qcow2
On the other hand, I still cannot log in by SSH or via the console:
Code:
ssh be-annuaire
Connection closed by 147.215.150.43 port 22

be.png

Could this problem have caused that too?
 
does pct enter still give the permission error?

maybe the previous disk was privileged. the container is now configured unprivileged, which could explain the permission problems.

one thing you can try is to take a backup of the container and restore it. make sure the "unprivileged" box is checked during both backup and restore

afterwards check again with pct enter
 
pct enter still has permission issues.

I wanted to try to take a backup and restore it but I don't see an "Unprivileged" box:
back.png
 
sorry my bad, it only needs to be checked during restore. you can follow the link you've sent (it's the same process as i've described)
 
Here is the return of the GUI during the backup of the container. The backup did not go well due to a permission problem.
 

Attachments

okay, it seems like this container is/was privileged but incorrectly set to unprivileged (maybe manually?), causing the permission issues.

you could try setting unprivileged: 0 in the configuration file manually and then trying the backup again. while restoring make sure "unprivileged" is checked (so it's restored unprivileged)
 
Here is the logs during the restoration:
Code:
()
Formatting '/mnt/pve/KVM/images/205/vm-205-disk-0.raw', fmt=raw size=34359738368
mke2fs 1.44.5 (15-Dec-2018)
Creating filesystem with 8388608 4k blocks and 2097152 inodes
Filesystem UUID: 62af43aa-d9ac-497a-998a-5ba7c361aa4f
Superblock backups stored on blocks:
    32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
    4096000, 7962624

Allocating group tables:   0/256       done                         
Writing inode tables:   0/256       done                         
Creating journal (65536 blocks): done
Multiple mount protection is enabled with update interval 5 seconds.
Writing superblocks and filesystem accounting information:   0/256       done

extracting archive '/var/lib/vz/dump/vzdump-lxc-205-2020_09_01-11_56_46.tar.lzo'
tar: ./etc/resolv.conf: Cannot change ownership to uid 100000, gid 100000: Invalid argument
tar: ./etc/hostname: Cannot change ownership to uid 100000, gid 100000: Invalid argument
tar: ./etc/network/interfaces: Cannot change ownership to uid 100000, gid 100000: Invalid argument
tar: ./etc/hosts: Cannot change ownership to uid 100000, gid 100000: Invalid argument
tar: ./fastboot: Cannot change ownership to uid 100000, gid 100000: Invalid argument
tar: ./var/spool/postfix/dev/urandom: Cannot mknod: Operation not permitted
tar: ./var/spool/postfix/dev/random: Cannot mknod: Operation not permitted
Total bytes read: 1984829440 (1.9GiB, 181MiB/s)
tar: Exiting with failure status due to previous errors
TASK ERROR: unable to restore CT 205 - command 'lxc-usernsexec -m u:0:100000:65536 -m g:0:100000:65536 -- tar xpf - --lzop --totals --one-file-system -p --sparse --numeric-owner --acls --xattrs '--xattrs-include=user.*' '--xattrs-include=security.capability' '--warning=no-file-ignored' '--warning=no-xattr-write' -C /var/lib/lxc/205/rootfs --skip-old-files --anchored --exclude './dev/*'' failed: exit code 2


After that, I no longer see the container on the cluster:

cluster.png

No results with pct list
 

Attachments

  • cluster.png
    cluster.png
    77.3 KB · Views: 1
Last edited:
does it work if you restore it privileged?
 
I can no longer restore via the GUI since I no longer see the container on the cluster. Can I launch it in CLI? If so, what is the exact command (I don't want to make a mistake)?
 
you can restore via cli: pct restore <VMID> <FILE> so for example pct restore 1000 /var/lib/vz/dump/vzdump-lxc-CTID-2020-bla.tar will restore it to ID 1000
 
Code:
pct restore 1000 /var/lib/vz/dump/vzdump-lxc-205-2020_09_01-11_56_46.tar.lzo
400 Parameter verification failed.
storage: storage 'local' does not support container directories
pct restore <vmid> <ostemplate> [OPTIONS]
 
add --storage NAME with the name of your storage
 
I had made the backup in storage "local" so I put --storage local but it still does not work:
Code:
pct restore 1000 /var/lib/vz/dump/vzdump-lxc-205-2020_09_01-11_56_46.tar.lzo --storage local
400 Parameter verification failed.
storage: storage 'local' does not support container directories
pct restore <vmid> <ostemplate> [OPTIONS]
 
try with a different one
 
I restored the container to network storage and it worked. Thank you.

I was able to reconnect by SSH but I have many services that no longer start (see attached file). Could this be related to UID changes?
 

Attachments