LXC - Ceph don't start

Gastondc

Well-Known Member
Aug 3, 2017
34
0
46
40
Launch the task of moving a volume from one pool to another in CEPH. The task was left hanging. Stop the task, and can't start again CT. The problem is over the ceph volume.



Code:
root@pve2:~# pct start 103 
run_buffer: 314 Script exited with status 32
lxc_init: 798 Failed to run lxc.hook.pre-start for container "103"
__lxc_start: 1945 Failed to initialize container "103"
startup for container '103' failed



root@pve2:~# lxc-start 103 
lxc-start: 103: lxccontainer.c: wait_on_daemonized_start: 851 No such file or directory - Failed to receive the container state
lxc-start: 103: tools/lxc_start.c: main: 308 The container failed to start
lxc-start: 103: tools/lxc_start.c: main: 311 To get more details, run the container in foreground mode
lxc-start: 103: tools/lxc_start.c: main: 314 Additional information can be obtained by setting the --logfile and --logpriority options


root@pve2:~# pct mount 103 
/dev/rbd5
mount: /var/lib/lxc/103/rootfs/mnt/DS01: /dev/rbd1 already mounted or mount point busy.
mounting container failed
command 'mount /dev/rbd1 /var/lib/lxc/103/rootfs//mnt/DS01' failed: exit code 32

Code:
root@pve2:~# pveversion -v
proxmox-ve: 6.4-1 (running kernel: 5.4.101-1-pve)
pve-manager: 6.4-11 (running version: 6.4-11/28d576c2)
pve-kernel-5.4: 6.4-4
pve-kernel-helper: 6.4-4
pve-kernel-5.4.124-1-pve: 5.4.124-1
pve-kernel-5.4.101-1-pve: 5.4.101-1
pve-kernel-5.4.78-2-pve: 5.4.78-2
pve-kernel-5.4.73-1-pve: 5.4.73-1
ceph: 15.2.13-pve1~bpo10
ceph-fuse: 15.2.13-pve1~bpo10
corosync: 3.1.2-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: residual config
ifupdown2: 3.0.0-1+pve3
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.20-pve1
libproxmox-acme-perl: 1.1.0
libproxmox-backup-qemu0: 1.1.0-1
libpve-access-control: 6.4-3
libpve-apiclient-perl: 3.1-3
libpve-common-perl: 6.4-3
libpve-guest-common-perl: 3.1-5
libpve-http-server-perl: 3.2-3
libpve-storage-perl: 6.4-1
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.6-2
lxcfs: 4.0.6-pve1
novnc-pve: 1.1.0-1
proxmox-backup-client: 1.1.10-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.6-1
pve-cluster: 6.4-1
pve-container: 3.3-5
pve-docs: 6.4-2
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-4
pve-firmware: 3.2-4
pve-ha-manager: 3.1-1
pve-i18n: 2.3-1
pve-qemu-kvm: 5.2.0-6
pve-xtermjs: 4.7.0-3
qemu-server: 6.4-2
smartmontools: 7.2-pve2
spiceterm: 3.1-1
vncterm: 1.6-2
zfsutils-linux: 2.0.4-pve1

Code:
root@pve2:~# ceph -v 
ceph version 15.2.13 (1f5c7871ec0e36ade641773b9b05b6211c308b9d) octopus (stable)

Code:
root@pve2:~# pct config 103
arch: amd64
cores: 4
hostname: PBS01
memory: 8192
mp0: ceph_wi3tb:vm-103-disk-1,mp=/mnt/DS01,size=3001G
net0: name=eth0,bridge=vmbr0,firewall=1,gw=192.168.0.1,hwaddr=0E:63:00:AB:73:A2,ip=192.168.0.201/22,type=veth
ostype: debian
rootfs: ceph_wi3tb:vm-103-disk-0,size=8G
swap: 0
 
Last edited:
I found this process

2184875 ? D 91:55 rsync --stats -X -A --numeric-ids -aH --whole-file --sparse --one-file-system --bwlimit=0 /var/lib/lxc/103/.copy-volume-2/ /var/lib/lxc/103/.copy-volume-1

i try to kill with -9 but nothing . and i can't restart the node.

any idea?
 
I delete the 2 hidden folders:

/var/lib/lxc/103/.copy-volume-2/
/var/lib/lxc/103/.copy-volume-1/

And now start de CT wiouth problems!

but i have de dead procces in the system.
 
Now i cant delete de failed destination of my copy.

I try:

Code:
rbd rm replicated_1tb/vm-103-disk-0

but don't start.


Code:
root@pve2:/var/lib/lxc/103# rbd info replicated_1tb/vm-103-disk-0
rbd image 'vm-103-disk-0':
    size 2.9 TiB in 768000 objects
    order 22 (4 MiB objects)
    snapshot_count: 0
    id: 1d5bdf68c29eab
    block_name_prefix: rbd_data.1d5bdf68c29eab
    format: 2
    features: layering, exclusive-lock, object-map, fast-diff, deep-flatten
    op_features:
    flags:
    create_timestamp: Tue Jul 20 16:04:08 2021
    access_timestamp: Tue Jul 20 16:04:08 2021
    modify_timestamp: Tue Jul 20 16:04:08 2021
 
Now. i can make some free space on the pool and i have an error:


Code:
root@pve2:# rbd unmap replicated_1tb/vm-103-disk-0
rbd: sysfs write failed
rbd: unmap failed: (16) Device or resource busy


Code:
root@pve2:# rbd rm replicated_1tb/vm-103-disk-0
2021-07-21T13:53:04.425-0300 7fc3e57fa700 -1 librbd::image::PreRemoveRequest: 0x558d80c4c550 check_image_watchers: image has watchers - not removing
Removing image: 0% complete...failed.
rbd: error: image still has watchers
This means the image is still open or the client using it crashed. Try again after closing/unmapping it or waiting 30s for the crashed client to timeout.


any idea?
 
I found , i can't remove image becaouse it's open

root@pve2:~# cat /sys/kernel/debug/ceph/80e7521d-57fb-4683-9d67-943eef4a91b5.client17544956/osdc
REQUESTS 0 homeless 0
LINGER REQUESTS
18446462598732841075 osd2 22.3beeb5b 22.1b [2,5,7]/2 [2,5,7]/2 e16330 rbd_header.2067f444687225 0x20 0 WC/0
18446462598732841027 osd3 22.dceff552 22.12 [3,28,7]/3 [3,28,7]/3 e16330 rbd_header.1d5bdf68c29eab 0x20 5 WC/0
18446462598732841029 osd21 7.61a1d11f 7.1f [21,24,29]/21 [21,24,29]/21 e16330 rbd_header.3d4b0727c6040d 0x20 0 WC/0
18446462598732840984 osd22 7.25253c4b 7.4b [22,16,4]/22 [22,16,4]/22 e16330 rbd_header.0dc3ab83ea3912 0x20 7 WC/0
18446462598732840982 osd23 7.7b8d1e46 7.46 [23,26,20]/23 [23,26,20]/23 e16330 rbd_header.0dc35da1fc23af 0x20 7 WC/0
18446462598732840974 osd29 7.1cb9d71 7.71 [29,14,13]/29 [29,14,13]/29 e16330 rbd_header.10c8e5417a3b6f 0x20 7 WC/0
18446462598732841069 osd29 7.65271274 7.74 [29,15,21]/29 [29,15,21]/29 e16330 rbd_header.3d476edbca1a4 0x20 3 WC/0
BACKOFFS


¿any idea how to close this task?

Thanks!
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!