Hi all.
Doing a little weekend maintenance to a 3-node cluster, but I have the inability to migrate 1 LXC container off a node.
It's a CentOS 7 LXC, with 2 disks on Ceph storage. vm-190-disk-2 is ~1TB in size, so large, with about 650GB on it. I have manual rbd snapshots setup on this guy, so I'm wondering if that's my problem here?
This LXC stores userdrive data, shared out via NFS. I do quartly, hourly, daily, weekly, and monthly snapshots on this guy with some scripts I wrote up. This is the only container I have with manual rbd snapshots, so I'm thinking this is a related issue, and this is a relatively new container. Is there something I can do to fix this? If I can't migrate because of the snapshots, is there any workarounds, or do I need to find another solution for this?
Thanks!
Doing a little weekend maintenance to a 3-node cluster, but I have the inability to migrate 1 LXC container off a node.
Code:
2018-01-12 16:26:11 starting migration of CT 190 to node 'pve01' (172.16.1.201)
2018-01-12 16:26:11 volume 'ceph_vm:vm-190-disk-1' is on shared storage 'ceph_vm'
2018-01-12 16:26:11 volume 'ceph_vm:vm-190-disk-2' is on shared storage 'ceph_vm'
rbd: sysfs write failed
can't unmap rbd volume vm-190-disk-1: rbd: sysfs write failed
rbd: sysfs write failed
can't unmap rbd volume vm-190-disk-2: rbd: sysfs write failed
2018-01-12 16:26:11 ERROR: volume deactivation failed: ceph_vm:vm-190-disk-1 ceph_vm:vm-190-disk-2 at /usr/share/perl5/PVE/Storage.pm line 999.
2018-01-12 16:26:11 aborting phase 1 - cleanup resources
2018-01-12 16:26:11 start final cleanup
2018-01-12 16:26:11 ERROR: migration aborted (duration 00:00:00): volume deactivation failed: ceph_vm:vm-190-disk-1 ceph_vm:vm-190-disk-2 at /usr/share/perl5/PVE/Storage.pm line 999.
TASK ERROR: migration aborted
It's a CentOS 7 LXC, with 2 disks on Ceph storage. vm-190-disk-2 is ~1TB in size, so large, with about 650GB on it. I have manual rbd snapshots setup on this guy, so I'm wondering if that's my problem here?
Code:
root@pve02:~# rbd --pool ceph_vm snap ls vm-190-disk-2
SNAPID NAME SIZE TIMESTAMP
12 first 1024 GB Wed Dec 27 21:23:00 2017
13 initial-copy 1024 GB Thu Dec 28 09:08:42 2017
681 snap-weekly_2017-12-31_0200 1024 GB Sun Dec 31 02:00:01 2017
947 snap-monthly_2018-01-01_0300 1024 GB Mon Jan 1 03:00:02 2018
2200 snap-daily_2018-01-06_0100 1024 GB Sat Jan 6 01:00:02 2018
2454 snap-daily_2018-01-07_0100 1024 GB Sun Jan 7 01:00:02 2018
2468 snap-weekly_2018-01-07_0200 1024 GB Sun Jan 7 02:00:02 2018
2713 snap-daily_2018-01-08_0100 1024 GB Mon Jan 8 01:00:01 2018
2974 snap-daily_2018-01-09_0100 1024 GB Tue Jan 9 01:00:01 2018
3228 snap-daily_2018-01-10_0100 1024 GB Wed Jan 10 01:00:02 2018
3485 snap-daily_2018-01-11_0100 1024 GB Thu Jan 11 01:00:02 2018
3739 snap-daily_2018-01-12_0100 1024 GB Fri Jan 12 01:00:01 2018
3848 snap-hourly_2018-01-12_1100 1024 GB Fri Jan 12 11:00:02 2018
3858 snap-hourly_2018-01-12_1200 1024 GB Fri Jan 12 12:00:02 2018
3867 snap-hourly_2018-01-12_1300 1024 GB Fri Jan 12 13:00:02 2018
3878 snap-hourly_2018-01-12_1400 1024 GB Fri Jan 12 14:00:02 2018
3889 snap-hourly_2018-01-12_1500 1024 GB Fri Jan 12 15:00:01 2018
3896 snap-quarterhour_2018-01-12_1545 1024 GB Fri Jan 12 15:45:02 2018
3899 snap-hourly_2018-01-12_1600 1024 GB Fri Jan 12 16:00:02 2018
3900 snap-quarterhour_2018-01-12_1600 1024 GB Fri Jan 12 16:00:04 2018
3903 snap-quarterhour_2018-01-12_1615 1024 GB Fri Jan 12 16:15:02 2018
3905 snap-quarterhour_2018-01-12_1630 1024 GB Fri Jan 12 16:30:01 2018
This LXC stores userdrive data, shared out via NFS. I do quartly, hourly, daily, weekly, and monthly snapshots on this guy with some scripts I wrote up. This is the only container I have with manual rbd snapshots, so I'm thinking this is a related issue, and this is a relatively new container. Is there something I can do to fix this? If I can't migrate because of the snapshots, is there any workarounds, or do I need to find another solution for this?
Thanks!