Code:
proxmox-ve: 5.1-32 (running kernel: 4.13.13-2-pve)
pve-manager: 5.1-41 (running version: 5.1-41/0b958203)
pve-kernel-4.13.13-2-pve: 4.13.13-32
libpve-http-server-perl: 2.0-8
lvm2: 2.02.168-pve6
corosync: 2.4.2-pve3
libqb0: 1.0.1-1
pve-cluster: 5.0-19
qemu-server: 5.0-18
pve-firmware: 2.0-3
libpve-common-perl: 5.0-25
libpve-guest-common-perl: 2.0-14
libpve-access-control: 5.0-7
libpve-storage-perl: 5.0-17
pve-libspice-server1: 0.12.8-3
vncterm: 1.5-3
pve-docs: 5.1-12
pve-qemu-kvm: 2.9.1-5
pve-container: 2.0-18
pve-firewall: 3.0-5
pve-ha-manager: 2.0-4
ksm-control-daemon: 1.2-2
glusterfs-client: 3.8.8-1
lxc-pve: 2.1.1-2
lxcfs: 2.0.8-1
criu: 2.11.1-1~bpo90
novnc-pve: 0.6-4
smartmontools: 6.5+svn4324-1
zfsutils-linux: 0.7.3-pve1~bpo9
ceph: 12.2.7-pve1
We have 12 OSD on 6 hosts, all hosts used for VM + CEPH (33 vm running). VMs has poor perfomance so we decide to do some tests with KRBD
So...
- I mark storage1_ct(krbd) for use with VM images (just for test, we have no containers)
- Move one image from storage1_vm to storage1_ct
VM config:
Code:
root@vhost4:~# cat /etc/pve/qemu-server/165.conf
#start%3A %D0%A1%D1%82%D0%B0%D1%80%D1%82%D0%BE%D0%B2%D0%B0%D1%82%D1%8C %D0%B2%D1%80%D1%83%D1%87%D0%BD%D1%83%D1%8E
#project%3A %D0%98%D0%A1%D0%9E%D0%93%D0%94 %D0%94%D0%B0%D0%B3%D0%B5%D1%81%D1%82%D0%B0%D0%BD
#role%3A %D1%81%D0%B5%D1%80%D0%B2%D0%B5%D1%80 %D0%91%D0%94
#owner%3A %D0%A1%D0%BF%D0%B8%D1%80%D0%B8%D0%BD
#db%3A Oracle Database 12c Standard Edition Release 12.2.0.1.0 - 64bit
agent: 1
balloon: 5120
boot: cdn
bootdisk: scsi0
cores: 2
cpu: host
ide2: none,media=cdrom
memory: 7168
name: vm-227-oradb-ogd-da
net0: virtio=62:39:39:19:96:01,bridge=vmbr0
numa: 0
ostype: win10
scsi0: storage1_vm:vm-165-disk-2,cache=writeback,discard=on,size=60G
scsi2: storage1_ct:vm-165-disk-3,discard=on,size=151G
scsihw: virtio-scsi-pci
smbios1: uuid=4b66ff2a-cf95-480c-9d70-43319400a968
sockets: 1
- Do tests (from x2 to x7 io improvement in database tasks) - wow! We need this now!
- Just fo sure let's test some other things
-- Make snapshot -- passed
-- Remove snapshot -- passed
-- Make backup on running VM -- hung!
- Waiting for 30 minutes on 32% without ceph io, cancel backup
Code:
2019-07-06 21:54:19 INFO: Starting Backup of VM 165 (qemu)
2019-07-06 21:54:19 INFO: status = running
2019-07-06 21:54:19 INFO: update VM 165: -lock backup
2019-07-06 21:54:19 INFO: VM Name: vm-227-oradb-ogd-da
2019-07-06 21:54:19 INFO: include disk 'scsi0' 'storage1_vm:vm-165-disk-2' 60G
2019-07-06 21:54:19 INFO: include disk 'scsi2' 'storage1_ct:vm-165-disk-3' 150G
2019-07-06 21:54:20 INFO: backup mode: snapshot
2019-07-06 21:54:20 INFO: ionice priority: 7
2019-07-06 21:54:20 INFO: creating archive '/mnt/pve/nas_backup/dump/vzdump-qemu-165-2019_07_06-21_54_19.vma.gz'
2019-07-06 21:54:26 INFO: started backup task '6a041bc0-cce9-44a3-ade5-0835b0b749e8'
2019-07-06 21:54:29 INFO: status: 0% (180355072/225485783040), sparse 0% (59043840), duration 3, read/write 60/40 MB/s
2019-07-06 21:55:44 INFO: status: 1% (2273312768/225485783040), sparse 0% (379441152), duration 78, read/write 27/23 MB/s
2019-07-06 21:57:19 INFO: status: 2% (4513071104/225485783040), sparse 0% (688300032), duration 173, read/write 23/20 MB/s
2019-07-06 21:58:58 INFO: status: 3% (6790578176/225485783040), sparse 0% (1219534848), duration 272, read/write 23/17 MB/s
2019-07-06 22:00:38 INFO: status: 4% (9030336512/225485783040), sparse 0% (1818075136), duration 372, read/write 22/16 MB/s
2019-07-06 22:02:10 INFO: status: 5% (11282677760/225485783040), sparse 1% (2338820096), duration 464, read/write 24/18 MB/s
2019-07-06 22:03:51 INFO: status: 6% (13530824704/225485783040), sparse 1% (2787958784), duration 565, read/write 22/17 MB/s
2019-07-06 22:05:01 INFO: status: 7% (15791554560/225485783040), sparse 1% (3808411648), duration 635, read/write 32/17 MB/s
2019-07-06 22:06:07 INFO: status: 8% (18102616064/225485783040), sparse 2% (5092950016), duration 701, read/write 35/15 MB/s
2019-07-06 22:07:20 INFO: status: 9% (20296237056/225485783040), sparse 2% (5978890240), duration 774, read/write 30/17 MB/s
2019-07-06 22:08:30 INFO: status: 10% (22570860544/225485783040), sparse 3% (7112863744), duration 844, read/write 32/16 MB/s
2019-07-06 22:09:31 INFO: status: 11% (24855445504/225485783040), sparse 3% (8441536512), duration 905, read/write 37/15 MB/s
2019-07-06 22:10:36 INFO: status: 12% (27262976000/225485783040), sparse 4% (9513435136), duration 970, read/write 37/20 MB/s
2019-07-06 22:11:27 INFO: status: 13% (29322379264/225485783040), sparse 4% (10651230208), duration 1021, read/write 40/18 MB/s
2019-07-06 22:12:37 INFO: status: 14% (31872516096/225485783040), sparse 5% (12083429376), duration 1091, read/write 36/15 MB/s
2019-07-06 22:12:55 INFO: status: 15% (33822867456/225485783040), sparse 6% (13661736960), duration 1109, read/write 108/20 MB/s
2019-07-06 22:13:02 INFO: status: 16% (36138123264/225485783040), sparse 7% (15833014272), duration 1116, read/write 330/20 MB/s
2019-07-06 22:13:05 INFO: status: 18% (40978350080/225485783040), sparse 9% (20673241088), duration 1119, read/write 1613/0 MB/s
2019-07-06 22:13:08 INFO: status: 20% (45772439552/225485783040), sparse 11% (25467330560), duration 1122, read/write 1598/0 MB/s
2019-07-06 22:13:11 INFO: status: 22% (50524585984/225485783040), sparse 13% (30219476992), duration 1125, read/write 1584/0 MB/s
2019-07-06 22:13:14 INFO: status: 24% (55314481152/225485783040), sparse 15% (35009372160), duration 1128, read/write 1596/0 MB/s
2019-07-06 22:13:17 INFO: status: 26% (60033073152/225485783040), sparse 17% (39727964160), duration 1131, read/write 1572/0 MB/s
2019-07-06 22:13:20 INFO: status: 28% (64428638208/225485783040), sparse 19% (44120440832), duration 1134, read/write 1465/1 MB/s
2019-07-06 22:13:56 INFO: status: 29% (65407942656/225485783040), sparse 19% (44120788992), duration 1170, read/write 27/27 MB/s
2019-07-06 22:15:28 INFO: status: 30% (67665068032/225485783040), sparse 19% (44170231808), duration 1262, read/write 24/23 MB/s
2019-07-06 22:16:55 INFO: status: 31% (69915181056/225485783040), sparse 19% (44170235904), duration 1349, read/write 25/25 MB/s
2019-07-06 22:19:03 INFO: status: 32% (72186462208/225485783040), sparse 19% (44170244096), duration 1477, read/write 17/17 MB/s
2019-07-06 22:49:59 ERROR: interrupted by signal
2019-07-06 22:49:59 INFO: aborting backup job
2019-07-06 22:50:00 ERROR: Backup of VM 165 failed - interrupted by signal
- syslog unusual entry. Time is similar to backup hangs. Saw it first time, never before for almost year ceph using
Code:
2019-07-06 22:19:57.015180 osd.7 osd.7 10.10.10.104:6800/1946843 2722 : cluster [ERR] 1.75 missing primary copy of 1:ae4bb006:::rbd_data.81fdd46b8b4567.0000000000000832:head, will try copies on 2,5
OSD log:
Code:
2019-07-06 22:19:57.015138 7f7e4a4c3700 -1 bluestore(/var/lib/ceph/osd/ceph-7) _verify_csum bad crc32c/0x1000 checksum at blob offset 0x25000, got 0x6706be76, expected 0xedf78c30, device location [0x235bdbc5000~1000], logical extent 0x3a5000~1000, object #1:ae4bb006:::rbd_data.81fdd46b8b4567.0000000000000832:head#
2019-07-06 22:19:57.015178 7f7e4a4c3700 -1 log_channel(cluster) log [ERR] : 1.75 missing primary copy of 1:ae4bb006:::rbd_data.81fdd46b8b4567.0000000000000832:head, will try copies on 2,5
ceph status HEALTH_OK
- Backup aborted but web interface show task running with rolling icon
- Try to run another backup
Code:
INFO: starting new backup job: vzdump 165 --compress gzip --remove 0 --node vhost4 --storage nas_backup --mode snapshot
INFO: Starting Backup of VM 165 (qemu)
INFO: status = running
INFO: update VM 165: -lock backup
INFO: VM Name: vm-227-oradb-ogd-da
INFO: include disk 'scsi0' 'storage1_vm:vm-165-disk-2' 60G
INFO: include disk 'scsi2' 'storage1_ct:vm-165-disk-3' 150G
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: creating archive '/mnt/pve/nas_backup/dump/vzdump-qemu-165-2019_07_06-22_50_49.vma.gz'
ERROR: Node 'drive-scsi2' is busy: block device is in use by block job: backup
INFO: aborting backup job
ERROR: Backup of VM 165 failed - Node 'drive-scsi2' is busy: block device is in use by block job: backup
INFO: Backup job finished with errors
TASK ERROR: job errors
- Shutdown VM
- Try to start VM
Code:
rbd: sysfs write failed
can't unmap rbd volume vm-165-disk-3: rbd: sysfs write failed
TASK ERROR: start failed: org.freedesktop.systemd1.UnitExists: Unit 165.scope already exists.
- Try to move image to another storage (just to clean things, locks, whatever...) - hungs
Code:
create full clone of drive scsi2 (storage1_ct:vm-165-disk-3)
transferred: 0 bytes remaining: 162135015424 bytes total: 162135015424 bytes progression: 0.00 %
- Cancel it - marked as stopped but:
Code:
root@vhost4:~# ps -ef | grep 165
root 1947665 1 0 Jul06 ? 00:00:00 /usr/bin/qemu-img convert -p -n -f raw -O raw /dev/rbd/storage1/vm-165-disk-3 zeroinit:rbd:storage1/vm-165-disk-1:conf=/etc/pve/ceph.conf:id=admin:keyring=/etc/pve/priv/ceph/storage1_vm.keyring
- Try to understand/do smthng!!111
Code:
root@vhost4:~# cat /proc/1947665/syscall
2 0x7f77c44f30e0 0x80800 0x0 0x7f77c48001c8 0x23901f00 0xaaaaaaab 0x7ffc3c531fc8 0x7f77d3769820
Code:
root@vhost4:~# cat /proc/1947665/stack
[<ffffffffa169276d>] __blkdev_get+0x6d/0x430
[<ffffffffa1693715>] blkdev_get+0x125/0x310
[<ffffffffa1693e71>] blkdev_open+0x81/0xf0
[<ffffffffa164e64c>] do_dentry_open+0x1fc/0x300
[<ffffffffa164fa7e>] vfs_open+0x4e/0x80
[<ffffffffa1661eb4>] path_openat+0x614/0x1490
[<ffffffffa16640b9>] do_filp_open+0x99/0x110
[<ffffffffa164fea5>] do_sys_open+0x135/0x280
[<ffffffffa165000e>] SyS_open+0x1e/0x20
[<ffffffffa1d347fb>] entry_SYSCALL_64_fastpath+0x1e/0xa9
[<ffffffffffffffff>] 0xffffffffffffffff
Code:
root@vhost4:~# rbd unmap -o force storage1/vm-165-disk-3
- For now: VM restored from backup to another VM No, so no data loss, no downtime
Questions....
0. What the cause of this situation? KRBD? crc fail? backup? combination?
1. What I've done wrong in this situation?
2. What to do in next such situation?
3. Can we use KRBD for VM storage?
4. How can librbd be so bad? We archive x2-x7 io improvement with krbd, it means that librbd adds overhead more than 1Gbit(!) network + sata disks do. Is librbd broken?
5. How to clean locked VM 165 image without host reboot?
Thank you