Ceph rbd error - sysfs write failed

elurex · Oct 3, 2019

I am running PVE 6.0 and on this pve node, I have already started 2 vm with rbd disks to my ceph storage, However, sometime if I need to start third VM using rbd disk, PVE will error with following msg. It only can be solve if I reboot the pve node and then I can start 3 VM or more using rbd disk

Code:

/dev/rbd6
/dev/rbd7
/dev/rbd8
rbd: sysfs write failed
can't unmap rbd device /dev/rbd/rbd/vm-104-disk-1: rbd: sysfs write failed
rbd: sysfs write failed
can't unmap rbd device /dev/rbd/rbd/vm-104-disk-2: rbd: sysfs write failed
rbd: sysfs write failed
can't unmap rbd device /dev/rbd/rbd/vm-104-disk-0: rbd: sysfs write failed
TASK ERROR: start failed: command '/usr/bin/kvm -id 104 -name winclient104 -chardev 'socket,id=qmp,path=/var/run/qemu-server/104.qmp,server,nowait' -mon 'chardev=qmp,mode=control' -chardev 'socket,id=qmp-event,path=/var/run/qmeventd.sock,reconnect=5' -mon 'chardev=qmp-event,mode=control' -pidfile /var/run/qemu-server/104.pid -daemonize -smbios 'type=1,uuid=b9595b11-142c-41b5-bba9-25f359770a91' -drive 'if=pflash,unit=0,format=raw,readonly,file=/usr/share/pve-edk2-firmware//OVMF_CODE.fd' -drive 'if=pflash,unit=1,format=raw,id=drive-efidisk0,file=/dev/rbd/rbd/vm-104-disk-0' -smp '8,sockets=1,cores=8,maxcpus=8' -nodefaults -boot 'menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg' -vga none -nographic -no-hpet -cpu 'host,+kvm_pv_unhalt,+kvm_pv_eoi,hv_vendor_id=proxmox,hv_spinlocks=0x1fff,hv_vapic,hv_time,hv_reset,hv_vpindex,hv_runtime,hv_relaxed,hv_synic,hv_stimer,hv_ipi,kvm=off' -m 16384 -object 'memory-backend-ram,id=ram-node0,size=16384M' -numa 'node,nodeid=0,cpus=0-7,memdev=ram-node0' -readconfig /usr/share/qemu-server/pve-q35.cfg -device 'usb-tablet,id=tablet,bus=ehci.0,port=1' -device 'vfio-pci,host=b2:00.0,id=hostpci0.0,bus=ich9-pcie-port-1,addr=0x0.0,multifunction=on' -device 'vfio-pci,host=b2:00.1,id=hostpci0.1,bus=ich9-pcie-port-1,addr=0x0.1' -device 'vfio-pci,host=b2:00.2,id=hostpci0.2,bus=ich9-pcie-port-1,addr=0x0.2' -device 'vfio-pci,host=b2:00.3,id=hostpci0.3,bus=ich9-pcie-port-1,addr=0x0.3' -chardev 'socket,path=/var/run/qemu-server/104.qga,server,nowait,id=qga0' -device 'virtio-serial,id=qga0,bus=pci.0,addr=0x8' -device 'virtserialport,chardev=qga0,name=org.qemu.guest_agent.0' -device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3' -iscsi 'initiator-name=iqn.1993-08.org.debian:01:cb85eb190f0' -device 'virtio-scsi-pci,id=scsihw0,bus=pci.0,addr=0x5' -drive 'file=/dev/rbd/rbd/vm-104-disk-1,if=none,id=drive-scsi0,cache=writeback,discard=on,throttling.bps-read=419430400,throttling.iops-read=8000,throttling.bps-write=419430400,throttling.iops-write=8000,format=raw,aio=threads,detect-zeroes=unmap' -device 'scsi-hd,bus=scsihw0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0,bootindex=100' -drive 'file=/dev/rbd/rbd/vm-104-disk-2,if=none,id=drive-scsi1,cache=writeback,discard=on,throttling.bps-read=419430400,throttling.iops-read=8000,throttling.bps-write=419430400,throttling.iops-write=8000,format=raw,aio=threads,detect-zeroes=unmap' -device 'scsi-hd,bus=scsihw0.0,channel=0,scsi-id=0,lun=1,drive=drive-scsi1,id=scsi1' -netdev 'type=tap,id=net0,ifname=tap104i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on' -device 'virtio-net-pci,mac=3A:C2:94:31:A4:C4,netdev=net0,bus=pci.0,addr=0x12,id=net0' -rtc 'driftfix=slew,base=localtime' -machine 'type=pc-q35-3.1' -global 'kvm-pit.lost_tick_policy=discard'' failed: got timeout

Code:

rbd image 'vm-104-disk-1':
        size 60 GiB in 15360 objects
        order 22 (4 MiB objects)
        snapshot_count: 0
        id: 433efbc6fb9b03
        block_name_prefix: rbd_data.433efbc6fb9b03
        format: 2
        features: layering
        op_features:
        flags:
        create_timestamp: Fri Sep  6 01:24:48 2019
        access_timestamp: Fri Sep  6 01:24:48 2019
        modify_timestamp: Fri Sep  6 01:24:48 2019

root@gpu01:~# rados -p rbd listwatchers rbd_data.433efbc6fb9b03
error listing watchers rbd/rbd_data.433efbc6fb9b03: (2) No such file or directory

root@gpu01:~# rbd status -p rbd vm-104-disk-1
Watchers: none

root@gpu01:~# rbd showmapped
id pool namespace image         snap device
0  rbd            vm-101-disk-1 -    /dev/rbd0
1  rbd            vm-101-disk-2 -    /dev/rbd1
2  rbd            vm-101-disk-0 -    /dev/rbd2
3  rbd            vm-102-disk-1 -    /dev/rbd3
4  rbd            vm-102-disk-2 -    /dev/rbd4
5  rbd            vm-102-disk-0 -    /dev/rbd5

root@gpu01:~#  ceph health detail
HEALTH_OK

There is no other rbd client open that image

# pveversion -v
proxmox-ve: 6.0-2 (running kernel: 5.0.21-2-pve)
pve-manager: 6.0-7 (running version: 6.0-7/28984024)
pve-kernel-5.0: 6.0-8
pve-kernel-helper: 6.0-8
pve-kernel-5.0.21-2-pve: 5.0.21-3
pve-kernel-5.0.21-1-pve: 5.0.21-2
pve-kernel-5.0.15-1-pve: 5.0.15-1
ceph: 14.2.2-pve1
ceph-fuse: 14.2.2-pve1
corosync: 3.0.2-pve2
criu: 3.11-3
glusterfs-client: 5.5-3
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.12-pve1
libpve-access-control: 6.0-2
libpve-apiclient-perl: 3.0-2
libpve-common-perl: 6.0-4
libpve-guest-common-perl: 3.0-1
libpve-http-server-perl: 3.0-2
libpve-storage-perl: 6.0-8
libqb0: 1.0.5-1
lvm2: 2.03.02-pve3
lxc-pve: 3.1.0-65
lxcfs: 3.0.3-pve60
novnc-pve: 1.0.0-60
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.0-7
pve-cluster: 6.0-7
pve-container: 3.0-7
pve-docs: 6.0-4
pve-edk2-firmware: 2.20190614-1
pve-firewall: 4.0-7
pve-firmware: 3.0-2
pve-ha-manager: 3.0-2
pve-i18n: 2.0-3
pve-qemu-kvm: 4.0.0-5
pve-xtermjs: 3.13.2-1
qemu-server: 6.0-7
smartmontools: 7.0-pve2
spiceterm: 3.1-1
vncterm: 1.6-1
zfsutils-linux: 0.8.1-pve2

Alwin · Oct 3, 2019

elurex said:
I am running PVE 6.0 and on this pve node,

IS this a client or part of a Ceph cluster? If part of PVE + Ceph cluster, can you please post a ceph versions?

elurex said:
root@gpu01:~# rados -p rbd listwatchers rbd_data.433efbc6fb9b03 error listing watchers rbd/rbd_data.433efbc6fb9b03: (2) No such file or directory

elurex said:
rbd: sysfs write failed can't unmap rbd device /dev/rbd/rbd/vm-104-disk-1: rbd: sysfs write failed

Is there anything in the logs (journal/syslog, ceph)?

elurex · Oct 4, 2019

Code:

root@gpu01:~# ceph versions
{
    "mon": {
        "ceph version 14.2.2 (a887fe9a5d3d97fe349065d3c1c9dbd7b8870855) nautilus (stable)": 3
    },
    "mgr": {
        "ceph version 14.2.2 (a887fe9a5d3d97fe349065d3c1c9dbd7b8870855) nautilus (stable)": 1
    },
    "osd": {
        "ceph version 14.2.2 (a887fe9a5d3d97fe349065d3c1c9dbd7b8870855) nautilus (stable)": 96
    },
    "mds": {},
    "overall": {
        "ceph version 14.2.2 (a887fe9a5d3d97fe349065d3c1c9dbd7b8870855) nautilus (stable)": 100
    }
}

syslog

Code:

root@gpu01:/var/log# cat syslog.1|grep rbd|grep unmap
Oct  2 20:37:38 gpu01-la3 pvedaemon[3351852]: can't unmap rbd device /dev/rbd/rbd/vm-103-disk-1: rbd: sysfs write failed
Oct  2 20:37:38 gpu01-la3 pvedaemon[3351852]: can't unmap rbd device /dev/rbd/rbd/vm-103-disk-2: rbd: sysfs write failed
Oct  2 20:37:39 gpu01-la3 pvedaemon[3351852]: can't unmap rbd device /dev/rbd/rbd/vm-103-disk-0: rbd: sysfs write failed
Oct  2 20:37:39 gpu01-la3 pvedaemon[3351852]: start failed: command '/usr/bin/kvm -id 103 -name winclient103 -chardev 'socket,id=qmp,path=/var/run/qemu-server/103.qmp,server,nowait' -mon 'chardev=qmp,mode=control' -chardev 'socket,id=qmp-event,path=/var/run/qmeventd.sock,reconnect=5' -mon 'chardev=qmp-event,mode=control' -pidfile /var/run/qemu-server/103.pid -daemonize -smbios 'type=1,uuid=e5190607-223f-4886-a9eb-1650123a9b66' -drive 'if=pflash,unit=0,format=raw,readonly,file=/usr/share/pve-edk2-firmware//OVMF_CODE.fd' -drive 'if=pflash,unit=1,format=raw,id=drive-efidisk0,file=/dev/rbd/rbd/vm-103-disk-0' -smp '8,sockets=1,cores=8,maxcpus=8' -nodefaults -boot 'menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg' -vga none -nographic -no-hpet -cpu 'host,+kvm_pv_unhalt,+kvm_pv_eoi,hv_vendor_id=proxmox,hv_spinlocks=0x1fff,hv_vapic,hv_time,hv_reset,hv_vpindex,hv_runtime,hv_relaxed,hv_synic,hv_stimer,hv_ipi,kvm=off' -m 16384 -object 'memory-backend-ram,id=ram-node0,size=16384M' -numa 'node,nodeid=0,cpus=0-7,memdev=ram-node0' -readconfig /usr/share/qemu-server/pve-q35.cfg -device 'usb-tablet,id=tablet,bus=ehci.0,port=1' -device 'vfio-pci,host=b2:00.0,id=hostpci0.0,bus=ich9-pcie-port-1,addr=0x0.0,multifunction=on' -device 'vfio-pci,host=b2:00.1,id=hostpci0.1,bus=ich9-pcie-port-1,addr=0x0.1' -device 'vfio-pci,host=b2:00.2,id=hostpci0.2,bus=ich9-pcie-port-1,addr=0x0.2' -device 'vfio-pci,host=b2:00.3,id=hostpci0.3,bus=ich9-pcie-port-1,addr=0x0.3' -chardev 'socket,path=/var/run/qemu-server/103.qga,server,nowait,id=qga0' -device 'virtio-serial,id=qga0,bus=pci.0,addr=0x8' -device 'virtserialport,chardev=qga0,name=org.qemu.guest_agent.0' -device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3' -iscsi 'initiator-name=iqn.1993-08.org.debian:01:cb85eb190f0' -device 'virtio-scsi-pci,id=scsihw0,bus=pci.0,addr=0x5' -drive 'file=/dev/rbd/rbd/vm-103-disk-1,if=none,id=drive-scsi0,cache=writeback,discard=on,throttling.bps-read=419430400,throttling.iops-read=8000,throttling.bps-write=419430400,throttling.iops-write=8000,format=raw,aio=threads,detect-zeroes=unmap' -device 'scsi-hd,bus=scsihw0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0,bootindex=100' -drive 'file=/dev/rbd/rbd/vm-103-disk-2,if=none,id=drive-scsi1,cache=writeback,discard=on,throttling.bps-read=419430400,throttling.iops-read=8000,throttling.bps-write=419430400,throttling.iops-write=8000,format=raw,aio=threads,detect-zeroes=unmap' -device 'scsi-hd,bus=scsihw0.0,channel=0,scsi-id=0,lun=1,drive=drive-scsi1,id=scsi1' -netdev 'type=tap,id=net0,ifname=tap103i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on' -device 'virtio-net-pci,mac=3A:C2:94:31:A4:C4,netdev=net0,bus=pci.0,addr=0x12,id=net0' -rtc 'driftfix=sle
w,base=localtime' -machine 'type=pc-q35-3.1' -global 'kvm-pit.lost_tick_policy=discard'' failed: got timeout
Oct  2 20:37:39 gpu01-la3 pvedaemon[508130]: <mquery@pam> end task UPID:gpu01:0033252C:0467553A:5D956CE3:qmstart:103:mquery@pam: start failed: command '/usr/bin/kvm -id 103 -name winclient103 -chardev 'socket,id=qmp,path=/var/run/qemu-server/103.qmp,server,nowait' -mon 'chardev=qmp,mode=control' -chardev 'socket,id=qmp-event,path=/var/run/qmeventd.sock,reconnect=5' -mon 'chardev=qmp-event,mode=control' -pidfile /var/run/qemu-server/103.pid -daemonize -smbios 'type=1,uuid=e5190607-223f-4886-a9eb-1650123a9b66' -drive 'if=pflash,unit=0,format=raw,readonly,file=/usr/share/pve-edk2-firmware//OVMF_CODE.fd' -drive 'if=pflash,unit=1,format=raw,id=drive-efidisk0,file=/dev/rbd/rbd/vm-103-disk-0' -smp '8,sockets=1,cores=8,maxcpus=8' -nodefaults -boot 'menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg' -vga none -nographic -no-hpet -cpu 'host,+kvm_pv_unhalt,+kvm_pv_eoi,hv_vendor_id=proxmox,hv_spinlocks=0x1fff,hv_vapic,hv_time,hv_reset,hv_vpindex,hv_runtime,hv_relaxed,hv_synic,hv_stimer,hv_ipi,kvm=off' -m 16384 -object 'memory-backend-ram,id=ram-node0,size=16384M' -numa 'node,nodeid=0,cpus=0-7,memdev=ram-node0' -readconfig /usr/share/qemu-server/pve-q35.cfg -device 'usb-tablet,id=tablet,bus=ehci.0,port=1' -device 'vfio-pci,host=b2:00.0,id=hostpci0.0,bus=ich9-pcie-port-1,addr=0x0.0,multifunction=on' -device 'vfio-pci,host=b2:00.1,id=hostpci0.1,bus=ich9-pcie-port-1,addr=0x0.1' -device 'vfio-pci,host=b2:00.2,id=hostpci0.2,bus=ich9-pcie-port-1,addr=0x0.2' -device 'vfio-pci,host=b2:00.3,id=hostpci0.3,bus=ich9-pcie-port-1,addr=0x0.3' -chardev 'socket,path=/var/run/qemu-server/103.qga,server,nowait,id=qga0' -device 'virtio-serial,id=qga0,bus=pci.0,addr=0x8' -device 'virtserialport,chardev=qga0,name=org.qemu.guest_agent.0' -device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3' -iscsi 'initiator-name=iqn.1993-08.org.debian:01:cb85eb190f0' -device 'virtio-scsi-pci,id=scsihw0,bus=pci.0,addr=0x5' -drive 'file=/dev/rbd/rbd/vm-103-disk-1,if=none,id=drive-scsi0,cache=writeback,discard=on,throttling.bps-read=419430400,throttling.iops-read=8000,throttling.bps-write=419430400,throttling.iops-write=8000,format=raw,aio=threads,detect-zeroes=unmap' -device 'scsi-hd,bus=scsihw0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0,bootindex=100' -drive 'file=/dev/rbd/rbd/vm-103-disk-2,if=none,id=drive-scsi1,cache=writeback,discard=on,throttling.bps-read=419430400,throttling.iops-read=8000,throttling.bps-write=419430400,throttling.iops-write=8000,format=raw,aio=threads,detect-zeroes=unmap' -device 'scsi-hd,bus=scsihw0.0,channel=0,scsi-id=0,lun=1,drive=drive-scsi1,id=scsi1' -netdev 'type=tap,id=net0,ifname=tap103i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on' -device 'virtio-net-pc
i,mac=3A:C2:94:31:A4:C4,netdev=net0,bus=pci.0,addr=0x12,id=net0' -rtc 'driftfix=slew,base=localtime' -machine 'type=pc-q35-3.1' -global 'kvm-pit.lost_tick_policy=discard'' failed: got timeout

Nothing on the ceph log

spirit · Oct 4, 2019

as workaround, do you have tried with librbd ? (I'm seeing that you use krbd)

spirit · Oct 4, 2019

also, what is the result of

#rbd info rbd/vm-104-disk-1

elurex · Oct 4, 2019

Code:

rbd image 'vm-104-disk-1':
        size 60 GiB in 15360 objects
        order 22 (4 MiB objects)
        snapshot_count: 0
        id: 433efbc6fb9b03
        block_name_prefix: rbd_data.433efbc6fb9b03
        format: 2
        features: layering
        op_features:
        flags:
        create_timestamp: Fri Sep  6 01:24:48 2019
        access_timestamp: Fri Sep  6 01:24:48 2019
        modify_timestamp: Fri Sep  6 01:24:48 2019

provided in first post

spirit said:
also, what is the result of

#rbd info rbd/vm-104-disk-1

spirit · Oct 4, 2019

currently proxmox disable some features

/usr/share/perl5/PVE/Storage/RBDPlugin.pm

my $krbd_feature_blacklist = ['deep-flatten', 'fast-diff', 'object-map', 'exclusive-lock'];

This was for old kernel on proxmox5, but this has not been updated.

I'm thinking about exclusive-lock (since kernel 4.10), maybe it could help in your case.

#rbd feature enable rbd/vm-104-disk-1 exclusive-lock

elurex · Oct 8, 2019

spirit said:
.....

#rbd feature enable rbd/vm-104-disk-1 exclusive-lock

Does not work
........
root@gpu05:~# rbd feature enable rbd/vm-104-disk-0 exclusive-lock
root@gpu05:~# rbd feature enable rbd/vm-104-disk-1 exclusive-lock
root@gpu05:~# rbd feature enable rbd/vm-104-disk-2 exclusive-lock

Code:

/dev/rbd9
/dev/rbd10
/dev/rbd11
rbd: sysfs write failed
can't unmap rbd device /dev/rbd/rbd/vm-104-disk-1: rbd: sysfs write failed
rbd: sysfs write failed
can't unmap rbd device /dev/rbd/rbd/vm-104-disk-2: rbd: sysfs write failed
rbd: sysfs write failed
can't unmap rbd device /dev/rbd/rbd/vm-104-disk-0: rbd: sysfs write failed
TASK ERROR: start failed: command '/usr/bin/kvm -id 104 -name winclient104 -chardev 'socket,id=qmp,path=/var/run/qemu-server/104.qmp,server,nowait' -mon 'chardev=qmp,mode=control' -chardev 'socket,id=qmp-event,path=/var/run/qmeventd.sock,reconnect=5' -mon 'chardev=qmp-event,mode=control' -pidfile /var/run/qemu-server/104.pid -daemonize -smbios 'type=1,uuid=b9595b11-142c-41b5-bba9-25f359770a91' -drive 'if=pflash,unit=0,format=raw,readonly,file=/usr/share/pve-edk2-firmware//OVMF_CODE.fd' -drive 'if=pflash,unit=1,format=raw,id=drive-efidisk0,file=/dev/rbd/rbd/vm-104-disk-0' -smp '8,sockets=1,cores=8,maxcpus=8' -nodefaults -boot 'menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg' -vga none -nographic -no-hpet -cpu 'host,+kvm_pv_unhalt,+kvm_pv_eoi,hv_vendor_id=proxmox,hv_spinlocks=0x1fff,hv_vapic,hv_time,hv_reset,hv_vpindex,hv_runtime,hv_relaxed,hv_synic,hv_stimer,hv_ipi,kvm=off' -m 16384 -object 'memory-backend-ram,id=ram-node0,size=16384M' -numa 'node,nodeid=0,cpus=0-7,memdev=ram-node0' -readconfig /usr/share/qemu-server/pve-q35.cfg -device 'usb-tablet,id=tablet,bus=ehci.0,port=1' -device 'vfio-pci,host=3e:00.0,id=hostpci0.0,bus=ich9-pcie-port-1,addr=0x0.0,multifunction=on' -device 'vfio-pci,host=3e:00.1,id=hostpci0.1,bus=ich9-pcie-port-1,addr=0x0.1' -device 'vfio-pci,host=3e:00.2,id=hostpci0.2,bus=ich9-pcie-port-1,addr=0x0.2' -device 'vfio-pci,host=3e:00.3,id=hostpci0.3,bus=ich9-pcie-port-1,addr=0x0.3' -chardev 'socket,path=/var/run/qemu-server/104.qga,server,nowait,id=qga0' -device 'virtio-serial,id=qga0,bus=pci.0,addr=0x8' -device 'virtserialport,chardev=qga0,name=org.qemu.guest_agent.0' -device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3' -iscsi 'initiator-name=iqn.1993-08.org.debian:01:cb85eb190f0' -device 'virtio-scsi-pci,id=scsihw0,bus=pci.0,addr=0x5' -drive 'file=/dev/rbd/rbd/vm-104-disk-1,if=none,id=drive-scsi0,cache=writeback,discard=on,throttling.bps-read=419430400,throttling.iops-read=8000,throttling.bps-write=419430400,throttling.iops-write=8000,format=raw,aio=threads,detect-zeroes=unmap' -device 'scsi-hd,bus=scsihw0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0,bootindex=100' -drive 'file=/dev/rbd/rbd/vm-104-disk-2,if=none,id=drive-scsi1,cache=writeback,discard=on,throttling.bps-read=419430400,throttling.iops-read=8000,throttling.bps-write=419430400,throttling.iops-write=8000,format=raw,aio=threads,detect-zeroes=unmap' -device 'scsi-hd,bus=scsihw0.0,channel=0,scsi-id=0,lun=1,drive=drive-scsi1,id=scsi1' -netdev 'type=tap,id=net0,ifname=tap104i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on' -device 'virtio-net-pci,mac=86:FE:6B:F7:96:F4,netdev=net0,bus=pci.0,addr=0x12,id=net0' -rtc 'driftfix=slew,base=localtime' -machine 'type=pc-q35-3.1' -global 'kvm-pit.lost_tick_policy=discard'' failed: got timeout

elurex · Oct 9, 2019

Code:

root@gpu01:~# rbd showmapped
id pool namespace image         snap device
0  rbd            vm-101-disk-1 -    /dev/rbd0
1  rbd            vm-101-disk-2 -    /dev/rbd1
2  rbd            vm-101-disk-0 -    /dev/rbd2
3  rbd            vm-104-disk-1 -    /dev/rbd3
4  rbd            vm-104-disk-2 -    /dev/rbd4
5  rbd            vm-104-disk-0 -    /dev/rbd5
6  rbd            vm-102-disk-1 -    /dev/rbd6
7  rbd            vm-102-disk-2 -    /dev/rbd7
8  rbd            vm-102-disk-0 -    /dev/rbd8

root@gpu01:~# rbd map rbd/vm-103-disk-2
/dev/rbd9

root@gpu01:~# rbd unmap rbd/vm-103-disk-2

This is strange because doing via cli it all works, but when pve trying to start a vm with RBD disk, it will error

spirit · Oct 9, 2019

could be permission related
https://forum.proxmox.com/threads/mapping-image-fails-with-error-rbd-sysfs-write-failed.51070/

do you have cephx auth enabled in ceph.conf?

elurex · Oct 14, 2019

no... cephx is disabled

spirit · Oct 14, 2019

elurex said:
no... cephx is disabled

are you sure to don't have any key in /etc/pve/priv/ceph ? (because proxmox will try to use them if they are present)

elurex · Oct 14, 2019

no, I don't....

Code:

root@gpu01:/etc/pve/priv/ceph# ls -al
total 0
drwx------ 2 root www-data 0 Aug 12 05:10 .
drwx------ 2 root www-data 0 May  3 05:02 ..

A temporarily work around this issue is to reboot pve host,
then it will stop working again mounting additional rbd disk with sysfs write error after sometimes

neodemus · Jun 11, 2025

another workaround is to migrate the vm to another host

Search

Search

Ceph rbd error - sysfs write failed

elurex

Active Member

Alwin

Proxmox Retired Staff

elurex

Active Member

spirit

Distinguished Member

spirit

Distinguished Member

elurex

Active Member

spirit

Distinguished Member

elurex

Active Member

elurex

Active Member

spirit

Distinguished Member

elurex

Active Member

spirit

Distinguished Member

elurex

Active Member

neodemus

New Member

We value your privacy