get nodes/$node/storage showing 0 byte for ceph pool

alexskysilk · Sep 6, 2018

I have this intermittent problem with storage returning 0 values for a specific rbd pool. Its only happening on one cluster, and there doesnt seem to be a corrolation to which node context is being called:

Code:

{"CODE":"OK","ERRORS":"","proxmoxRes":{"active":0,"avail":0,"content":"rootdir,images","enabled":1,"shared":1,"total":0,"type":"rbd","used":0,"data":{"active":0,"content":"rootdir,images","avail":0,"shared":1,"used":0,"total":0,"enabled":1,"type":"rbd"},"errors":null,"status":null,"success":1,"message":null},"request":null}

If I run the query in pvesh, I get a timeout before the 0 response:

Code:

pvesh get nodes/sky11/storage/vdisk-3pg/status
got timeout
200 OK
{
   "active" : 0,
   "avail" : 0,
   "content" : "rootdir,images",
   "enabled" : 1,
   "shared" : 1,
   "total" : 0,
   "type" : "rbd",
   "used" : 0
}

Why is it timing out? none of the nodes are overloaded, and pveproxy is showing any issues.

Code:

# pveversion -v
proxmox-ve: 5.2-2 (running kernel: 4.15.17-3-pve)
pve-manager: 5.2-3 (running version: 5.2-3/785ba980)
pve-kernel-4.15: 5.2-3
pve-kernel-4.15.17-3-pve: 4.15.17-13
pve-kernel-4.15.17-1-pve: 4.15.17-9
pve-kernel-4.15.3-1-pve: 4.15.3-1
ceph: 12.2.5-pve1
corosync: 2.4.2-pve5
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: 1.2-2
libjs-extjs: 6.0.1-2
libpve-access-control: 5.0-8
libpve-apiclient-perl: 2.0-4
libpve-common-perl: 5.0-34
libpve-guest-common-perl: 2.0-17
libpve-http-server-perl: 2.0-9
libpve-storage-perl: 5.0-23
libqb0: 1.0.1-1
lvm2: 2.02.168-pve6
lxc-pve: 3.0.0-3
lxcfs: 3.0.0-1
novnc-pve: 1.0.0-1
proxmox-widget-toolkit: 1.0-19
pve-cluster: 5.0-27
pve-container: 2.0-23
pve-docs: 5.2-4
pve-firewall: 3.0-12
pve-firmware: 2.0-4
pve-ha-manager: 2.0-5
pve-i18n: 1.0-6
pve-libspice-server1: 0.12.8-3
pve-qemu-kvm: 2.11.1-5
pve-xtermjs: 1.0-5
qemu-server: 5.0-29
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3
zfsutils-linux: 0.7.9-pve1~bpo9

Alwin · Sep 7, 2018

Is the storage accessible through rbd command line? If it is an external ceph cluster, is the keyring file at /etc/pve/priv/ceph/'?

alexskysilk · Sep 7, 2018

yes, storage is accessible normally and operating normally; when the query is rerun it will yield the correct output, and sometimes not.

Alwin · Sep 10, 2018

Are all MONs accessible through the PVE node? The timeout could come from a MON not being reachable, while the rest is.

alexskysilk · Sep 10, 2018

Alwin said:
Are all MONs accessible through the PVE node? The timeout could come from a MON not being reachable, while the rest is.

Ok, but how would I go about checking/verifying this?

Alwin · Sep 11, 2018

Is the port of every MON accessible (telnet/netcat)? Maybe a firewall/routing issue?

alexskysilk · Sep 11, 2018

Alwin said:
Is the port of every MON accessible (telnet/netcat)? Maybe a firewall/routing issue?

Yes. telneting connects every time to all monitors. not sure how/why it would be firewall related, there is no firewall (software or hardware) enabled on that subnet, its dedicated to ceph traffic.

Alwin · Sep 12, 2018

Does a 'ceph -m monhost mon_status' to each of the MONs work?

For the moment, I believe not all MONs are (equally?) reachable, as I have seen in the past, the "sometimes empty" results from such behavior.

alexskysilk said:
not sure how/why it would be firewall related, there is no firewall (software or hardware) enabled on that subnet, its dedicated to ceph traffic.

Just going through the usual questions, as with remote diagnosis you never know, what is and what isn't.

alexskysilk · Sep 13, 2018

Alwin said:
For the moment, I believe not all MONs are (equally?) reachable, as I have seen in the past, the "sometimes empty" results from such behavior.

That seems logical. I ran tried randomly to call the monitors and in at least one instance it just hung without replying. I will move the defective monitor but how do I troubleshoot why its not responding?

Alwin · Sep 13, 2018

The logs on the MON may give any clues, if it is some network issue, then you maybe see dropped packets.

Search

Search

get nodes/$node/storage showing 0 byte for ceph pool

alexskysilk

Distinguished Member

Alwin

Proxmox Retired Staff

alexskysilk

Distinguished Member

Alwin

Proxmox Retired Staff

alexskysilk

Distinguished Member

Alwin

Proxmox Retired Staff

alexskysilk

Distinguished Member

Alwin

Proxmox Retired Staff

alexskysilk

Distinguished Member

Alwin

Proxmox Retired Staff

We value your privacy