has serial but no model error ceph

Aug 31, 2019
13
0
1
42
-- Logs begin at Thu 2019-09-19 16:56:44 EDT, end at Thu 2019-09-19 17:06:52 EDT. --
Sep 19 16:56:55 hypervisor01 systemd[1]: Started Ceph cluster monitor daemon.
Sep 19 16:56:57 hypervisor01 ceph-mon[1556]: 2019-09-19 16:56:57.376 7f87104a63c0 -1 mon.hypervisor01@0(electing) e1 failed to get devid for : fallback method has serial ''but no model

This has degraded all the disks and has filled my logs with this
019-09-19 17:10:09.610290 mgr.hypervisor01 (mgr.14121) 388 : cluster [DBG] pgmap v387: 128 pgs: 128 undersized+peered; 0 B data, 8.8 MiB used, 8.2 TiB / 8.2 TiB avail
2019-09-19 17:10:11.611104 mgr.hypervisor01 (mgr.14121) 389 : cluster [DBG] pgmap v388: 128 pgs: 128 undersized+peered; 0 B data, 8.8 MiB used, 8.2 TiB / 8.2 TiB avail
2019-09-19 17:10:13.611660 mgr.hypervisor01 (mgr.14121) 390 : cluster [DBG] pgmap v389: 128 pgs: 128 undersized+peered; 0 B data, 8.8 MiB used, 8.2 TiB / 8.2 TiB avail
2019-09-19 17:10:15.612369 mgr.hypervisor01 (mgr.14121) 391 : cluster [DBG] pgmap v390: 128 pgs: 128 undersized+peered; 0 B data, 8.8 MiB used, 8.2 TiB / 8.2 TiB avail
2019-09-19 17:10:17.613062 mgr.hypervisor01 (mgr.14121) 392 : cluster [DBG] pgmap v391: 128 pgs: 128 undersized+peered; 0 B data, 8.8 MiB used, 8.2 TiB / 8.2 TiB avail
2019-09-19 17:10:19.613715 mgr.hypervisor01 (mgr.14121) 393 : cluster [DBG] pgmap v392: 128 pgs: 128 undersized+peered; 0 B data, 8.8 MiB used, 8.2 TiB / 8.2 TiB avail
2019-09-19 17:10:21.614576 mgr.hypervisor01 (mgr.14121) 394 : cluster [DBG] pgmap v393: 128 pgs: 128 undersized+peered; 0 B data, 8.8 MiB used, 8.2 TiB / 8.2 TiB avail
2019-09-19 17:10:23.615249 mgr.hypervisor01 (mgr.14121) 395 : cluster [DBG] pgmap v394: 128 pgs: 128 undersized+peered; 0 B data, 8.8 MiB used, 8.2 TiB / 8.2 TiB avail
2019-09-19 17:10:25.615981 mgr.hypervisor01 (mgr.14121) 396 : cluster [DBG] pgmap v395: 128 pgs: 128 undersized+peered; 0 B data, 8.8 MiB used, 8.2 TiB / 8.2 TiB avail
2019-09-19 17:10:27.616684 mgr.hypervisor01 (mgr.14121) 397 : cluster [DBG] pgmap v396: 128 pgs: 128 undersized+peered; 0 B data, 8.8 MiB used, 8.2 TiB / 8.2 TiB avail
2019-09-19 17:10:29.617287 mgr.hypervisor01 (mgr.14121) 398 : cluster [DBG] pgmap v397: 128 pgs: 128 undersized+peered; 0 B data, 8.8 MiB used, 8.2 TiB / 8.2 TiB avail
2019-09-19 17:10:31.617990 mgr.hypervisor01 (mgr.14121) 399 : cluster [DBG] pgmap v398: 128 pgs: 128 undersized+peered; 0 B data, 8.8 MiB used, 8.2 TiB / 8.2 TiB avail
2019-09-19 17:10:33.618626 mgr.hypervisor01 (mgr.14121) 400 : cluster [DBG] pgmap v399: 128 pgs: 128 undersized+peered; 0 B data, 8.8 MiB used, 8.2 TiB / 8.2 TiB avail
2019-09-19 17:10:35.619344 mgr.hypervisor01 (mgr.14121) 401 : cluster [DBG] pgmap v400: 128 pgs: 128 undersized+peered; 0 B data, 8.8 MiB used, 8.2 TiB / 8.2 TiB avail
2019-09-19 17:10:37.619997 mgr.hypervisor01 (mgr.14121) 402 : cluster [DBG] pgmap v401: 128 pgs: 128 undersized+peered; 0 B data, 8.8 MiB used, 8.2 TiB / 8.2 TiB avail

can someone shed some light this issue please?
Thanks!
 
What doese ceph -s and ceph osd df tree show? Can you please explain in more detail your cluster setup?
 
found that bond type was not set to active-failover fixed and rebooted
but still get WARN state
havent added any other nodes to cluster yet as I need to get this disk cleean so I can move VMs by hand here first
reformat and then join my other nodes
root@hypervisor01:~# ceph osd df tree
ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE VAR PGS STATUS TYPE NAME
-1 8.18546 - 8.2 TiB 3.0 GiB 9.8 MiB 0 B 3 GiB 8.2 TiB 0.04 1.00 - root default
-3 8.18546 - 8.2 TiB 3.0 GiB 9.8 MiB 0 B 3 GiB 8.2 TiB 0.04 1.00 - host hypervisor01
0 hdd 2.72849 1.00000 2.7 TiB 1.0 GiB 3.2 MiB 0 B 1 GiB 2.7 TiB 0.04 1.00 46 up osd.0
1 hdd 2.72849 1.00000 2.7 TiB 1.0 GiB 3.2 MiB 0 B 1 GiB 2.7 TiB 0.04 1.00 46 up osd.1
2 hdd 2.72849 1.00000 2.7 TiB 1.0 GiB 3.2 MiB 0 B 1 GiB 2.7 TiB 0.04 1.00 36 up osd.2
TOTAL 8.2 TiB 3.0 GiB 9.8 MiB 0 B 3 GiB 8.2 TiB 0.04
MIN/MAX VAR: 1.00/1.00 STDDEV: 0
root@hypervisor01:~# ceph -s
cluster:
id: 327c0bb6-ef1d-47ed-8a3d-a7ca913b13d9
health: HEALTH_WARN
Reduced data availability: 128 pgs inactive
Degraded data redundancy: 128 pgs undersized
128 pgs not deep-scrubbed in time
128 pgs not scrubbed in time

services:
mon: 1 daemons, quorum hypervisor01 (age 10m)
mgr: hypervisor01(active, since 9m)
osd: 3 osds: 3 up (since 10m), 3 in (since 12d)

data:
pools: 1 pools, 128 pgs
objects: 0 objects, 0 B
usage: 3.0 GiB used, 8.2 TiB / 8.2 TiB avail
pgs: 100.000% pgs not active
128 undersized+peered
if I can get back to an OK state with all space I can proceed quickly in getting things up and running with the other node joins
 
Ceph needs 3x nodes to start with, not only for quorum, also for data redundancy. Add two more nodes with MON and OSD to get Ceph running properly.
 
is ceph the only way to share local storage as ideally the this cluster will have 2 nodes to start with and I need that to work first before adding a third
and I cant add nodes as that existing data is local and cant be converted to a bluestore OSD as there is data on those drives
any suggestions would be ideal
 
so I can do away with ceph entirely in this case?
my existing nodes are 5.3.8
can I join them to a 6.0 cluster to get the data synced move the VMs and then upgrade the older nodes

I cannot/do not want to do an inplace upgrade for a number of reasons (downtime/risk being a major one)

my 5.3.8 nodes do not store on ZFS but I guess that is OK I can move the data to do 6.0 node which has ZFS and reconfigure as needed
should I raid the ZFS storage or HBA it? system has 4 x 4TB disks currently setup for HBA due to ceph but can be raided if better
 
so now I just need to know if 5.3.8 cluster can join a 6.0 cluster to move my data and i'm all set!
If this was your last question, you may know already. It isn't possible to join a PVE 5.3 with PVE 6.0 node. You will need to upgrade to PVE 5.4 first, install corosync 3 and then you could add the PVE 6.0 node to the PVE 5.4.
 
To be clear
now I am using ZFS and not ceph /bluestore
Can they be joined I cannot have down time in upgrading the first node to 5.4
I just need to find a way to get the data off the first node with little to zero downtime or reboots then I can upgrade it to 6
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!