HI there,
We're evaluating Proxmox as a virtual environment solution, and I've run up 3 x nodes in 3 different datacenters, and by and large we're getting the same errors on all datacenters with Ceph (HEALTH WARN) - mainly creating+peering
I've trawled through a lot of documentation, so hoping someone may have some ideas on where I'm going wrong. By and large I'm doing a default install of Proxmox.
Configuration is as follows:
- 3 x identical servers with 6 x SSD (4 x SSD for Ceph)
- 1 gig public / 10G private network - MTU 9000 on 10G network. Public network is tagged vlan on vmbr.xxx with bond0 as underlying. bond1 is dedicated as an access port to cluster network.
- Created mon/mgr on all nodes
- Created OSD on all disks
- Created pool with defaults
So in the first two DC's I have varying issues - but DC 1 has the following:
And DC3 is in an even worse state:
What am I doing wrong? I must be missing a trick somewhere, but I can't find any reason for it. Any help would be appreciated!!! I've search through the dumps, followed a lot of guides, but can't work out what I'm doing wrong here.
We're evaluating Proxmox as a virtual environment solution, and I've run up 3 x nodes in 3 different datacenters, and by and large we're getting the same errors on all datacenters with Ceph (HEALTH WARN) - mainly creating+peering
I've trawled through a lot of documentation, so hoping someone may have some ideas on where I'm going wrong. By and large I'm doing a default install of Proxmox.
Configuration is as follows:
- 3 x identical servers with 6 x SSD (4 x SSD for Ceph)
- 1 gig public / 10G private network - MTU 9000 on 10G network. Public network is tagged vlan on vmbr.xxx with bond0 as underlying. bond1 is dedicated as an access port to cluster network.
- Created mon/mgr on all nodes
- Created OSD on all disks
- Created pool with defaults
So in the first two DC's I have varying issues - but DC 1 has the following:
Code:
root@pve002-ewr:~# ceph -s
cluster:
health: HEALTH_WARN
Reduced data availability: 23 pgs inactive, 22 pgs peering
24 slow ops, oldest one blocked for 4219 sec, daemons [osd.5,osd.7] have slow ops.
services:
mon: 3 daemons, quorum pve002-ewr,pve003-ewr,pve004-ewr (age 2d)
mgr: pve003-ewr(active, since 2d), standbys: pve002-ewr, pve004-ewr
osd: 12 osds: 12 up (since 67m), 12 in (since 67m); 1 remapped pgs
data:
pools: 2 pools, 129 pgs
objects: 2 objects, 128 KiB
usage: 325 MiB used, 21 TiB / 21 TiB avail
pgs: 17.829% pgs not active
106 active+clean
21 creating+peering
1 creating+activating
1 remapped+peering
And DC3 is in an even worse state:
Code:
root@pve002-sjc:~# ceph -s
cluster:
health: HEALTH_WARN
Reduced data availability: 129 pgs inactive, 129 pgs peering
258 slow ops, oldest one blocked for 274222 sec, daemons [osd.0,osd.3,osd.9] have slow ops.
services:
mon: 3 daemons, quorum pve002-sjc,pve004-sjc,pve003-sjc (age 3d)
mgr: pve004-sjc(active, since 3d), standbys: pve002-sjc, pve003-sjc
osd: 12 osds: 12 up (since 3d), 12 in (since 3d)
data:
pools: 2 pools, 129 pgs
objects: 0 objects, 0 B
usage: 338 MiB used, 21 TiB / 21 TiB avail
pgs: 100.000% pgs not active
129 creating+peering
What am I doing wrong? I must be missing a trick somewhere, but I can't find any reason for it. Any help would be appreciated!!! I've search through the dumps, followed a lot of guides, but can't work out what I'm doing wrong here.
Last edited: