Hello,
Complete newbie here regarding proxmox VE... I recently installed a 4node cluster with a ceph shared storage. Everything worked great until I recently upgraded one of the nodes... after the upgrade I rebooted the node and noticed I "lost network" across all the VMs in the cluster... it was weird since VMs kept working fine for a while but after then they would "lock up". I eventually was able to find out that it was not really the network but that the storage was placed in a kind of read only status while the node was rebooting.
This happens every time I reboot any of the 4 nodes... while the node is rebooting (2-4mins) the ceph cluster locks up and in turn locks up the VMs.. it's all inmediately restored as soon as the rebooting node boots and joins the cluster again.
As a test... I completely shutdown one of the nodes and was able to see in Proxmox's Ceph gui that the storage rebuilt itself and this took about 15mins.
So my questions are.... is this normal behavior? Can it be configured to have a better response or reduce the lock up times?
As a reference, each of my 4 hosts/nodes have a dedicated storage network of 2x1GB links in bond node.
Here is my config for node1 in case it helps>
[global]
auth_client_required = cephx
auth_cluster_required = cephx
auth_service_required = cephx
cluster_network = 172.16.10.11/24
fsid = 0f3ad8a1-4bd8-43fa-b00f-a12b411d9112
mon_allow_pool_delete = true
mon_host = 172.16.9.11 172.16.9.12 172.16.9.13 172.16.9.14
osd_pool_default_min_size = 2
osd_pool_default_size = 3
public_network = 172.16.9.11/24
[client]
keyring = /etc/pve/priv/$cluster.$name.keyring
[mon.proxmox01]
public_addr = 172.16.9.11
[mon.proxmox02]
public_addr = 172.16.9.12
[mon.proxmox03]
public_addr = 172.16.9.13
[mon.proxmox04]
public_addr = 172.16.9.14
Id appreciate if anyone can give some suggestions on how to solve this or improve it. Thanks!
Complete newbie here regarding proxmox VE... I recently installed a 4node cluster with a ceph shared storage. Everything worked great until I recently upgraded one of the nodes... after the upgrade I rebooted the node and noticed I "lost network" across all the VMs in the cluster... it was weird since VMs kept working fine for a while but after then they would "lock up". I eventually was able to find out that it was not really the network but that the storage was placed in a kind of read only status while the node was rebooting.
This happens every time I reboot any of the 4 nodes... while the node is rebooting (2-4mins) the ceph cluster locks up and in turn locks up the VMs.. it's all inmediately restored as soon as the rebooting node boots and joins the cluster again.
As a test... I completely shutdown one of the nodes and was able to see in Proxmox's Ceph gui that the storage rebuilt itself and this took about 15mins.
So my questions are.... is this normal behavior? Can it be configured to have a better response or reduce the lock up times?
As a reference, each of my 4 hosts/nodes have a dedicated storage network of 2x1GB links in bond node.
Here is my config for node1 in case it helps>
[global]
auth_client_required = cephx
auth_cluster_required = cephx
auth_service_required = cephx
cluster_network = 172.16.10.11/24
fsid = 0f3ad8a1-4bd8-43fa-b00f-a12b411d9112
mon_allow_pool_delete = true
mon_host = 172.16.9.11 172.16.9.12 172.16.9.13 172.16.9.14
osd_pool_default_min_size = 2
osd_pool_default_size = 3
public_network = 172.16.9.11/24
[client]
keyring = /etc/pve/priv/$cluster.$name.keyring
[mon.proxmox01]
public_addr = 172.16.9.11
[mon.proxmox02]
public_addr = 172.16.9.12
[mon.proxmox03]
public_addr = 172.16.9.13
[mon.proxmox04]
public_addr = 172.16.9.14
Id appreciate if anyone can give some suggestions on how to solve this or improve it. Thanks!