I have a 5 node PVE cluster (ver 6.4-1 running kernel: 5.4.124-1-pve) with existing 4 node ceph storage installed under proxmox (ver 14.2.6) - working great for at least 700 days.
Today I added secondary 4 node ceph cluster working under proxmox ver 16.2.7. This cluster was working in the lab and added to at least two independent nodes in the lab -- working great.
When I added the new storage I lost majority of GUI on node 1 - I could not list storage , I could not look at VM settings nor console, the only part that was working was the summary page - also all the VMs under this node were operational. On other 4 nodes the only GUI thing that was not working was the VM console. The cluster had quorum. I ended up removing the rbd pool via CLI with: pvesm remove cephhdd-pool-2 command and I did not have to use --force. After removing it the cluster was operational but the GUI was the same broken.
here is output from CLI from node :
root@pve01:/etc/pve/priv/ceph# systemctl status pvedaemon.service
â— pvedaemon.service - PVE API Daemon
Loaded: loaded (/lib/systemd/system/pvedaemon.service; enabled; vendor preset: enabled)
Active: active (running) since Tue 2021-08-31 05:38:12 EDT; 5 months 15 days ago
Process: 2290 ExecStart=/usr/bin/pvedaemon start (code=exited, status=0/SUCCESS)
Main PID: 2318 (pvedaemon)
Tasks: 46 (limit: 13516)
Memory: 281.9M
CGroup: /system.slice/pvedaemon.service
â”─ 2318 pvedaemon
â”─11319 pvedaemon worker
â”─11506 /usr/bin/rbd -p cephhdd-pool-2 -m 10.221.1.170,10.221.1.171,10.221.1.172,10.221.1.173 --auth_supported cephx -n client.admin --keyring /etc/pve/priv/ceph/cephhdd-pool-2.keyring ls -l --format json
â”─11536 /usr/bin/rbd -p cephhdd-pool-2 -m 10.221.1.170,10.221.1.171,10.221.1.172,10.221.1.173 --auth_supported cephx -n client.admin --keyring /etc/pve/priv/ceph/cephhdd-pool-2.keyring ls -l --format json
â”─11977 /usr/bin/rbd -p cephhdd-pool-2 -m 10.221.1.170,10.221.1.171,10.221.1.172,10.221.1.173 --auth_supported cephx -n client.admin --keyring /etc/pve/priv/ceph/cephhdd-pool-2.keyring ls -l --format json
â”─63788 pvedaemon worker
└─88845 pvedaemon worker
Feb 14 19:49:38 pve01 pvedaemon[88845]: rados_connect failed - Operation not supported
Feb 14 19:49:40 pve01 pvedaemon[88845]: rados_connect failed - Operation not supported
Feb 14 19:49:42 pve01 pvedaemon[88845]: rados_connect failed - Operation not supported
Feb 14 19:49:48 pve01 pvedaemon[11319]: rados_connect failed - Operation not supported
Feb 14 19:49:50 pve01 pvedaemon[11319]: rados_connect failed - Operation not supported
Feb 14 19:49:52 pve01 pvedaemon[11319]: rados_connect failed - Operation not supported
Feb 14 19:49:54 pve01 pvedaemon[11319]: rados_connect failed - Operation not supported
Feb 14 19:51:03 pve01 pvedaemon[63788]: <root@pam> successful auth for user 'root@pam'
Feb 14 19:51:42 pve01 pvedaemon[88845]: got timeout
Feb 14 19:51:48 pve01 pvedaemon[88845]: got timeout
Here is proxy:
root@pve01:/etc/pve/priv/ceph# systemctl status pveproxy.service
â— pveproxy.service - PVE API Proxy Server
Loaded: loaded (/lib/systemd/system/pveproxy.service; enabled; vendor preset: enabled)
Active: active (running) since Tue 2021-08-31 05:38:13 EDT; 5 months 15 days ago
Process: 2322 ExecStartPre=/usr/bin/pvecm updatecerts --silent (code=exited, status=1/FAILURE)
Process: 2326 ExecStart=/usr/bin/pveproxy start (code=exited, status=0/SUCCESS)
Process: 66247 ExecReload=/usr/bin/pveproxy restart (code=exited, status=0/SUCCESS)
Main PID: 2328 (pveproxy)
Tasks: 4 (limit: 13516)
Memory: 322.9M
CGroup: /system.slice/pveproxy.service
â”─ 2328 pveproxy
â”─31961 pveproxy worker
â”─39707 pveproxy worker
└─42223 pveproxy worker
Feb 14 20:38:56 pve01 pveproxy[31961]: proxy detected vanished client connection
Feb 14 20:39:30 pve01 pveproxy[39707]: proxy detected vanished client connection
Feb 14 20:39:34 pve01 pveproxy[42223]: proxy detected vanished client connection
Feb 14 20:43:31 pve01 pveproxy[42223]: proxy detected vanished client connection
Feb 14 20:43:39 pve01 pveproxy[31961]: proxy detected vanished client connection
Feb 14 20:44:12 pve01 pveproxy[31961]: proxy detected vanished client connection
Feb 14 20:44:33 pve01 pveproxy[42223]: proxy detected vanished client connection
Feb 14 20:45:23 pve01 pveproxy[31961]: proxy detected vanished client connection
Feb 14 20:45:31 pve01 pveproxy[39707]: proxy detected vanished client connection
Feb 14 20:45:53 pve01 pveproxy[42223]: proxy detected vanished client connectio
Here is pvestatd status:
root@pve01:/etc/pve/priv/ceph# systemctl status pvestatd.service
â— pvestatd.service - PVE Status Daemon
Loaded: loaded (/lib/systemd/system/pvestatd.service; enabled; vendor preset: enabled)
Active: active (running) since Tue 2021-08-31 05:38:11 EDT; 5 months 15 days ago
Process: 2271 ExecStart=/usr/bin/pvestatd start (code=exited, status=0/SUCCESS)
Main PID: 2298 (pvestatd)
Tasks: 1 (limit: 13516)
Memory: 224.5M
CGroup: /system.slice/pvestatd.service
└─2298 pvestatd
Feb 14 20:29:09 pve01 pvestatd[2298]: rados_connect failed - Operation not supported
Feb 14 20:29:19 pve01 pvestatd[2298]: rados_connect failed - Operation not supported
Feb 14 20:29:29 pve01 pvestatd[2298]: rados_connect failed - Operation not supported
Feb 14 20:29:39 pve01 pvestatd[2298]: rados_connect failed - Operation not supported
Feb 14 20:29:50 pve01 pvestatd[2298]: rados_connect failed - Operation not supported
Feb 14 20:29:59 pve01 pvestatd[2298]: rados_connect failed - Operation not supported
Feb 14 20:30:10 pve01 pvestatd[2298]: rados_connect failed - Operation not supported
Feb 14 20:30:19 pve01 pvestatd[2298]: rados_connect failed - Operation not supported
Feb 14 20:30:30 pve01 pvestatd[2298]: rados_connect failed - Operation not supported
Feb 14 20:30:39 pve01 pvestatd[2298]: rados_connect failed - Operation not supported
@ 20:30 I executed the pvesm remove cephhdd-pool-2 .
The fix was easy and quickcly restored GUI on all nodes:
service pveproxy restart
service pvedaemon restart
...but I am struggling to understand why adding ceph storage would caused that. Some type of version incompatibility ? Also the newely added storage was not working, was listed with ? mark and was timing out even on the other 4 nodes.
Thank you for any suggestions.
Today I added secondary 4 node ceph cluster working under proxmox ver 16.2.7. This cluster was working in the lab and added to at least two independent nodes in the lab -- working great.
When I added the new storage I lost majority of GUI on node 1 - I could not list storage , I could not look at VM settings nor console, the only part that was working was the summary page - also all the VMs under this node were operational. On other 4 nodes the only GUI thing that was not working was the VM console. The cluster had quorum. I ended up removing the rbd pool via CLI with: pvesm remove cephhdd-pool-2 command and I did not have to use --force. After removing it the cluster was operational but the GUI was the same broken.
here is output from CLI from node :
root@pve01:/etc/pve/priv/ceph# systemctl status pvedaemon.service
â— pvedaemon.service - PVE API Daemon
Loaded: loaded (/lib/systemd/system/pvedaemon.service; enabled; vendor preset: enabled)
Active: active (running) since Tue 2021-08-31 05:38:12 EDT; 5 months 15 days ago
Process: 2290 ExecStart=/usr/bin/pvedaemon start (code=exited, status=0/SUCCESS)
Main PID: 2318 (pvedaemon)
Tasks: 46 (limit: 13516)
Memory: 281.9M
CGroup: /system.slice/pvedaemon.service
â”─ 2318 pvedaemon
â”─11319 pvedaemon worker
â”─11506 /usr/bin/rbd -p cephhdd-pool-2 -m 10.221.1.170,10.221.1.171,10.221.1.172,10.221.1.173 --auth_supported cephx -n client.admin --keyring /etc/pve/priv/ceph/cephhdd-pool-2.keyring ls -l --format json
â”─11536 /usr/bin/rbd -p cephhdd-pool-2 -m 10.221.1.170,10.221.1.171,10.221.1.172,10.221.1.173 --auth_supported cephx -n client.admin --keyring /etc/pve/priv/ceph/cephhdd-pool-2.keyring ls -l --format json
â”─11977 /usr/bin/rbd -p cephhdd-pool-2 -m 10.221.1.170,10.221.1.171,10.221.1.172,10.221.1.173 --auth_supported cephx -n client.admin --keyring /etc/pve/priv/ceph/cephhdd-pool-2.keyring ls -l --format json
â”─63788 pvedaemon worker
└─88845 pvedaemon worker
Feb 14 19:49:38 pve01 pvedaemon[88845]: rados_connect failed - Operation not supported
Feb 14 19:49:40 pve01 pvedaemon[88845]: rados_connect failed - Operation not supported
Feb 14 19:49:42 pve01 pvedaemon[88845]: rados_connect failed - Operation not supported
Feb 14 19:49:48 pve01 pvedaemon[11319]: rados_connect failed - Operation not supported
Feb 14 19:49:50 pve01 pvedaemon[11319]: rados_connect failed - Operation not supported
Feb 14 19:49:52 pve01 pvedaemon[11319]: rados_connect failed - Operation not supported
Feb 14 19:49:54 pve01 pvedaemon[11319]: rados_connect failed - Operation not supported
Feb 14 19:51:03 pve01 pvedaemon[63788]: <root@pam> successful auth for user 'root@pam'
Feb 14 19:51:42 pve01 pvedaemon[88845]: got timeout
Feb 14 19:51:48 pve01 pvedaemon[88845]: got timeout
Here is proxy:
root@pve01:/etc/pve/priv/ceph# systemctl status pveproxy.service
â— pveproxy.service - PVE API Proxy Server
Loaded: loaded (/lib/systemd/system/pveproxy.service; enabled; vendor preset: enabled)
Active: active (running) since Tue 2021-08-31 05:38:13 EDT; 5 months 15 days ago
Process: 2322 ExecStartPre=/usr/bin/pvecm updatecerts --silent (code=exited, status=1/FAILURE)
Process: 2326 ExecStart=/usr/bin/pveproxy start (code=exited, status=0/SUCCESS)
Process: 66247 ExecReload=/usr/bin/pveproxy restart (code=exited, status=0/SUCCESS)
Main PID: 2328 (pveproxy)
Tasks: 4 (limit: 13516)
Memory: 322.9M
CGroup: /system.slice/pveproxy.service
â”─ 2328 pveproxy
â”─31961 pveproxy worker
â”─39707 pveproxy worker
└─42223 pveproxy worker
Feb 14 20:38:56 pve01 pveproxy[31961]: proxy detected vanished client connection
Feb 14 20:39:30 pve01 pveproxy[39707]: proxy detected vanished client connection
Feb 14 20:39:34 pve01 pveproxy[42223]: proxy detected vanished client connection
Feb 14 20:43:31 pve01 pveproxy[42223]: proxy detected vanished client connection
Feb 14 20:43:39 pve01 pveproxy[31961]: proxy detected vanished client connection
Feb 14 20:44:12 pve01 pveproxy[31961]: proxy detected vanished client connection
Feb 14 20:44:33 pve01 pveproxy[42223]: proxy detected vanished client connection
Feb 14 20:45:23 pve01 pveproxy[31961]: proxy detected vanished client connection
Feb 14 20:45:31 pve01 pveproxy[39707]: proxy detected vanished client connection
Feb 14 20:45:53 pve01 pveproxy[42223]: proxy detected vanished client connectio
Here is pvestatd status:
root@pve01:/etc/pve/priv/ceph# systemctl status pvestatd.service
â— pvestatd.service - PVE Status Daemon
Loaded: loaded (/lib/systemd/system/pvestatd.service; enabled; vendor preset: enabled)
Active: active (running) since Tue 2021-08-31 05:38:11 EDT; 5 months 15 days ago
Process: 2271 ExecStart=/usr/bin/pvestatd start (code=exited, status=0/SUCCESS)
Main PID: 2298 (pvestatd)
Tasks: 1 (limit: 13516)
Memory: 224.5M
CGroup: /system.slice/pvestatd.service
└─2298 pvestatd
Feb 14 20:29:09 pve01 pvestatd[2298]: rados_connect failed - Operation not supported
Feb 14 20:29:19 pve01 pvestatd[2298]: rados_connect failed - Operation not supported
Feb 14 20:29:29 pve01 pvestatd[2298]: rados_connect failed - Operation not supported
Feb 14 20:29:39 pve01 pvestatd[2298]: rados_connect failed - Operation not supported
Feb 14 20:29:50 pve01 pvestatd[2298]: rados_connect failed - Operation not supported
Feb 14 20:29:59 pve01 pvestatd[2298]: rados_connect failed - Operation not supported
Feb 14 20:30:10 pve01 pvestatd[2298]: rados_connect failed - Operation not supported
Feb 14 20:30:19 pve01 pvestatd[2298]: rados_connect failed - Operation not supported
Feb 14 20:30:30 pve01 pvestatd[2298]: rados_connect failed - Operation not supported
Feb 14 20:30:39 pve01 pvestatd[2298]: rados_connect failed - Operation not supported
@ 20:30 I executed the pvesm remove cephhdd-pool-2 .
The fix was easy and quickcly restored GUI on all nodes:
service pveproxy restart
service pvedaemon restart
...but I am struggling to understand why adding ceph storage would caused that. Some type of version incompatibility ? Also the newely added storage was not working, was listed with ? mark and was timing out even on the other 4 nodes.
Thank you for any suggestions.