PVE9 Remote RBD Issue

iwvpmx · Monday at 06:40

Two scenarios:

1. I upgraded pve8 to pve9

When i run pve8to9 --full, I was asked to run "/usr/share/pve-manager/migrations/pve-rbd-storage-configure-keyring", what the script does is it added a rbd pool conf file under /etc/pve/priv/ceph/poolname.conf. So now you have /etc/pve/priv/ceph/poolname.conf and /etc/pve/priv/ceph/poolname.keyring.

[global]
keyring = /etc/pve/priv/ceph/poolname.keyring

Doing this will result in RBD Issue, existing rbd pool mounted will die immediately. Deleting the new config /etc/pve/priv/ceph/poolname.conf will bring back the RBD pool to active.

Then I rerun "/usr/share/pve-manager/migrations/pve-rbd-storage-configure-keyring", it will put back the /etc/pve/priv/ceph/poolname.conf and I proceed to upgrade from pve8 to pve9. The server was successfully upgraded to pve9, but the RBD pool was not active. Googling for answers, It stated that i need to put the keyring into /etc/pve/storage.cfg

rbd: poolname
disable
content images
keyring /etc/pve/priv/ceph/poolname.keyring
krbd 0
monhost xxx.xxx.xxx.xxx
pool poolname
username admin

Then i tried to enable the rbd pool again but I am still having issue mounting it.

I have no issue doing rbd listing inside the server. I can see the listing.

rbd ls -m xxx.xxx.xxx.xxx -p poolname --id admin --keyring /etc/pve/priv/ceph/poolname.keyring

inside /etc/pve/ceph.conf and /etc/ceph/ceph.conf, I only have one global settings which previously work on pve8 connecting to a remote rbd (pve8) configured with ms_crc_data = False.

root@xxx:~# cat /etc/pve/ceph.conf
[global]
ms_crc_data = False
root@xxx:~#'

The Question.... does pve9 have issue connecting to remote rbd running on pve8? Why this is happening and how to fix it?

2. I installed a brand new pve9, connecting to remote RBD running on PVE8

Trying to add rbd storage will automatically have the following

/etc/pve/priv/ceph/poolname.conf and /etc/pve/priv/ceph/poolname.keyring.

But still I cannot bring the rbd storage to active. I also added the keyring under. Tested with or without the keyring line both does not work

/etc/pve/storage.cfg

rbd: poolname
disable
content images
keyring /etc/pve/priv/ceph/poolname.keyring
krbd 0
monhost xxx.xxx.xxx.xxx
pool poolname
username admin

Why this is happening and how to fix it? What other settings needed?

PVE9 Client Server
root@pve9-test:~# pveversion
pve-manager/9.1.9/ee7bad0a3d1546c9 (running kernel: 6.17.13-4-pve)
root@pve9-test:~#

root@pve9-test:~# apt list --installed | grep -i ceph
ceph-common/stable,now 19.2.3-pve4 amd64 [installed]
ceph-fuse/stable,now 19.2.3-pve4 amd64 [installed]
libcephfs2/stable,now 19.2.3-pve4 amd64 [installed]
python3-ceph-argparse/stable,now 19.2.3-pve4 all [installed]
python3-ceph-common/stable,now 19.2.3-pve4 all [installed]
python3-cephfs/stable,now 19.2.3-pve4 amd64 [installed]
root@pve9-test:~#

root@pve9-test:~# apt list --installed | grep -i rbd
librbd1/stable,now 19.2.3-pve4 amd64 [installed]
python3-rbd/stable,now 19.2.3-pve4 amd64 [installed]
root@pve9-test:~#

Remote RBD
root@xxx:~# pveversion
pve-manager/8.4.16/368e3c45c15b895c (running kernel: 6.8.12-18-pve)
root@sgrc-kv-pmox-05:~#

root@xxx:~# ceph -v
ceph version 19.2.3 (aaa9a618206bc71cd6f7f12af2a12247d827305a) squid (stable)
root@xxx:~#

fiona · Monday at 11:13

Hi,
can you share an excerpt of the system logs/journal from around the time the issues occur?

iwvpmx said:
Googling for answers, It stated that i need to put the keyring into /etc/pve/storage.cfg

iwvpmx said:
rbd: poolname
disable
content images
keyring /etc/pve/priv/ceph/poolname.keyring
krbd 0
monhost xxx.xxx.xxx.xxx
pool poolname
username admin

Google is wrong

Note that the keyring property is Client keyring contents (for external clusters)., i.e. the contents, not the path, but it's only to be used for setup/update and should not be part of the storage configuration itself. Did you add the line manually (since pvesm set should not write it out)? It'll be written to the .keyring file when it's provided when adding or updating the storage:
https://git.proxmox.com/?p=pve-stor...74ffc24d9d1d47a2b6c049258391d29f;hb=HEAD#l476
https://git.proxmox.com/?p=pve-stor...67cf5b06b24cc301241c32f96feffd8f;hb=HEAD#l446

Still, it should otherwise be ignored and not lead to a failure connecting.

EDIT: for completeness, but should not be relevant here, I should mention that when provided via pvesm add/update, the keyring argument is the path, but its contents will be read out by pvesm before being passed to the storage API for further handling.

iwvpmx · Monday at 11:42

I have removed the keyring line in storage.cfg, with or without also not working. I also re-created the rbd on GUI also not working. What other settings needed?

I tried both scenario 1 [pve8 to pve9] and scenario 2 [new pve9] installation failed to connect to remote rbd.

root@pve9-test:~# cat /etc/pve/storage.cfg
dir: local
path /var/lib/vz
content iso,vztmpl,import,backup

lvmthin: local-lvm
thinpool data
vgname pve
content rootdir,images

rbd: ceph-hc-nvme-prod
content images
krbd 0
monhost xxx.xxx.xxx.xxx
pool ceph-hc-nvme-prod
username admin

root@pve9-test:~#

As soon as I activated the rbd storage, pvestatd got timeout.

Apr 27 04:45:55 pve9-test systemd[1]: Created slice user-0.slice - User Slice of UID 0.
Apr 27 04:45:55 pve9-test systemd[1]: Starting user-runtime-dir@0.service - User Runtime Directory /run/user/0...
Apr 27 04:45:55 pve9-test systemd[1]: Finished user-runtime-dir@0.service - User Runtime Directory /run/user/0.
Apr 27 04:45:55 pve9-test systemd[1]: Starting user@0.service - User Manager for UID 0...
Apr 27 04:45:55 pve9-test (systemd)[60104]: pam_unix(systemd-user:session): session opened for user root(uid=0) by root(uid=0)
Apr 27 04:45:55 pve9-test systemd-logind[649]: New session 12 of user root.
Apr 27 04:45:55 pve9-test systemd[60104]: Queued start job for default target default.target.
Apr 27 04:45:55 pve9-test systemd[60104]: Created slice app.slice - User Application Slice.
Apr 27 04:45:55 pve9-test systemd[60104]: Reached target paths.target - Paths.
Apr 27 04:45:55 pve9-test systemd[60104]: Reached target timers.target - Timers.
Apr 27 04:45:55 pve9-test systemd[60104]: Listening on dirmngr.socket - GnuPG network certificate management daemon.
Apr 27 04:45:55 pve9-test systemd[60104]: Listening on gpg-agent-browser.socket - GnuPG cryptographic agent and passphrase cache (access for web browsers).
Apr 27 04:45:55 pve9-test systemd[60104]: Listening on gpg-agent-extra.socket - GnuPG cryptographic agent and passphrase cache (restricted).
Apr 27 04:45:55 pve9-test systemd[60104]: Starting gpg-agent-ssh.socket - GnuPG cryptographic agent (ssh-agent emulation)...
Apr 27 04:45:55 pve9-test systemd[60104]: Starting gpg-agent.socket - GnuPG cryptographic agent and passphrase cache...
Apr 27 04:45:55 pve9-test systemd[60104]: Listening on keyboxd.socket - GnuPG public key management service.
Apr 27 04:45:55 pve9-test systemd[60104]: Starting ssh-agent.socket - OpenSSH Agent socket...
Apr 27 04:45:55 pve9-test systemd[60104]: Listening on gpg-agent-ssh.socket - GnuPG cryptographic agent (ssh-agent emulation).
Apr 27 04:45:55 pve9-test systemd[60104]: Listening on gpg-agent.socket - GnuPG cryptographic agent and passphrase cache.
Apr 27 04:45:55 pve9-test systemd[60104]: Listening on ssh-agent.socket - OpenSSH Agent socket.
Apr 27 04:45:55 pve9-test systemd[60104]: Reached target sockets.target - Sockets.
Apr 27 04:45:55 pve9-test systemd[60104]: Reached target basic.target - Basic System.
Apr 27 04:45:55 pve9-test systemd[60104]: Reached target default.target - Main User Target.
Apr 27 04:45:55 pve9-test systemd[60104]: Startup finished in 557ms.
Apr 27 04:45:55 pve9-test systemd[1]: Started user@0.service - User Manager for UID 0.
Apr 27 04:45:55 pve9-test systemd[1]: Started session-11.scope - Session 11 of User root.
Apr 27 04:46:44 pve9-test pvedaemon[991]: <root@pam> successful auth for user 'root@pam'
Apr 27 04:47:32 pve9-test pvestatd[965]: got timeout
Apr 27 04:47:32 pve9-test pvestatd[965]: status update time (5.241 seconds)
Apr 27 04:47:37 pve9-test pvedaemon[990]: got timeout
Apr 27 04:47:42 pve9-test pvestatd[965]: got timeout
Apr 27 04:47:42 pve9-test pvestatd[965]: status update time (5.216 seconds)
Apr 27 04:47:52 pve9-test pvestatd[965]: got timeout
Apr 27 04:47:52 pve9-test pvestatd[965]: status update time (5.221 seconds)
Apr 27 04:47:54 pve9-test pvedaemon[991]: got timeout
Apr 27 04:48:02 pve9-test pvestatd[965]: got timeout
Apr 27 04:48:02 pve9-test pvestatd[965]: status update time (5.213 seconds)
Apr 27 04:48:11 pve9-test pvedaemon[989]: got timeout
Apr 27 04:48:12 pve9-test pvestatd[965]: got timeout
Apr 27 04:48:12 pve9-test pvestatd[965]: status update time (5.226 seconds)
Apr 27 04:48:22 pve9-test pvestatd[965]: got timeout
Apr 27 04:48:22 pve9-test pvestatd[965]: status update time (5.211 seconds)
Apr 27 04:48:28 pve9-test pvedaemon[991]: got timeout
Apr 27 04:48:32 pve9-test pvestatd[965]: got timeout
Apr 27 04:48:32 pve9-test pvestatd[965]: status update time (5.220 seconds)
Apr 27 04:48:42 pve9-test pvestatd[965]: got timeout
Apr 27 04:48:42 pve9-test pvestatd[965]: status update time (5.208 seconds)
Apr 27 04:48:45 pve9-test pvedaemon[990]: got timeout
Apr 27 04:48:52 pve9-test pvestatd[965]: got timeout
Apr 27 04:48:53 pve9-test pvestatd[965]: status update time (5.206 seconds)
Apr 27 04:49:02 pve9-test pvestatd[965]: got timeout
Apr 27 04:49:02 pve9-test pvestatd[965]: status update time (5.232 seconds)
Apr 27 04:49:02 pve9-test pvedaemon[991]: got timeout
Apr 27 04:49:12 pve9-test pvestatd[965]: got timeout
Apr 27 04:49:12 pve9-test pvestatd[965]: status update time (5.208 seconds)
Apr 27 04:49:20 pve9-test pvedaemon[989]: got timeout
Apr 27 04:49:22 pve9-test pvestatd[965]: got timeout
Apr 27 04:49:22 pve9-test pvestatd[965]: status update time (5.220 seconds)

iwvpmx · Monday at 12:09

A simple explanation, in scenario 1 where im upgarding from pve8 to pve9.

The pve8 has no issue with remote rbd.

root@pve8-test:~# cat /etc/pve/ceph.conf /etc/ceph/ceph.conf
[global]
#ms_crc_data = True
ms_crc_data = False
[global]
#ms_crc_data = True
ms_crc_data = False
root@pve8-test:~#

root@pve8-test:~# cat /etc/pve/storage.cfg
dir: local
path /var/lib/vz
content backup,vztmpl,iso

lvmthin: local-lvm
thinpool data
vgname pve
content images,rootdir

rbd: ceph-hc-nvme-prod
content images
krbd 0
monhost xxx.xxx.xxx.xxx
pool ceph-hc-nvme-prod
username admin

root@pve8-test:~#

root@pve8-test:~# cat /etc/pve/priv/ceph/ceph-hc-nvme-prod.keyring
[client.admin]
key = AQCHjrdh8gYMLRAARaEkcbtDN60Y6R20AnSEGA==
caps mds = "allow *"
caps mgr = "allow *"
caps mon = "allow *"
caps osd = "allow *"
root@pve8-test:~#

Then when doing the pre check, pve8to9 --full, I was asked to run "/usr/share/pve-manager/migrations/pve-rbd-storage-configure-keyring"

After running it, it will create a new config file ceph-hc-nvme-prod.conf. The remote RBD immediately stopped working.

root@pve8-test:/etc/pve/priv/ceph# /usr/share/pve-manager/migrations/pve-rbd-storage-configure-keyring
INFO: Starting with PVE 9, externally managed RBD storages require that the 'keyring' option is configured in the storage's Ceph configuration. This script creates and updates the storage's Ceph configurations.
INFO: Checking whether all external RBD storages have the 'keyring' option configured
PASS: The Ceph configuration of the following externally managed RBD storages has been updated:
ceph-hc-nvme-prod

root@pve8-test:/etc/pve/priv/ceph# ls
ceph-hc-nvme-prod.conf ceph-hc-nvme-prod.keyring
root@pve8-test:/etc/pve/priv/ceph#

Apr 27 10:17:01 pve8-test CRON[96363]: pam_unix(cron:session): session closed for user root
Apr 27 10:39:01 pve8-test pvestatd[832]: got timeout
Apr 27 10:39:01 pve8-test pvestatd[832]: status update time (5.194 seconds)
Apr 27 10:39:11 pve8-test pvestatd[832]: got timeout
Apr 27 10:39:11 pve8-test pvestatd[832]: status update time (5.219 seconds)
Apr 27 10:39:22 pve8-test pvestatd[832]: got timeout
Apr 27 10:39:22 pve8-test pvestatd[832]: status update time (5.207 seconds)
Apr 27 10:39:31 pve8-test pvestatd[832]: got timeout
Apr 27 10:39:31 pve8-test pvestatd[832]: status update time (5.210 seconds)
Apr 27 10:39:41 pve8-test pvestatd[832]: got timeout
Apr 27 10:39:41 pve8-test pvestatd[832]: status update time (5.204 seconds)
Apr 27 10:39:51 pve8-test pvestatd[832]: got timeout
Apr 27 10:39:51 pve8-test pvestatd[832]: status update time (5.204 seconds)
Apr 27 10:40:02 pve8-test pvestatd[832]: got timeout
Apr 27 10:40:02 pve8-test pvestatd[832]: status update time (5.199 seconds)
Apr 27 10:40:11 pve8-test pvestatd[832]: got timeout
Apr 27 10:40:11 pve8-test pvestatd[832]: status update time (5.218 seconds)
Apr 27 10:40:21 pve8-test pvestatd[832]: got timeout
Apr 27 10:40:21 pve8-test pvestatd[832]: status update time (5.211 seconds)
Apr 27 10:40:31 pve8-test pvestatd[832]: got timeout
Apr 27 10:40:31 pve8-test pvestatd[832]: status update time (5.213 seconds)
Apr 27 10:40:41 pve8-test pvestatd[832]: got timeout
Apr 27 10:40:41 pve8-test pvestatd[832]: status update time (5.214 seconds)
Apr 27 10:40:52 pve8-test pvestatd[832]: got timeout
Apr 27 10:40:52 pve8-test pvestatd[832]: status update time (5.214 seconds)
Apr 27 10:41:01 pve8-test pvestatd[832]: got timeout
Apr 27 10:41:01 pve8-test pvestatd[832]: status update time (5.204 seconds)

Running rbd listing have no issues.

root@pve8-test:/etc/pve/priv/ceph# rbd ls -m xxx.xxx.xxx.xxx -p ceph-hc-nvme-prod --id admin --keyring /etc/pve/priv/ceph/ceph-hc-nvme-prod.keyring
vm-1001-cloudinit
vm-1001-disk-0
....

The if i remove the new config file /etc/pve/priv/ceph/ceph-hc-nvme-prod.conf created by "/usr/share/pve-manager/migrations/pve-rbd-storage-configure-keyring" the remote RBD started to work again.

I proceed to upgrade pve8 to pve9, remote rbd no longer works. it will auto create the new config file /etc/pve/priv/ceph/ceph-hc-nvme-prod.conf.

Tested with new pve9 installation also having issue connecting to remote RBD having the same settings on PVE8. Running rbd listing have no issue on pve9

Restarting pvestatd, or simply reboot the pve server and then readd the RBD storage does not fix the problem.

fiona · Monday at 13:58

What if you add the ms_crc_data = False setting to the storage-specific configuration (that is generated by the script)? I'm not sure the /etc/ceph one is even considered when the storage-specific one is present.

iwvpmx · Monday at 14:06

Okay. It works. Thanks

So in PVE9, every remote rbd have it's own ceph config located in /etc/pve/priv/remote-rbd-pool.conf

Search

Search

PVE9 Remote RBD Issue

iwvpmx

New Member

fiona

Proxmox Staff Member

iwvpmx

New Member

iwvpmx

New Member

fiona

Proxmox Staff Member

iwvpmx

New Member

We value your privacy