Problem update node PVE 7.4 to 8.3.0

Reartu24

New Member
Jul 9, 2024
7
4
3
Hi all
I upgraded my cluster from version 7.4.19 to 8.3.0
to do this I followed the ceph update guide ( Ceph pacific to quincy ) and then the node update guide ( Proxmox 7 to 8 ).
I ran the guide on 4 nodes and everything went well, node 2 instead shows the image below upon reboot

1732804859759.png

I tried to rebuild both the ceph manager and the ceph monitor, but the problem still doesn't resolve itself, how can I get it back up and running without having to format it?

before start the upgrade i have passed all test on the script "pve7to8 --full" and don't received no warning

Thank you so much
 
Last edited:
Hi,
please check that the nodes can ping each other and the output of pvecm status and ceph -s as well as the system logs/journal of the affected node and the node to which you are connected on the web interface.
 
Hi Fiona,
i have lauch the command on the node 2 ( the node wich have the problem) and this is the result


root@clusterah-node02:~# pvecm status
Cluster information
-------------------
Name: clusterah
Config Version: 6
Transport: knet
Secure auth: on

Quorum information
------------------
Date: Fri Nov 29 13:54:03 2024
Quorum provider: corosync_votequorum
Nodes: 6
Node ID: 0x00000002
Ring ID: 1.1109
Quorate: Yes

Votequorum information
----------------------
Expected votes: 6
Highest expected: 6
Total votes: 6
Quorum: 4
Flags: Quorate

Membership information
----------------------
Nodeid Votes Name
0x00000001 1 10.3.1.151
0x00000002 1 10.3.1.152 (local)
0x00000003 1 10.3.1.153
0x00000004 1 10.3.1.154
0x00000005 1 10.3.1.155
0x00000006 1 10.3.1.156

and

root@clusterah-node02:~# ceph -s
cluster:
id: 96bf289b-fc33-449e-8335-82a02c0f1d6e
health: HEALTH_OK

services:
mon: 6 daemons, quorum clusterah-node01,clusterah-node03,clusterah-node04,clusterah-node05,clusterah-node06,clusterah-node02 (age 3h)
mgr: clusterah-node06(active, since 26h), standbys: clusterah-node01, clusterah-node05, clusterah-node04, clusterah-node03, clusterah-node02
osd: 36 osds: 30 up (since 5m), 30 in (since 61m)

data:
pools: 2 pools, 513 pgs
objects: 1.05M objects, 3.9 TiB
usage: 12 TiB used, 15 TiB / 27 TiB avail
pgs: 512 active+clean
1 active+clean+scrubbing+deep

io:
client: 341 B/s rd, 1.3 MiB/s wr, 1 op/s rd, 139 op/s wr

at the moment, as a precaution, I put the discs of the node 2 in OUT and DOWN

thank you so much

Stefano
 
Please also share the system logs/journal from the node as well as from the node to which you are connected on the web interface. What does pvesm status show?
 
Hi Fiona, I am a friend of Stefano and we are trying to fix this situation. I run the pvesm status command but the system is hanging on.
About the logs/journal may I ask you the path as I see inside /var/log/journal/ and I found different log files?

thanks

daniele
 
Hi Fiona, I am a friend of Stefano and we are trying to fix this situation. I run the pvesm status command but the system is hanging on.
About the logs/journal may I ask you the path as I see inside /var/log/journal/ and I found different log files?
If the process is hanging, that might indicate a hanging mount point somewhere, see: https://forum.proxmox.com/threads/issues-with-backup-vzdump.20741/post-664154 Please see the thread and check the output of ls -l /mnt/pve and ps faxl.

You can access the journal with journalctl, e.g. journalctl -b for the current boot and journalctl -b > /tmp/journal.txt to dump it to a file for uploading.
 
  • Like
Reactions: d.fioramonti
hi Fiona, thank you for your answer and yes it happens that a mount point (we have identified last friday :D) create the issue.
We have removed the mount point and everything back to normal.

Thanks

Daniele