Proxmox VE 7.1-8 cluster issue on node1 and cyber-attack

maxcolombo · Jan 25, 2022

Good morning everyone, I would like to submit an event that happened to a customer I follow and for whom I maintain a Proxmox Cluster.
The server cluster (3 nodes) is located on a public provider (OVH) that offers the 3 identical physical machines (dedicated servers). All nodes have an active Enterprise Community licence and are updated to the latest version 7.1-8.
The first node is the "master" of the cluster and the other two nodes are the "slaves", the servers have been joined together in a Cluster because inside they run different VMs and LXCs that communicate with each other using internal VLANs (also offered by OVH) and some of these have also configured replication between different nodes.
At the HW level all servers have a RAID1 ZFS as a 250GB boot disk on a SATA SSD and a second RAID1 ZFS with a 1TB NVME disk for data. All VMs and LCX are on the NVME disk and backups are done at a different location offered by OVH using an internal network over SMB.

Last week I noticed that some replication jobs were failing and when I connected to the management interface there were synchronization errors. Since it had already happened in the past that by carrying out system updates and a reboot of the nodes everything was working again I preceded this.
Unfortunately, nodes 2-3 came online and node1 remained offline (can access only by SSH), after a careful analysis I realize that, in the history of bash I find these commands attached image. Basically someone has changed the hostname of node1 with the consequent deletion of the directory /etc/pve, maybe caused by the server restart that with a different hostname has tried to reconfigure itself.
I analyzed all the system logs and there are no SSH violations on all 3 nodes and as a precaution I changed the passwords of all the servers.

Now my questions are two:
1. How did they run those commands if I can't find any kind of ssh access? Is there any bug/vulnerability in the Proxmox GUI? (Maybe fixed with version 7.1-10 released today?)
2. Now I'm stuck with node1 which I tried reconfiguring by hand taking the pve folder from node2 and editing the various links to the correct node1 configurations. I've also tried upgrading to the no-subscription version but it keeps giving me an error on installing pve-manager.
Do you think I can get it working again somehow? If needed I could share more information from the service error logs.
Or is it better to redo node1 and reinstall it from scratch? In this case, how would node2 and node3 behave if I went to "pvecm delnode node1"? Would it self-promote node2 to master within the cluster quorum? Can I then subsequently re-add the node1, re-prepared from scratch, and promote it as master again? (I have already removed nodes from clusters in the past, but they were secondary and I had no problems).

I remain available for any clarifications or other to understand how to solve the problem of node1 but especially to secure this and all other Proxmox Servers I manager, and be sure that cannot be a vulnerability of Proxmox GUI.

Max Colombo

frankz · Jan 25, 2022

maxcolombo said:
Good morning everyone, I would like to submit an event that happened to a customer I follow and for whom I maintain a Proxmox Cluster.
The server cluster (3 nodes) is located on a public provider (OVH) that offers the 3 identical physical machines (dedicated servers). All nodes have an active Enterprise Community licence and are updated to the latest version 7.1-8.
The first node is the "master" of the cluster and the other two nodes are the "slaves", the servers have been joined together in a Cluster because inside they run different VMs and LXCs that communicate with each other using internal VLANs (also offered by OVH) and some of these have also configured replication between different nodes.
At the HW level all servers have a RAID1 ZFS as a 250GB boot disk on a SATA SSD and a second RAID1 ZFS with a 1TB NVME disk for data. All VMs and LCX are on the NVME disk and backups are done at a different location offered by OVH using an internal network over SMB.

Last week I noticed that some replication jobs were failing and when I connected to the management interface there were synchronization errors. Since it had already happened in the past that by carrying out system updates and a reboot of the nodes everything was working again I preceded this.
Unfortunately, nodes 2-3 came online and node1 remained offline (can access only by SSH), after a careful analysis I realize that, in the history of bash I find these commands attached image. Basically someone has changed the hostname of node1 with the consequent deletion of the directory /etc/pve, maybe caused by the server restart that with a different hostname has tried to reconfigure itself.
I analyzed all the system logs and there are no SSH violations on all 3 nodes and as a precaution I changed the passwords of all the servers.

Now my questions are two:
1. How did they run those commands if I can't find any kind of ssh access? Is there any bug/vulnerability in the Proxmox GUI? (Maybe fixed with version 7.1-10 released today?)
2. Now I'm stuck with node1 which I tried reconfiguring by hand taking the pve folder from node2 and editing the various links to the correct node1 configurations. I've also tried upgrading to the no-subscription version but it keeps giving me an error on installing pve-manager.
Do you think I can get it working again somehow? If needed I could share more information from the service error logs.
Or is it better to redo node1 and reinstall it from scratch? In this case, how would node2 and node3 behave if I went to "pvecm delnode node1"? Would it self-promote node2 to master within the cluster quorum? Can I then subsequently re-add the node1, re-prepared from scratch, and promote it as master again? (I have already removed nodes from clusters in the past, but they were secondary and I had no problems).

I remain available for any clarifications or other to understand how to solve the problem of node1 but especially to secure this and all other Proxmox Servers I manager, and be sure that cannot be a vulnerability of Proxmox GUI.

Max Colombo

Hello, as far as I'm concerned, you should remove the node with the procedure explained in proxmox fans. In addition, the proxmox Cluster system does not have a master see: https://pve.proxmox.com/wiki/Cluster_Manager . I advise you to do this in order to avoid unnecessary complications. As for the violation, you should analyze it well.

fabian · Jan 25, 2022

maxcolombo said:
Good morning everyone, I would like to submit an event that happened to a customer I follow and for whom I maintain a Proxmox Cluster.
The server cluster (3 nodes) is located on a public provider (OVH) that offers the 3 identical physical machines (dedicated servers). All nodes have an active Enterprise Community licence and are updated to the latest version 7.1-8.
The first node is the "master" of the cluster and the other two nodes are the "slaves", the servers have been joined together in a Cluster because inside they run different VMs and LXCs that communicate with each other using internal VLANs (also offered by OVH) and some of these have also configured replication between different nodes.
At the HW level all servers have a RAID1 ZFS as a 250GB boot disk on a SATA SSD and a second RAID1 ZFS with a 1TB NVME disk for data. All VMs and LCX are on the NVME disk and backups are done at a different location offered by OVH using an internal network over SMB.

Last week I noticed that some replication jobs were failing and when I connected to the management interface there were synchronization errors. Since it had already happened in the past that by carrying out system updates and a reboot of the nodes everything was working again I preceded this.
Unfortunately, nodes 2-3 came online and node1 remained offline (can access only by SSH), after a careful analysis I realize that, in the history of bash I find these commands attached image. Basically someone has changed the hostname of node1 with the consequent deletion of the directory /etc/pve, maybe caused by the server restart that with a different hostname has tried to reconfigure itself.
I analyzed all the system logs and there are no SSH violations on all 3 nodes and as a precaution I changed the passwords of all the servers.

Now my questions are two:
1. How did they run those commands if I can't find any kind of ssh access? Is there any bug/vulnerability in the Proxmox GUI? (Maybe fixed with version 7.1-10 released today?)

no known ones, no. in any case it is a good idea to not allow public access to your PVE host (it should be behind restrictive firewall rules and/or only accessibly by VPN). you might want to check /var/log/pveproxy/access.log and other log files for further pointers - but keep in mind that once a system is compromised contents of such files might not be valid/trustworthy/..

maxcolombo said:
2. Now I'm stuck with node1 which I tried reconfiguring by hand taking the pve folder from node2 and editing the various links to the correct node1 configurations. I've also tried upgrading to the no-subscription version but it keeps giving me an error on installing pve-manager.
Do you think I can get it working again somehow? If needed I could share more information from the service error logs.
Or is it better to redo node1 and reinstall it from scratch? In this case, how would node2 and node3 behave if I went to "pvecm delnode node1"? Would it self-promote node2 to master within the cluster quorum? Can I then subsequently re-add the node1, re-prepared from scratch, and promote it as master again? (I have already removed nodes from clusters in the past, but they were secondary and I had no problems).

if a system is compromised it's always best to fully reinstall it, and restore as needed from known uncompromised backups. also keep in mind that taking over a single proxmox node possible means that the attacker also has/had control over the other nodes (since the nodes trust eachother). there is no master in a PVE cluster - all nodes are equal.

maxcolombo said:
I remain available for any clarifications or other to understand how to solve the problem of node1 but especially to secure this and all other Proxmox Servers I manager, and be sure that cannot be a vulnerability of Proxmox GUI.

see above - reinstall everything that was compromised or is suspected of having been compromised, rotate all keys/passwords stored on the systems, restrict access to the management interface.

frankz · Jan 25, 2022

However, today I scanned my 3-node cluster system with Openvas, I didn't see any vulnerabilities with alert score.

Search

Search

Proxmox VE 7.1-8 cluster issue on node1 and cyber-attack

maxcolombo

New Member

Attachments

frankz

Well-Known Member

fabian

Proxmox Staff Member

frankz

Well-Known Member

We value your privacy