Already used VMID assigned during node downtime

ahhoj

New Member
Mar 31, 2009
6
0
1
Hi

The next thing happened:
There's a cluster with some nodes. One of them gone unreachable (actually the HDD gone bad, but i don't think it is important, why it cannot be reached). During the downtime of that node, a user created a new VM on other node, and it got the lowest VMID from the unreachable node. After some time the problematic node works again (in our case, we changed the hdd, and copied back the backups of those VM-s). But the problem is, now there is two VM with the same VMID (i'd like to mention that, it is possible to create VM with same VMID, but on other node, also when nothing is wrong).
Is there a way to avoid these kind of situations? For example, the master should store all existing VMID-s, and not to assign them in case one of them goes unreachable. And only assign those VMID-s, if the node deleted from the cluster. Or maybe there is some better way, it's just an idea.
And if it matters, it happened with v1.3:
# pveversion -v
pve-manager: 1.3-1 (pve-manager/1.3/4023)
qemu-server: 1.0-14
pve-kernel: 2.6.24-8
pve-kvm: 86-3
pve-firmware: 1
vncterm: 0.9-2
vzctl: 3.0.23-1pve3
vzdump: 1.1-2
vzprocps: 2.0.11-1dso2
vzquota: 3.0.11-1
 
After some time the problematic node works again (in our case, we changed the hdd, and copied back the backups of those VM-s). But the problem is, now there is two VM with the same VMID (i'd like to mention that, it is possible to create VM with same VMID, but on other node, also when nothing is wrong).

We, I can't avoid that you restore VM during the cluster is offline - how should that work?

Is there a way to avoid these kind of situations? For example, the master should store all existing VMID-s, and not to assign them in case one of them goes unreachable. And only assign those VMID-s, if the node deleted from the cluster.

We already do that. If that does not work its a bug. Can you provide detailed instructions how to reproduce?
 
We already do that. If that does not work its a bug. Can you provide detailed instructions how to reproduce?

Hmm, /var/log/daemon.log says things similar to this:
Aug 24 13:44:47 hn-vz1 pvetunnel[5391]: trying to finish tunnel 15182 10.14.15.254
...
Aug 24 13:45:48 hn-vz1 proxwww[8317]: no data at /usr/share/perl5/PVE/HTMLServices.pm line 69.
...

And also, /etc/pve/cluster.cfg file's last modification date/time is similar to this... (Ohh, yes, all of this happened that time, just had a lot to do, and forgot to ask here...)

Well, from these data, what you think, is it possible that someone completely removed the node from the cluster, then readded? In that case, forgot the earlier, and sorry for that... :(
 
Hmm, /var/log/daemon.log says things similar to this:
Aug 24 13:44:47 hn-vz1 pvetunnel[5391]: trying to finish tunnel 15182 10.14.15.254

This indicates that the node was removed from the cluster.

- Dietmar
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!