Already used VMID assigned during node downtime

ahhoj · Sep 18, 2009

Hi

The next thing happened:
There's a cluster with some nodes. One of them gone unreachable (actually the HDD gone bad, but i don't think it is important, why it cannot be reached). During the downtime of that node, a user created a new VM on other node, and it got the lowest VMID from the unreachable node. After some time the problematic node works again (in our case, we changed the hdd, and copied back the backups of those VM-s). But the problem is, now there is two VM with the same VMID (i'd like to mention that, it is possible to create VM with same VMID, but on other node, also when nothing is wrong).
Is there a way to avoid these kind of situations? For example, the master should store all existing VMID-s, and not to assign them in case one of them goes unreachable. And only assign those VMID-s, if the node deleted from the cluster. Or maybe there is some better way, it's just an idea.
And if it matters, it happened with v1.3:
# pveversion -v
pve-manager: 1.3-1 (pve-manager/1.3/4023)
qemu-server: 1.0-14
pve-kernel: 2.6.24-8
pve-kvm: 86-3
pve-firmware: 1
vncterm: 0.9-2
vzctl: 3.0.23-1pve3
vzdump: 1.1-2
vzprocps: 2.0.11-1dso2
vzquota: 3.0.11-1

dietmar · Sep 18, 2009

ahhoj said:
After some time the problematic node works again (in our case, we changed the hdd, and copied back the backups of those VM-s). But the problem is, now there is two VM with the same VMID (i'd like to mention that, it is possible to create VM with same VMID, but on other node, also when nothing is wrong).

We, I can't avoid that you restore VM during the cluster is offline - how should that work?

ahhoj said:
Is there a way to avoid these kind of situations? For example, the master should store all existing VMID-s, and not to assign them in case one of them goes unreachable. And only assign those VMID-s, if the node deleted from the cluster.

We already do that. If that does not work its a bug. Can you provide detailed instructions how to reproduce?

ahhoj · Sep 18, 2009

dietmar said:
We already do that. If that does not work its a bug. Can you provide detailed instructions how to reproduce?

Hmm, /var/log/daemon.log says things similar to this:
Aug 24 13:44:47 hn-vz1 pvetunnel[5391]: trying to finish tunnel 15182 10.14.15.254
...
Aug 24 13:45:48 hn-vz1 proxwww[8317]: no data at /usr/share/perl5/PVE/HTMLServices.pm line 69.
...

And also, /etc/pve/cluster.cfg file's last modification date/time is similar to this... (Ohh, yes, all of this happened that time, just had a lot to do, and forgot to ask here...)

Well, from these data, what you think, is it possible that someone completely removed the node from the cluster, then readded? In that case, forgot the earlier, and sorry for that...

dietmar · Sep 18, 2009

ahhoj said:
Hmm, /var/log/daemon.log says things similar to this:
Aug 24 13:44:47 hn-vz1 pvetunnel[5391]: trying to finish tunnel 15182 10.14.15.254

This indicates that the node was removed from the cluster.

- Dietmar

Search

Search

Already used VMID assigned during node downtime

ahhoj

New Member

dietmar

Proxmox Staff Member

ahhoj

New Member

dietmar

Proxmox Staff Member