I shut down a node, now /etc/pve/qemu-server is out of sync

Feb 14, 2021
41
2
13
68
Denmark
I have a small cluster with replication of one VM from one node (A) to another (B).

I shut down node A, as I needed to change RAM, but when the node rebooted, the VM (102) on it had disappeared. When I looked into /etc/pve/qemu-server, the file 102.conf was missing. I couldn't make a new file 102.conf, but got the message "cannot create regular file '102.conf': File exists".

I tried shutting down node B, but still couldn't make a new 102.conf file. Now I have made a new 103.conf file, and it seems to work. VM 103 is running nicely, exactly where VM 102 left off. But how do I get replication to work again? Anything I should do before starting node B?
 
Hi,
can you check if the config file exists in the directory for another node ls /etc/pve/nodes/*/*/102.conf (the cluster filesystem is shared, so it's enough to run the command on one node)? If the VM was HA-managed it might've been recovered to another node.

So the 103.conf now references the disks of 102?

How many nodes are there in the cluster?
 
Hello, thanks for your interest.
  • Only two nodes in the cluster
  • The VM was HA-managed
  • Yes, the config file was present in node B
  • Yes, 103.conf new references the disks of 102. 103 is humming away and serving its users nicely. No one except me as sysadm has noticed any changes
The two nodes are quite different in performance. So I've been using node A as the primary node for VM 102, and only kept the replicated VM on node B as a kind of backup. But I guess that when I shut down node A, after some time node B took over as host for VM 102. And then when I brought back node A, something went wrong.

I also guess that the best thing for me to do now is remove VM 102 from node B, and start replicating VM 103 from node A to node B.
 
Hello, thanks for your interest.
  • Only two nodes in the cluster
  • The VM was HA-managed
If you haven't got one already, with two nodes, a QDevice for vote support is highly recommended. Without that, you don't have quorum and thus the cluster is not operational when one of the nodes is down. You can use HA in a setup with two nodes and a QDevice, but having at least three nodes is highly recommended, please see the requirements.

  • Yes, the config file was present in node B
  • Yes, 103.conf new references the disks of 102. 103 is humming away and serving its users nicely. No one except me as sysadm has noticed any changes
The two nodes are quite different in performance. So I've been using node A as the primary node for VM 102, and only kept the replicated VM on node B as a kind of backup.
If you don't intend to run the machine on node B, you'd only need replication and not HA.

But I guess that when I shut down node A, after some time node B took over as host for VM 102. And then when I brought back node A, something went wrong.
Yes, that's exactly what HA is for. I guess you would've just needed to migrate the VM back to the other node afterwards. You can also define a HA group where node A has higher priority. Then this would happen automatically.

I also guess that the best thing for me to do now is remove VM 102 from node B, and start replicating VM 103 from node A to node B.
I'd recommend renaming the disks of 103 first, so that the ID from the config file and referenced disks match again. To do it during production, if you have another storage with enough space, you could use the Move Disk operation, move to the other storage and then back to the ZFS storage. That would also take care of renaming. But if you can afford a bit of downtime, doing it with zfs rename is much quicker and easier.
 
Thanks again for your support. It is ok for me to shut down node A for a short time. Here's the output of zfs list:
Code:
root@sonja:~# zfs list
NAME                       USED  AVAIL     REFER  MOUNTPOINT
rpool                      399G   500G      104K  /rpool
rpool/ROOT                6.94G   500G       96K  /rpool/ROOT
rpool/ROOT/pve-1          6.94G   500G     6.94G  /
rpool/data                 392G   500G       96K  /rpool/data
rpool/data/vm-101-disk-0  20.5G   500G     19.0G  -
rpool/data/vm-102-disk-0   372G   500G      370G  -

So I should
  • shutdown node A
  • zfs rename
    Code:
    rpool/data/vm102-disk-0
    to
    Code:
    rpool/data/vm-103-disk-0
  • in /etc/pve/qemu-server/103.conf change
    Code:
    scsi0: local-zfs:vm-102-disk-0,size=512
    to
    Code:
    scsi0: local-zfs:vm-103-disk-0,size=512
  • restart node A and VM 103?
 
So I should
  • shutdown node A
If you only have two nodes, you need both for the cluster to be quorate (except if you have a QDevice). And I thought VM 103 is on node A? So you need it up to rename the volumes in the first place.

  • zfs rename
    Code:
    rpool/data/vm102-disk-0
    to
    Code:
    rpool/data/vm-103-disk-0
  • in /etc/pve/qemu-server/103.conf change
    Code:
    scsi0: local-zfs:vm-102-disk-0,size=512
    to
    Code:
    scsi0: local-zfs:vm-103-disk-0,size=512
Yes, that should work. Still, it's always a good idea to have a working backup before attempting such things.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!