I shut down a node, now /etc/pve/qemu-server is out of sync

holckj · Oct 16, 2022

I have a small cluster with replication of one VM from one node (A) to another (B).

I shut down node A, as I needed to change RAM, but when the node rebooted, the VM (102) on it had disappeared. When I looked into /etc/pve/qemu-server, the file 102.conf was missing. I couldn't make a new file 102.conf, but got the message "cannot create regular file '102.conf': File exists".

I tried shutting down node B, but still couldn't make a new 102.conf file. Now I have made a new 103.conf file, and it seems to work. VM 103 is running nicely, exactly where VM 102 left off. But how do I get replication to work again? Anything I should do before starting node B?

fiona · Oct 17, 2022

Hi,
can you check if the config file exists in the directory for another node ls /etc/pve/nodes/*/*/102.conf (the cluster filesystem is shared, so it's enough to run the command on one node)? If the VM was HA-managed it might've been recovered to another node.

So the 103.conf now references the disks of 102?

How many nodes are there in the cluster?

holckj · Oct 19, 2022

Hello, thanks for your interest.

Only two nodes in the cluster
The VM was HA-managed
Yes, the config file was present in node B
Yes, 103.conf new references the disks of 102. 103 is humming away and serving its users nicely. No one except me as sysadm has noticed any changes

The two nodes are quite different in performance. So I've been using node A as the primary node for VM 102, and only kept the replicated VM on node B as a kind of backup. But I guess that when I shut down node A, after some time node B took over as host for VM 102. And then when I brought back node A, something went wrong.

I also guess that the best thing for me to do now is remove VM 102 from node B, and start replicating VM 103 from node A to node B.

fiona · Oct 20, 2022

holckj said:
Hello, thanks for your interest.

Only two nodes in the cluster

The VM was HA-managed

If you haven't got one already, with two nodes, a QDevice for vote support is highly recommended. Without that, you don't have quorum and thus the cluster is not operational when one of the nodes is down. You can use HA in a setup with two nodes and a QDevice, but having at least three nodes is highly recommended, please see the requirements.

holckj said:
Yes, the config file was present in node B

Yes, 103.conf new references the disks of 102. 103 is humming away and serving its users nicely. No one except me as sysadm has noticed any changes

The two nodes are quite different in performance. So I've been using node A as the primary node for VM 102, and only kept the replicated VM on node B as a kind of backup.

If you don't intend to run the machine on node B, you'd only need replication and not HA.

holckj said:
But I guess that when I shut down node A, after some time node B took over as host for VM 102. And then when I brought back node A, something went wrong.

Yes, that's exactly what HA is for. I guess you would've just needed to migrate the VM back to the other node afterwards. You can also define a HA group where node A has higher priority. Then this would happen automatically.

holckj said:
I also guess that the best thing for me to do now is remove VM 102 from node B, and start replicating VM 103 from node A to node B.

I'd recommend renaming the disks of 103 first, so that the ID from the config file and referenced disks match again. To do it during production, if you have another storage with enough space, you could use the Move Disk operation, move to the other storage and then back to the ZFS storage. That would also take care of renaming. But if you can afford a bit of downtime, doing it with zfs rename is much quicker and easier.

holckj · Oct 21, 2022

Thanks again for your support. It is ok for me to shut down node A for a short time. Here's the output of zfs list:

Code:

root@sonja:~# zfs list
NAME                       USED  AVAIL     REFER  MOUNTPOINT
rpool                      399G   500G      104K  /rpool
rpool/ROOT                6.94G   500G       96K  /rpool/ROOT
rpool/ROOT/pve-1          6.94G   500G     6.94G  /
rpool/data                 392G   500G       96K  /rpool/data
rpool/data/vm-101-disk-0  20.5G   500G     19.0G  -
rpool/data/vm-102-disk-0   372G   500G      370G  -

So I should

shutdown node A

zfs rename

Code:

rpool/data/vm102-disk-0

to

Code:

rpool/data/vm-103-disk-0

in /etc/pve/qemu-server/103.conf change

Code:

scsi0: local-zfs:vm-102-disk-0,size=512

to

Code:

scsi0: local-zfs:vm-103-disk-0,size=512

restart node A and VM 103?

fiona · Oct 24, 2022

holckj said:
So I should

shutdown node A

If you only have two nodes, you need both for the cluster to be quorate (except if you have a QDevice). And I thought VM 103 is on node A? So you need it up to rename the volumes in the first place.

holckj said:
zfs rename

Code:

rpool/data/vm102-disk-0

to

Code:

rpool/data/vm-103-disk-0

in /etc/pve/qemu-server/103.conf change

Code:

scsi0: local-zfs:vm-102-disk-0,size=512

to

Code:

scsi0: local-zfs:vm-103-disk-0,size=512

Yes, that should work. Still, it's always a good idea to have a working backup before attempting such things.

Search

Search

I shut down a node, now /etc/pve/qemu-server is out of sync

holckj

Active Member

fiona

Proxmox Staff Member

holckj

Active Member

fiona

Proxmox Staff Member

holckj

Active Member

fiona

Proxmox Staff Member

We value your privacy