[SOLVED] HA Migration creates a lot of dublicate disks

Treeka

Member
Oct 8, 2021
8
0
6
26
I have a Cluster of 3 and would like to migrate them. However we found out that it just creates a lot of dublicate disks. So I have been migrating a TestMigrationVM around all 3 nodes. I would like to post the Output of a Migrationg which takes really long for just one disk of 8GiB.

I have no idea what I am doing wrong. I mostly work just using the WebUI.
The version of each proxmox is (node1 (6.4-13), node2 (7.4-15), node3 (7.4-16). The lvm's of each node is shared between all.
The goal is to have an migration if one node goes down.

Here is the Log file of the most reason Migration tested on a TestMigrationVM.
It is a migration from node3 to node2. I had to stop it myself because it was using over a 100 of gb on all lvm's and the process of moving a 8GiB disk would take longer than 2 hours.



Code:
task started by HA resource agent
2023-10-03 11:37:37 starting migration of VM 800 to node 'node2' ({{IP}})
2023-10-03 11:37:38 found local disk 'node2-lvm:vm-800-disk-0' (via storage)
2023-10-03 11:37:38 found local disk 'node2-lvm:vm-800-disk-1' (via storage)
2023-10-03 11:37:38 found local disk 'node2-lvm:vm-800-disk-10' (via storage)
2023-10-03 11:37:38 found local disk 'node2-lvm:vm-800-disk-11' (via storage)
2023-10-03 11:37:38 found local disk 'node2-lvm:vm-800-disk-12' (via storage)
2023-10-03 11:37:38 found local disk 'node2-lvm:vm-800-disk-13' (via storage)
2023-10-03 11:37:38 found local disk 'node2-lvm:vm-800-disk-14' (via storage)
2023-10-03 11:37:38 found local disk 'node2-lvm:vm-800-disk-15' (via storage)
2023-10-03 11:37:38 found local disk 'node2-lvm:vm-800-disk-16' (via storage)
2023-10-03 11:37:38 found local disk 'node2-lvm:vm-800-disk-17' (via storage)
2023-10-03 11:37:38 found local disk 'node2-lvm:vm-800-disk-18' (via storage)
2023-10-03 11:37:38 found local disk 'node2-lvm:vm-800-disk-19' (via storage)
2023-10-03 11:37:38 found local disk 'node2-lvm:vm-800-disk-2' (via storage)
2023-10-03 11:37:38 found local disk 'node2-lvm:vm-800-disk-20' (via storage)
2023-10-03 11:37:38 found local disk 'node2-lvm:vm-800-disk-21' (via storage)
2023-10-03 11:37:38 found local disk 'node2-lvm:vm-800-disk-22' (via storage)
2023-10-03 11:37:38 found local disk 'node2-lvm:vm-800-disk-23' (via storage)
2023-10-03 11:37:38 found local disk 'node2-lvm:vm-800-disk-24' (via storage)
2023-10-03 11:37:38 found local disk 'node2-lvm:vm-800-disk-25' (via storage)
2023-10-03 11:37:38 found local disk 'node2-lvm:vm-800-disk-26' (in current VM config)
2023-10-03 11:37:38 found local disk 'node2-lvm:vm-800-disk-3' (via storage)
2023-10-03 11:37:38 found local disk 'node2-lvm:vm-800-disk-4' (via storage)
2023-10-03 11:37:38 found local disk 'node2-lvm:vm-800-disk-5' (via storage)
2023-10-03 11:37:38 found local disk 'node2-lvm:vm-800-disk-6' (via storage)
2023-10-03 11:37:38 found local disk 'node2-lvm:vm-800-disk-7' (via storage)
2023-10-03 11:37:38 found local disk 'node2-lvm:vm-800-disk-8' (via storage)
2023-10-03 11:37:38 found local disk 'node2-lvm:vm-800-disk-9' (via storage)
2023-10-03 11:37:38 found local disk 'node3-lvm:vm-800-disk-0' (via storage)
2023-10-03 11:37:38 found local disk 'node3-lvm:vm-800-disk-1' (via storage)
2023-10-03 11:37:38 found local disk 'node3-lvm:vm-800-disk-10' (via storage)
2023-10-03 11:37:38 found local disk 'node3-lvm:vm-800-disk-11' (via storage)
2023-10-03 11:37:38 found local disk 'node3-lvm:vm-800-disk-12' (via storage)
2023-10-03 11:37:38 found local disk 'node3-lvm:vm-800-disk-13' (via storage)
2023-10-03 11:37:38 found local disk 'node3-lvm:vm-800-disk-14' (via storage)
2023-10-03 11:37:38 found local disk 'node3-lvm:vm-800-disk-15' (via storage)
2023-10-03 11:37:38 found local disk 'node3-lvm:vm-800-disk-16' (via storage)
2023-10-03 11:37:38 found local disk 'node3-lvm:vm-800-disk-17' (via storage)
2023-10-03 11:37:38 found local disk 'node3-lvm:vm-800-disk-18' (via storage)
2023-10-03 11:37:38 found local disk 'node3-lvm:vm-800-disk-19' (via storage)
2023-10-03 11:37:38 found local disk 'node3-lvm:vm-800-disk-2' (via storage)
2023-10-03 11:37:38 found local disk 'node3-lvm:vm-800-disk-20' (via storage)
2023-10-03 11:37:38 found local disk 'node3-lvm:vm-800-disk-21' (via storage)
2023-10-03 11:37:38 found local disk 'node3-lvm:vm-800-disk-22' (via storage)
2023-10-03 11:37:38 found local disk 'node3-lvm:vm-800-disk-23' (via storage)
2023-10-03 11:37:38 found local disk 'node3-lvm:vm-800-disk-24' (via storage)
2023-10-03 11:37:38 found local disk 'node3-lvm:vm-800-disk-25' (via storage)
2023-10-03 11:37:38 found local disk 'node3-lvm:vm-800-disk-26' (via storage)
2023-10-03 11:37:38 found local disk 'node3-lvm:vm-800-disk-3' (via storage)
2023-10-03 11:37:38 found local disk 'node3-lvm:vm-800-disk-4' (via storage)
2023-10-03 11:37:38 found local disk 'node3-lvm:vm-800-disk-5' (via storage)
2023-10-03 11:37:38 found local disk 'node3-lvm:vm-800-disk-6' (via storage)
2023-10-03 11:37:38 found local disk 'node3-lvm:vm-800-disk-7' (via storage)
2023-10-03 11:37:38 found local disk 'node3-lvm:vm-800-disk-8' (via storage)
2023-10-03 11:37:38 found local disk 'node3-lvm:vm-800-disk-9' (via storage)
2023-10-03 11:37:38 found local disk 'node1-lvm:vm-800-disk-0' (via storage)
2023-10-03 11:37:38 found local disk 'node1-lvm:vm-800-disk-1' (via storage)
2023-10-03 11:37:38 found local disk 'node1-lvm:vm-800-disk-10' (via storage)
2023-10-03 11:37:38 found local disk 'node1-lvm:vm-800-disk-11' (via storage)
2023-10-03 11:37:38 found local disk 'node1-lvm:vm-800-disk-12' (via storage)
2023-10-03 11:37:38 found local disk 'node1-lvm:vm-800-disk-13' (via storage)
2023-10-03 11:37:38 found local disk 'node1-lvm:vm-800-disk-14' (via storage)
2023-10-03 11:37:38 found local disk 'node1-lvm:vm-800-disk-15' (via storage)
2023-10-03 11:37:38 found local disk 'node1-lvm:vm-800-disk-16' (via storage)
2023-10-03 11:37:38 found local disk 'node1-lvm:vm-800-disk-17' (via storage)
2023-10-03 11:37:38 found local disk 'node1-lvm:vm-800-disk-18' (via storage)
2023-10-03 11:37:38 found local disk 'node1-lvm:vm-800-disk-19' (via storage)
2023-10-03 11:37:38 found local disk 'node1-lvm:vm-800-disk-2' (via storage)
2023-10-03 11:37:38 found local disk 'node1-lvm:vm-800-disk-20' (via storage)
2023-10-03 11:37:38 found local disk 'node1-lvm:vm-800-disk-21' (via storage)
2023-10-03 11:37:38 found local disk 'node1-lvm:vm-800-disk-22' (via storage)
2023-10-03 11:37:38 found local disk 'node1-lvm:vm-800-disk-23' (via storage)
2023-10-03 11:37:38 found local disk 'node1-lvm:vm-800-disk-24' (via storage)
2023-10-03 11:37:38 found local disk 'node1-lvm:vm-800-disk-25' (via storage)
2023-10-03 11:37:38 found local disk 'node1-lvm:vm-800-disk-26' (via storage)
2023-10-03 11:37:38 found local disk 'node1-lvm:vm-800-disk-3' (via storage)
2023-10-03 11:37:38 found local disk 'node1-lvm:vm-800-disk-4' (via storage)
2023-10-03 11:37:38 found local disk 'node1-lvm:vm-800-disk-5' (via storage)
2023-10-03 11:37:38 found local disk 'node1-lvm:vm-800-disk-6' (via storage)
2023-10-03 11:37:38 found local disk 'node1-lvm:vm-800-disk-7' (via storage)
2023-10-03 11:37:38 found local disk 'node1-lvm:vm-800-disk-8' (via storage)
2023-10-03 11:37:38 found local disk 'node1-lvm:vm-800-disk-9' (via storage)
2023-10-03 11:37:38 copying local disk images
2023-10-03 11:37:40   Logical volume "vm-800-disk-0" created.
2023-10-03 11:39:01 131072+0 records in
2023-10-03 11:39:01 131072+0 records out
2023-10-03 11:39:01 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 81.8667 s, 105 MB/s
2023-10-03 11:40:17 107+441954 records in
2023-10-03 11:40:17 107+441954 records out
2023-10-03 11:40:17 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 156.724 s, 54.8 MB/s
2023-10-03 11:40:17 successfully imported 'node2-lvm:vm-800-disk-0'
2023-10-03 11:40:17 volume 'node2-lvm:vm-800-disk-0' is 'node2-lvm:vm-800-disk-0' on the target
2023-10-03 11:40:19   Logical volume "vm-800-disk-1" created.
2023-10-03 11:41:41 131072+0 records in
2023-10-03 11:41:41 131072+0 records out
2023-10-03 11:41:41 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 82.7546 s, 104 MB/s
2023-10-03 11:43:01 177+513984 records in
2023-10-03 11:43:01 177+513984 records out
2023-10-03 11:43:01 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 161.625 s, 53.1 MB/s
2023-10-03 11:43:01 successfully imported 'node2-lvm:vm-800-disk-1'
2023-10-03 11:43:01 volume 'node2-lvm:vm-800-disk-1' is 'node2-lvm:vm-800-disk-1' on the target
2023-10-03 11:43:04   Logical volume "vm-800-disk-10" created.
2023-10-03 11:44:25 131072+0 records in
2023-10-03 11:44:25 131072+0 records out
2023-10-03 11:44:25 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 82.3946 s, 104 MB/s
2023-10-03 11:45:41 68+517294 records in
2023-10-03 11:45:41 68+517294 records out
2023-10-03 11:45:41 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 157.746 s, 54.5 MB/s
2023-10-03 11:45:41 successfully imported 'node2-lvm:vm-800-disk-10'
2023-10-03 11:45:41 volume 'node2-lvm:vm-800-disk-10' is 'node2-lvm:vm-800-disk-10' on the target
2023-10-03 11:45:44   Logical volume "vm-800-disk-11" created.
2023-10-03 11:47:04 131072+0 records in
2023-10-03 11:47:04 131072+0 records out
2023-10-03 11:47:04 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 81.6989 s, 105 MB/s
2023-10-03 11:48:18 111+410314 records in
2023-10-03 11:48:18 111+410314 records out
2023-10-03 11:48:18 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 154.086 s, 55.7 MB/s
2023-10-03 11:48:18 successfully imported 'node2-lvm:vm-800-disk-11'
2023-10-03 11:48:18 volume 'node2-lvm:vm-800-disk-11' is 'node2-lvm:vm-800-disk-11' on the target
2023-10-03 11:48:20   Logical volume "vm-800-disk-12" created.
2023-10-03 11:49:41 131072+0 records in
2023-10-03 11:49:41 131072+0 records out
2023-10-03 11:49:41 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 82.1604 s, 105 MB/s
2023-10-03 11:50:56 133+432990 records in
2023-10-03 11:50:56 133+432990 records out
2023-10-03 11:50:56 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 155.889 s, 55.1 MB/s
2023-10-03 11:50:56 successfully imported 'node2-lvm:vm-800-disk-12'
2023-10-03 11:50:56 volume 'node2-lvm:vm-800-disk-12' is 'node2-lvm:vm-800-disk-12' on the target
2023-10-03 11:50:59   Logical volume "vm-800-disk-13" created.
2023-10-03 11:52:19 131072+0 records in
2023-10-03 11:52:19 131072+0 records out
2023-10-03 11:52:19 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 81.6876 s, 105 MB/s
[Trimed Output because I was unable to post anything that longer than 16384 Characters.]
 

Attachments

  • NotTrimedOutput.txt
    26.2 KB · Views: 4
The evidence in the provided log and your description do not fully match. Or rather the terminology used is not inline with what most people using PVE would expect.
The version of each proxmox is (node1 (6.4-13), node2 (7.4-15), node3 (7.4-16)
Mixing major releases is not an officially supported cluster configuration.
The lvm's of each node is shared between all.
"shared" is a specific attribute of storage in PVE that is used when a disk is seen on more than one node at the same time. When "shared" storage is used there is no need to copy data on VM migration event. The data is already seen by target node.

In your case you appear to have _LOCAL_ LVM on each node, the configuration is naturally in the /etc/pve/storage.cfg. This configuration file is located in a special clustered filesystem (/etc/pve) which is seen and used by all nodes at the same time.
This is a valid configuration, however in most cases you would want to name the storage object consistently across nodes precisely to simplify migration.

You have only shown a small snippet of the log after what appears to be many migrations. The log clearly shows that you have many disks across your nodes that are not referenced in VM config, how and when it all started is not in the log. I recommend that you remove all the extra disks across all nodes manually, clean up the system fully, then test it again. When/if you reproduce this on a clean system, in addition to log you should also provide:
a) /etc/pve/storage.cfg
b) pvesm status
c) qm config [test_vmid]
d) pvesm list [source_storage_prior_to_migration]
e) pvesm list [target_storage_prior_to_migration]
f) (d) and (e) after migration

And, of course, prior to any testing - upgrade your systems to at least latest 7.x

Good luck


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
Okay thank you for the info. Here are some additional sources. I deleted the old one and made a new VM. Here are the logs on its first migration.
Code:
dir: local
        path /var/lib/vz
        content iso,images,snippets,rootdir,vztmpl
        prune-backups keep-all=1
        shared 0

dir: backup
        path /backup
        content backup
        prune-backups keep-all=1
        shared 0

lvmthin: node1-lvm
        thinpool data
        vgname vg0
        content rootdir,images

lvmthin: node2-lvm
        thinpool data
        vgname vg0
        content images,rootdir

pbs: Backup
        datastore backupDir
        server backup.###.eu
        content backup
        nodes node2,node1
        prune-backups keep-all=1
        username root@pam

lvmthin: node3-lvm
        thinpool data
        vgname vg0
        content rootdir,images

pbs: HetznerStorageBox
        datastore Proxmox-SB
        server backup.###.eu
        content backup
        prune-backups keep-all=1
        username root@pam

Pvesm status shows problems with the StorageBox but I believe that should not effect the migration and therfor I can probably look at it a different time.
Code:
proxmox-backup-client failed: Error: unable to open chunk store 'Proxmox-SB' at "/mnt/hetzner-storage/.chunks" - Host is down (os error 112)
Name                     Type     Status           Total            Used       Available        %
Backup                    pbs   disabled               0               0               0      N/A
HetznerStorageBox         pbs   inactive               0               0               0    0.00%
backup                    dir     active       527322552         8296888       492165736    1.57%
node2-lvm             lvmthin     active      3113852928      1698295386      1415557541   54.54%
node3-lvm             lvmthin     active      3113852928      1698295386      1415557541   54.54%
local                     dir     active       527322552         8296888       492165736    1.57%
node1-lvm             lvmthin     active      3113852928      1698295386      1415557541   54.54%

Code:
boot: order=scsi0;net0
cores: 2
memory: 2048
meta: creation-qemu=7.2.0,ctime=1696410308
name: test-migration
net0: virtio=0E:AD:6C:DB:94:E9,bridge=vmbr4000,firewall=1
numa: 0
ostype: l26
scsi0: node2-lvm:vm-800-disk-0,size=8G
scsihw: virtio-scsi-pci
smbios1: uuid=df3a5d2d-cc15-4cb3-9792-f302aea9a59c
sockets: 1
vmgenid: 869c39d5-acb3-481d-84e2-6234ee2dc7ba


Code:
Volid                 Format  Type             Size VMID
node2-lvm:vm-122-disk-0 raw     images    90194313216 122
node2-lvm:vm-123-disk-0 raw     images    68719476736 123
node2-lvm:vm-210-disk-0 raw     images    34359738368 210
node2-lvm:vm-220-disk-0 raw     images    68719476736 220
node2-lvm:vm-260-disk-0 raw     images    34359738368 260
node2-lvm:vm-280-disk-0 raw     images    34359738368 280
node2-lvm:vm-280-disk-1 raw     images        4194304 280
node2-lvm:vm-800-disk-0 raw     images     8589934592 800

Code:
Volid                  Format  Type             Size VMID
node3-lvm:vm-122-disk-0 raw     images    90194313216 122
node3-lvm:vm-123-disk-0 raw     images    68719476736 123
node3-lvm:vm-210-disk-0 raw     images    34359738368 210
node3-lvm:vm-220-disk-0 raw     images    68719476736 220
node3-lvm:vm-260-disk-0 raw     images    34359738368 260
node3-lvm:vm-280-disk-0 raw     images    34359738368 280
node3-lvm:vm-280-disk-1 raw     images        4194304 280
node3-lvm:vm-800-disk-0 raw     images     8589934592 800

Starting Migration:
Warning when making Migration: Migration with local disk might take long: node2-lvm:vm-800-disk-0 (8.00 GiB)
Target storage: Current layout
From node2 to node3
Code:
Volid                 Format  Type             Size VMID
node2-lvm:vm-122-disk-0 raw     images    90194313216 122
node2-lvm:vm-123-disk-0 raw     images    68719476736 123
node2-lvm:vm-210-disk-0 raw     images    34359738368 210
node2-lvm:vm-220-disk-0 raw     images    68719476736 220
node2-lvm:vm-260-disk-0 raw     images    34359738368 260
node2-lvm:vm-280-disk-0 raw     images    34359738368 280
node2-lvm:vm-280-disk-1 raw     images        4194304 280

Code:
Volid                  Format  Type             Size VMID
node3-lvm:vm-122-disk-0 raw     images    90194313216 122
node3-lvm:vm-123-disk-0 raw     images    68719476736 123
node3-lvm:vm-210-disk-0 raw     images    34359738368 210
node3-lvm:vm-220-disk-0 raw     images    68719476736 220
node3-lvm:vm-260-disk-0 raw     images    34359738368 260
node3-lvm:vm-280-disk-0 raw     images    34359738368 280
node3-lvm:vm-280-disk-1 raw     images        4194304 280

Migration Log has been attached to post.
 

Attachments

  • Migration Log.txt
    9.7 KB · Views: 3
Okay thank you for the info. Here are some additional sources. I deleted the old one and made a new VM. Here are the logs on its first migration.
undefined
as a good practice you should add "nodes" attribute to each lvm storage to point to an appropriate owner. You can avoid doing this by naming your storage object the same across all nodes. Ie have single storage entry:
lvmthin: lvm-data
thinpool data
vgname vg0
content rootdir,images

Since the volume group is the same, and there is no "shared" attribute - PVE will just treat it as local to the node its working on. Although, unique names are actually helpful at this moment, so its really up to you.

The before output from node2 and node3 still shows that you still have duplicate disks prior to your testing. Use "pvesm free" to remove unnecessary disks.

The after output shows that the vm-800-disk-0 is no longer present on source, matching the entry in the log "Logical volume "vm-800-disk-0" successfully removed".
However, the disk is also not in your "after" output on node3. Is it possible that the output you collected is missing some steps in the between and is not the full picture?

My advice is the same - clean up your environment fully, redo your testing carefully and consistently. I ran a few tests in my environment and no stranded disks are left over. Granted, I am not running a mismatched version cluster.
As a reminder, such configurations are not tested/QA'ed by developers for every operation. You should finish the upgrade before continuing your testing.

Good luck


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
  • Like
Reactions: Treeka
Thank you for the information. There a few things I would like to clarify.


I removed all disks before creating the new one. I used the command lvremove -f vg0/vm-800-disk-0 so I thought I got all.
I created a new VM as 801 and sure enough I got vm-801-disk-0 on (node1-lvm, node2-lvm, node3-lvm).
The Before log was created before the migration but after the creation of the vm, so that is why it was already showing one disk.

I will plan to migrate node1 soon, currently just a bit scared if things break.

I have removed node1-lvm, node2-lvm and node3-lvm.
They have been replaced by the newly created lvm-data. I guess this works now since the thin-storage as been called data on each node.
The VM's still show that they use the disk node2-lvm. I can still use them but I believe they can not use the disk anymore.
Had to add them back so that it does not break. I see that just changing the config in a text editor solves this since it it the same storage after all.

I also have 2 different Nodes in there own Cluster. They are set up with ZFS so I am able to also use the replication feature. Everything works on that.

Thank you kindly for your help. I will respond next week again after the migration of node1 has been done.
 
ok, it was a little suspicios that your LVM storage on 3 nodes was absolutely identical in sizing. I gave it the benefit of a doubt that this is due to identical compute nodes from Hertzner.
I created a new VM as 801 and sure enough I got vm-801-disk-0 on (node1-lvm, node2-lvm, node3-lvm).
This suggests that this LVM is storage is located on actual _shared_ storage. If this is the case, you should stop and reassess. The entire approach is wrong and leading you towards data corruption.
In short, you cant use LVMTHIN on shared storage.


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
  • Like
Reactions: Treeka
Thank you. The issue was the LVM storage configuration. Everything is working as expected.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!