Stefan,
Thanks for your response. An example of the qemu-server.conf is:
agent: 1
bios: ovmf
bootdisk: scsi0
cores: 4
cpu: IvyBridge
efidisk0: gluster-vm:301/vm-301-disk-1.raw,size=128K
machine: q35
memory: 4096
name: virtual-win10-187
net0: virtio=CE:5C:B3:A0:B1:68,bridge=vmbr26,firewall=1
numa: 0
ostype: win10
scsi0: gluster-vm:301/vm-301-disk-0.raw,cache=writeback,iothread=1,size=64G
scsihw: virtio-scsi-pci
smbios1: uuid=5dffdc93-aa6b-4621-9cb0-6880f56249b8
sockets: 1
vga: std
vmgenid: 4ccb4f32-bc03-4813-b41f-e46ea7e53728
#qmdump#map:efidisk0:drive-efidisk0:gluster-vm:raw:
#qmdump#map:scsi0:drive-scsi0:gluster-vm:raw:
However, this has worked twice today since the issue this morning. I have had the issue several times in the past, just can't get it to repeat right now. I even vma extracted a multi-disk backup and couldn't get it to re-create the archive straight away.
If I have another incident where it doesn't create the archive first time, I'll repost the qemu-server.conf as requested.
The reason I'm creating the archives is that we've had issues of machines corrupting after running the vzdump via the backup process over the last few weeks. Machines would fail to start, start in a hung mode, windows vms would blue-screen, etc. Sometimes this would be minutes to hours after the backup. Given the startup problems, I created a script to shutdown and startup the machines and that part seemed to be fine, going through several repeats with no machines hanging. Therefore, I thought I had isolated it to the actual vzdump process and have had mostly success with running a tar/untar copy of the machine's hard disk while it was shut down then running the vma create command to create the archives.
I have seen another thread about this sort of behaviour with the backup process causing problems:
https://forum.proxmox.com/threads/k...ks-during-backup-restore-migrate.34362/page-4
which seems identical to my issues. My issues only started happening regularly about a month ago - it had been pretty robust before that. It seems to sort of be linked to Proxmox 6 though I can't recall now exactly when the problem occurred. The corruption stemmed from bad hardware (dead SSD triggering a whole system reconfigure to another config which was followed by other SSDs being worn out), so I can't be sure exactly when things went awry. The bad hardware has been replaced now. Thing is, I have built brand new linux vms and have seen them crash as well, so I can't definitely say it has all been linked to the bad hardware. I saw so many parallels to the issues linked to the thread above, I thought I would try an alternative to the vzdump process. I have had a couple of machines hang since the weekend, but that is an order of magnitude less than where things were a few weeks ago.
I hope that's not too long an explanation, and if you've got any ideas, I'm all ears...
The cluster is 4 nodes with storage and 4 nodes with VMs. The storage nodes have replica 3 arbiter 1 gluster shares while the VM nodes have no storage, plenty of ram for the vms we run and dual multicore Xeon processors. I've been happily running Proxmox for many years since version 3 or so.
Trevor