error with vma_writer

YellowShed · Oct 31, 2019

Hello,

When trying to create an archive with a command like

Code:

vma create vzdump-qemu-100-2019_10_31-08_04_33.vma -c qemu-server.conf drive-scsi0=disk-drive-scsi0.raw drive-scsi1=disk-drive-scsi1.raw drive-scsi2=disk-drive-scsi2.raw

I often get errors:
(process:10285): ERROR **: 05:07:08.691: vma_writer_register_stream 'drive-scsi1' failed

Yet re-issuing the exact same command will fail with 'drive-scsi2' again in this case. Finally, it will create a perfectly good archive on the fourth try. I quite often get this with one hard disk and en efi disk on the machines as well.

I'm running "pve-manager/6.0-7/28984024 (running kernel: 5.0.21-3-pve)"

Any way to tell vma create how many disks it should be expecting?

Thanks,

Tervor

Stefan_R · Oct 31, 2019

I can't reproduce your issue on my end... Could you post your qemu-server.conf?

Why are you using the 'vma' command directly anyway? Usually you'd use vzdump to create backup archives.

YellowShed · Oct 31, 2019

Stefan,

Thanks for your response. An example of the qemu-server.conf is:

agent: 1
bios: ovmf
bootdisk: scsi0
cores: 4
cpu: IvyBridge
efidisk0: gluster-vm:301/vm-301-disk-1.raw,size=128K
machine: q35
memory: 4096
name: virtual-win10-187
net0: virtio=CE:5C:B3:A0:B1:68,bridge=vmbr26,firewall=1
numa: 0
ostype: win10
scsi0: gluster-vm:301/vm-301-disk-0.raw,cache=writeback,iothread=1,size=64G
scsihw: virtio-scsi-pci
smbios1: uuid=5dffdc93-aa6b-4621-9cb0-6880f56249b8
sockets: 1
vga: std
vmgenid: 4ccb4f32-bc03-4813-b41f-e46ea7e53728
#qmdump#map:efidisk0:drive-efidisk0:gluster-vm:raw:
#qmdump#map:scsi0:drive-scsi0:gluster-vm:raw:

However, this has worked twice today since the issue this morning. I have had the issue several times in the past, just can't get it to repeat right now. I even vma extracted a multi-disk backup and couldn't get it to re-create the archive straight away.

If I have another incident where it doesn't create the archive first time, I'll repost the qemu-server.conf as requested.

The reason I'm creating the archives is that we've had issues of machines corrupting after running the vzdump via the backup process over the last few weeks. Machines would fail to start, start in a hung mode, windows vms would blue-screen, etc. Sometimes this would be minutes to hours after the backup. Given the startup problems, I created a script to shutdown and startup the machines and that part seemed to be fine, going through several repeats with no machines hanging. Therefore, I thought I had isolated it to the actual vzdump process and have had mostly success with running a tar/untar copy of the machine's hard disk while it was shut down then running the vma create command to create the archives.

I have seen another thread about this sort of behaviour with the backup process causing problems:
https://forum.proxmox.com/threads/k...ks-during-backup-restore-migrate.34362/page-4
which seems identical to my issues. My issues only started happening regularly about a month ago - it had been pretty robust before that. It seems to sort of be linked to Proxmox 6 though I can't recall now exactly when the problem occurred. The corruption stemmed from bad hardware (dead SSD triggering a whole system reconfigure to another config which was followed by other SSDs being worn out), so I can't be sure exactly when things went awry. The bad hardware has been replaced now. Thing is, I have built brand new linux vms and have seen them crash as well, so I can't definitely say it has all been linked to the bad hardware. I saw so many parallels to the issues linked to the thread above, I thought I would try an alternative to the vzdump process. I have had a couple of machines hang since the weekend, but that is an order of magnitude less than where things were a few weeks ago.

I hope that's not too long an explanation, and if you've got any ideas, I'm all ears...

The cluster is 4 nodes with storage and 4 nodes with VMs. The storage nodes have replica 3 arbiter 1 gluster shares while the VM nodes have no storage, plenty of ram for the vms we run and dual multicore Xeon processors. I've been happily running Proxmox for many years since version 3 or so.

Trevor

YellowShed · Nov 1, 2019

So several more failures with the vma_writer issue from the following qemu-server.conf files:
agent: 1
bios: ovmf
boot: cdn
bootdisk: scsi0
cores: 8
cpu: IvyBridge
keyboard: en-gb
machine: q35
memory: 24576
name: 2016server-28
net0: virtio=06:F1:3B:F0:3A:79,bridge=vmbr26
numa: 1
ostype: win10
scsi0: gluster-vm:228/vm-228-disk-0.raw,cache=writeback,iothread=1,size=64G
scsi1: gluster-vm:228/vm-228-disk-1.raw,cache=writeback,iothread=1,size=32G
scsi2: gluster-vm:228/vm-228-disk-2.raw,cache=writeback,iothread=1,size=192G
scsi3: gluster-vm:228/vm-228-disk-3.raw,cache=writeback,iothread=1,size=192G
scsihw: virtio-scsi-single
smbios1: uuid=4cad5b52-5f37-47b5-a438-5832d6379acd
sockets: 1
vga: std
#qmdump#map:scsi0:drive-scsi0:gluster-vm:raw:
#qmdump#map:scsi1:drive-scsi1:gluster-vm:raw:
#qmdump#map:scsi2:drive-scsi2:gluster-vm:raw:
#qmdump#map:scsi3:drive-scsi3:gluster-vm:raw:

issued as this command:
<code>
vma create vzdump-qemu-228-2019_11_01-00_54_55.vma -c qemu-server.c onf drive-scsi0=disk-drive-scsi0.raw drive-scsi1=disk-drive-scsi1.raw drive-scsi2=disk-drive- scsi2.raw drive-scsi3=disk-drive-scsi3.raw
</code>
gave the error:
<code>
** (process:9798): ERROR **: 00:54:56.089: vma_writer_register_stream 'drive-scsi2' failed
./backup_cmd_line.sh: line 105: 9798 Trace/breakpoint trap vma create vzdump-qemu-$vm-$dat estamp.vma -c qemu-server.conf $drivescsilist $driveefidisk
</code>

and this conf file:
agent: 1
bios: ovmf
boot: cdn
bootdisk: scsi0
cores: 4
cpu: IvyBridge
efidisk0: gluster-vm:403/vm-403-disk-0.qcow2,size=128K
keyboard: en-gb
machine: q35
memory: 4096
name: win10-user-186
net0: virtio=AA:EB:3B:75:F7:37,bridge=vmbr26
numa: 0
ostype: win10
scsi0: gluster-vm:403/vm-403-disk-0.raw,cache=writeback,iothread=1,size=64G
scsihw: virtio-scsi-single
smbios1: uuid=19ab72e0-2e27-471c-ac22-fabc5a440296
sockets: 1
vga: std,memory=32
#qmdump#map:efidisk0:drive-efidisk0:gluster-vm:raw:
#qmdump#map:scsi0:drive-scsi0:gluster-vm:raw:

failed with the same message. My script now repeats the exact vma create command and, in the second case above it completed, whereas the first case above, it failed on the first repeat with drive-scsi2 failure, but another loop round to issue the vma create command finally resulted in creating the vma archive.
Strangely, the following qemu-conf:
agent: 1
bios: ovmf
bootdisk: scsi0
cores: 8
cpu: IvyBridge
machine: q35
memory: 16384
name: 2019server-30
net0: virtio=46:F4:C7:86:0F:AF,bridge=vmbr26,firewall=1
numa: 0
ostype: win10
scsi0: gluster-vm:233/vm-233-disk-0.raw,cache=writeback,iothread=1,size=64G
scsi1: gluster-vm:233/vm-233-disk-2.raw,cache=writeback,iothread=1,size=20G
scsi2: gluster-vm:233/vm-233-disk-1.raw,cache=writeback,iothread=1,size=128G
scsi3: gluster-vm:233/vm-233-disk-3.raw,cache=writeback,iothread=1,size=192G
scsihw: virtio-scsi-pci
smbios1: uuid=abc062fa-e0cb-40ee-ae10-c37e9793125c
sockets: 1
vga: std
vmgenid: 0050a8d8-8203-407d-ab94-7072b6e7ffaa
#qmdump#map:scsi0:drive-scsi0:gluster-vm:raw:
#qmdump#map:scsi1:drive-scsi1:gluster-vm:raw:
#qmdump#map:scsi2:drive-scsi2:gluster-vm:raw:
#qmdump#map:scsi3:drive-scsi3:gluster-vm:raw:

failed at the drive-scsi2 and only required one repeat of the vma create command to complete.

Very strange.

Stefan_R · Nov 4, 2019

I tried to replicate the situation on my end as closely as possible but cannot get the behaviour you're describing...

Judging by your naming scheme, you are using glusterfs? Does the bug only occur there or on local storage as well?

Also, we had some issues with backing up disks using "iothread=1" in the past, and although that should be fixed in the most recent versions of PVE, could you try and see if the issue persists without using iothreads?

YellowShed · Nov 4, 2019

Stefan,

Thanks for the further feedback. I appreciate your continued help with this, as it's still occurring, though as I said, my script now repeats until the vma is created, which always happens once the error goes past the last disk, if you follow me.

While the source files are on glusterfs, they are copied to a zfs local storage in the script and the vma create is then local to that storage, so I don't think that it's a gluster problem. I could try changing the script to use the glusterfs rather than zfs, perhaps?

I will try removing the iothread=1 in the config file - I could see that maybe the vma create might detect that from the config file.

Trevor

YellowShed · Nov 5, 2019

Stefan,

So I changed the config of the win 2k16 server that exhibits the issue to remove the iothread:
agent: 1
bios: ovmf
boot: cdn
bootdisk: scsi0
cores: 8
cpu: IvyBridge
keyboard: en-gb
lock: backup
machine: q35
memory: 24576
name: 2016server-28
net0: virtio=06:F1:3B:F0:3A:79,bridge=vmbr26
numa: 1
ostype: win10
scsi0: gluster-vm:228/vm-228-disk-0.raw,cache=writeback,size=64G
scsi1: gluster-vm:228/vm-228-disk-1.raw,cache=writeback,size=32G
scsi2: gluster-vm:228/vm-228-disk-2.raw,cache=writeback,size=192G
scsi3: gluster-vm:228/vm-228-disk-3.raw,cache=writeback,size=192G
scsihw: virtio-scsi-single
smbios1: uuid=4cad5b52-5f37-47b5-a438-5832d6379acd
sockets: 1
vga: std
#qmdump#map:scsi0:drive-scsi0:gluster-vm:raw:
#qmdump#map:scsi1:drive-scsi1:gluster-vm:raw:
#qmdump#map:scsi2:drive-scsi2:gluster-vm:raw:
#qmdump#map:scsi3:drive-scsi3:gluster-vm:raw:

It still had issues with the vma create on this backup server. I didn't get a chance to try backing up to one of the storage nodes with the gluster storage.

If it helps, my pveversion -v:

proxmox-ve: 6.0-2 (running kernel: 5.0.21-3-pve)
pve-manager: 6.0-7 (running version: 6.0-7/28984024)
pve-kernel-5.0: 6.0-9
pve-kernel-helper: 6.0-9
pve-kernel-4.15: 5.4-6
pve-kernel-5.0.21-3-pve: 5.0.21-7
pve-kernel-5.0.21-2-pve: 5.0.21-7
pve-kernel-5.0.21-1-pve: 5.0.21-2
pve-kernel-5.0.18-1-pve: 5.0.18-3
pve-kernel-4.15.18-18-pve: 4.15.18-44
pve-kernel-4.15.18-10-pve: 4.15.18-32
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.0.2-pve2
criu: 3.11-3
glusterfs-client: 6.5-1
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.12-pve1
libpve-access-control: 6.0-2
libpve-apiclient-perl: 3.0-2
libpve-common-perl: 6.0-5
libpve-guest-common-perl: 3.0-1
libpve-http-server-perl: 3.0-2
libpve-storage-perl: 6.0-9
libqb0: 1.0.5-1
lvm2: 2.03.02-pve3
lxc-pve: 3.1.0-65
lxcfs: 3.0.3-pve60
novnc-pve: 1.1.0-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.0-7
pve-cluster: 6.0-7
pve-container: 3.0-7
pve-docs: 6.0-4
pve-edk2-firmware: 2.20190614-1
pve-firewall: 4.0-7
pve-firmware: 3.0-2
pve-ha-manager: 3.0-2
pve-i18n: 2.0-3
pve-qemu-kvm: 4.0.0-6
pve-xtermjs: 3.13.2-1
qemu-server: 6.0-9
smartmontools: 7.0-pve2
spiceterm: 3.1-1
vncterm: 1.6-1
zfsutils-linux: 0.8.2-pve1

Still strange that it give the error but is not reproducible by you. Is it something in my particular config - I'm running pve-test on this machine.

Search

Search

error with vma_writer

YellowShed

Member

Stefan_R

Proxmox Retired Staff

YellowShed

Member

YellowShed

Member

Stefan_R

Proxmox Retired Staff

YellowShed

Member

YellowShed

Member