Disk resize problem

Thomas Plant · Aug 27, 2019

Hello all,

I had a very strange and scary problem with one of our VMs when resizing a disk. The discs of the VM where imported through 'qm importdisk....' from an XenServer xva file. I cloned the VM to do a test and tried to resize on the clone. VM had 3 disks (scsi0 -> 2), where I tried to resize size disk 1 which had originally 150GB, tried to add 20 GByte, but instead the disk was truncated to 20GB! All data was gone.
Did the same thing on the original VM, there I got a lock error and in the management windows showed still 150GB, but it also killed the disk, the disk was damaged and unusable. Other VirtualMachines I tried resized correctly.

So, as stupid I am (destroying every evidence), I deleted the VM and reimported from XenServer the same way I did previously. But now resizing work correctly.

Proxmox is Version 6 with subscription and updated, but the VM originally was imported through Proxmox 5. Storage is NFS on an Open-E Jovian Cluster.

Anybody experienced such a problem? Makes me a little nervous putting in production this Proxmox Cluster.

Kind Regards,
Thomas

Thomas Plant · Aug 27, 2019

Was able to reproduce the problem. Imported the VM, cloned it and tried to do an online resize of the disk....data is gone.
Made a video on the steps I did, except the clone of the disk. VM I cloned was stopped/shutdown.

Here the video: Proxmox eats my data

And I added a screenshot of the nfs mount where we can see that the disk I resized was effectively reduced to 12G....
The .conf file of the vm still showing size of 172G:

root@pve5:~# cat /etc/pve/qemu-server/112.conf
boot: cdn
bootdisk: scsi0
cores: 4
cpu: Broadwell
ide2: none,media=cdrom
memory: 8192
name: test
net0: virtio=FE:BD:B3:60:79:40,bridge=vmbr0,link_down=1
net1: virtio=8A:5E:35:6C:AD:3A,bridge=vmbr1,link_down=1
numa: 1
ostype: l26
scsi0: NFS01:112/vm-112-disk-0.qcow2,discard=on,size=14G
scsi1: NFS01:112/vm-112-disk-1.qcow2,discard=on,size=172G
scsihw: virtio-scsi-pci
smbios1: uuid=a606156a-fc4e-4681-99a2-0ec478e108e0
sockets: 2
vmgenid: 1771e21d-635e-4476-ad36-cdd74243897c

tim · Aug 27, 2019

Hi,

can you share what's in the syslog & task log of the host around the time when you cloned & resized the disk and additionally your storage configuration (/etc/pve/storage.cfg)

Thomas Plant · Aug 27, 2019

Ok, added the syslog and tar of /var/log/pve/tasks.

Operations tock place from 09:55 to aprox. 10:15 CEST

Edit: added storage.cfg file to the zip

Thomas Plant · Aug 27, 2019

ah, forgot, the 'original' VM is the 104 and the cloned one is 112.

tim · Aug 27, 2019

Ok, seems there is some sort of timeout with the connection.
Try the following:

# qm monitor 112

In the new qm shell type:

qm> info block -v drive-scsi1

Please post the output or any errors you might get.

Thomas Plant · Aug 27, 2019

Here is the output:

Code:

qm> info block -v drive-scsi1

drive-scsi1 (#block301): /mnt/pve/NFS01/images/112/vm-112-disk-0.qcow2 (qcow2)
    Attached to:      scsi1
    Cache mode:       writeback, direct
    Detect zeroes:    unmap

Images:
image: /mnt/pve/NFS01/images/112/vm-112-disk-0.qcow2
file format: qcow2
virtual size: 172G (184683593728 bytes)
disk size: 132G
cluster_size: 65536
Format specific information:
    compat: 1.1
    lazy refcounts: false
    refcount bits: 16
    corrupt: false
qm> info block -v drive-scsi1

drive-scsi1 (#block301): /mnt/pve/NFS01/images/112/vm-112-disk-0.qcow2 (qcow2)
    Attached to:      scsi1
    Cache mode:       writeback, direct
    Detect zeroes:    unmap

Images:
image: /mnt/pve/NFS01/images/112/vm-112-disk-0.qcow2
file format: qcow2
virtual size: 10G (10737418240 bytes)
disk size: 7.5G
cluster_size: 65536
Format specific information:
    compat: 1.1
    lazy refcounts: false
    refcount bits: 16
    corrupt: false

Code:

qm> info block -v drive-scsi0                                                                                                                                        

drive-scsi0 (#block113): /mnt/pve/NFS01/images/112/vm-112-disk-1.qcow2 (qcow2)
    Attached to:      scsi0
    Cache mode:       writeback, direct
    Detect zeroes:    unmap

Images:
image: /mnt/pve/NFS01/images/112/vm-112-disk-1.qcow2
file format: qcow2
virtual size: 14G (15032385536 bytes)
disk size: 7.5G
cluster_size: 65536
Format specific information:
    compat: 1.1
    lazy refcounts: false
    refcount bits: 16
    corrupt: false

Could it be that the clone process rotated the disk assignment? As in the 112.conf the disks are now configure this way:
scsi0: NFS01:112/vm-112-disk-1.qcow2,discard=on,size=14G
scsi1: NFS01:112/vm-112-disk-0.qcow2,discard=on,size=172G

And on the original VM they are ordered the right way around:
scsi0: NFS01:104/vm-104-disk-0.qcow2,discard=on,size=14G
scsi1: NFS01:104/vm-104-disk-1.qcow2,discard=on,size=172G

tim · Aug 27, 2019

In the first code block, you did the "info block -v drive-scsi1" twice, with different results. Did you change anything in between or did it report 2 different images for the same drive?

Thomas Plant · Aug 27, 2019

Edit: I am a litte confused....in the first code block I did the first command before the resize and the second after the resize

Thomas Plant · Aug 27, 2019

Tried to clone the 104 VM a second time and now even this errors out:

Code:

transferred: 14646053227 bytes remaining: 386332309 bytes total: 15032385536 bytes progression: 97.43 %
transferred: 14797880321 bytes remaining: 234505215 bytes total: 15032385536 bytes progression: 98.44 %
transferred: 14948204176 bytes remaining: 84181360 bytes total: 15032385536 bytes progression: 99.44 %
transferred: 15032385536 bytes remaining: 0 bytes total: 15032385536 bytes progression: 100.00 %
transferred: 15032385536 bytes remaining: 0 bytes total: 15032385536 bytes progression: 100.00 %
create full clone of drive scsi1 (NFS01:104/vm-104-disk-1.qcow2)
Formatting '/mnt/pve/NFS01/images/114/vm-114-disk-1.qcow2', fmt=qcow2 size=0 cluster_size=65536 preallocation=metadata lazy_refcounts=off refcount_bits=16
transferred: 0 bytes remaining: 0 bytes total: 0 bytes progression: 0.00 %
qemu-img: output file is smaller than input file
TASK ERROR: clone failed: copy failed: command '/usr/bin/qemu-img convert -p -n -f qcow2 -O qcow2 /mnt/pve/NFS01/images/104/vm-104-disk-1.qcow2 zeroinit:/mnt/pve/NFS01/images/114/vm-114-disk-1.qcow2' failed: exit code 1

Thomas Plant · Aug 29, 2019

Hello,

any news on this topic? Should I open a ticket directly with Proxmox Support?

Regards,
Thomas

tim · Aug 29, 2019

Sure you can, but I would prefer to go on here just to not have to look at two different places for answers.
Do you have a local storage where you can test this as well, just to rule out the NFS share and make things little bit easier.

I haven't been able to reproduce it yet, but I'm on it.

Thomas Plant · Aug 29, 2019

Hi,

no problem if your working on it, I wait. Sorry, not enough local storage to test, VM is to big...I will see if the same problem exists with a smaller one and test it on the NFS and then on local storage.

Thomas Plant · Aug 29, 2019

With a 10 GB VM, imported the same way from XenServer clone and resize works correctly.

tim · Aug 29, 2019

On local or nfs storage?

Thomas Plant · Aug 29, 2019

Thomas Plant · Aug 29, 2019

Interessanterweise nachdem die Disk beim Vergrößern zerstört wurde, funktioniert das Vergrößern der jetzt nutzlosen Disk.

Thomas Plant · Aug 29, 2019

Did another test.....Resizing with the VM turned off after cloning does work too. Only when I do an online resize it kills my disk....
Did the following: resized the disk with the shutdown VM which went ok. Then I started the VM and did another resize which reduced the size of the disk to the amount it should have increased it, destroying it.

Thomas Plant · Aug 29, 2019

Next Test: cloned the VM again, started it up, resized the disk but put the size I wanted the disk to become in the resize windows and voila, it was expanded to the right size and VM still works. And now the 'normal' way of increasing disk size works too, tried twice and all worked as it should.

tim · Aug 29, 2019

Thomas Plant said:
Next Test: cloned the VM again, started it up, resized the disk but put the size I wanted the disk to become in the resize windows and voila, it was expanded to the right size and VM still works. And now the 'normal' way of increasing disk size works too, tried twice and all worked as it should.

My first guess was, that qemu doesn't know about the correct size, that's why I asked for "info block -v drive-scsi1", but from the output you gave above, the size seems to be correct. Can you verify that "info block -v drive-scsi1" is the true size before the first time resizing?

Disk resize problem

Member

Attachments

Member

Attachments

Proxmox Staff Member

Member

Attachments

Member

Proxmox Staff Member

Member

Proxmox Staff Member

Member

Member

Member

Proxmox Staff Member

Member

Member

Proxmox Staff Member

Member

Member

Member

Member

Proxmox Staff Member

We value your privacy