lvm-disk not accessible after update to 1.8

udo

Distinguished Member
Apr 22, 2009
5,981
204
163
Ahrensburg; Germany
Hi,
i make an upgrade from 1.7 to 1.8 on one server and after that, most of the VMs can't use they disks!
All disks are on local lvm-storage.

It's looks like the same issue like here http://forum.proxmox.com/threads/6111-Boot-Failure?p=34622#post34622

Guest Error:
No boot disk...
or disk not available.

If i copy the disk to local storage (dd if=/dev/localsata/vm-102-disk-1 of=vm-102-disk-1.raw bs=1024k) and use this as disk-image the guest boots fine!
1. So, the data aren't corrupt!
2. The data are accessible!

But where are the problem?
Code:
lvs
  LV            VG        Attr   LSize   Origin Snap%  Move Log Copy%  Convert
  vm-101-disk-1 localsata -wi-ao 512,00M                                      
  vm-102-disk-1 localsata -wi-ao  35,00G                                      
  vm-103-disk-1 localsata -wi-ao   7,00G                                      
  vm-103-disk-2 localsata -wi-ao   1,56T                                      
  vm-103-disk-3 localsata -wi-ao  45,00G                                      
  vm-104-disk-1 localsata -wi-ao  32,00G                                      
  vm-105-disk-1 localsata -wi-a-   7,00G                                      
  vm-105-disk-2 localsata -wi-ao 110,00G                                      
  vm-105-disk-3 localsata -wi-a-  64,00G                                      
  vm-107-disk-1 localsata -wi-a-  32,00G                                      
  vm-108-disk-1 localsata -wi-a-  71,00G                                      
  vm-109-disk-1 localsata -wi-a-  22,00G                                      
  vm-110-disk-1 localsata -wi-a- 162,00G                                      
  vm-111-disk-1 localsata -wi-a-  41,00G                                      
  vm-113-disk-1 localsata -wi-ao 201,00G                                      
  vm-113-disk-2 localsata -wi-ao  81,00G                                      
  data          pve       -wi-ao  85,00G                                      
  root          pve       -wi-ao  40,00G                                      
  swap          pve       -wc-ao   8,00G
to deactivate and activate the lv doesn't help.

Unfortunately i don't have enough local space for all disks. If i don't find fast a solution i must go back to pve1.7.

Code:
pveversion -v
pve-manager: 1.8-15 (pve-manager/1.8/5754)
running kernel: 2.6.35-1-pve
proxmox-ve-2.6.35: 1.8-10
pve-kernel-2.6.32-4-pve: 2.6.32-32
pve-kernel-2.6.35-1-pve: 2.6.35-10
qemu-server: 1.1-30
pve-firmware: 1.0-11
libpve-storage-perl: 1.0-17
vncterm: 0.9-2
vzctl: 3.0.24-1pve4
vzdump: 1.2-11
vzprocps: 2.0.11-2
vzquota: 3.0.11-1
pve-qemu-kvm: 0.14.0-3
ksm-control-daemon: 1.0-5

Udo
 
Hi Udo,

I don't know if I can help you, but perhaps can make things clearer. You said first 'All disks are on local lvm-storage', then 'If i copy the disk to local storage...'. Are your images on an external SATA disks attached on an eSATA port for example ?

There can be a problem in the way the storage is recognized with the newer kernel (even if it is still an 2.6.35...). Just a guess...

Alain
 
Last edited:
Hi Udo,

I don't know if I can help you, but perhaps can make things clearer. You said first 'All disks are on local lvm-storage', then 'If i copy the disk to local storage...'. Are your images on an external SATA disks attached on an eSATA port for example ?

There can be a problem in the way the storage is recognized with the newer kernel (even if it is still an 2.6.35...). Just a guess...

Alain
Hi Alain,
with "local lvm-storage" i mean local raid-disk (like pve) but as lvm-storage ( all disks are internal disks on one raidcontroller):
Code:
pvdisplay
  --- Physical volume ---
  PV Name               /dev/sda3
  VG Name               localsata
  PV Size               1,68 TB / not usable 1,07 MB
  Allocatable           yes 
  PE Size (KByte)       4096
  Total PE              440946
  Free PE               29426
  Allocated PE          411520
  PV UUID               76sxq1-qJjM-0OVt-f6Hc-A0Wh-HUui-GLV7em
   
  --- Physical volume ---
  PV Name               /dev/sdb1
  VG Name               localsata
  PV Size               3,27 TB / not usable 3,00 MB
  Allocatable           yes 
  PE Size (KByte)       4096
  Total PE              858304
  Free PE               582592
  Allocated PE          275712
  PV UUID               g9Kss5-kEHL-7Ysd-NH2f-ePtg-EtmT-MU5X1w
   
  --- Physical volume ---
  PV Name               /dev/sda2
  VG Name               pve
  PV Size               139,70 GB / not usable 3,15 MB
  Allocatable           yes 
  PE Size (KByte)       4096
  Total PE              35762
  Free PE               1714
  Allocated PE          34048
  PV UUID               pWHycZ-IMqA-L0Ze-1J2b-ownM-S9n6-Yre0CW
I just create new lvm-disk for one VM and copy the content back - perhaps it's work (i know more in some minutes).

Udo
 
...I just create new lvm-disk for one VM and copy the content back - perhaps it's work (i know more in some minutes).
Bad luck - the same issue - the client say "Buffer IO Error on device vdc"

Hmm buffer io??? I just switch back from "cache=none" and IT WORKS!!!!

Don't know why, but with this the machine work fine:
Code:
name: web-srv
ide2: cdrom,media=cdrom
vlan90: e1000=1E:E8:3F:CF:29:D3
bootdisk: virtio0
ostype: l26
memory: 1024
sockets: 1
onboot: 1
cores: 1
boot: cad
freeze: 0
cpuunits: 1000
acpi: 1
kvm: 1
virtio0: localsata:vm-105-disk-1,cache=writethrough
virtio1: localsata:vm-105-disk-2,cache=writethrough
virtio2: localsata:vm-105-disk-3,cache=writethrough

I will change the other VMs too.

Udo
 
Hi Udo,

Good you have found the problem. Concerning I/O slowness, I read this link recently, for RHEL 6, which may be of interest in this context too :
http://www.ilsistemista.net/index.php/virtualization/11-kvm-io-slowness-on-rhel-6.html

It seems that in latest kvm/qemu the default for image file caching is write through and it should be none...

Alain
Hi Alain,
in pve 1.8 is cache=none the default. This is for some cases faster, but for this one server it's looks not so good :-(

The speed looks ok with cache=writethrough. This are the test from inside the VM:
Code:
# dd if=/dev/zero of=bigfile bs=1024k count=8192 conv=fdatasync
8192+0 records in
8192+0 records out
8589934592 bytes (8.6 GB) copied, 105.711 s, [B]81.3 MB/s[/B]

# echo 3 > /proc/sys/vm/drop_caches
# dd if=bigfile of=/dev/null bs=1024k
8192+0 records in
8192+0 records out
8589934592 bytes (8.6 GB) copied, 27.2377 s, [B]315 MB/s[/B]
This values are ok.

Udo
 
...
Is there a fast/easy way to create a VM which shows that behavior?
Hi Dietmar,
unfortunalety not. I have moved the image of one VM to an other hosts - there boot the VM without problems.

Both hosts 2.6.35, an (different) areca raid-controller but on the host, where cache=none works, the lvm-storage is ontop of drbd.
The cpu is also different (opteron 4170HE + opteron 6136)...
And the pve-version on the working host is two weeks old - differences:
libpve-storage-perl: 1.0-16
vzprocps: 2.0.11-1dso2
pve-qemu-kvm: 0.14.0-2

I will try to move the VM to a testmachine with the same pve-version.

Udo