Storage model, iSCSI and lock problem

alain

Renowned Member
May 17, 2009
223
2
83
France/Paris
Hi all,

I am currently testing iSCSI with Proxmox VE 1.5 and kernel 2.6.32.

I followed the wiki 'Use iSCSI LUN for LVM base', to add a LVM group on an iSCSI target. It succeded and I indeed see a storage, iscsi-test, type LVM, with a capacity of 929 GB.

Then I tried to install a VM (KVM) on this storage, a CentOS 5.4 64 bits, but it crashed after package selection. It will be perhaps the subject of another thread.

Then I tried an Ubuntu Server 8.04 64 bits, and this one succeded. I had network and was able to connect via ssh.

Next test was to try to migrate the VM to another server connected to the iSCSI storage, but it complained about missing open-issci package. So I installed it, and discovered the targets :
# apt-get install open-iscsi

hertz:~# iscsiadm -m discovery -t sendtargets -p 192.168.2.1
192.168.2.1:3260,1 iqn.2010-1.fr.upmc.lpp:RAID.iscsi0.vg0.isgi
192.168.2.1:3260,1 iqn.2010-1.fr.upmc.lpp:RAID.iscsi1.vg0.vmtest

First question : why the open-iscsi package is not installed by default on the slave node, as it is on the master node ? An iSCSI target is meant to be shared...

Then live migration succeedded within a few seconds (7). But I was unable to connect by ssh. I migrated it back to the original server, but was still unable to connect via ssh, and more problematic, via VNC interface.

I tried to restart the VM (109) but it stayed in the state 'restarting' and after a few minutes, if it was visible in the web interface, I coulde no more interact with this machine, nor edit his settings. So I can not even remove the machine.

With command line, I get :
srv-kvm1:~# qm shutdown 109
trying to aquire lock...got timeout

On the other server, I get :
hertz:~# qm shutdown 109
unable to read config for VM 109

So I think it is a lock problem.

How I can fix it ?

Here is my configuration :
srv-kvm1:~# pveversion -v
pve-manager: 1.5-5 (pve-manager/1.5/4627)
running kernel: 2.6.32-1-pve
proxmox-ve-2.6.32: 1.5-4
pve-kernel-2.6.32-1-pve: 2.6.32-4
pve-kernel-2.6.24-9-pve: 2.6.24-18
pve-kernel-2.6.24-8-pve: 2.6.24-16
qemu-server: 1.1-11
pve-firmware: 1.0-3
libpve-storage-perl: 1.0-8
vncterm: 0.9-2
vzctl: 3.0.23-1pve8
vzdump: 1.2-5
vzprocps: 2.0.11-1dso2
vzquota: 3.0.11-1
pve-qemu-kvm: 0.11.1-2
ksm-control-daemon: 1.0-2

Thanks for your help
 
Hi,
do you use for the seond node the same (actual) install-iso?

Live-Migration not work with all clients - search in this forum.
Do you use more than one cpu? With linux and one cpu it should work.

Windows stop's at my configuration also with one cpu. I must switch of the VM - restart is not enough.

Udo
 
Udo,

Thanks for your answer.

In fact, I upgraded all nodes from 1.4 to 1.5. If I remember well, second node was installed with 1.3 (1.3 -> 1.4 -> 1.5).
hertz:~# pveversion -v
pve-manager: 1.5-5 (pve-manager/1.5/4627)
running kernel: 2.6.32-1-pve
proxmox-ve-2.6.32: 1.5-4
pve-kernel-2.6.32-1-pve: 2.6.32-4
pve-kernel-2.6.24-7-pve: 2.6.24-11
pve-kernel-2.6.24-9-pve: 2.6.24-18
pve-kernel-2.6.24-8-pve: 2.6.24-16
qemu-server: 1.1-11
pve-firmware: 1.0-3
libpve-storage-perl: 1.0-8
vncterm: 0.9-2
vzctl: 3.0.23-1pve8
vzdump: 1.2-5
vzprocps: 2.0.11-1dso2
vzquota: 3.0.11-1
pve-qemu-kvm: 0.11.1-2
ksm-control-daemon: 1.0-2


For this test, the VM is using only one CPU, so it should not be a problem. And it is Linux...

How to you switch off the VM ?

Alain
 
Finally, I recovered my lost VM by rebooting the master.

I tried then another live migration from master to slave node. It works and the network for the VM is OK, I can ssh to it.
But when I migrate back the VM to the master, the problem of 'lost VM' reappears. The VM seems to run but I am unable to connect to it in any way.

I discovered that it seems I have some LVM problem on slave :
hertz:/etc/network# lvdisplay
/dev/dm-3: read failed after 0 of 4096 at 0: Input/output error
/dev/dm-4: read failed after 0 of 4096 at 0: Input/output error
--- Logical volume ---
LV Name /dev/pve/swap
VG Name pve
LV UUID SllZzf-8CpU-e7e7-ryxs-ZbNZ-4Btp-gvbitE
LV Write Access read/write
LV Status available
# open 1
LV Size 7.00 GB
Current LE 1792
Segments 1
Allocation inherit
Read ahead sectors auto
- currently set to 256
Block device 252:0
....

Perhaps it is the root of the problem. What could be the cause of these read errors ?

Alain
 
First question : why the open-iscsi package is not installed by default on the slave node

A new install will automatically install the open-iscsi package. You updated from an older version. We decided to keep package dependencies at minimum, and that is why open-iscsi is not installed on update.
 
A new install will automatically install the open-iscsi package. You updated from an older version. We decided to keep package dependencies at minimum, and that is why open-iscsi is not installed on update.

Thanks for the explanation, it is good to know, as the option is shown on web interface, and you would think the package is installed...

I tried once again to live migrate my test VM from master to host, it's OK, and then back to the master. This time, network was working, I was able to connect to the VM, but the file system was read only.

In the messages on the web interface, during live migration, I saw alerts similar to the previous I mentioned :
/dev/dm-3: read failed after 0 of 4096 at 0: Input/output error
/dev/dm-4: read failed after 0 of 4096 at 0: Input/output error
(I could not copy the text, due to refresh interval I guess).

In other threads, I read it could be related to stalled snapshots. Is this right, and in this case, how can I found and remove theses stalled snapshots ?

Alain
 
In other threads, I read it could be related to stalled snapshots. Is this right,

Are there any stalled snapshots? Use

# lvs

and in this case, how can I found and remove theses stalled snapshots ?

# lvremove

use the manual pages to get more info about those command.
 
Are there any stalled snapshots? Use

# lvs

No, it does not seem I have stale snapshots :
hertz:~# lvs
/dev/dm-3: read failed after 0 of 4096 at 0: Input/output error
/dev/dm-4: read failed after 0 of 4096 at 0: Input/output error
LV VG Attr LSize Origin Snap% Move Log Copy% Convert
data pve -wi-ao 357.00G
root pve -wi-ao 96.00G
swap pve -wi-ao 7.00G
vm-108-disk-1 pve-iscsi -wi-a- 20.00G
vm-109-disk-1 pve-iscsi -wi-a- 20.00G

I have still the same errors, but is it related to the live migration problem (there no such errors on the master) I see ? I read that /dev/dm-* were device mappers, so perhaps something uncorrectly unmounted ?

Alain
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!