LVM on ISCSI does not connect after reboot

holgihero

Member
Jan 13, 2010
79
0
6
After reboot of ISCSI target Proxmox servers can not connect to the lvm volumes any more.
ISCSI target itself is connected and active but when I try to activate the LVM it fails with :

Fehler: command '/sbin/vgchange -aly pve01' failed with exit code 5

No errors on the ISCSI target, logical volumes all active here:

root@ubu-iscsi01:~# pvscan
PV /dev/md0 VG pve01 lvm2 [232,88 GiB / 92,88 GiB free]
Total: 1 [232,88 GiB] / in use: 1 [232,88 GiB] / in no VG: 0 [0 ]
root@ubu-iscsi01:~# vgscan
Reading all physical volumes. This may take a while...
Found volume group "pve01" using metadata type lvm2
root@ubu-iscsi01:~# lvscan
ACTIVE '/dev/pve01/vm-130-disk-1' [32,00 GiB] inherit
ACTIVE '/dev/pve01/vm-300-disk-1' [40,00 GiB] inherit
ACTIVE '/dev/pve01/vm-102-disk-1' [8,00 GiB] inherit
ACTIVE Original '/dev/pve01/vm-109-disk-1' [6,00 GiB] inherit
ACTIVE Original '/dev/pve01/vm-109-disk-2' [32,00 GiB] inherit
ACTIVE '/dev/pve01/vm-300-disk-2' [20,00 GiB] inherit
ACTIVE Snapshot '/dev/pve01/vzsnap-proxmox-server-02-0' [1,00 GiB] inherit
ACTIVE Snapshot '/dev/pve01/vzsnap-proxmox-server-02-1' [1,00 GiB] inherit


Any suggestions?

Best regards, Holger
 
post the output of:

Code:
pveversion -v
 
Hi Tom,
this is pveversion output of my master (node is slightly newer, 2.6.32-4-pve):

pve-manager: 1.5-10 (pve-manager/1.5/4822)
running kernel: 2.6.32-2-pve
proxmox-ve-2.6.32: 1.5-7
pve-kernel-2.6.32-1-pve: 2.6.32-4
pve-kernel-2.6.32-2-pve: 2.6.32-7
pve-kernel-2.6.24-9-pve: 2.6.24-18
pve-kernel-2.6.24-8-pve: 2.6.24-16
qemu-server: 1.1-16
pve-firmware: 1.0-5
libpve-storage-perl: 1.0-13
vncterm: 0.9-2
vzctl: 3.0.23-1pve11
vzdump: 1.2-5
vzprocps: 2.0.11-1dso2
vzquota: 3.0.11-1
pve-qemu-kvm: 0.12.4-1
ksm-control-daemon: 1.0-3


While investigating further I found an error in the backup log of one of the VMs located on the ISCSI target
(I accidentially aimed the backup target to the local disk of a proxmox node):

Nov 22 02:05:41 INFO: adding '/dev/pve01/vzsnap-proxmox-server-02-1' to archive ('vm-disk-ide1.raw')
Nov 22 02:18:17 INFO: gzip: stdout: No space left on device
Nov 22 02:18:17 INFO: received signal - terminate process
Nov 22 02:18:18 INFO: /etc/lvm/cache/.cache.tmp: write error failed: No space left on device
Nov 22 02:18:18 INFO: /etc/lvm/cache/.cache.tmp: write error failed: No space left on device
Nov 22 02:18:18 INFO: /etc/lvm/cache/.cache.tmp: write error failed: No space left on device
Nov 22 02:18:18 INFO: setting parameters failed - close failed - No space left on device
Nov 22 02:18:18 INFO: Volume group "pve01" metadata archive failed.
Nov 22 02:18:18 INFO: /etc/lvm/cache/.cache.tmp: write error failed: No space left on device
Nov 22 02:18:18 ERROR: command 'lvremove -f '/dev/pve01/vzsnap-proxmox-server-02-0'' failed with exit code 5
Nov 22 02:18:18 INFO: Volume group "pve01" metadata archive failed.
Nov 22 02:18:18 INFO: /etc/lvm/cache/.cache.tmp: write error failed: No space left on device
Nov 22 02:18:18 ERROR: command 'lvremove -f '/dev/pve01/vzsnap-proxmox-server-02-1'' failed with exit code 5

so maybe metadata of vg pve01 is corrupt?

[EDIT]
No, it is not. pvck and vgck found no errors.
And I managed to successfully remove the snapshots (at command line of ISCSI target).
Still Error code 5 when trying to activate LVM group...
[/EDIT]

Thank you, Holger
 
Last edited:
Investigating further:
The volume group seems to be ok, i found an error while starting the ISCSI target:

[ 3874.325758] iSCSI Enterprise Target Software - version 1.4.19
[ 3874.326966] iscsi_trgt: Registered io type fileio
[ 3874.326973] iscsi_trgt: Registered io type blockio
[ 3874.326978] iscsi_trgt: Registered io type nullio
[ 3874.337712] iscsi_trgt: blockio_open_path(167) Can't open device /dev/md0, error -16
[ 3874.338002] iscsi_trgt: blockio_attach(355) Error attaching Lun 0 to Target iqn.2010-10.de.xxxxxxxxxx:store.01

Reason is that the volume group is activated locally first, so ISCSI target daemon can not access it.

So the workaround is:
Stop ISCSI target daemon -> /etc/init.d/iscsitarget stop
Deactivate lvgroup locally -> vgchange -an vgname
Start ISCSI target daemon -> /etc/init.d/iscsitarget start

To put that into rc.local seems to be crude, any suggestion to set this issue in an elegant way?

Best regards, Holger
 
After all this (faulty) behaviour was caused by a test script I once (no idea for what reason) did put in rc.local.
It deactivated hte iscsi daemon, activated the lv groups and started the iscsi daemon ... which leads straight away into this error.

Strange enough I 'deactivated' that piece of code by putting an 'exit 0' in front of it. Which did not exit the rc.local as desired (anybody: Why?)...

So this was misconfiguration on my side. Proxmox is running perfect!

Thank you for your patience, Holger
 
fine! thanks for feedback.
 
Hi Holger,

This is a known problem with openfiler 2.3 and XEN (which does the same thing as KVM) is that what you are using as your iSCSI server?

If so the permanent fix is to do the following:

i. Comment these lines out in /etc/rc.sysinit - Lines 333 to 337.

Code:
# if [ -x /sbin/lvm.static ]; then
# if /sbin/lvm.static vgscan --mknodes --ignorelockingfailure > /dev/null 2>&1 ; then
# action $"Setting up Logical Volume Management:" /sbin/lvm.static vgchange -a y --ignorelockingfailure
# fi
# fi

ii. Turn off aoe (ATA over Ethernet) service from autostart by running:
chkconfig aoe off

This stops the activation of volume groups that aren't being used locally. The issue is that openfiler 2.3 has a bug where it cannot map to LUN if it has local volume groups activated inside it.
 
Hi Holger,

This is a known problem with openfiler 2.3 and XEN (which does the same thing as KVM) is that what you are using as your iSCSI server?

If so the permanent fix is to do the following:

i. Comment these lines out in /etc/rc.sysinit - Lines 333 to 337.

Code:
# if [ -x /sbin/lvm.static ]; then
# if /sbin/lvm.static vgscan --mknodes --ignorelockingfailure > /dev/null 2>&1 ; then
# action $"Setting up Logical Volume Management:" /sbin/lvm.static vgchange -a y --ignorelockingfailure
# fi
# fi

ii. Turn off aoe (ATA over Ethernet) service from autostart by running:
chkconfig aoe off

This stops the activation of volume groups that aren't being used locally. The issue is that openfiler 2.3 has a bug where it cannot map to LUN if it has local volume groups activated inside it.


Thank you so much for this information i have been pulling my hair out over this!! and now everything works beautifully!
 
Sorry if this is obvious to everyone else, but do you do the above fix on openfiler or proxmox?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!