iSCSI / Hardware.htm : Error in Perl code: 500 read timeout

Clement

New Member
Mar 12, 2010
8
0
1
Hello,

I have a little problem to acces the hardware tab when my ISCSI SAN is overloaded :

Internal Server Error
The server encountered an internal error or misconfiguration and was unable to complete your request. Please contact the server administrator, root and inform them of the time the error occurred, and anything you might have done that may have caused the error.
[6044]ERR: 24: Error in Perl code: 500 read timeout

This "Bug" doesn't appears when my iSCSI SAN is free (but the tab time load is long - 5/6 sec).

I have "normals" errors (md-multipath doesn't works perfectly with my SAN model) :
sci-virt1:~# pvesm list
/dev/sdc: read failed after 0 of 4096 at 0: Input/output error
/dev/sdf: read failed after 0 of 4096 at 0: Input/output error
/dev/sdc: read failed after 0 of 4096 at 0: Input/output error
/dev/sdf: read failed after 0 of 4096 at 0: Input/output error
/dev/sdc: read failed after 0 of 4096 at 0: Input/output error
/dev/sdf: read failed after 0 of 4096 at 0: Input/output error
Backup nfs 0 1 8052162560 2586277888 32%
ISO nfs 0 1 8052162560 2586277888 32%
VMSTORE lvm 0 1 1317068800 293134336 22%
local dir 0 1 601340840 12497672 2%
san1 iscsi 0 1 0 0 100%
(this command takes 5 seconds to print the results)

Is there a solution or a timeout variable I can enlarge?

Thanks in advance.

Clément.
 
I have no solution for now but this is a known issue and will be fixed in one of the upcoming versions (2.x).
 
Hello,

This is really a pain for me.
Perhaps do I have to much storage volumes ?

I choose to have one LVM Volume Group for each VM, I have 6 nodes and ~80 VMs.

It is really hard to reach the Hardware Tab of a VM.

As it seems to be a timeout from LWP Perl module, there's certainly something to do to specify a longer timeout...

I haven't really found where the commands are launched (certainly pvesm list, via ssh ?).
 
I choose to have one LVM Volume Group for each VM, I have 6 nodes and ~80 VMs.

why don t you use one VG for all VMs?

As it seems to be a timeout from LWP Perl module, there's certainly something to do to specify a longer timeout...
I haven't really found where the commands are launched (certainly pvesm list, via ssh ?).

see lib/PVE/ConfigServer.pm ("sub connect")
 
why don t you use one VG for all VMs?

Because I want to migrate One VM at a time.
I dont' think you can activate 2 LV of the same VG on 2 nodes ?
I haven't tried, but generally, cluster software (like heartbeat, redhat cluster suite, HP serviceguard...) need a vgchange -a y (or e for exclusive on HP-UX) for the ressource on the node.
If you planned to use clvm, corrosync... I think the lock granularity for LVM is VG.
 
Because I want to migrate One VM at a time.
I dont' think you can activate 2 LV of the same VG on 2 nodes ?

Why not? At least it works perfectly here.

I haven't tried, but generally, cluster software (like heartbeat, redhat cluster suite, HP serviceguard...) need a vgchange -a y (or e for exclusive on HP-UX) for the ressource on the node.
If you planned to use clvm, corrosync... I think the lock granularity for LVM is VG.

I don't think so - you can lock individual LVs - or do you have some evidence on that?
 
I confess I have never tried.
I was certainly biased by HP ServiceGuard, which needs a VG by "package" (cluster ressource).
Your right, the man page for lvchange shows that you can lock individual LV.

Can you confirm that in the futur 2.x release, HA for KVM will work for VMs sharing a VG ?

PS:
Another point for me to use a VG by VM, is because I use a LUN by VM and I can choose by VM if the SLA needs replication between our 2 Array or not.
If I had licenses, I can also use snapshots for individual VMs...
 
Can you confirm that in the futur 2.x release, HA for KVM will work for VMs sharing a VG ?

That is the plan, yes.

Another point for me to use a VG by VM, is because I use a LUN by VM and I can choose by VM if the SLA needs replication between our 2 Array or not.
If I had licenses, I can also use snapshots for individual VMs...

Well, those a valid arguments. But how can I scan 80 VGs in reasonable time?
 
In fact, I don't see why it is so long.

As I can see, you launch "/sbin/vgs --separator : --noheadings --units k --unbuffered --nosuffix --options vg_name,vg_size,vg_free" on the host running the VM.
Via pvedaemon worker.

If I run that command, it is quit instantaneous (2-3s).
Even If I do a loop across all my 6 hosts via ssh.

If I monitor with ps -ef | grep vgs my host when I hit the hardwar tab, I see the vgs command for much longer...

Is there any overhead with your tunnels ?
Is it possible that the pvedaemon workers don't get enough priority (I start my nodes with elevator=deadline) ?

Any way, it seems that the timeout is around 10s. I still haven't found where I can set it, and I thing 30s would be sufficient for me.
I continue to dig in your scripts.
 
I think I have found the timeout, I change from 10 to 30 here :

--- ConfigServer.pm.ori 2010-08-18 19:11:39.000000000 +0200
+++ ConfigServer.pm 2010-08-18 19:08:23.000000000 +0200
@@ -1946,7 +1946,7 @@
die "no ticket specified" if !$ticket;

# set longet timeout for local connection
- my $timeout = $cid ? 10 : 120;
+ my $timeout = $cid ? 30 : 120;

my $port = $soapport;


I saw another think : the vgs command is run at least 3 times, perhaps some cache can greatly improve things.
But also, I can understand that you work on the 2.x version which perhaps handle differently all these things.
 
If I run that command, it is quit instantaneous (2-3s).
Even If I do a loop across all my 6 hosts via ssh.

ok, that seems fast enough.

If I monitor with ps -ef | grep vgs my host when I hit the hardwar tab, I see the vgs command for much longer...

Is there any overhead with your tunnels ?
Is it possible that the pvedaemon workers don't get enough priority (I start my nodes with elevator=deadline) ?

no and no
 
Hello,

This patch does'nt work for me, I still have the problem.

Thanks.

I think I have found the timeout, I change from 10 to 30 here :

--- ConfigServer.pm.ori 2010-08-18 19:11:39.000000000 +0200
+++ ConfigServer.pm 2010-08-18 19:08:23.000000000 +0200
@@ -1946,7 +1946,7 @@
die "no ticket specified" if !$ticket;

# set longet timeout for local connection
- my $timeout = $cid ? 10 : 120;
+ my $timeout = $cid ? 30 : 120;

my $port = $soapport;


I saw another think : the vgs command is run at least 3 times, perhaps some cache can greatly improve things.
But also, I can understand that you work on the 2.x version which perhaps handle differently all these things.
 
Does it wait longer before the error ?
If no, you hit another timeout, perhaps iSCSI and LVM have to separate sections for that.
 
If you have a shared storage, monitor the vgs process on the node where you look at the hardware tab.
I use a simple loop like :
while true; do ps -ef | grep vgs; sleep 1; done

But if you haven't got enough devices, it will probably be too fast to be ssen.

Because, not only have I 80 VGs, but there is multipathing under the hood, with 8 paths by LUN.
So pvs, vgs, vgscan are scanning 136 devices !
# ll /dev/sd* | wc -l
136
 
If you have a shared storage, monitor the vgs process on the node where you look at the hardware tab.
I use a simple loop like :
while true; do ps -ef | grep vgs; sleep 1; done

But if you haven't got enough devices, it will probably be too fast to be ssen.

Because, not only have I 80 VGs, but there is multipathing under the hood, with 8 paths by LUN.
So pvs, vgs, vgscan are scanning 136 devices !
# ll /dev/sd* | wc -l
136

Only 11 devices for me actually.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!