Problem with 4.0, Kernel and iSCSI backend

christian_ws

New Member
Nov 24, 2015
5
0
1
Hello, we are using Proxmox for years now. At the moment we are upgrading our servers to 4.0.
For this we reinstall every node and recreate the cluster.


In our office we are using a iSCSI storage system, it's an IBM DS3524 storage machine. With proxmox 3.4 we have no problems and all is working fine.


With 4.0 the problem with iSCSI began. We can connect and login into the iSCSI target, but if you write data to it the whole server hang. We can read without problem.
I searched in google and found a bugreport from debian sid: 805252


So i started to compile more and more kernels to see where the problem come from (all are tested on the proxmox 4.0 host!):
With 3.18.21, there is no problem, i can read and write much as i want.
With 3.18.22, the problem is there, if i write data all hang and i have to hard reset the server.


Many kernel compile later i found out that the following patch is the problem:
Code:
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index ce382e8..6d931d5 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -2812,9 +2812,9 @@ static int sd_revalidate_disk(struct gendisk *disk)
     max_xfer = sdkp->max_xfer_blocks;
     max_xfer <<= ilog2(sdp->sector_size) - 9;
 
-    max_xfer = min_not_zero(queue_max_hw_sectors(sdkp->disk->queue),
-                max_xfer);
-    blk_queue_max_hw_sectors(sdkp->disk->queue, max_xfer);
+    sdkp->disk->queue->limits.max_sectors =
+        min_not_zero(queue_max_hw_sectors(sdkp->disk->queue), max_xfer);
+
     set_capacity(disk, sdkp->capacity);
     sd_config_write_same(sdkp);
     kfree(buffer);


If i compile the kernel 3.18.21 and add this code, i can not write to the iscsi device.
So i tried to recompile the pve kernel with this patch reverted, but this doesn't solve the problem. I can compile a new pve kernel, but the problem still exist.


I also found a thread here with a similar problem: [thread]24748[/thread]


So, if anyone can help me with this, or any other ideas what i can try to do, please give me a hint.
If you need further data i can provide it here.


Thank you all.
Best regards


Christian
 
Last edited:
Hello,

just tried to test some other vanilla kernels...

Version 4.2(.0) does not work, here is the same problem. I tried also to backport the commit i found out in 3.18 but it doesn't work.
Version 4.1.6 does also not work, the commit above is not yet implemented in this version so it must be another commit...

Now i try to compile version 4.1(.0) to see if it happens here...

Any suggentions (also from proxmox team, we have valid community subscriptions for 5 servers...) what we can also try?

Regards

Christian
 
Hi,

sorry for the late response, i called sick and the whole project stuck...

I does a few more research on this. Finally in kernel 4.4rc5 there was the following patchset:

https://git.kernel.org/cgit/linux/k...c?id=ca369d51b3e1649be4a72addd6d6a168cfb3f537

I backported this patch to the actual 4.2.6 proxmox kernel (from the git) and compiled it.
With this patches applied i have no more problems on connecting and writing to my iscsi device.

So for your question, there was a problem with writing to the whole device without any vm started. If a vm was started the whole vm gets corrupted because no data can be written.

I attached my patch, could you please include it in the current proxmox kernel? That would be very nice.

I think that the problem in Thread #17999 can be the same issue.

Thank you very much and best regards.
 

Attachments

  • scsi_write.patch.txt
    5.6 KB · Views: 10
Any news on this? It would be very nice if you can include this patch!
Or, if you want, i can push it via git into your repo?

Thanks very much.
 
The patch referenced above (as well as two follow-up commits) is already included in our 4.4 kernel.
 
The patch referenced above (as well as two follow-up commits) is already included in our 4.4 kernel.
Oh, well any ideas why I might be suffering exactly the same symptoms? I followed the iSCSI MPIO guide to a tee and this is not the first time I have configured iSCSI with MPIO anyway... I had it working on PVE 3.3 absolutely fine.

Logs are full of this for any LUN I use:

Jul 28 09:00:17 apollo01 kernel: [ 1075.317537] sd 8:0:0:4: [sdp] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Jul 28 09:00:17 apollo01 kernel: [ 1075.317551] sd 8:0:0:4: [sdp] tag#0 Sense Key : Illegal Request [current]
Jul 28 09:00:17 apollo01 kernel: [ 1075.317555] sd 8:0:0:4: [sdp] tag#0 Add. Sense: Invalid field in cdb
Jul 28 09:00:17 apollo01 kernel: [ 1075.317559] sd 8:0:0:4: [sdp] tag#0 CDB: Write(10) 2a 00 00 00 10 00 00 20 10 00
Jul 28 09:00:17 apollo01 kernel: [ 1075.319230] sd 8:0:0:4: [sdp] tag#24 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Jul 28 09:00:17 apollo01 kernel: [ 1075.319237] sd 8:0:0:4: [sdp] tag#24 Sense Key : Illegal Request [current]
Jul 28 09:00:17 apollo01 kernel: [ 1075.319240] sd 8:0:0:4: [sdp] tag#24 Add. Sense: Invalid field in cdb
Jul 28 09:00:17 apollo01 kernel: [ 1075.319243] sd 8:0:0:4: [sdp] tag#24 CDB: Write(10) 2a 00 02 c4 10 00 00 3b 10 00
 
Last edited:
as far as I understand the referenced discussions/bug reports, the error is actually on the iSCSI target side (i.e., the target reports limits to the initiator that it does not actually support). in newer Linux kernel versions, the initiator started to actually honor those reported limits.

what kind of iSCSI target / vendor are you using? maybe you can configure those limits on the target side? do the errors only occur when writing more than X bytes (you could try with progressively larger write sizes)?
 
as far as I understand the referenced discussions/bug reports, the error is actually on the iSCSI target side (i.e., the target reports limits to the initiator that it does not actually support). in newer Linux kernel versions, the initiator started to actually honor those reported limits.

what kind of iSCSI target / vendor are you using? maybe you can configure those limits on the target side? do the errors only occur when writing more than X bytes (you could try with progressively larger write sizes)?

I am using a HP P2000 G3 10Gbit iSCSI SAN. This same unit worked fine on PVE 3.3.

I don't believe that I can configure those limits.

The errors occur as soon as a VM tries to partition the disk upon OS installation.
 
it seems like the workaround from that thread should work then? according to the discussion you linked, this is still something that the ISCSI target claims to support (but apparently does not). so the bug is actually on the target side, not on the initiator side.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!