Deleting Large LVM disks with 4.15.18-20-pve fails

adamb

Famous Member
Mar 1, 2012
1,326
77
113
Hitting some odd issues with deleting larger LVM disks on 4.15.18-20-pve 5.4-13 .

Our setup is HP DL 380 Gen10 front ends, nimble iscsi storage with LVM on top.

Deleting VM disks over 2TB fails on 4.15.18-20-pve and causes the host to basically loose access to the storage. Only way to get the host back is to power it off and power it back on.

Testing 4.15.18-18 and I can't reproduce the issue at all. Is anyone aware of this issue, or should I get a bug report going?
 
Last edited:
causes the host to basically loose access to the storage
what do you mean with this? anything in dmesg/journal? what do the lvs tools (lvs/vgs/pvs) display?

does it work when using lvremove directly ? (if yes, how long does it take?)

if you can reliably trigger this, a bug report would probably be better https://bugzilla.proxmox.com/
 
what do you mean with this? anything in dmesg/journal? what do the lvs tools (lvs/vgs/pvs) display?

does it work when using lvremove directly ? (if yes, how long does it take?)



if you can reliably trigger this, a bug report would probably be better https://bugzilla.proxmox.com/

- Yep, lots of iscsi related errors.

Sep 3 11:54:11 testprox1 kernel: [ 267.285924] sd 3:0:0:0: Power-on or device reset occurred
Sep 3 11:54:11 testprox1 kernel: [ 267.285927] connection3:0: detected conn error (1008)
Sep 3 11:54:12 testprox1 kernel: [ 267.294631] connection4:0: detected conn error (1008)
Sep 3 11:54:12 testprox1 kernel: [ 267.294667] sd 4:0:0:0: Power-on or device reset occurred
Sep 3 11:54:14 testprox1 kernel: [ 269.301731] connection4:0: detected conn error (1020)
Sep 3 11:54:14 testprox1 kernel: [ 269.302539] sd 3:0:0:0: Power-on or device reset occurred
Sep 3 11:54:14 testprox1 kernel: [ 269.302562] connection3:0: detected conn error (1008)
Sep 3 11:54:14 testprox1 kernel: [ 269.306044] connection4:0: detected conn error (1008)
Sep 3 11:54:14 testprox1 kernel: [ 269.306094] sd 4:0:0:0: Power-on or device reset occurred
Sep 3 11:54:16 testprox1 kernel: [ 271.310392] connection4:0: detected conn error (1020)
Sep 3 11:54:16 testprox1 kernel: [ 271.313063] connection3:0: detected conn error (1020)
Sep 3 11:54:16 testprox1 kernel: [ 271.314618] sd 4:0:0:0: Power-on or device reset occurred
Sep 3 11:54:16 testprox1 kernel: [ 271.314628] connection4:0: detected conn error (1008)
Sep 3 11:54:16 testprox1 kernel: [ 271.314717] scsi_io_completion: 174 callbacks suppressed
Sep 3 11:54:16 testprox1 kernel: [ 271.314723] sd 4:0:0:0: [sde] tag#1 FAILED Result: hostbyte=DID_TRANSPORT_DISRUPTED driverbyte=DRIVER_OK
Sep 3 11:54:16 testprox1 kernel: [ 271.314729] sd 4:0:0:0: [sde] tag#1 CDB: Unmap/Read sub-channel 42 00 00 00 00 00 00 00 18 00
Sep 3 11:54:16 testprox1 kernel: [ 271.314731] print_req_error: 174 callbacks suppressed
Sep 3 11:54:16 testprox1 kernel: [ 271.314977] sd 4:0:0:0: [sde] tag#2 FAILED Result: hostbyte=DID_TRANSPORT_DISRUPTED driverbyte=DRIVER_OK
Sep 3 11:54:16 testprox1 kernel: [ 271.314993] sd 4:0:0:0: [sde] tag#2 CDB: Unmap/Read sub-channel 42 00 00 00 00 00 00 00 18 00
Sep 3 11:54:16 testprox1 kernel: [ 271.315074] sd 4:0:0:0: [sde] tag#3 FAILED Result: hostbyte=DID_TRANSPORT_DISRUPTED driverbyte=DRIVER_OK
Sep 3 11:54:16 testprox1 kernel: [ 271.315076] sd 4:0:0:0: [sde] tag#3 CDB: Unmap/Read sub-channel 42 00 00 00 00 00 00 00 18 00
Sep 3 11:54:16 testprox1 kernel: [ 271.315148] sd 4:0:0:0: [sde] tag#4 FAILED Result: hostbyte=DID_TRANSPORT_DISRUPTED driverbyte=DRIVER_OK
Sep 3 11:54:16 testprox1 kernel: [ 271.315149] sd 4:0:0:0: [sde] tag#4 CDB: Unmap/Read sub-channel 42 00 00 00 00 00 00 00 18 00
Sep 3 11:54:16 testprox1 kernel: [ 271.315219] sd 4:0:0:0: [sde] tag#5 FAILED Result: hostbyte=DID_TRANSPORT_DISRUPTED driverbyte=DRIVER_OK
Sep 3 11:54:16 testprox1 kernel: [ 271.315220] sd 4:0:0:0: [sde] tag#5 CDB: Unmap/Read sub-channel 42 00 00 00 00 00 00 00 18 00
Sep 3 11:54:16 testprox1 kernel: [ 271.315288] sd 4:0:0:0: [sde] tag#6 FAILED Result: hostbyte=DID_TRANSPORT_DISRUPTED driverbyte=DRIVER_OK

- Yep it happens when manually removing as well. The lvremove command hangs and the same errors as about are a outputted.

https://bugzilla.proxmox.com/show_bug.cgi?id=2354

Its 100% reproducible every time for me. But 100% solid on the older kernels.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!