Drives just disconnects then have a new names

vooze

Member
May 11, 2017
77
20
8
34
Hi

I have a bit of a problem today, with two drives (sda + sdd). I got an email about smart failing because "no such device" I got worried so had to login from work, during zpool status it was all good so I put my mind to ease, then when I got home, I checked the logs.

Code:
May 11 05:40:13 pve kernel: [455073.224190] sd 0:0:0:0: device_block, handle(0x000c)
May 11 05:40:15 pve kernel: [455074.975487] sd 0:0:3:0: [sdd] Synchronize Cache(10) failed: Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
May 11 05:40:15 pve kernel: [455074.989351] mpt2sas_cm0: removing handle(0x000b), sas_addr(0x4433221102000000)
May 11 05:40:15 pve kernel: [455074.990529] sd 0:0:0:0: [sda] Synchronizing SCSI cache
May 11 05:40:15 pve kernel: [455075.013372] mpt2sas_cm0: removing handle(0x000c), sas_addr(0x4433221103000000)
May 11 05:40:19 pve kernel: [455078.977171] sd 0:0:8:0: Attached scsi generic sg0 type 0
May 11 05:40:19 pve kernel: [455078.977587] sd 0:0:8:0: [sdi] 5860533168 512-byte logical blocks: (3.00 TB/2.73 TiB)
May 11 05:40:19 pve kernel: [455078.977592] sd 0:0:8:0: [sdi] 4096-byte physical blocks
May 11 05:40:19 pve kernel: [455078.982747] sd 0:0:8:0: [sdi] Write Protect is off
May 11 05:40:19 pve kernel: [455078.982751] sd 0:0:8:0: [sdi] Mode Sense: 7f 00 10 08
May 11 05:40:19 pve kernel: [455078.983547] sd 0:0:8:0: [sdi] Write cache: enabled, read cache: enabled, supports DPO and FUA
May 11 05:40:19 pve kernel: [455079.227025] sd 0:0:9:0: [sdj] 5860533168 512-byte logical blocks: (3.00 TB/2.73 TiB)
May 11 05:40:54 pve kernel: [455114.038187] mpt2sas_cm0: removing handle(0x000c), sas_addr(0x4433221103000000)
May 11 05:40:59 pve kernel: [455118.726847] scsi 0:0:10:0: Direct-Access     ATA      WDC WD30EFRX-68E 0A82 PQ: 0 ANSI: 6
May 11 05:40:59 pve kernel: [455118.726859] scsi 0:0:10:0: SATA: handle(0x000b), sas_addr(0x4433221102000000), phy(2), device_name(0x0000000000000000)
May 11 05:40:59 pve kernel: [455118.726862] scsi 0:0:10:0: SATA: enclosure_logical_id(0x5c81f660ec81e400), slot(1)
May 11 05:40:59 pve kernel: [455118.727029] scsi 0:0:10:0: atapi(n), ncq(y), asyn_notify(n), smart(y), fua(y), sw_preserve(y)
May 11 05:40:59 pve kernel: [455118.728607] sd 0:0:10:0: Attached scsi generic sg0 type 0
May 11 05:40:59 pve kernel: [455118.729018] sd 0:0:10:0: [sdi] 5860533168 512-byte logical blocks: (3.00 TB/2.73 TiB)
May 11 05:40:59 pve kernel: [455118.729024] sd 0:0:10:0: [sdi] 4096-byte physical blocks
May 11 05:46:08 pve smartd[4421]: Device: /dev/sda [SAT], open() failed: No such device
May 11 05:46:08 pve smartd[4421]: Sending warning via /usr/share/smartmontools/smartd-runner to root ...
May 11 05:46:08 pve smartd[4421]: Warning via /usr/share/smartmontools/smartd-runner to root: successful
May 11 05:46:08 pve postfix/pickup[19904]: E90E517756: uid=0 from=<root>
May 11 05:46:08 pve smartd[4421]: Device: /dev/sdb [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 119 to 120
May 11 05:46:08 pve postfix/cleanup[32391]: E90E517756: message-id=<20170511034608.E90E517756@pve.localdomain>
May 11 05:46:08 pve postfix/qmgr[4608]: E90E517756: from=<root@pve.localdomain>, size=838, nrcpt=1 (queue active)
May 11 05:46:08 pve smartd[4421]: Device: /dev/sdd [SAT], open() failed: No such device
May 11 05:46:08 pve smartd[4421]: Sending warning via /usr/share/smartmontools/smartd-runner to root ...

Now they are sdi + sdj and I had corrupted data on the two drives, when checking zpool status. I had to zpool export and zfs import for it go away after resilver.

all my 6 drives are connected to an LSI controller flashed to IT-mode.

Any ideas? :)

All packages are up to date from free repository.
 

fabian

Proxmox Staff Member
Staff member
Jan 7, 2016
7,619
1,432
164
broken cables?
 

vooze

Member
May 11, 2017
77
20
8
34
broken cables?

Thank you for your reply. I don't think (hope) so. They are brand new and original LSI cables. These are the ones: "CBL-SFF8087OCF-10M 1 unit of 1m Multi-lane Internal (SFF-8087) Serial ATA breakout cable, forward" - anyway they have worked just fine for several weeks, this only happened once, then after export + import it all "seems" fine again.

Guess I will have to see if it was just a one time thing over time. I have never tried to have 2 disks just go "offline" for a few secounds and then come back with a new "name"
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get your own in 60 seconds.

Buy now!