Drives just disconnects then have a new names

vooze · May 11, 2017

Hi

I have a bit of a problem today, with two drives (sda + sdd). I got an email about smart failing because "no such device" I got worried so had to login from work, during zpool status it was all good so I put my mind to ease, then when I got home, I checked the logs.

Code:

May 11 05:40:13 pve kernel: [455073.224190] sd 0:0:0:0: device_block, handle(0x000c)
May 11 05:40:15 pve kernel: [455074.975487] sd 0:0:3:0: [sdd] Synchronize Cache(10) failed: Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
May 11 05:40:15 pve kernel: [455074.989351] mpt2sas_cm0: removing handle(0x000b), sas_addr(0x4433221102000000)
May 11 05:40:15 pve kernel: [455074.990529] sd 0:0:0:0: [sda] Synchronizing SCSI cache
May 11 05:40:15 pve kernel: [455075.013372] mpt2sas_cm0: removing handle(0x000c), sas_addr(0x4433221103000000)
May 11 05:40:19 pve kernel: [455078.977171] sd 0:0:8:0: Attached scsi generic sg0 type 0
May 11 05:40:19 pve kernel: [455078.977587] sd 0:0:8:0: [sdi] 5860533168 512-byte logical blocks: (3.00 TB/2.73 TiB)
May 11 05:40:19 pve kernel: [455078.977592] sd 0:0:8:0: [sdi] 4096-byte physical blocks
May 11 05:40:19 pve kernel: [455078.982747] sd 0:0:8:0: [sdi] Write Protect is off
May 11 05:40:19 pve kernel: [455078.982751] sd 0:0:8:0: [sdi] Mode Sense: 7f 00 10 08
May 11 05:40:19 pve kernel: [455078.983547] sd 0:0:8:0: [sdi] Write cache: enabled, read cache: enabled, supports DPO and FUA
May 11 05:40:19 pve kernel: [455079.227025] sd 0:0:9:0: [sdj] 5860533168 512-byte logical blocks: (3.00 TB/2.73 TiB)
May 11 05:40:54 pve kernel: [455114.038187] mpt2sas_cm0: removing handle(0x000c), sas_addr(0x4433221103000000)
May 11 05:40:59 pve kernel: [455118.726847] scsi 0:0:10:0: Direct-Access     ATA      WDC WD30EFRX-68E 0A82 PQ: 0 ANSI: 6
May 11 05:40:59 pve kernel: [455118.726859] scsi 0:0:10:0: SATA: handle(0x000b), sas_addr(0x4433221102000000), phy(2), device_name(0x0000000000000000)
May 11 05:40:59 pve kernel: [455118.726862] scsi 0:0:10:0: SATA: enclosure_logical_id(0x5c81f660ec81e400), slot(1)
May 11 05:40:59 pve kernel: [455118.727029] scsi 0:0:10:0: atapi(n), ncq(y), asyn_notify(n), smart(y), fua(y), sw_preserve(y)
May 11 05:40:59 pve kernel: [455118.728607] sd 0:0:10:0: Attached scsi generic sg0 type 0
May 11 05:40:59 pve kernel: [455118.729018] sd 0:0:10:0: [sdi] 5860533168 512-byte logical blocks: (3.00 TB/2.73 TiB)
May 11 05:40:59 pve kernel: [455118.729024] sd 0:0:10:0: [sdi] 4096-byte physical blocks
May 11 05:46:08 pve smartd[4421]: Device: /dev/sda [SAT], open() failed: No such device
May 11 05:46:08 pve smartd[4421]: Sending warning via /usr/share/smartmontools/smartd-runner to root ...
May 11 05:46:08 pve smartd[4421]: Warning via /usr/share/smartmontools/smartd-runner to root: successful
May 11 05:46:08 pve postfix/pickup[19904]: E90E517756: uid=0 from=<root>
May 11 05:46:08 pve smartd[4421]: Device: /dev/sdb [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 119 to 120
May 11 05:46:08 pve postfix/cleanup[32391]: E90E517756: message-id=<20170511034608.E90E517756@pve.localdomain>
May 11 05:46:08 pve postfix/qmgr[4608]: E90E517756: from=<root@pve.localdomain>, size=838, nrcpt=1 (queue active)
May 11 05:46:08 pve smartd[4421]: Device: /dev/sdd [SAT], open() failed: No such device
May 11 05:46:08 pve smartd[4421]: Sending warning via /usr/share/smartmontools/smartd-runner to root ...

Now they are sdi + sdj and I had corrupted data on the two drives, when checking zpool status. I had to zpool export and zfs import for it go away after resilver.

all my 6 drives are connected to an LSI controller flashed to IT-mode.

Any ideas?

All packages are up to date from free repository.

fabian · May 12, 2017

broken cables?

vooze · May 12, 2017

fabian said:
broken cables?

Thank you for your reply. I don't think (hope) so. They are brand new and original LSI cables. These are the ones: "CBL-SFF8087OCF-10M 1 unit of 1m Multi-lane Internal (SFF-8087) Serial ATA breakout cable, forward" - anyway they have worked just fine for several weeks, this only happened once, then after export + import it all "seems" fine again.

Guess I will have to see if it was just a one time thing over time. I have never tried to have 2 disks just go "offline" for a few secounds and then come back with a new "name"

Search

Search

Drives just disconnects then have a new names

vooze

Member

fabian

Proxmox Staff Member

vooze

Member