Proxmox 4.1 boot issues (dev-pve-data.device has failed)

grandon

Renowned Member
Jun 18, 2014
2
0
66
Hello!

we currently have Proxmox 3.4 installed and are very satisfied with it (thanks!) and we want to upgrade to 4.1.
One of our two DL385p is currently used to test the new software, but we are experiencing boot issues.

Journal Log:
http://pastebin.com/vTkL6gpG

Code:
Dec 21 10:26:55 rnd-01 systemd[1]: Job dev-pve-data.device/start timed out.
Dec 21 10:26:55 rnd-01 systemd[1]: Timed out waiting for device dev-pve-data.device.
-- Subject: Unit dev-pve-data.device has failed
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit dev-pve-data.device has failed.
--
-- The result is timeout.

The OS is booted from a RAID 1 (HP P420i Controller), which seems to be recognized fine during the startup:
Code:
[  6.097376] hpsa 0000:03:00.0: scsi 4:3:0:0: added RAID  HP  P420i  RAID-UNKNOWN SSDSmartPathCap- En- Exp=3
[  6.097382] hpsa 0000:03:00.0: scsi 4:2:0:0: masked Direct-Access  ATA  SAMSUNG MZ7WD120 RAID-UNKNOWN SSDSmartPathCap- En- Exp=0
[  6.097385] hpsa 0000:03:00.0: scsi 4:2:1:0: masked Direct-Access  ATA  SAMSUNG MZ7WD120 RAID-UNKNOWN SSDSmartPathCap- En- Exp=0
[  6.097388] hpsa 0000:03:00.0: scsi 4:0:0:0: added Direct-Access  HP  LOGICAL VOLUME  RAID-1(+0) SSDSmartPathCap- En- Exp=3
[  6.097570] scsi 4:3:0:0: RAID  HP  P420i  6.68 PQ: 0 ANSI: 5
[  6.097814] scsi 4:0:0:0: Direct-Access  HP  LOGICAL VOLUME  6.68 PQ: 0 ANSI: 5

It is possible to continue in maintenance mode and the mentioned services (like above) which failed are all active and operational.

What we noticed so far is that dmesg shows a long initializing time for the HP H221 Host Bus Adapter (which is connected to our storage HP MSA 2040, but recently disconnected for this test).

Code:
[  105.632733] mpt2sas0: sending message unit reset !!
[  105.640747] mpt2sas0: message unit reset: SUCCESS
[  187.056761] mpt2sas0: Allocated physical memory: size(16361 kB)
[  187.056766] mpt2sas0: Current Controller Queue Depth(7931), Max Controller Queue Depth(8192)
[  187.056767] mpt2sas0: Scatter Gather Elements per IO(128)
[  441.708043] mpt2sas0: LSISAS2308: FWVersion(15.10.09.00), ChipRevision(0x01), BiosVersion(07.39.00.00)
[  441.708047] mpt2sas0: HP H221 Host Bus Adapter
[  441.708049] mpt2sas0: Protocol=(Initiator,Target), Capabilities=(TLR,EEDP,Snapshot Buffer,Diag Trace Buffer,Task Set Full,NCQ)
[  441.708100] scsi host0: Fusion MPT SAS Host
[  441.708574] mpt2sas0: sending port enable !!
[  441.711056] mpt2sas0: host_add: handle(0x0001), sas_addr(0x500605b0075ba2f0), phys(8)
[  451.893344] mpt2sas0: port enable: SUCCESS

We tried to set a kerneldelay of 180 seconds, without any effect.
We also applied a "vgchange -ay" script to the initramfs in a previous test without any success.

The same issues were noticed during the Proxmox 4.x beta testing.
 
Update:

we installed the latest Debian on the server just to see if the issues can be reproduced.
With Debian the server boots after some seconds without any problem.

logs attached.

Code:
root@rnd-01:~# uname -a
Linux rnd-01 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt20-1+deb8u1 (2015-12-14) x86_64 GNU/Linux
 

Attachments

Last edited: