PVE 5.0 parted mklabel hangs

Glen_A

Renowned Member
Dec 20, 2016
3
0
66
USA
I've been running for a few weeks now PVE 5.0 with Ceph and Bluestore on a quad node Dell 6100 with 1x 160GB for boot OS and 2x 2TB sata drives per node for ceph.

The cluster (Proxmox & Ceph) has been working fine but yesterday (on the first node) I did an apt-get update && apt-get dist-upgrade and when I reboot, the two OSDs on the server filed to come up.

After a lot of debug, I tried to just recreate the OSDs. I with wiped out the partitions and then tried creating the ODSs via the gui. That failed with out any reason so I went to command line tools and got the following:

=============================================================================
/dev/disk/by-uuid# pveceph createosd /dev/sdb -bluestore
create OSD on /dev/sdb (bluestore)
Caution: invalid backup GPT header, but valid main header; regenerating
backup header from main header.

****************************************************************************
Caution: Found protective or hybrid MBR and corrupt GPT. Using GPT, but disk
verification and recovery are STRONGLY recommended.
****************************************************************************
GPT data structures destroyed! You may now partition the disk using fdisk or
other utilities.
Creating new GPT entries.
The operation has completed successfully.

------------------------------------------------------------------
At this point it hung...

While trying to figure out the issue I also tried to recreate the disk partition type via parted and it also hung:

=============================================================================
root@pmx1:/dev/disk/by-uuid# parted /dev/sdb mklabel
New disk label type? gpt
Warning: The existing disk label on /dev/sdb will be destroyed and all data on this disk will be lost. Do you want
to continue?
Yes/No? yes

=============================================================================
After typing in 'Yes' and pressing enter, it than hung! So I'm getting closer to the real issue.

I also tried this too the second other OSD disk /dev/sdc with the same result.

Another possible clue is, in /dev/disk/by-uuid/, I don't see any links for ether of the two disks for the OSDs. I do see the initial boot disk though...

=============================================================================
root@pmx1:/dev/disk/by-uuid# ls -l
total 0
0 lrwxrwxrwx 1 root root 10 Aug 20 08:40 d29c5db0-eab1-4bbe-803e-382d73bc2a14 -> ../../dm-0
0 lrwxrwxrwx 1 root root 10 Aug 20 08:40 8345-80A0 -> ../../sda2
0 lrwxrwxrwx 1 root root 10 Aug 20 08:42 015ef2d3-0fc3-4caa-bf2b-6d9ca5e9e963 -> ../../dm-1

=============================================================================


Here is some more info:
=============================================================================
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 1 149.1G 0 disk
├─sda1 8:1 1 1M 0 part
├─sda2 8:2 1 256M 0 part
└─sda3 8:3 1 148.8G 0 part
├─pve-root 253:0 0 37G 0 lvm /
├─pve-swap 253:1 0 8G 0 lvm [SWAP]
├─pve-data_tmeta 253:2 0 88M 0 lvm
│ └─pve-data 253:4 0 87.8G 0 lvm
└─pve-data_tdata 253:3 0 87.8G 0 lvm
└─pve-data 253:4 0 87.8G 0 lvm
sdb 8:16 1 1.8T 0 disk
sdc 8:32 1 1.8T 0 disk

root@pmx1:/dev/disk/by-uuid# blkid
/dev/sda2: UUID="8345-80A0" TYPE="vfat" PARTUUID="e08c8686-4648-4b08-a1b6-b2d48df92260"
/dev/sda3: UUID="WaN31l-qUAc-KIa6-nN2u-PWe2-Gouv-U2hExs" TYPE="LVM2_member" PARTUUID="fe524c16-08bf-4059-a5cf-25e065e87561"
/dev/mapper/pve-root: UUID="d29c5db0-eab1-4bbe-803e-382d73bc2a14" TYPE="ext4"
/dev/mapper/pve-swap: UUID="015ef2d3-0fc3-4caa-bf2b-6d9ca5e9e963" TYPE="swap"
/dev/sda1: PARTUUID="6ed5b088-0f4c-4b5c-9b1f-b6e223fbf2d6"
/dev/sdb: PTUUID="a4d2cf40-aa4e-4e2c-a3bb-a92750b2eb5f" PTTYPE="gpt"

=============================================================================

Any help would be appreciated!

-Glen
 
No. The only line in /var/log/syslog while trying 'parted /dev/sdb' is:

Aug 20 10:55:28 pmx1 kernel: [ 8124.109558] sdb:

Nothing else. :(
 
what is the pveversion -v output?
 
Here is the version you requested:

root@pmx1:~# pveversion -v
proxmox-ve: 5.0-20 (running kernel: 4.10.17-2-pve)
pve-manager: 5.0-30 (running version: 5.0-30/5ab26bc)
pve-kernel-4.10.17-2-pve: 4.10.17-20
pve-kernel-4.10.15-1-pve: 4.10.15-15
pve-kernel-4.10.17-1-pve: 4.10.17-18
libpve-http-server-perl: 2.0-6
lvm2: 2.02.168-pve3
corosync: 2.4.2-pve3
libqb0: 1.0.1-1
pve-cluster: 5.0-12
qemu-server: 5.0-15
pve-firmware: 2.0-2
libpve-common-perl: 5.0-16
libpve-guest-common-perl: 2.0-11
libpve-access-control: 5.0-6
libpve-storage-perl: 5.0-14
pve-libspice-server1: 0.12.8-3
vncterm: 1.5-2
pve-docs: 5.0-9
pve-qemu-kvm: 2.9.0-4
pve-container: 2.0-15
pve-firewall: 3.0-2
pve-ha-manager: 2.0-2
ksm-control-daemon: 1.2-2
glusterfs-client: 3.8.8-1
lxc-pve: 2.0.8-3
lxcfs: 2.0.7-pve4
criu: 2.11.1-1~bpo90
novnc-pve: 0.6-4
smartmontools: 6.5+svn4324-1
zfsutils-linux: 0.6.5.11-pve17~bpo90
ceph: 12.1.2-pve1
 
please post the full boot log (i.e., the first few minutes of "journalctl -b"). what kind of controller are those disks connected to?