System not booting after new NIC installed

mkyb14

Well-Known Member
Mar 17, 2017
58
1
48
40
Been googling for the last hour trying to figure out why when installing a 4port NIC, I've lost all my zpools?

images below of the bios and MB Manual.

I have 3 dell perc 300 IT mode 20 in slots 2,4,6 for a supermicro cse846 case. worked just fine in those slots (for some reasons those HBA's in the 1,3,5 slots wouldn't show) and supermicro is no help. I'm sure it's a setting for the MB. Proxmox is installed on the NVME m.2.

Anyways, got a new HP 4port nic (shows in proxmox just fine), but noticed all my datasets have the grey questionmark. zpool list (no pools).
Rebooted, the avaigo bios sees the drives, but they don't show in proxmox now... I'm guessing if I was to remove the slot 5 nic and reboot they would.

Any ideas as to why this is happening?
 

Attachments

  • Screen Shot 2020-05-28 at 2.35.43 PM.png
    Screen Shot 2020-05-28 at 2.35.43 PM.png
    373.1 KB · Views: 14
  • Screen Shot 2020-05-28 at 2.37.18 PM.png
    Screen Shot 2020-05-28 at 2.37.18 PM.png
    92 KB · Views: 14
  • Screen Shot 2020-05-31 at 9.40.33 PM.png
    Screen Shot 2020-05-31 at 9.40.33 PM.png
    149.2 KB · Views: 10
Does a "lsblk" list the drives in proxmox shell ?

If yes try to import them with "zpool import -f"

Please post the output of "lsblk" "df -h" "zfs list"

Also check "systemctl status" for any erros
 
So removing the card and rebooting, everything shows up....

Code:
root@pve:~# lsblk
NAME                         MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sda                            8:0    0  3.7T  0 disk
├─sda1                         8:1    0  3.7T  0 part
└─sda9                         8:9    0    8M  0 part
sdb                            8:16   0  9.1T  0 disk
├─sdb1                         8:17   0  9.1T  0 part
└─sdb9                         8:25   0    8M  0 part
sdc                            8:32   0  9.1T  0 disk
├─sdc1                         8:33   0  9.1T  0 part
└─sdc9                         8:41   0    8M  0 part
sdd                            8:48   0  9.1T  0 disk
├─sdd1                         8:49   0  9.1T  0 part
└─sdd9                         8:57   0    8M  0 part
sde                            8:64   0  9.1T  0 disk
├─sde1                         8:65   0  9.1T  0 part
└─sde9                         8:73   0    8M  0 part
sdf                            8:80   0  3.7T  0 disk
├─sdf1                         8:81   0  3.7T  0 part
└─sdf9                         8:89   0    8M  0 part
sdg                            8:96   0  3.7T  0 disk
├─sdg1                         8:97   0  3.7T  0 part
└─sdg9                         8:105  0    8M  0 part
sdh                            8:112  0  3.7T  0 disk
├─sdh1                         8:113  0  3.7T  0 part
└─sdh9                         8:121  0    8M  0 part
sdi                            8:128  0  9.1T  0 disk
├─sdi1                         8:129  0  9.1T  0 part
└─sdi9                         8:137  0    8M  0 part
sdj                            8:144  0  9.1T  0 disk
├─sdj1                         8:145  0  9.1T  0 part
└─sdj9                         8:153  0    8M  0 part
sdk                            8:160  0  9.1T  0 disk
├─sdk1                         8:161  0  9.1T  0 part
└─sdk9                         8:169  0    8M  0 part
nvme0n1                      259:0    0  1.9T  0 disk
├─nvme0n1p1                  259:1    0 1007K  0 part
├─nvme0n1p2                  259:2    0  512M  0 part /boot/efi
└─nvme0n1p3                  259:3    0  1.9T  0 part
  ├─pve-swap                 253:0    0    8G  0 lvm  [SWAP]
  ├─pve-root                 253:1    0   96G  0 lvm  /
  ├─pve-data_tmeta           253:2    0 15.8G  0 lvm 
  │ └─pve-data-tpool         253:4    0  1.7T  0 lvm 
  │   ├─pve-data             253:5    0  1.7T  0 lvm 
  │   ├─pve-vm--101--disk--0 253:6    0  132G  0 lvm 
  │   └─pve-vm--100--disk--0 253:7    0   32G  0 lvm 
  └─pve-data_tdata           253:3    0  1.7T  0 lvm 
    └─pve-data-tpool         253:4    0  1.7T  0 lvm 
      ├─pve-data             253:5    0  1.7T  0 lvm 
      ├─pve-vm--101--disk--0 253:6    0  132G  0 lvm 
      └─pve-vm--100--disk--0 253:7    0   32G  0 lvm

DF -H

Code:
root@pve:~# df -h
Filesystem            Size  Used Avail Use% Mounted on
udev                   63G     0   63G   0% /dev
tmpfs                  13G   20M   13G   1% /run
/dev/mapper/pve-root   94G  3.9G   86G   5% /
tmpfs                  63G   43M   63G   1% /dev/shm
tmpfs                 5.0M     0  5.0M   0% /run/lock
tmpfs                  63G     0   63G   0% /sys/fs/cgroup
/dev/nvme0n1p2        511M  312K  511M   1% /boot/efi
BigData/Files          36T  9.6T   26T  28% /BigData/Files
/dev/fuse              30M   16K   30M   1% /etc/pve
tmpfs                  13G     0   13G   0% /run/user/0

zfs list

Code:
root@pve:~# zfs list
NAME                               USED  AVAIL     REFER  MOUNTPOINT
Apollo                            1.84G  6.82T      186K  /Apollo
Apollo/ISO                         506M  6.82T      506M  /Apollo/ISO
Apollo/storage                     516M  6.82T      140K  /Apollo/storage
Apollo/storage/subvol-102-disk-0   516M  4.88T      516M  /Apollo/storage/subvol-102-disk-0
Apollo/subvol-102-disk-0           859M  7.16G      859M  /Apollo/subvol-102-disk-0
Apollo/vmdata                      140K  6.82T      140K  /Apollo/vmdata
BigData                           9.58T  25.6T      272K  /BigData
BigData/Files                     9.58T  25.6T     9.58T  /BigData/Files
BigData/subvol-102-disk-0         1.94G  24.4T     1.94G  /BigData/subvol-102-disk-0


I will reply back once I have a chance to plug the card back in and run the same commands again.
 
ok so after plugging the card back in to slot 5, no bios changes to what you see above, the following is what happens. So I'm not sure from googling what this actually is. This is a brand new Supermicro server board, can't think that they would disable slots based on a card being in one like many other people have suggested. why have 6 pcie slots then.

Is there a setting I'm just not understanding, like OPROM or bifurication?

Code:
root@pve:~# lsblk
NAME                         MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sda                            8:0    0  9.1T  0 disk
├─sda1                         8:1    0  9.1T  0 part
└─sda9                         8:9    0    8M  0 part
sdb                            8:16   0  9.1T  0 disk
├─sdb1                         8:17   0  9.1T  0 part
└─sdb9                         8:25   0    8M  0 part
sdc                            8:32   0  9.1T  0 disk
├─sdc1                         8:33   0  9.1T  0 part
└─sdc9                         8:41   0    8M  0 part
nvme0n1                      259:0    0  1.9T  0 disk
├─nvme0n1p1                  259:1    0 1007K  0 part
├─nvme0n1p2                  259:2    0  512M  0 part /boot/efi
└─nvme0n1p3                  259:3    0  1.9T  0 part
  ├─pve-swap                 253:0    0    8G  0 lvm  [SWAP]
  ├─pve-root                 253:1    0   96G  0 lvm  /
  ├─pve-data_tmeta           253:2    0 15.8G  0 lvm 
  │ └─pve-data-tpool         253:4    0  1.7T  0 lvm 
  │   ├─pve-data             253:5    0  1.7T  0 lvm 
  │   ├─pve-vm--101--disk--0 253:6    0  132G  0 lvm 
  │   └─pve-vm--100--disk--0 253:7    0   32G  0 lvm 
  └─pve-data_tdata           253:3    0  1.7T  0 lvm 
    └─pve-data-tpool         253:4    0  1.7T  0 lvm 
      ├─pve-data             253:5    0  1.7T  0 lvm 
      ├─pve-vm--101--disk--0 253:6    0  132G  0 lvm 
      └─pve-vm--100--disk--0 253:7    0   32G  0 lvm

df -h

Code:
root@pve:~# df -h
Filesystem            Size  Used Avail Use% Mounted on
udev                   63G     0   63G   0% /dev
tmpfs                  13G   11M   13G   1% /run
/dev/mapper/pve-root   94G  3.9G   86G   5% /
tmpfs                  63G   34M   63G   1% /dev/shm
tmpfs                 5.0M     0  5.0M   0% /run/lock
tmpfs                  63G     0   63G   0% /sys/fs/cgroup
/dev/nvme0n1p2        511M  312K  511M   1% /boot/efi
/dev/fuse              30M   16K   30M   1% /etc/pve
tmpfs                  13G     0   13G   0% /run/user/0

zfs list

Code:
root@pve:~# zfs list
no datasets available
 
also with the nic plugged in

Systemctl status gives the following


Code:
● pve
    State: degraded
     Jobs: 0 queued
   Failed: 2 units
    Since: Fri 2020-05-29 07:34:54 PDT; 12min ago
   CGroup: /
           ├─1440 bpfilter_umh
           ├─user.slice
           │ └─user-0.slice
           │   ├─session-1.scope
           │   │ ├─ 2268 sshd: root@pts/0
           │   │ ├─ 2367 -bash
           │   │ ├─22593 systemctl status
           │   │ └─22594 pager
           │   └─user@0.service
           │     └─init.scope
           │       ├─2352 /lib/systemd/systemd --user
           │       └─2353 (sd-pam)
           ├─init.scope
           │ └─1 /sbin/init
           └─system.slice
             ├─influxdb.service
             │ └─1430 /usr/bin/influxd -config /etc/influxdb/influxdb.conf
             ├─containerd.service
             │ └─1457 /usr/bin/containerd
             ├─systemd-udevd.service
             │ ├─  753 /lib/systemd/systemd-udevd
             │ ├─22580 /lib/systemd/systemd-udevd
             │ ├─22581 /lib/systemd/systemd-udevd
             │ ├─22582 /lib/systemd/systemd-udevd
             │ ├─22583 /lib/systemd/systemd-udevd
             │ ├─22584 /lib/systemd/systemd-udevd
             │ └─22585 /lib/systemd/systemd-udevd
             ├─cron.service
             │ └─1921 /usr/sbin/cron -f
             ├─docker.service
             │ └─1459 /usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock
             ├─pve-firewall.service
             │ └─1926 pve-firewall
             ├─pve-lxc-syscalld.service
             │ └─1300 /usr/lib/x86_64-linux-gnu/pve-lxc-syscalld/pve-lxc-syscalld --system /run/pve/lxc-syscalld.sock
             ├─spiceproxy.service
             │ ├─1969 spiceproxy
             │ └─1970 spiceproxy worker
             ├─pve-ha-crm.service
             │ └─1962 pve-ha-crm
             ├─nmbd.service
             │ └─1444 /usr/sbin/nmbd --foreground --no-process-group
             ├─pvedaemon.service
             │ ├─1952 pvedaemon
             │ ├─1953 pvedaemon worker
             │ ├─1954 pvedaemon worker
             │ └─1955 pvedaemon worker
             ├─systemd-journald.service
             │ └─725 /lib/systemd/systemd-journald
             ├─unattended-upgrades.service
             │ └─1445 /usr/bin/python3 /usr/share/unattended-upgrades/unattended-upgrade-shutdown --wait-for-signal
             ├─ssh.service
             │ └─1453 /usr/sbin/sshd -D
             ├─qmeventd.service
             │ └─1336 /usr/sbin/qmeventd /var/run/qmeventd.sock
             ├─rrdcached.service
             │ └─1580 /usr/bin/rrdcached -B -b /var/lib/rrdcached/db/ -j /var/lib/rrdcached/journal/ -p /var/run/rrdcached.pid -l unix:/var/run/r
             ├─watchdog-mux.service
             │ └─1308 /usr/sbin/watchdog-mux
             ├─pvefw-logger.service
             │ └─1284 /usr/sbin/pvefw-logger
             ├─rsyslog.service
             │ └─1344 /usr/sbin/rsyslogd -n -iNONE
             ├─pveproxy.service
             │ ├─1963 pveproxy
             │ ├─1964 pveproxy worker
             │ ├─1965 pveproxy worker
             │ └─1966 pveproxy worker
             ├─ksmtuned.service
             │ ├─ 1331 /bin/bash /usr/sbin/ksmtuned
             │ └─22591 sleep 60
             ├─lxc-monitord.service
             │ └─1451 /usr/lib/x86_64-linux-gnu/lxc/lxc-monitord --daemon
             ├─rpcbind.service
             │ └─1290 /sbin/rpcbind -f -w
             ├─lxcfs.service
             │ └─1302 /usr/bin/lxcfs /var/lib/lxcfs
             ├─system-postfix.slice
             │ └─postfix@-.service
             │   ├─1867 /usr/lib/postfix/sbin/master -w
             │   ├─1869 pickup -l -t unix -u -c
             │   └─1870 qmgr -l -t unix -u
             ├─smartmontools.service
             │ └─1309 /usr/sbin/smartd -n
             ├─iscsid.service
             │ ├─1461 /sbin/iscsid
             │ └─1462 /sbin/iscsid
             ├─zfs-zed.service
             │ └─1341 /usr/sbin/zed -F
             ├─pve-cluster.service
             │ └─1624 /usr/bin/pmxcfs
             ├─smbd.service
             │ ├─1589 /usr/sbin/smbd --foreground --no-process-group
             │ ├─1646 /usr/sbin/smbd --foreground --no-process-group
             │ ├─1647 /usr/sbin/smbd --foreground --no-process-group
             │ ├─1664 /usr/sbin/smbd --foreground --no-process-group
             │ └─2034 /usr/sbin/smbd --foreground --no-process-group
             ├─dbus.service
             │ └─1326 /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation --syslog-only
             ├─systemd-timesyncd.service
             │ └─1294 /lib/systemd/systemd-timesyncd
             ├─pve-ha-lrm.service
             │ └─1971 pve-ha-lrm
             ├─system-getty.slice
             │ └─getty@tty1.service
             │   └─1537 /sbin/agetty -o -p -- \u --noclear tty1 linux
             ├─pvestatd.service
             │ └─1927 pvestatd
             ├─dm-event.service
             │ └─744 /sbin/dmeventd -f
             └─systemd-logind.service
               └─1299 /lib/systemd/systemd-logind
 
Two services are failing, one will be zfs.

You can check with "systemctl --failed".

But as the drives are completly gone when you put the nic in it seems like a fault with the mobo.

You could compare the kernel output "dmesg -T"

Maybe it shows some driver issues with the nic.
 
Code:
root@pve:~# systemctl --failed
  UNIT                      LOAD   ACTIVE SUB    DESCRIPTION                 
● pve-container@102.service loaded failed failed PVE LXC Container: 102     
● zfs-import-cache.service  loaded failed failed Import ZFS pools by cache file

LOAD   = Reflects whether the unit definition was properly loaded.
ACTIVE = The high-level unit activation state, i.e. generalization of SUB.
SUB    = The low-level unit activation state, values depend on unit type.

2 loaded units listed. Pass --all to see loaded but inactive units, too.
To show all installed unit files use 'systemctl list-unit-files'.

and also attached output from systemctl --all
 

Attachments

  • systemctl--all.txt
    60.8 KB · Views: 1
  • dmesg-T.txt
    152.2 KB · Views: 1
Last edited:
Code:
root@pve:~# systemctl --failed
  UNIT                      LOAD   ACTIVE SUB    DESCRIPTION                
● pve-container@102.service loaded failed failed PVE LXC Container: 102    
● zfs-import-cache.service  loaded failed failed Import ZFS pools by cache file

LOAD   = Reflects whether the unit definition was properly loaded.
ACTIVE = The high-level unit activation state, i.e. generalization of SUB.
SUB    = The low-level unit activation state, values depend on unit type.

2 loaded units listed. Pass --all to see loaded but inactive units, too.
To show all installed unit files use 'systemctl list-unit-files'.

and also attached output from systemctl --all


Also updated the avago device list in the first post.
 
Code:
root@pve:~# systemctl --failed
  UNIT                      LOAD   ACTIVE SUB    DESCRIPTION                
● pve-container@102.service loaded failed failed PVE LXC Container: 102    
● zfs-import-cache.service  loaded failed failed Import ZFS pools by cache file

LOAD   = Reflects whether the unit definition was properly loaded.
ACTIVE = The high-level unit activation state, i.e. generalization of SUB.
SUB    = The low-level unit activation state, values depend on unit type.

2 loaded units listed. Pass --all to see loaded but inactive units, too.
To show all installed unit files use 'systemctl list-unit-files'.

and also attached output from systemctl --all

There are serveral pci errors that look relevant
[Sat May 30 14:44:28 2020] pci 0000:c1:00.0: BAR 9: no space for [mem size 0x00400000 64bit]
[Sat May 30 14:44:28 2020] pci 0000:c1:00.0: BAR 9: failed to assign [mem size 0x00400000 64bit]
[Sat May 30 14:44:28 2020] pci 0000:c1:00.0: BAR 7: no space for [mem size 0x00040000 64bit]
[Sat May 30 14:44:28 2020] pci 0000:c1:00.0: BAR 7: failed to assign [mem size 0x00040000 64bit]


It might be a mobo setting but i doubt it. I kinda have the feeling the nic interferes with your hba.

If the only thing that changes is adding/removing the nic, it seems like a hardware problem.
 
so that's what happens... all three HBA's work only in slots 2,4,6. If in 1,3,5 they don't show in proxmox.
With them in 2,4,6 and I put the nic in slot 5, I lose 2 of them.

I'm just struggling with Supermicro in trying to understand that they would have a brand new board in 2020 that has 6 pcie slots and somehow disables them or HW breaks them. These are all very popular hardware types and mfgs. They essentially state that you have to use Supermicro AOC approved devices since they know it "works".

If this was as MB issue or setting, what would that be? legacy / uefi settings for pcie? I tried swapping them around. I also went into the avago settings and disabled boot, since the os is installed on the nvme to further reduce any settings that could be interfering.
 
There are serveral pci errors that look relevant
[Sat May 30 14:44:28 2020] pci 0000:c1:00.0: BAR 9: no space for [mem size 0x00400000 64bit]
[Sat May 30 14:44:28 2020] pci 0000:c1:00.0: BAR 9: failed to assign [mem size 0x00400000 64bit]
[Sat May 30 14:44:28 2020] pci 0000:c1:00.0: BAR 7: no space for [mem size 0x00040000 64bit]
[Sat May 30 14:44:28 2020] pci 0000:c1:00.0: BAR 7: failed to assign [mem size 0x00040000 64bit]


It might be a mobo setting but i doubt it. I kinda have the feeling the nic interferes with your hba.

If the only thing that changes is adding/removing the nic, it seems like a hardware problem.
how can I tell what slot that is on the motherboard? It's not in slot 1.... trying to figure out how to correlate c1:00 to a slot
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!