PVE Doing Something Strange to SATA Drives

Apr 13, 2022
15
1
1
Background:

System is a Supermicro 5018D-FN4T.
I have a VM TrueNAS, which relies on SATA drives being passed through /dev/disk/by-id. The filesystem of the drives is ZFS.
Everything was working until one day I shutdown the VM.
Upon reboot of the host the drives are confirmed attached in the BIOS. No hardware changes have been made.
However, once pve loads the disks don't show up in the GUI nor fdisk -l. Obviously, the VM fails to start.

In the console prior to the login screen, the following is displayed:

Found volume group "pve" using metadata type lvm2 6 logical volume(s) in volume group "pve" now active /dev/mapper/pve-root: clean, 86164/6291456 files, 3217668/25165824 blocks [FAILED] Failed to start Import ZFS pools by device scanning.

Questions:
1. What could possibly be going on?
2. One subsequent reboot after many attempts, the drives did show up, but I changed nothing. Why is this happening intermittently?
 
1. What could possibly be going on?
From what you wrote and the error message indicates that it detects the ZFS pool on the disks and is unable to import them. If you only have ZFS inside of your TrueNAS VM, try to disable all ZFS stuff on the PVE side.
 
Thank you for the reply @LnxBil

I confirmed:
1. there are no zfs pools in /etc/pve/storage.cfg.
2. zfs list returns no value.

I'd rather not disable a module from pve, if I don't have to.

Is there a way to exclude the drives from its scan?
 
how many drives were being passed through to TrueNAS?

on the proxmox host, what does the output of lsblk look like?

on the proxmox host, what does zpool import return?
 
@bobmc Thank you for your insight.

There are four drives passed to the TrueNAS VM, but I'm only using the two 8TB drives ATM.

zpool import returns:

Code:
pool: ds-truenas
id: 12345678901234567890
state: ONLINE
status: The pool was last accessed by another system.
action: The pool can be imported using its name or numeric identifier and the '-f' flag.
see: https://openzfs.github.io/openzfs-docs/msg/NFS-8000-EY
config:
ds-truenas ONLINE
mirror-0 ONLINE
sdc2 ONLINE
sdd2 ONLINE

lsblk returns:

Code:
NAME       MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda          8:0    0   2.7T  0 disk 
├─sda1       8:1    0   200M  0 part 
└─sda2       8:2    0   2.7T  0 part 
sdb          8:16   0   2.7T  0 disk 
├─sdb1       8:17   0   2.4G  0 part 
├─sdb2       8:18   0     2G  0 part 
└─sdb3       8:19   0   2.7T  0 part 
sdc          8:32   0   7.3T  0 disk 
├─sdc1       8:33   0     2G  0 part 
└─sdc2       8:34   0   7.3T  0 part 
sdd          8:48   0   7.3T  0 disk 
├─sdd1       8:49   0     2G  0 part 
└─sdd2       8:50   0   7.3T  0 part 
sde          8:64   0 465.8G  0 disk 
├─sde1       8:65   0  1007K  0 part 
├─sde2       8:66   0   512M  0 part /boot/efi
└─sde3       8:67   0 465.3G  0 part 
  ├─pve-swap
  │        253:0    0     8G  0 lvm  [SWAP]
  ├─pve-root
  │        253:1    0    96G  0 lvm  /
  ├─pve-data_tmeta
  │        253:2    0   3.5G  0 lvm  
  │ └─pve-data-tpool
  │        253:4    0 338.4G  0 lvm  
  │   ├─pve-data
  │   │    253:5    0 338.4G  1 lvm  
  │   ├─pve-vm--101--disk--0
  │   │    253:6    0    32G  0 lvm  
  │   ├─pve-vm--102--disk--0
  │   │    253:7    0    16G  0 lvm  
  │   └─pve-vm--100--disk--0
  │        253:8    0    32G  0 lvm  
  └─pve-data_tdata
           253:3    0 338.4G  0 lvm  
    └─pve-data-tpool
           253:4    0 338.4G  0 lvm  
      ├─pve-data
      │    253:5    0 338.4G  1 lvm  
      ├─pve-vm--101--disk--0
      │    253:6    0    32G  0 lvm  
      ├─pve-vm--102--disk--0
      │    253:7    0    16G  0 lvm  
      └─pve-vm--100--disk--0
           253:8    0    32G  0 lvm
 
If your console is just displaying, that you "failed to import", then just ignore it, or disable the systemd unit zfs-import-scan.service.
FMSA, I did some searching, and I came to the following conclusion supported by the two premises below.

source: https://github.com/openzfs/zfs/issues/4325

behlendorf commented on Feb 12, 2016

The cache file is one way to configure the system, it's not definitively the right way or the only way. The current systemd behavior is to prefer the cache file if it exists otherwise fall back to scanning.

ilovezfs commented on Feb 12, 2016

I think it's hard to justify a naked, automated zpool import -a under any scenario even with -d /dev/disk/by-id. Maybe if it forced the mountpoint property to be entirely ignored, it would be something other than scary.



If the information above is valid, the zpool import task could be prepended by writing something in /etc/zfs/zpool.cache , I just don't know what. I don't have a cache file currently.

Do you know what should be in the file?
 
So that's looking positive, drives are available and the zfs pool appears to be intact.

What happens when you start the TrueNAS VM? Anything in the logs?
 
If the information above is valid, the zpool import task could be prepended by writing something in /etc/zfs/zpool.cache , I just don't know what. I don't have a cache file currently.

Do you know what should be in the file?
IMHO, this is now that you're shooting for. You don't want to import the pool in your host, therefore there should not be any configuration present and any software trying to access things should be disabled.
 
IMHO, this is now that you're shooting for. You don't want to import the pool in your host, therefore there should not be any configuration present and any software trying to access things should be disabled.
Copy that. I agree with your logic.
So that's looking positive, drives are available and the zfs pool appears to be intact.

What happens when you start the TrueNAS VM? Anything in the logs?
I looked at syslog with less /var/log/syslog. I could see smartd finding the drives as the host boots, but nothing was logged as the VMs were spinning up. Is there another place to look?
 
I looked at syslog with less /var/log/syslog. I could see smartd finding the drives as the host boots, but nothing was logged as the VMs were spinning up. Is there another place to look?
Does the TrueNAS VM starts or not? If not, there is an red log entry in the GUI (bottom) where the actual error may be visibe.
 
the VM starts and pulls the drives in. The error when it doesn't have them is


Task viewer: VM 102 - Start

OutputStatus

Stop
kvm: -drive file=/dev/disk/by-id/ata-ST8000VN0022-2EL112_xxxxxxx,if=none,id=drive-sata0,format=raw,cache=none,aio=io_uring,detect-zeroes=on: Could not open '/dev/disk/by-id/ata-ST8000VN0022-2EL112_xxxxxxxx': No such file or directory
TASK ERROR: start failed: QEMU exited with code 1
Does the TrueNAS VM starts or not? If not, there is an red log entry in the GUI (bottom) where the actual error may be visibe.
 
Last edited:
on the proxmox host, does ls /dev/disk/by-id show a drive id that matches the one in the error report?
 
on the proxmox host, does ls /dev/disk/by-id show a drive id that matches the one in the error report?

I think we're getting signals crossed. Yes, sometimes they do show up.
2. One subsequent reboot after many attempts, the drives did show up, but I changed nothing. Why is this happening intermittently?
When the above happens, the drives attach to the VM, and they do show up in ls /dev/disk/by-id
Whether they attach or not involves their import by pve, being seen as ZFS drives.
 
Last edited:
I think what needs to be established is if the 'by-id' disk identifier is consistent between reboots (my understanding is that it is expected to be unchanged). If it is consistent, then what needs to be understood is why pass-through works sometimes but not others.

Alternatively, you could try using 'by-uuid' instead?
 
I shutdown the host, read the logs and saw this:

udevadm[590]: systemd-udev-settle.service is deprecated. Please fix zfs-import-cache.service, zfs-import-scan.service not to pull it in.

Any relation to the problem?

This is the TrueNAS VM config:
Code:
balloon: 0
boot: order=scsi0
cores: 8
cpu: host
memory: 65536
name: abc.local.domain
net0: virtio=xxxxxxxxxxxx,bridge=vmbr10
net1: virtio=xxxxxxxxxxxx,bridge=vmbr1000,firewall=1,tag=100
net2: virtio=xxxxxxxxxxxx,bridge=vmbr1000,firewall=1,tag=30
numa: 0
onboot: 1
ostype: l26
sata0: /dev/disk/by-id/ata-ST8000VN0022-2EL112_xxxxxxxx,backup=0
sata1: /dev/disk/by-id/ata-ST8000VN0022-2EL112_xxxxxxxx,backup=0
sata2: /dev/disk/by-id/ata-ST3000DM001-1ER166_xxxxxxxx,backup=0
sata3: /dev/disk/by-id/ata-ST3000DM001-1ER166_xxxxxxxx,backup=0
scsi0: local-lvm:vm-102-disk-0,size=16G
scsihw: virtio-scsi-pci
smbios1: uuid=xxxxxxxxxxxxxxxxxxxxxxxxx
sockets: 1
startup: order=3
 
udevadm[590]: systemd-udev-settle.service is deprecated. Please fix zfs-import-cache.service, zfs-import-scan.service not to pull it in.

Any relation to the problem?
No.

This is the TrueNAS VM config:
Okay, thank you. It is very strange indeed that those paths are changing.

Can you reboot your host multiple times and check for name changes? If they occur, what pattern do you observe?
 
Is your TrueNAS VM set to 'autostart' and is it the first VM to start on boot? If so, maybe set a startup delay to allow the host to fully initialise first?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!