PVE Doing Something Strange to SATA Drives

sudoblue · Apr 24, 2022

Background:

System is a Supermicro 5018D-FN4T.
I have a VM TrueNAS, which relies on SATA drives being passed through /dev/disk/by-id. The filesystem of the drives is ZFS.
Everything was working until one day I shutdown the VM.
Upon reboot of the host the drives are confirmed attached in the BIOS. No hardware changes have been made.
However, once pve loads the disks don't show up in the GUI nor fdisk -l. Obviously, the VM fails to start.

In the console prior to the login screen, the following is displayed:

Found volume group "pve" using metadata type lvm2
6 logical volume(s) in volume group "pve" now active
/dev/mapper/pve-root: clean, 86164/6291456 files, 3217668/25165824 blocks
[FAILED] Failed to start Import ZFS pools by device scanning.

Questions:
1. What could possibly be going on?
2. One subsequent reboot after many attempts, the drives did show up, but I changed nothing. Why is this happening intermittently?

LnxBil · Apr 25, 2022

sudoblue said:
1. What could possibly be going on?

From what you wrote and the error message indicates that it detects the ZFS pool on the disks and is unable to import them. If you only have ZFS inside of your TrueNAS VM, try to disable all ZFS stuff on the PVE side.

sudoblue · Apr 25, 2022

Thank you for the reply @LnxBil

I confirmed:
1. there are no zfs pools in /etc/pve/storage.cfg.
2. zfs list returns no value.

I'd rather not disable a module from pve, if I don't have to.

Is there a way to exclude the drives from its scan?

LnxBil · Apr 25, 2022

sudoblue said:
Is there a way to exclude the drives from its scan?

Not that I know of.

If your console is just displaying, that you "failed to import", then just ignore it, or disable the systemd unit zfs-import-scan.service.

bobmc · Apr 25, 2022

how many drives were being passed through to TrueNAS?

on the proxmox host, what does the output of lsblk look like?

on the proxmox host, what does zpool import return?

sudoblue · Apr 25, 2022

@bobmc Thank you for your insight.

There are four drives passed to the TrueNAS VM, but I'm only using the two 8TB drives ATM.

zpool import returns:

Code:

pool: ds-truenas
id: 12345678901234567890
state: ONLINE
status: The pool was last accessed by another system.
action: The pool can be imported using its name or numeric identifier and the '-f' flag.
see: https://openzfs.github.io/openzfs-docs/msg/NFS-8000-EY
config:
ds-truenas ONLINE
mirror-0 ONLINE
sdc2 ONLINE
sdd2 ONLINE

lsblk returns:

Code:

NAME       MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda          8:0    0   2.7T  0 disk 
├─sda1       8:1    0   200M  0 part 
└─sda2       8:2    0   2.7T  0 part 
sdb          8:16   0   2.7T  0 disk 
├─sdb1       8:17   0   2.4G  0 part 
├─sdb2       8:18   0     2G  0 part 
└─sdb3       8:19   0   2.7T  0 part 
sdc          8:32   0   7.3T  0 disk 
├─sdc1       8:33   0     2G  0 part 
└─sdc2       8:34   0   7.3T  0 part 
sdd          8:48   0   7.3T  0 disk 
├─sdd1       8:49   0     2G  0 part 
└─sdd2       8:50   0   7.3T  0 part 
sde          8:64   0 465.8G  0 disk 
├─sde1       8:65   0  1007K  0 part 
├─sde2       8:66   0   512M  0 part /boot/efi
└─sde3       8:67   0 465.3G  0 part 
  ├─pve-swap
  │        253:0    0     8G  0 lvm  [SWAP]
  ├─pve-root
  │        253:1    0    96G  0 lvm  /
  ├─pve-data_tmeta
  │        253:2    0   3.5G  0 lvm  
  │ └─pve-data-tpool
  │        253:4    0 338.4G  0 lvm  
  │   ├─pve-data
  │   │    253:5    0 338.4G  1 lvm  
  │   ├─pve-vm--101--disk--0
  │   │    253:6    0    32G  0 lvm  
  │   ├─pve-vm--102--disk--0
  │   │    253:7    0    16G  0 lvm  
  │   └─pve-vm--100--disk--0
  │        253:8    0    32G  0 lvm  
  └─pve-data_tdata
           253:3    0 338.4G  0 lvm  
    └─pve-data-tpool
           253:4    0 338.4G  0 lvm  
      ├─pve-data
      │    253:5    0 338.4G  1 lvm  
      ├─pve-vm--101--disk--0
      │    253:6    0    32G  0 lvm  
      ├─pve-vm--102--disk--0
      │    253:7    0    16G  0 lvm  
      └─pve-vm--100--disk--0
           253:8    0    32G  0 lvm

sudoblue · Apr 25, 2022

LnxBil said:
If your console is just displaying, that you "failed to import", then just ignore it, or disable the systemd unit zfs-import-scan.service.

FMSA, I did some searching, and I came to the following conclusion supported by the two premises below.

source: https://github.com/openzfs/zfs/issues/4325

behlendorf commented on Feb 12, 2016

The cache file is one way to configure the system, it's not definitively the right way or the only way. The current systemd behavior is to prefer the cache file if it exists otherwise fall back to scanning.

ilovezfs commented on Feb 12, 2016

I think it's hard to justify a naked, automated zpool import -a under any scenario even with -d /dev/disk/by-id. Maybe if it forced the mountpoint property to be entirely ignored, it would be something other than scary.

If the information above is valid, the zpool import task could be prepended by writing something in /etc/zfs/zpool.cache , I just don't know what. I don't have a cache file currently.

Do you know what should be in the file?

bobmc · Apr 25, 2022

So that's looking positive, drives are available and the zfs pool appears to be intact.

What happens when you start the TrueNAS VM? Anything in the logs?

LnxBil · Apr 26, 2022

sudoblue said:
If the information above is valid, the zpool import task could be prepended by writing something in /etc/zfs/zpool.cache , I just don't know what. I don't have a cache file currently.

Do you know what should be in the file?

IMHO, this is now that you're shooting for. You don't want to import the pool in your host, therefore there should not be any configuration present and any software trying to access things should be disabled.

sudoblue · Apr 26, 2022

LnxBil said:
IMHO, this is now that you're shooting for. You don't want to import the pool in your host, therefore there should not be any configuration present and any software trying to access things should be disabled.

Copy that. I agree with your logic.

bobmc said:
So that's looking positive, drives are available and the zfs pool appears to be intact.

What happens when you start the TrueNAS VM? Anything in the logs?

I looked at syslog with less /var/log/syslog. I could see smartd finding the drives as the host boots, but nothing was logged as the VMs were spinning up. Is there another place to look?

LnxBil · Apr 26, 2022

sudoblue said:
I looked at syslog with less /var/log/syslog. I could see smartd finding the drives as the host boots, but nothing was logged as the VMs were spinning up. Is there another place to look?

Does the TrueNAS VM starts or not? If not, there is an red log entry in the GUI (bottom) where the actual error may be visibe.

sudoblue · Apr 26, 2022

the VM starts and pulls the drives in. The error when it doesn't have them is

Task viewer: VM 102 - Start

OutputStatus

Stop
kvm: -drive file=/dev/disk/by-id/ata-ST8000VN0022-2EL112_xxxxxxx,if=none,id=drive-sata0,format=raw,cache=none,aio=io_uring,detect-zeroes=on: Could not open '/dev/disk/by-id/ata-ST8000VN0022-2EL112_xxxxxxxx': No such file or directory
TASK ERROR: start failed: QEMU exited with code 1

LnxBil said:
Does the TrueNAS VM starts or not? If not, there is an red log entry in the GUI (bottom) where the actual error may be visibe.

bobmc · Apr 26, 2022

on the proxmox host, does ls /dev/disk/by-id show a drive id that matches the one in the error report?

sudoblue · Apr 26, 2022

bobmc said:
on the proxmox host, does ls /dev/disk/by-id show a drive id that matches the one in the error report?

I think we're getting signals crossed. Yes, sometimes they do show up.

sudoblue said:
2. One subsequent reboot after many attempts, the drives did show up, but I changed nothing. Why is this happening intermittently?

When the above happens, the drives attach to the VM, and they do show up in ls /dev/disk/by-id
Whether they attach or not involves their import by pve, being seen as ZFS drives.

bobmc · Apr 26, 2022

I think what needs to be established is if the 'by-id' disk identifier is consistent between reboots (my understanding is that it is expected to be unchanged). If it is consistent, then what needs to be understood is why pass-through works sometimes but not others.

Alternatively, you could try using 'by-uuid' instead?

LnxBil · Apr 27, 2022

In addition to @bobmc 's suggestions, could you also please share your VM settings concerning the passthrough?

sudoblue · Apr 27, 2022

I shutdown the host, read the logs and saw this:

udevadm[590]: systemd-udev-settle.service is deprecated. Please fix zfs-import-cache.service, zfs-import-scan.service not to pull it in.

Any relation to the problem?

This is the TrueNAS VM config:

Code:

balloon: 0
boot: order=scsi0
cores: 8
cpu: host
memory: 65536
name: abc.local.domain
net0: virtio=xxxxxxxxxxxx,bridge=vmbr10
net1: virtio=xxxxxxxxxxxx,bridge=vmbr1000,firewall=1,tag=100
net2: virtio=xxxxxxxxxxxx,bridge=vmbr1000,firewall=1,tag=30
numa: 0
onboot: 1
ostype: l26
sata0: /dev/disk/by-id/ata-ST8000VN0022-2EL112_xxxxxxxx,backup=0
sata1: /dev/disk/by-id/ata-ST8000VN0022-2EL112_xxxxxxxx,backup=0
sata2: /dev/disk/by-id/ata-ST3000DM001-1ER166_xxxxxxxx,backup=0
sata3: /dev/disk/by-id/ata-ST3000DM001-1ER166_xxxxxxxx,backup=0
scsi0: local-lvm:vm-102-disk-0,size=16G
scsihw: virtio-scsi-pci
smbios1: uuid=xxxxxxxxxxxxxxxxxxxxxxxxx
sockets: 1
startup: order=3

LnxBil · Apr 27, 2022

sudoblue said:
udevadm[590]: systemd-udev-settle.service is deprecated. Please fix zfs-import-cache.service, zfs-import-scan.service not to pull it in.

Any relation to the problem?

No.

sudoblue said:
This is the TrueNAS VM config:

Okay, thank you. It is very strange indeed that those paths are changing.

Can you reboot your host multiple times and check for name changes? If they occur, what pattern do you observe?

sudoblue · Apr 28, 2022

I've rebooted many times. The by-id names never change.

LnxBil said:
Can you reboot your host multiple times and check for name changes? If they occur, what pattern do you observe?

bobmc · Apr 28, 2022

Is your TrueNAS VM set to 'autostart' and is it the first VM to start on boot? If so, maybe set a startup delay to allow the host to fully initialise first?

PVE Doing Something Strange to SATA Drives

New Member

Distinguished Member

New Member

Distinguished Member

Renowned Member

New Member

New Member

behlendorf commented on Feb 12, 2016​

ilovezfs commented on Feb 12, 2016​

Renowned Member

Distinguished Member

New Member

Distinguished Member

New Member

Renowned Member

New Member

Renowned Member

Distinguished Member

New Member

Distinguished Member

New Member

Renowned Member

behlendorf commented on Feb 12, 2016

ilovezfs commented on Feb 12, 2016