pvestatd: zfs error: cannot open 'pool_sata': no such pool

I have a striped mirror pool consisting of 4 SATA disks that is not mounted during boot. Syslog gives the error message in the subject above. The error appears to be that for some reason ZFS is taking long to read/import/... the pools. Because after the system has booted, after I manually open a terminal, I can mount the dataset manually with

Code:
zfs mount pool_sata/netstore

The problem started after I had to apply changes to this pool: Because of issues with one kind of disks (Samsung 860 Pro SSDs producing strange data integrity errors), I had to replace one mirror with another one, consisting of disks from another vendor. For this purpose, I removed the vdev with the Samsung drives, and added a new mirror with other vendor drives. Since then, the pool is not producing any data integrity errors anymore. But I have to manually mount the pool (or run a script after boot to mount it).

Pool status:

Code:
  pool: pool_sata
 state: ONLINE
  scan: scrub repaired 0B in 0 days 00:10:51 with 0 errors on Sun Dec 13 00:34:53 2020
remove: Removal of vdev 1 copied 129G in 0h4m, completed on Thu Dec 10 17:43:00 2020
    1.12M memory used for removed device mappings

config:

        NAME                                             STATE     READ WRITE CKSUM
        pool_sata                                        ONLINE       0     0     0
          mirror-0                                       ONLINE       0     0     0
            ata-KINGSTON_SEDC500M1920G_50026B76830C8417  ONLINE       0     0     0
            scsi-SATA_KINGSTON_SEDC500_50026B768355CADF  ONLINE       0     0     0
          mirror-6                                       ONLINE       0     0     0
            scsi-SATA_KINGSTON_SEDC500_50026B7683976F0C  ONLINE       0     0     0
            ata-KINGSTON_SEDC500M1920G_50026B7683976E32  ONLINE       0     0     0

errors: No known data errors

no errors in dmesg:

Code:
# dmesg | grep ZFS
[    0.000000] Command line: initrd=\EFI\proxmox\5.4.78-2-pve\initrd.img-5.4.78-2-pve root=ZFS=rpool/ROOT/pve-1 boot=zfs amd_iommu=on mitigations=off
[    0.712114] Kernel command line: initrd=\EFI\proxmox\5.4.78-2-pve\initrd.img-5.4.78-2-pve root=ZFS=rpool/ROOT/pve-1 boot=zfs amd_iommu=on mitigations=off
[   13.598597] ZFS: Loaded module v0.8.5-pve1, ZFS pool version 5000, ZFS filesystem version 5

no helpful error messages in syslog:

Code:
Dec 27 10:23:54 zeus kernel: [    0.000000] Command line: initrd=\EFI\proxmox\5.4.78-2-pve\initrd.img-5.4.78-2-pve root=ZFS=rpool/ROOT/pve-1 boot=zfs amd_iommu=on mitigations=off
Dec 27 10:23:54 zeus kernel: [    0.712114] Kernel command line: initrd=\EFI\proxmox\5.4.78-2-pve\initrd.img-5.4.78-2-pve root=ZFS=rpool/ROOT/pve-1 boot=zfs amd_iommu=on mitigations=off
Dec 27 10:24:09 zeus pvestatd[4848]: zfs error: cannot open 'pool_sata': no such pool#012

The reason for this appears to be that ZFS is slow in mounting datasets during boot, because immediately fter the boot, a script running "zfs list" shows:

Code:
NAME               USED  AVAIL     REFER  MOUNTPOINT
pool_opt           209G   465G      208K  /mnt/zfs_opt
pool_opt/VMs       168G   465G      163G  /mnt/zfs_opt/VMs
pool_opt/mail     41.2G   465G     41.2G  /mnt/zfs_opt/mail
rpool             7.94G   422G      104K  /rpool
rpool/ROOT        7.94G   422G       96K  /rpool/ROOT
rpool/ROOT/pve-1  7.94G   422G     7.94G  /
rpool/data          96K   422G       96K  /rpool/data

About 10 seconds later, zfs list shows all datasets:

Code:
NAME                         USED  AVAIL     REFER  MOUNTPOINT
pool_opt                     209G   465G      208K  /mnt/zfs_opt
pool_opt/VMs                 168G   465G      163G  /mnt/zfs_opt/VMs
pool_opt/mail               41.2G   465G     41.2G  /mnt/zfs_opt/mail
pool_sata                    244G  3.12T      192K  /mnt/zfs_sata
pool_sata/netstore           244G  3.12T      244G  /mnt/zfs_sata/netstore
pool_storage                14.2T  19.7T     18.9G  /mnt/zfs_storage
pool_storage/backup         1.48T  19.7T     1.48T  /mnt/zfs_storage/backup
pool_storage/jail           14.6G  19.7T     14.6G  /mnt/zfs_storage/jail
pool_storage/media          12.5T  19.7T     12.5T  /mnt/zfs_storage/media
pool_storage/server_data    46.5G  19.7T     46.5G  /mnt/zfs_storage/server64_data
rpool                       7.94G   422G      104K  /rpool
rpool/ROOT                  7.94G   422G       96K  /rpool/ROOT
rpool/ROOT/pve-1            7.94G   422G     7.94G  /
rpool/data                    96K   422G       96K  /rpool/data

Other than running a script after boot to mount this pool manually, how can I fix this? How can I find out what ZFS is doing so long after boot?

Thanks!
 
I don't think this has anything to do with mount of ZFS volumes.
ZFS list can only show results when the underlying pool is there. And for whatever reasons that seems to take some time.

The question to me is what "zpool list" would show.
Could you show output of this command during boot?
Once we have a timestamp we can compare messages ins "syslog" file

Also what devices do you use?
It seems some are reported as at a, others as SCSI. What's the reason behind of it?
 
What I have is the output of "zpool status" from the same time during boot:

Code:
  pool: pool_opt
 state: ONLINE
  scan: scrub repaired 0B in 0 days 00:02:43 with 0 errors on Sun Dec 13 00:26:44 2020
config:

        NAME                                             STATE     READ WRITE CKSUM
        pool_opt                                         ONLINE       0     0     0
          mirror-0                                       ONLINE       0     0     0
            nvme-INTEL_SSDPE21K750GA_PHKE913000R3750BGN  ONLINE       0     0     0
            nvme-INTEL_SSDPE21K750GA_PHKE913000TH750BGN  ONLINE       0     0     0

errors: No known data errors

  pool: rpool
 state: ONLINE
  scan: scrub repaired 0B in 0 days 00:00:09 with 0 errors on Sun Dec 13 00:24:13 2020
config:

        NAME                                                 STATE     READ WRITE CKSUM
        rpool                                                ONLINE       0     0     0
          mirror-0                                           ONLINE       0     0     0
            nvme-eui.0026b728251d09c50000000000000000-part3  ONLINE       0     0     0
            nvme-eui.0026b728251d0c150000000000000000-part3  ONLINE       0     0     0

errors: No known data errors

To me, this shows that ZFS is taking it's time to bring the other two pools (pool_storage and pool_sata) online. And pool_sata is the one in that I applied changes. Question is why it is taking so long.

The devices in this pool are Kingston DC500 SSDs. As to the naming of the disks for ZFS:

Code:
# ls /dev/disk/by-id/ | grep KINGSTON | grep -v nvme
ata-KINGSTON_SEDC500M1920G_50026B76830C8417
ata-KINGSTON_SEDC500M1920G_50026B76830C8417-part1
ata-KINGSTON_SEDC500M1920G_50026B76830C8417-part9
ata-KINGSTON_SEDC500M1920G_50026B7683976E32
ata-KINGSTON_SEDC500M1920G_50026B7683976E32-part1
ata-KINGSTON_SEDC500M1920G_50026B7683976E32-part9
scsi-0ATA_KINGSTON_SEDC500_50026B7683976E32
scsi-0ATA_KINGSTON_SEDC500_50026B7683976E32-part1
scsi-0ATA_KINGSTON_SEDC500_50026B7683976E32-part9
scsi-1ATA_KINGSTON_SEDC500M1920G_50026B76830C8417
scsi-1ATA_KINGSTON_SEDC500M1920G_50026B76830C8417-part1
scsi-1ATA_KINGSTON_SEDC500M1920G_50026B76830C8417-part9
scsi-1ATA_KINGSTON_SEDC500M1920G_50026B7683976E32
scsi-1ATA_KINGSTON_SEDC500M1920G_50026B7683976E32-part1
scsi-1ATA_KINGSTON_SEDC500M1920G_50026B7683976E32-part9
scsi-SATA_KINGSTON_SEDC500_50026B76830C8417
scsi-SATA_KINGSTON_SEDC500_50026B76830C8417-part1
scsi-SATA_KINGSTON_SEDC500_50026B76830C8417-part9
scsi-SATA_KINGSTON_SEDC500_50026B768355CADF
scsi-SATA_KINGSTON_SEDC500_50026B768355CADF-part1
scsi-SATA_KINGSTON_SEDC500_50026B768355CADF-part9
scsi-SATA_KINGSTON_SEDC500_50026B7683976E32
scsi-SATA_KINGSTON_SEDC500_50026B7683976E32-part1
scsi-SATA_KINGSTON_SEDC500_50026B7683976E32-part9
scsi-SATA_KINGSTON_SEDC500_50026B7683976F0C
scsi-SATA_KINGSTON_SEDC500_50026B7683976F0C-part1
scsi-SATA_KINGSTON_SEDC500_50026B7683976F0C-part9

I.e. they are all available as names with "scsi". Two of them also with "ata" in front. I believe this is because two are connected to the onboard controller, two other to the HBA. This setup was identical with the two previous SSDs that were replaced.
 
To me, this shows that ZFS is taking it's time to bring the other two pools (pool_storage and pool_sata) online. And pool_sata is the one in that I applied changes. Question is why it is taking so long.
Did you replace the the underlying devices (vdevs) or have you removed the mirror itself? The process of removing mirrors is quite "interesting" from what I read and involves remapping of block addresses. That could explain why it takes longer to import.


I believe this is because two are connected to the onboard controller, two other to the HBA.
That makes sense.
 
Did you replace the the underlying devices (vdevs) or have you removed the mirror itself? The process of removing mirrors is quite "interesting" from what I read and involves remapping of block addresses. That could explain why it takes longer to import.
I think I did something like
Code:
zpool remove pool_sata mirror-1
and then
Code:
zpool add mirror <device1> <device2>

As you can see from the pool status above. Removing the mirror worked, ZFS reallocated the data. Then the new mirror was added. However, that reallocation ended immediately, so I don't think this should be taking any extra time now, several weeks after the replacement?

Also as you can see, the pool (4x1.92TB) is pretty empty, only 244 GB occupied.
 
I don't think this should be taking any extra time now, several weeks after the replacement?
I can't judge... Its the only thing that came to my mind. I have read how the "remove" feature was implemented when it was introduced and I do remember that I found the approach rather unappealing. This is the reason why I personally have removed that from an option list. I ended up creating a new pool and do a zfs-send/receive for the data affected. It didn't take much time (roughly 300GB) and the downtime didn't really bother me.
If you really don't use that pool yet I'd see this as a valid option. I have no other idea than that. Sorry
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!