Can't login and loss of zpool after abnormal system shutdown and server being off for 2 weeks

Davidoff

Well-Known Member
Nov 19, 2017
63
2
48
Hi there. About two weeks ago, a host bus adapter on my single node PVE server died. That seemed to throw things into chaos and my system become unresponsive - I could login or do anything even at a local console connected directly to the server. I ended up having to power down and up again, at which point I was thrown into emergency mode. I then saw that a lot of my drives had not been mounted, and figured out the HBA had failed (an LSI 9207-8i).

In any event, I finally received the replacement today and replaced the failed HBA. All the drives in /etc/fstab mounted just fine. However, I did encounter two issues:

First, while I can login just fine at the console or through SSL as root, I can no longer login to the webgui, either as root@pam or another user I had set up using PVE authentication. I have 2FA turned on. I checked and rechecked my user ID, password, and 2FA number and tried to login multiple times either as root@pam or user@pve, to no avail. Because I'm using 2FA, thought it might be something to do with the clock on the server. However, I checked the system time on the server and that on my phone and they look to be within one second of each other.

Code:
systemctl status pvedaemon
shows the following:

Code:
● pvedaemon.service - PVE API Daemon
     Loaded: loaded (/lib/systemd/system/pvedaemon.service; enabled; vendor preset: enabled)
     Active: active (running) since Wed 2022-10-05 11:21:38 EDT; 5h 44min ago
    Process: 6945 ExecStart=/usr/bin/pvedaemon start (code=exited, status=0/SUCCESS)
   Main PID: 7049 (pvedaemon)
      Tasks: 4 (limit: 154579)
     Memory: 207.7M
        CPU: 4.640s
     CGroup: /system.slice/pvedaemon.service
             ├─7049 pvedaemon
             ├─7050 pvedaemon worker
             ├─7051 pvedaemon worker
             └─7052 pvedaemon worker

Oct 05 16:23:17 fava2 pvedaemon[7051]: authentication failure; rhost=::ffff:10.0.10.1 user=root@pam msg=Authentication failure
Oct 05 16:23:45 fava2 pvedaemon[7052]: authentication failure; rhost=::ffff:10.0.10.1 user=user@pve msg=invalid credentials

I tried the suggestions in this post, but none of the suggestions there worked. I suspect it has something to do with 2FA, but don't know how exactly to diagnose or resolve the problem.

Second, I had manually created a RAID1 zpool through SSH on two of the drives that were attached through the replaced HBA using this command:

Code:
zpool create -f zdata mirror /dev/disk/by-id/[id of first drive] /dev/disk/by-id/[id of first drive]

I can see the two drives in
Code:
lsblk
. They show as unmounted. But I don't see the zpool when I run
Code:
zpool status
. The mountpoint /zdata is still there, but not surprisingly it shows as being empty. I'm a bit nervous about this as the pool contained some rather important data. Most of it is backed up, but would strongly prefer some method of reinstating the pool with the existing drives if at all possible, as there's several terabytes of data and recovery from cloud backups will take a long, long time. If anyone has any suggestions as to how to properly restore the zpool, I'd be most grateful.

I'd perhaps also be interested if anyone has any comments on ZFS and why it did not automatically restore that zpool. The other drives I had which are mounted through /etc/fstab mounted just fine. It's just the ZFS pool that seems to have disappeared.

Any advice would be most appreciated. I'm running PVE 7.2-7.
 
What do you see when you run zpool import?
Oh boy. Do I feel silly. Welp, there it is:

Code:
   pool: zdata
     id: 7502400372626375659
  state: ONLINE
status: Some supported features are not enabled on the pool.
        (Note that they may be intentionally disabled if the
        'compatibility' property is set.)
 action: The pool can be imported using its name or numeric identifier, though
        some features will not be available without an explicit 'zpool upgrade'.
 config:

        zdata                       ONLINE
          mirror-0                  ONLINE
            wwn-0x5000c500a9cb5cda  ONLINE
            wwn-0x5000c500aa1e0539  ONLINE

I suppose all that needs to be done is to run zpool import zdata?
 
Code:
   pool: zdata
     id: 7502400372626375659
  state: ONLINE
status: Some supported features are not enabled on the pool.
        (Note that they may be intentionally disabled if the
        'compatibility' property is set.)
 action: The pool can be imported using its name or numeric identifier, though
        some features will not be available without an explicit 'zpool upgrade'.
 config:

        zdata                       ONLINE
          mirror-0                  ONLINE
            wwn-0x5000c500a9cb5cda  ONLINE
            wwn-0x5000c500aa1e0539  ONLINE

I suppose all that needs to be done is to run zpool import zdata?
Probably, as the pool looks healthy.
 
  • Like
Reactions: Davidoff
I suspect it has something to do with 2FA, but don't know how exactly to diagnose or resolve the problem.
Problems with time-based one time password is the time and after a reboot, often the time is not 100% correct and that can mess with 2FA. Have you checked that?
 
  • Like
Reactions: Davidoff
Problems with time-based one time password is the time and after a reboot, often the time is not 100% correct and that can mess with 2FA. Have you checked that?
Yes, I did check the time on the server and the device generating the 2FA, and they looked to be within a second of each other. Also checked that the server was syncing with NTP servers. After digging around a bit more and trying to figure out what was wrong, out of frustration I just rebooted the bloody thing. And now it works. I dunno. Appreciate the suggestion.
 
You should check why PVE won't automatically import it on boot. Check at Datacenter -> Storage if there is your ZFSPool storage available and enabled.
Thanks for the suggestion. I manually imported the pool using zpool import zdata. Originally I had created it manually from the command line, not in Proxmox. It doesn't show up in the GUI at Datacenter -> Storage. Would that be something I should be concerned about? Currently I don't use that particular pool for anything within Proxmox itself (e.g. to store images or containers). Would it be advisable to add it in the GUI, or would that only be warranted if I decided to use it as container/image storage?
 
Thanks for the suggestion. I manually imported the pool using zpool import zdata. Originally I had created it manually from the command line, not in Proxmox. It doesn't show up in the GUI at Datacenter -> Storage. Would that be something I should be concerned about?
Should be fine long as you don't want to store any virtual disks on it. But PVE will then not manage it, so you have to do stuff like importing the pool yourself. And I'm not sure if PVE will scrub the pool when its not managed by PVE. You might want to check that (for example run zpool status and verify that last scrub isn't longer than 3 months ago).
 
  • Like
Reactions: Davidoff
Should be fine long as you don't want to store any virtual disks on it. But PVE will then not manage it, so you have to do stuff like importing the pool yourself. And I'm not sure if PVE will scrub the pool when its not managed by PVE. You might want to check that (for example run zpool status and verify that last scrub isn't longer than 3 months ago).
Thanks - appreciate the suggestions. Will keep an eye on it to check for scrubbing.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!