I had the unfortunate scenario of completely filling the root zfs filesystem after a rogue process (my fault) filled /tmp with 22GB of junk. I noticed this within about 30mins and removed the file in /tmp. As expected, various processes had hit "no space left on device" problems and I thought it best to reboot the server. However upon reboot I was dropped to the recovery shell as the zpool wasn't found. Error message contained:
I ran "zfs import rpool" which seemed to work fine and showed both disks in my mirror as ONLINE. I exited the recovery shell and boot continued, but things weren't right and containers failed to start.
I rebooted again and this time no missing rpool on boot. However, once booted I can see one disk is now unavailable:
That's almost expected - it was on old early generation SSD and completely filling it would have stressed the hell out of it. Thankfully I'd paired it with a brand new disk.
Obviously I'll replace the failed disk, but my immediate issue seems to be that zfs filesystems other than root aren't mounted. i.e. /rpool, /rpool/data and the specific container images under it. This is why containers aren't starting: their images are just empty directories.
Not sure what the proper procedure is for making the system mount these on startup, as they were before, and would appreciate any help.
Some more info:
Note that I've previously created an extra zfs subvolume, appconfig. I've also got multiple snapshots of each subvol should I need to rollback.
Code:
cannot import 'rpool': no such pool or dataset.
I rebooted again and this time no missing rpool on boot. However, once booted I can see one disk is now unavailable:
Code:
# zpool status
pool: rpool
state: DEGRADED
status: One or more devices could not be used because the label is missing or
invalid. Sufficient replicas exist for the pool to continue
functioning in a degraded state.
action: Replace the device using 'zpool replace'.
scan: scrub repaired 0B in 0h8m with 0 errors on Sun Dec 24 04:52:41 2017
config:
NAME STATE READ WRITE CKSUM
rpool DEGRADED 0 0 0
mirror-0 DEGRADED 0 0 0
15895554979573075729 UNAVAIL 0 0 0 was /dev/sde2
ata-Samsung_SSD_850_EVO_250GB_S3NYNF0J886394T-part2 ONLINE 0 0 0
errors: No known data errors
That's almost expected - it was on old early generation SSD and completely filling it would have stressed the hell out of it. Thankfully I'd paired it with a brand new disk.
Obviously I'll replace the failed disk, but my immediate issue seems to be that zfs filesystems other than root aren't mounted. i.e. /rpool, /rpool/data and the specific container images under it. This is why containers aren't starting: their images are just empty directories.
Not sure what the proper procedure is for making the system mount these on startup, as they were before, and would appreciate any help.
Some more info:
Code:
# pveversion -v
proxmox-ve: 5.1-28 (running kernel: 4.13.8-2-pve)
pve-manager: 5.1-36 (running version: 5.1-36/131401db)
pve-kernel-4.13.4-1-pve: 4.13.4-26
pve-kernel-4.13.8-2-pve: 4.13.8-28
pve-kernel-4.10.17-4-pve: 4.10.17-24
pve-kernel-4.10.17-2-pve: 4.10.17-20
pve-kernel-4.10.15-1-pve: 4.10.15-15
pve-kernel-4.10.17-3-pve: 4.10.17-23
pve-kernel-4.10.17-1-pve: 4.10.17-18
libpve-http-server-perl: 2.0-6
lvm2: 2.02.168-pve6
corosync: 2.4.2-pve3
libqb0: 1.0.1-1
pve-cluster: 5.0-15
qemu-server: 5.0-17
pve-firmware: 2.0-3
libpve-common-perl: 5.0-20
libpve-guest-common-perl: 2.0-13
libpve-access-control: 5.0-7
libpve-storage-perl: 5.0-16
pve-libspice-server1: 0.12.8-3
vncterm: 1.5-2
pve-docs: 5.1-12
pve-qemu-kvm: 2.9.1-2
pve-container: 2.0-17
pve-firewall: 3.0-3
pve-ha-manager: 2.0-3
ksm-control-daemon: 1.2-2
glusterfs-client: 3.8.8-1
lxc-pve: 2.1.0-2
lxcfs: 2.0.7-pve4
criu: 2.11.1-1~bpo90
novnc-pve: 0.6-4
smartmontools: 6.5+svn4324-1
zfsutils-linux: 0.7.3-pve1~bpo9
Code:
# mount | grep zfs
rpool/ROOT/pve-1 on / type zfs (rw,relatime,xattr,noacl)
rpool/ROOT on /rpool/ROOT type zfs (rw,noatime,xattr,noacl)
Code:
# zfs list
NAME USED AVAIL REFER MOUNTPOINT
rpool 36.5G 21.1G 96K /rpool
rpool/ROOT 15.9G 21.1G 96K /rpool/ROOT
rpool/ROOT/pve-1 15.9G 21.1G 10.6G /
rpool/data 16.3G 21.1G 316K /rpool/data
rpool/data/appconfig 1.77G 21.1G 946M /rpool/data/appconfig
rpool/data/subvol-100-disk-1 2.50G 6.95G 1.05G /rpool/data/subvol-100-disk-1
rpool/data/subvol-102-disk-1 3.66G 21.1G 1.42G /rpool/data/subvol-102-disk-1
rpool/data/subvol-103-disk-1 500M 7.55G 463M /rpool/data/subvol-103-disk-1
rpool/data/vm-101-disk-1 4.42G 21.1G 3.94G -
rpool/data/vm-101-state-Base 320M 21.1G 320M -
rpool/data/vm-104-disk-1 3.17G 21.1G 3.05G -
rpool/swap 4.25G 21.8G 3.59G -
Note that I've previously created an extra zfs subvolume, appconfig. I've also got multiple snapshots of each subvol should I need to rollback.