Help Moving Storage to ZFS // Docker not working!

phrankme

Member
Sep 1, 2020
9
3
6
44
Hi all!

I extended my Proxmox setup to a cluster with three nodes. And (!) I added SSDs for ZFS. For all my LXCs and VMs I moved the root disks from local-lvm to my ZFS (data). Everything works well, also replication!

Except DOCKER! :( I can move my docker volume in Proxmox to NFS or any other local disk and it works. If I move it to ZFS I get this error:
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?

DOCKER is simply not running! What exactly am I missing here??


Detailed information below

Bash:
root@reactorlab-3:~# pct config 104
arch: amd64
cores: 4
description: Docker for IoT #20 only%3A%0A- homebridge%0A
features: keyctl=1,nesting=1
hostname: vaultboy
memory: 4048
net0: name=eth0,bridge=vmbr0,firewall=1,gw=10.20.0.1,hwaddr=0E:1B:F7:F4:83:42,ip=10.20.0.111/24,ip6=dhcp,tag=20,type=veth
onboot: 1
ostype: ubuntu
rootfs: local-lvm:vm-104-disk-0,size=64G
swap: 4048
unprivileged: 1
root@reactorlab-3:~#

Bash:
root@reactorlab-3:~# cat /etc/pve/storage.cfg
dir: local
        path /var/lib/vz
        content images,iso,vztmpl,snippets
        shared 0

lvmthin: local-lvm
        thinpool data
        vgname pve
        content rootdir,images

nfs: Proxmox
        export /volume1/proxmox
        path /mnt/pve/Proxmox
        server 10.10.0.6
        content vztmpl,snippets,iso,backup,rootdir,images
        prune-backups keep-last=1

nfs: Backup
        export /volume1/_backup
        path /mnt/pve/Backup
        server 10.10.0.6
        content backup
        prune-backups keep-last=2

zfspool: data
        pool data
        content rootdir,images
        mountpoint /data
        nodes reactorlab-2,reactorlab-3
        sparse 0

root@reactorlab-3:~#

Bash:
root@reactorlab-3:~# zpool status
  pool: data
 state: ONLINE
config:

        NAME                                         STATE     READ WRITE CKSUM
        data                                         ONLINE       0     0     0
          mirror-0                                   ONLINE       0     0     0
            ata-WDC_WDS100T1R0A-...  ONLINE       0     0     0
            ata-WDC_WDS100T1R0A-...  ONLINE       0     0     0

errors: No known data errors

Bash:
root@reactorlab-3:~# zfs list
NAME                     USED  AVAIL     REFER  MOUNTPOINT
data                    1.91G   897G      104K  /data
data/subvol-100-disk-0  1.91G  6.10G     1.90G  /data/subvol-100-disk-0

Bash:
root@reactorlab-3:~# mount
sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime)
proc on /proc type proc (rw,relatime)
udev on /dev type devtmpfs (rw,nosuid,relatime,size=16103268k,nr_inodes=4025817,mode=755,inode64)
devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000)
tmpfs on /run type tmpfs (rw,nosuid,nodev,noexec,relatime,size=3227392k,mode=755,inode64)
/dev/mapper/pve-root on / type ext4 (rw,relatime,errors=remount-ro)
securityfs on /sys/kernel/security type securityfs (rw,nosuid,nodev,noexec,relatime)
tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev,inode64)
tmpfs on /run/lock type tmpfs (rw,nosuid,nodev,noexec,relatime,size=5120k,inode64)
cgroup2 on /sys/fs/cgroup type cgroup2 (rw,nosuid,nodev,noexec,relatime)
pstore on /sys/fs/pstore type pstore (rw,nosuid,nodev,noexec,relatime)
efivarfs on /sys/firmware/efi/efivars type efivarfs (rw,nosuid,nodev,noexec,relatime)
bpf on /sys/fs/bpf type bpf (rw,nosuid,nodev,noexec,relatime,mode=700)
systemd-1 on /proc/sys/fs/binfmt_misc type autofs (rw,relatime,fd=30,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=27827)
hugetlbfs on /dev/hugepages type hugetlbfs (rw,relatime,pagesize=2M)
mqueue on /dev/mqueue type mqueue (rw,nosuid,nodev,noexec,relatime)
debugfs on /sys/kernel/debug type debugfs (rw,nosuid,nodev,noexec,relatime)
tracefs on /sys/kernel/tracing type tracefs (rw,nosuid,nodev,noexec,relatime)
fusectl on /sys/fs/fuse/connections type fusectl (rw,nosuid,nodev,noexec,relatime)
configfs on /sys/kernel/config type configfs (rw,nosuid,nodev,noexec,relatime)
sunrpc on /run/rpc_pipefs type rpc_pipefs (rw,relatime)
/dev/nvme0n1p2 on /boot/efi type vfat (rw,relatime,fmask=0022,dmask=0022,codepage=437,iocharset=iso8859-1,shortname=mixed,errors=remount-ro)
lxcfs on /var/lib/lxcfs type fuse.lxcfs (rw,nosuid,nodev,relatime,user_id=0,group_id=0,allow_other)
/dev/fuse on /etc/pve type fuse (rw,nosuid,nodev,relatime,user_id=0,group_id=0,default_permissions,allow_other)
10.10.0.6:/volume1/_backup on /mnt/pve/Backup type nfs4 (rw,relatime,vers=4.0,rsize=131072,wsize=131072,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=10.0.0.62,local_lock=none,addr=10.10.0.6)
10.10.0.6:/volume1/proxmox on /mnt/pve/Proxmox type nfs4 (rw,relatime,vers=4.0,rsize=131072,wsize=131072,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=10.0.0.62,local_lock=none,addr=10.10.0.6)
binfmt_misc on /proc/sys/fs/binfmt_misc type binfmt_misc (rw,nosuid,nodev,noexec,relatime)
data on /data type zfs (rw,xattr,noacl)
tmpfs on /run/user/0 type tmpfs (rw,nosuid,nodev,relatime,size=3227388k,nr_inodes=806847,mode=700,inode64)
data/subvol-100-disk-0 on /data/subvol-100-disk-0 type zfs (rw,xattr,posixacl)

Bash:
root@reactorlab-3:~# zfs get all zpool
cannot open 'zpool': dataset does not exist
 
Last edited:
I have the same issue - I backed up from a node using LVm and restored onto a node with zfs storage

Bash:
root@docker-portainer:~# docker info
Client:
 Context:    default
 Debug Mode: false
 Plugins:
  app: Docker App (Docker Inc., v0.9.1-beta3)
  buildx: Docker Buildx (Docker Inc., v0.9.1-docker)
  compose: Docker Compose (Docker Inc., v2.12.2)
  scan: Docker Scan (Docker Inc., v0.21.0)

Server:
ERROR: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
errors pretty printing info

Bash:
root@docker-portainer:~# docker ps
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?

Where it was restored to:
Bash:
root@thin:~# pct config 1311
arch: amd64
cores: 2
features: nesting=1
hostname: docker-portainer
lock: backup
memory: 1500
net0: name=eth0,bridge=vmbr1,firewall=1,hwaddr=0A:74:0D:67:86:36,ip=dhcp,tag=30,type=veth
onboot: 0
ostype: debian
rootfs: zfs-on-thin:subvol-1311-disk-0,size=30G
swap: 512
unprivileged: 1


Where it came from
Bash:
root@hpnote:~# pct config 1311
arch: amd64
cores: 2
features: nesting=1
hostname: docker-portainer
memory: 1500
net0: name=eth0,bridge=vmbr1,firewall=1,hwaddr=0A:74:0D:67:86:36,ip=dhcp,tag=30,type=veth
onboot: 1
ostype: debian
rootfs: local:1311/vm-1311-disk-0.raw,size=30G
swap: 512
unprivileged: 1

Bash:
root@thin:~# cat /etc/pve/storage.cfg
dir: local
        path /var/lib/vz
        content vztmpl,backup,iso

lvmthin: local-lvm
        thinpool data
        vgname pve
        content rootdir,images

zfspool: zfs-on-thin
        pool zfs-on-thin
        content rootdir,images
        mountpoint /zfs-on-thin
        nodes thin
        sparse 1

nfs: backup
        export /mnt/HD/HD_b2/xfer/proxmox
        path /mnt/pve/backup
        server 192.168.1.16
        content vztmpl,backup,snippets,iso
        prune-backups keep-all=1

Bash:
root@thin:~# zpool status
  pool: zfs-on-thin
 state: ONLINE
config:

        NAME                                    STATE     READ WRITE CKSUM
        zfs-on-thin                             ONLINE       0     0     0
          ata-LITEON_CV3-8D256-HP_0027321000SS  ONLINE       0     0     0

errors: No known data errors

docker daemon failed to start, not running

Code:
root@docker-portainer:~# systemctl status docker
* docker.service - Docker Application Container Engine
     Loaded: loaded (/lib/systemd/system/docker.service; enabled; vendor preset: enabled)
     Active: failed (Result: exit-code) since Sun 2022-11-27 23:20:14 UTC; 1min 18s ago
TriggeredBy: * docker.socket
       Docs: https://docs.docker.com
    Process: 325 ExecStart=/usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock (code=exited, status=1/FAILURE)
   Main PID: 325 (code=exited, status=1/FAILURE)
        CPU: 126ms

Nov 27 23:20:12 docker-portainer dockerd[325]: time="2022-11-27T23:20:12.842303343Z" level=error msg="[graphdriver] prior storage driver overlay2 failed: driver not supported"
Nov 27 23:20:12 docker-portainer dockerd[325]: failed to start daemon: error initializing graphdriver: driver not supported
Nov 27 23:20:12 docker-portainer systemd[1]: docker.service: Main process exited, code=exited, status=1/FAILURE
Nov 27 23:20:12 docker-portainer systemd[1]: docker.service: Failed with result 'exit-code'.
Nov 27 23:20:12 docker-portainer systemd[1]: Failed to start Docker Application Container Engine.
Nov 27 23:20:14 docker-portainer systemd[1]: docker.service: Scheduled restart job, restart counter is at 3.
Nov 27 23:20:14 docker-portainer systemd[1]: Stopped Docker Application Container Engine.
Nov 27 23:20:14 docker-portainer systemd[1]: docker.service: Start request repeated too quickly.
Nov 27 23:20:14 docker-portainer systemd[1]: docker.service: Failed with result 'exit-code'.
Nov 27 23:20:14 docker-portainer systemd[1]: Failed to start Docker Application Container Engine.
 
Last edited:
You have to tell docker to use its zfs storage driver.

In the file /etc/docker/demon.json add this:

{ "storage-driver": "zfs" }

Then restart the docker daemon.

See here for more info.
 
You have to tell docker to use its zfs storage driver.

In the file /etc/docker/demon.json add this:

{ "storage-driver": "zfs" }

Then restart the docker daemon.

See here for more info.
That is only required if you do not have a ZFS dataset mounted at /var/lib/docker. Having a dataset is the best approach to use the benefits of ZFS and encapsulate everything. In addition to the "normal" Docker ZFS driver, I recommend also installing the ZFS volume driver for Docker, which will create each volume directly as ZFS dataset instead of a volume ON your Docker ZFS dataset in /var/lib/docker/volumes.
 
  • Like
Reactions: Dunuin
You have to tell docker to use its zfs storage driver.

In the file /etc/docker/demon.json add this:

{ "storage-driver": "zfs" }

Then restart the docker daemon.

See here for more info.

There is no daemon.json in that folder, so I created one and I'm still unable to start docker.

As docker with proxmox migration to zfs is obviously flaky/boken I will have to painstakingly rebuild the whole VM.

Thanks anyway for your input.

JayS
 
As docker with proxmox migration to zfs is obviously flaky/boken I will have to painstakingly rebuild the whole VM.
That has nothing to do with PVE, just Docker.

The easiest way is to start over:
  • remove everything below /var/lib/docker (this will delete EVERYTHING, so make backups if necessary)
  • create a new ZFS dataset and set the mountpoint to /var/lib/docker
  • start Docker and enjoy Docker on ZFS