ZFS drive causing 'io error' status for OpenMediaVault VM

sugarleaves

New Member
Feb 1, 2020
8
1
1
35
I recently started seeing a "io error" status on my OpenMediaVault VM within Proxmox. OpenMediaVault VM has ZFS drive attached to it (via Proxmox) that it exposes as a network drive. When I detach the ZFS drive from OpenMediaVault VM, OpenMediaVault runs fine, which leads me to believe that the issue is coming from the ZFS drive.

I set up ZFS in Proxmox as a raidz3. Like I said above, I then attached the ZFS storage as a drive in my OpenMediaVault VM, making available almost all space for OpenMediaVault. When I was able to last view the drive in OpenMediaVault, it should have more than 5TiB of space left, so it's nowhere near full in OpenMediaVault. I ran a zpool scrub but it didn't find any errors. Proxmox is showing usage at 100%, however: 100.00% (10.56 TiB of 10.56 TiB)

I ran zpool list, zfs list, and zpool status, but they all seem normal to me. The output is below. The "AVAIL" portion of the zfs list output seems low (~3M). Could that be the cause of the issue? Is there some kind of overhead with ZFS that I need to account for? If so, how do I fix the issue since I can no longer access the drive contents via OpenMediaVault?

Code:
root@lab:~# zfs list
NAME                  USED  AVAIL     REFER  MOUNTPOINT
D2700                10.6T  3.46M      307K  /D2700
D2700/vm-100-disk-0  10.6T  3.46M     10.6T  -

root@lab:~# zpool list
NAME    SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
D2700  13.6T  13.2T   437G        -         -    10%    96%  1.00x    ONLINE  -

root@lab:~# zpool status
  pool: D2700
state: ONLINE
  scan: scrub repaired 0B in 0 days 08:55:16 with 0 errors on Fri May  8 11:19:46 2020
config:

        NAME                        STATE     READ WRITE CKSUM
        D2700                       ONLINE       0     0     0
          raidz3-0                  ONLINE       0     0     0
            wwn-0x5000c500287054e3  ONLINE       0     0     0
            wwn-0x5000c5003355b4af  ONLINE       0     0     0
            wwn-0x5000c5001d5fc1db  ONLINE       0     0     0
            wwn-0x5000c500337b562f  ONLINE       0     0     0
            wwn-0x5000c500337b601b  ONLINE       0     0     0
            wwn-0x5000c500289fd313  ONLINE       0     0     0
            wwn-0x5000c5002870621f  ONLINE       0     0     0
            wwn-0x5000c500289fd83f  ONLINE       0     0     0
            wwn-0x5000c50028705b53  ONLINE       0     0     0
            wwn-0x5000c500287049cb  ONLINE       0     0     0
            wwn-0x5000c500289fcef7  ONLINE       0     0     0
            wwn-0x5000c500289d419b  ONLINE       0     0     0
            sdn                     ONLINE       0     0     0
            wwn-0x5000c500337b603f  ONLINE       0     0     0
            wwn-0x5000c50028705c93  ONLINE       0     0     0
            wwn-0x5000c50028a2bd43  ONLINE       0     0     0
            wwn-0x5000c5003356512b  ONLINE       0     0     0
            wwn-0x5000c500337468c7  ONLINE       0     0     0
            wwn-0x5000c50028a0c2d7  ONLINE       0     0     0
            wwn-0x5000c500289fd7ab  ONLINE       0     0     0
            sdv                     ONLINE       0     0     0
            sdw                     ONLINE       0     0     0
            wwn-0x5000c500289fd8f7  ONLINE       0     0     0
            wwn-0x5000c5001d530623  ONLINE       0     0     0
            wwn-0x5000c50033565a17  ONLINE       0     0     0

errors: No known data errors


Any help is greatly appreciated.
 
Last edited:
Easy: Your pool is full. More full than I ever saw. Often, I got 93% and then the first errors.

It's strange to me that it's now showing errors after a month of usage. I wasn't allocating any more storage since I provisioned the initial drive for my OpenMediaVault VM. What's allocating more space such that it's now having issues?
 
It's strange to me that it's now showing errors after a month of usage. I wasn't allocating any more storage since I provisioned the initial drive for my OpenMediaVault VM. What's allocating more space such that it's now having issues?

Normally you have thin provisioning, snapshots and every dataset on your pool can consume space over time. That's normal. Especially if you use snapshots.
 
It doesn't look like I'm using any space for snapshots: usedbysnapshots 0B

On my single volume (my disk I provisioned for OpenMediaVault), I see some interesting values:
Code:
D2700/vm-100-disk-0  used                  10.6T
D2700/vm-100-disk-0  available             3.46M
D2700/vm-100-disk-0  written               10.6T
D2700/vm-100-disk-0  logicalused           3.31T
D2700/vm-100-disk-0  logicalreferenced     3.31T

Logical Used/Referenced (3.31T) seems to be the value that OpenMediaVault is showing me. However it says that 10.6T is Used/Written? I don't understand how ~3T of content is taking up ~10T of space.

Is there a configuration issue with how I set it up within OpenMediaVault perhaps? I have the disk formatted as ext4 within OpenMediaVault. Would that screw up something since the disk within Proxmox is really ZFS raidz3-0?

Here's the full output I was looking at:
Code:
root@lab:~# zfs get all
NAME                 PROPERTY              VALUE                  SOURCE
D2700                type                  filesystem             -
D2700                creation              Sat Feb  8  0:43 2020  -
D2700                used                  10.6T                  -
D2700                available             3.46M                  -
D2700                referenced            307K                   -
D2700                compressratio         1.00x                  -
D2700                mounted               yes                    -
D2700                quota                 none                   default
D2700                reservation           none                   default
D2700                recordsize            128K                   default
D2700                mountpoint            /D2700                 default
D2700                sharenfs              off                    default
D2700                checksum              on                     default
D2700                compression           on                     local
D2700                atime                 on                     default
D2700                devices               on                     default
D2700                exec                  on                     default
D2700                setuid                on                     default
D2700                readonly              off                    default
D2700                zoned                 off                    default
D2700                snapdir               hidden                 default
D2700                aclinherit            restricted             default
D2700                createtxg             1                      -
D2700                canmount              on                     default
D2700                xattr                 on                     default
D2700                copies                1                      default
D2700                version               5                      -
D2700                utf8only              off                    -
D2700                normalization         none                   -
D2700                casesensitivity       sensitive              -
D2700                vscan                 off                    default
D2700                nbmand                off                    default
D2700                sharesmb              off                    default
D2700                refquota              none                   default
D2700                refreservation        none                   default
D2700                guid                  373170307492478138     -
D2700                primarycache          all                    default
D2700                secondarycache        all                    default
D2700                usedbysnapshots       0B                     -
D2700                usedbydataset         307K                   -
D2700                usedbychildren        10.6T                  -
D2700                usedbyrefreservation  0B                     -
D2700                logbias               latency                default
D2700                objsetid              54                     -
D2700                dedup                 off                    default
D2700                mlslabel              none                   default
D2700                sync                  standard               default
D2700                dnodesize             legacy                 default
D2700                refcompressratio      1.00x                  -
D2700                written               307K                   -
D2700                logicalused           3.31T                  -
D2700                logicalreferenced     42K                    -
D2700                volmode               default                default
D2700                filesystem_limit      none                   default
D2700                snapshot_limit        none                   default
D2700                filesystem_count      none                   default
D2700                snapshot_count        none                   default
D2700                snapdev               hidden                 default
D2700                acltype               off                    default
D2700                context               none                   default
D2700                fscontext             none                   default
D2700                defcontext            none                   default
D2700                rootcontext           none                   default
D2700                relatime              off                    default
D2700                redundant_metadata    all                    default
D2700                overlay               off                    default
D2700                encryption            off                    default
D2700                keylocation           none                   default
D2700                keyformat             none                   default
D2700                pbkdf2iters           0                      default
D2700                special_small_blocks  0                      default
D2700/vm-100-disk-0  type                  volume                 -
D2700/vm-100-disk-0  creation              Mon Apr 13 22:15 2020  -
D2700/vm-100-disk-0  used                  10.6T                  -
D2700/vm-100-disk-0  available             3.46M                  -
D2700/vm-100-disk-0  referenced            10.6T                  -
D2700/vm-100-disk-0  compressratio         1.00x                  -
D2700/vm-100-disk-0  reservation           none                   default
D2700/vm-100-disk-0  volsize               10.2T                  local
D2700/vm-100-disk-0  volblocksize          8K                     default
D2700/vm-100-disk-0  checksum              on                     default
D2700/vm-100-disk-0  compression           on                     inherited from D2700
D2700/vm-100-disk-0  readonly              off                    default
D2700/vm-100-disk-0  createtxg             16785                  -
D2700/vm-100-disk-0  copies                1                      default
D2700/vm-100-disk-0  refreservation        10.5T                  local
D2700/vm-100-disk-0  guid                  11003192779661825820   -
D2700/vm-100-disk-0  primarycache          all                    default
D2700/vm-100-disk-0  secondarycache        all                    default
D2700/vm-100-disk-0  usedbysnapshots       0B                     -
D2700/vm-100-disk-0  usedbydataset         10.6T                  -
D2700/vm-100-disk-0  usedbychildren        0B                     -
D2700/vm-100-disk-0  usedbyrefreservation  0B                     -
D2700/vm-100-disk-0  logbias               latency                default
D2700/vm-100-disk-0  objsetid              1667                   -
D2700/vm-100-disk-0  dedup                 off                    default
D2700/vm-100-disk-0  mlslabel              none                   default
D2700/vm-100-disk-0  sync                  standard               default
D2700/vm-100-disk-0  refcompressratio      1.00x                  -
D2700/vm-100-disk-0  written               10.6T                  -
D2700/vm-100-disk-0  logicalused           3.31T                  -
D2700/vm-100-disk-0  logicalreferenced     3.31T                  -
D2700/vm-100-disk-0  volmode               default                default
D2700/vm-100-disk-0  snapshot_limit        none                   default
D2700/vm-100-disk-0  snapshot_count        none                   default
D2700/vm-100-disk-0  snapdev               hidden                 default
D2700/vm-100-disk-0  context               none                   default
D2700/vm-100-disk-0  fscontext             none                   default
D2700/vm-100-disk-0  defcontext            none                   default
D2700/vm-100-disk-0  rootcontext           none                   default
D2700/vm-100-disk-0  redundant_metadata    all                    default
D2700/vm-100-disk-0  encryption            off                    default
D2700/vm-100-disk-0  keylocation           none                   default
D2700/vm-100-disk-0  keyformat             none                   default
D2700/vm-100-disk-0  pbkdf2iters           0                      default
 
I guess the other question is how do I go about fixing this? I can't mount the disk image on OpenMediaVault or it freezes with the "io error" problem, so I don't know how else to remove files to decrease the usage.

Could I possibly change to a more aggressive compression or something to give myself a bit of space temporarily? I should then be able to mount the disk on OpenMediaVault and remove some files. I'd rather not have to do that but it seems like it possibly might work?

Any ideas? Thanks for the input so far!
 
Last edited:
So, you have not enabled compression on your pool and enabling it now will only be for new blocks, that cannot be written due to the lack of space. This is also not a recommended layout of raidz3 for speed reasons, but that is another matter.

In general, every raidz level is very unpredictable with respect to space usage. Here in/on (what is correct??) the forums, there are a number of threads/posts about exactly this problem. Best to always use mirrors (for best performance and best predicability). I'd say your 10 TB volume is just too darn big for your 13.6 TB raidz3 pool. The pool size is the sum of all disks (before any redundancy). The data itself is one part of the equation, the others are metadata, redundant metadata, copy of blocks in your vdev layer, etc.

The fact that you have only one zvol in your pool, and that is full shrinks the countermeasures to a minimum. You could reclaim more "free" space by trying to trim your filesystem, but you need to mount it and therefore you may need to recover it and without any space you will end up with the same i/o errors again. Quite a pickle you're in. To restore functionality, I'd consider recreating everything. The first step to get the pool working is to add new space, that is the only save thing you can do in my opinion. After that, you can mount it, trim it and see if that reduces the space usage. Depending on the version of ZFS, you can remove the added space afterwards again and should have a somewhat working pool. The main problem is still that you may end up running in the same problem again.
 
  • Like
Reactions: sugarleaves

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!