Troubleshoot io-error status

flucimuc · Feb 27, 2020

Hi all,

Since a few days I get the error "io-error" message for one of my VMs when I try to boot it.
i can't explain why this happened, this setup has been running the last 10 months without problems.

On the host I have installed Proxmox 6.1-7, the VM which has problems uses OpenMediaVault 4.1.x.
The host has 8 hard disks and one SSD installed, the hard disks are divided into two ZFS raids. (volume1 and volume2)

The VM itself was set up with two virtual disks. On one disk is the operating system, on the other disk there is only data.
The OS disk is located on the SSD and the data disk can be found on volume1. (vm-100-disk-0)

If I detache the data disk, I can boot the VM normally.
So my guess is there's something wrong with the raid, but I can't find anything out of the ordinary.
My second guess would be that the data disk on volume1 is somehow corrupt.
I've run a "zpool scrub" for voume1, but no error was found.

Here some info's regarding the storage situation:

Code:

root@pve:/# df -h
Filesystem            Size  Used Avail Use% Mounted on
udev                   16G     0   16G   0% /dev
tmpfs                 3.1G  9.2M  3.1G   1% /run
/dev/mapper/pve-root   57G  7.2G   47G  14% /
tmpfs                  16G   37M   16G   1% /dev/shm
tmpfs                 5.0M     0  5.0M   0% /run/lock
tmpfs                  16G     0   16G   0% /sys/fs/cgroup
/dev/nvme0n1p2        511M  304K  511M   1% /boot/efi
volume2                11T  256K   11T   1% /volume2
volume1               2.0M  256K  1.8M  13% /volume1
/dev/fuse              30M   16K   30M   1% /etc/pve
tmpfs                 3.1G     0  3.1G   0% /run/user/0
root@pve:/# lvs
  LV            VG  Attr       LSize    Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  data          pve twi-aotz-- <147.38g             39.62  2.86
  root          pve -wi-ao----   58.00g
  swap          pve -wi-ao----    8.00g
  vm-100-disk-0 pve Vwi-aotz--   60.00g data        63.63
  vm-101-disk-0 pve Vwi-aotz--   25.00g data        80.86
root@pve:/# vgs
  VG  #PV #LV #SN Attr   VSize   VFree
  pve   1   5   0 wz--n- 232.38g 16.00g
root@pve:/# pvs
  PV             VG  Fmt  Attr PSize   PFree
  /dev/nvme0n1p3 pve lvm2 a--  232.38g 16.00g
root@pve:/# zpool list -v
NAME        SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
volume1    36.2T  35.1T  1.13T        -         -    11%    96%  1.00x    ONLINE  -
  raidz1   36.2T  35.1T  1.13T        -         -    11%  96.9%      -  ONLINE
    sda        -      -      -        -         -      -      -      -  ONLINE
    sdb        -      -      -        -         -      -      -      -  ONLINE
    sdc        -      -      -        -         -      -      -      -  ONLINE
    sdd        -      -      -        -         -      -      -      -  ONLINE
volume2    14.5T  53.3G  14.5T        -         -     0%     0%  1.00x    ONLINE  -
  raidz1   14.5T  53.3G  14.5T        -         -     0%  0.35%      -  ONLINE
    sde        -      -      -        -         -      -      -      -  ONLINE
    sdf        -      -      -        -         -      -      -      -  ONLINE
    sdg        -      -      -        -         -      -      -      -  ONLINE
    sdh        -      -      -        -         -      -      -      -  ONLINE
root@pve:/# zpool status -v
  pool: volume1
 state: ONLINE
  scan: scrub repaired 0B in 0 days 19:38:40 with 0 errors on Wed Feb 26 13:07:48 2020
config:

        NAME        STATE     READ WRITE CKSUM
        volume1     ONLINE       0     0     0
          raidz1-0  ONLINE       0     0     0
            sda     ONLINE       0     0     0
            sdb     ONLINE       0     0     0
            sdc     ONLINE       0     0     0
            sdd     ONLINE       0     0     0

errors: No known data errors

  pool: volume2
 state: ONLINE
  scan: scrub repaired 0B in 0 days 00:00:02 with 0 errors on Sun Feb  9 00:24:05 2020
config:

        NAME        STATE     READ WRITE CKSUM
        volume2     ONLINE       0     0     0
          raidz1-0  ONLINE       0     0     0
            sde     ONLINE       0     0     0
            sdf     ONLINE       0     0     0
            sdg     ONLINE       0     0     0
            sdh     ONLINE       0     0     0

errors: No known data errors

Something that seems strange to me, why does the command "df -h" show a much smaller size for volume1? Is the volume not mounted correctly?
Now to my question, what can i do to fix this problem? If it cannot be fixed, is there any way I can save my data?

Thanks in advance for your help!
Sorry if I forgot to add some important information's.

LnxBil · Feb 29, 2020

Volume 1 is at 96% capacity and will most probably stopped to work. Your ZFS pool is too full and you will get I/O errors on writing to it.

flucimuc · Mar 4, 2020

I could fix the problem by deleting volume2 and adding the 4 disks to zpool volume1.
Why zpool volume1 is already full I have not fully understand yet, but at least I have solved the problem for now and can boot the VM and access my data.

Thanks to all who helped me.

apoc · Mar 4, 2020

check what "zfs list" does report. There you can see individual datasets.
Perhaps you have created the datasets using the default allocation size.
I have experienced "interesting" or "unexpected" allocation due to that.
See my thread here for additional reference:
https://forum.proxmox.com/threads/u...reated-on-raidz3-pool-vs-mirrored-pool.65018/

LnxBil · Mar 4, 2020

tburger said:
https://forum.proxmox.com/threads/u...reated-on-raidz3-pool-vs-mirrored-pool.65018/

Damn ... I did not read that. Others (me included) had also struggled with this and there is also some similar research:
https://forum.proxmox.com/threads/help-me-to-understand-the-used-space-on-zfs.47934/
and the linked stuff in there.

apoc · Mar 5, 2020

great @LnxBil - I did not find that one as well :/
Doh. Seems we (humans) need to do the same mistakes over and over again ...

LnxBil · Mar 5, 2020

tburger said:
Seems we (humans) need to do the same mistakes over and over again ...

Unfortunately, yes.
Definition of history: Watching people making the same mistakes over and over again and learning nothing from it.

Search

Search

Troubleshoot io-error status

flucimuc

Active Member

LnxBil

Distinguished Member

flucimuc

Active Member

apoc

Famous Member

LnxBil

Distinguished Member

apoc

Famous Member

LnxBil

Distinguished Member

We value your privacy