io-error how to identify, fix and check?

eiger3970 · Mar 19, 2016

Hello, I've received an io-error. I would like to know how to pinpoint what the cause of the problem is, to address the correct issue and avoid repeats.

I ran some tests

Code:

Proxmox > /var/logs/messages:
Mar 13 06:25:01 proxmox rsyslogd: [origin software="rsyslogd" swVersion="5.8.11" x-pid="2481" x-info="http://www.rsyslog.com"] rsyslogd was HUPed
Mar 14 06:25:01 proxmox rsyslogd: [origin software="rsyslogd" swVersion="5.8.11" x-pid="2481" x-info="http://www.rsyslog.com"] rsyslogd was HUPed
Mar 15 06:25:01 proxmox rsyslogd: [origin software="rsyslogd" swVersion="5.8.11" x-pid="2481" x-info="http://www.rsyslog.com"] rsyslogd was HUPed
Mar 16 06:25:02 proxmox rsyslogd: [origin software="rsyslogd" swVersion="5.8.11" x-pid="2481" x-info="http://www.rsyslog.com"] rsyslogd was HUPed
Mar 17 06:25:02 proxmox rsyslogd: [origin software="rsyslogd" swVersion="5.8.11" x-pid="2481" x-info="http://www.rsyslog.com"] rsyslogd was HUPed
Mar 18 06:25:02 proxmox rsyslogd: [origin software="rsyslogd" swVersion="5.8.11" x-pid="2481" x-info="http://www.rsyslog.com"] rsyslogd was HUPed
Mar 18 21:31:46 proxmox kernel: vmbr0: port 5(tap145i0) entering disabled state
Mar 18 21:31:58 proxmox kernel: vmbr0: port 4(tap144i0) entering disabled state
Mar 18 21:33:10 proxmox kernel: device tap144i0 entered promiscuous mode
Mar 18 21:33:10 proxmox kernel: vmbr0: port 4(tap144i0) entering forwarding state
Mar 19 06:25:02 proxmox rsyslogd: [origin software="rsyslogd" swVersion="5.8.11" x-pid="2481" x-info="http://www.rsyslog.com"] rsyslogd was HUPed

Code:

root@proxmox:/var/log# pvs
  PV         VG   Fmt  Attr PSize   PFree
  /dev/sda3  pve  lvm2 a--  111.29g 13.87g
root@proxmox:/var/log# vgs
  VG   #PV #LV #SN Attr   VSize   VFree
  pve    1   3   0 wz--n- 111.29g 13.87g
root@proxmox:/var/log# lvs
  LV   VG   Attr      LSize  Pool Origin Data%  Move Log Copy%  Convert
  data pve  -wi-ao--- 55.79g                                           
  root pve  -wi-ao--- 27.75g                                           
  swap pve  -wi-ao--- 13.88g                                           
root@proxmox:/var/log# dmsetup ls --tree
pve-swap (253:1)
└─ (8:3)
pve-root (253:0)
└─ (8:3)
pve-data (253:2)
└─ (8:3)
root@proxmox:/var/log# mount
sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime)
proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)
udev on /dev type devtmpfs (rw,relatime,size=10240k,nr_inodes=2045157,mode=755)
devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000)
tmpfs on /run type tmpfs (rw,nosuid,noexec,relatime,size=1638148k,mode=755)
/dev/mapper/pve-root on / type ext3 (rw,relatime,errors=remount-ro,user_xattr,acl,barrier=0,data=ordered)
tmpfs on /run/lock type tmpfs (rw,nosuid,nodev,noexec,relatime,size=5120k)
tmpfs on /run/shm type tmpfs (rw,nosuid,nodev,noexec,relatime,size=3276280k)
fusectl on /sys/fs/fuse/connections type fusectl (rw,relatime)
/dev/mapper/pve-data on /var/lib/vz type ext3 (rw,relatime,errors=continue,user_xattr,acl,barrier=0,data=ordered)
/dev/sda2 on /boot type ext3 (rw,relatime,errors=continue,user_xattr,acl,barrier=0,data=ordered)
/dev/sde1 on /mnt/backup type ext4 (rw,relatime,barrier=1,data=ordered)
rpc_pipefs on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw,relatime)
/dev/fuse on /etc/pve type fuse (rw,nosuid,nodev,relatime,user_id=0,group_id=0,default_permissions,allow_other)
beancounter on /proc/vz/beancounter type cgroup (rw,relatime,blkio,name=beancounter)
container on /proc/vz/container type cgroup (rw,relatime,freezer,devices,name=container)
fairsched on /proc/vz/fairsched type cgroup (rw,relatime,cpuacct,cpu,cpuset,name=fairsched)

I'm guessing the hard drive is faulty, but would like to confirm which hard drive, before physically replacing.

Also, how do I check all the hard drives in the server without physically opening up the machine?
I would like to confirm harddrives running Proxmox and if or what hard drives are running the servers, such as CentOS where the io-error appeared on the GUI. The Proxmox GUI also showed the CPU usage at 101.5%, but was running fine, until I logged in and saw the io-error.

udo · Mar 19, 2016

Hi,
where do you see the io-error?

For looking at your hdds use smartmontools.

Which storage do you use for your VM? Which cache settings?

Udo

eiger3970 · Mar 19, 2016

Thank you for the reply.
The io-error is in VM 4 and 5 on the Proxmox GUI > Server View > Datacenter > proxmox > VM > Summary > Status > Status > io-error.

I've had a look at smartmontools and it looks like I install on the Proxmox machine.

The storage used is a hard disk drive (virtio0) local:VMID/vm-VMID-disk-1.raw,format=raw,size=50G.
Cache (memory?): 2.00GB.

VMs 1 - 3 run fine.
VMs 4 - 5 had the io-errors.
I deleted VMs 4 - 5, then restored VMs 4 - 5.

VM 5 now works.

VM 4 has the same error. Odd as the restore is from 20160313, but VM 4 worked until 20160319.

VM 4 Console:

Code:

/dev/VolGroup00/LogVo100: Superblock last write time is in the future. FIXED.
/dev/VolGoup00/LogVo100 contains a file system with errors, check forced.
/dev/VolGroup00/LogVo100:
Deleted inode 6737031 has zero dtime. FIXED.
/dev/VolGroup00/LogVo100: Inodes that were part of a corrupted orphan linked list found.
/dev/VolGroup00/LogVo100: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY.
(i.e., without -a or -p options)
[FAILED]

*** An error occurred during the file system check.
*** Dropping you to a shell; the system will reboot
*** when you leave the shell.
Give root password for maintenance
(or type Control-D to continue): _

Yes, I have run the fsck manually, however after fixing issues, a reboot returns to the same error.

Actually, I ran the fsck manually on the restore, and it's fixed.

So, how can I avoid this.
I guess setting up RAID or knowing what causes the io-error would be good.

eiger3970 · Apr 25, 2016

The io-error seems to come back monthly.
I am now upgrading VM4's OS to rule out that possibility.
However, VM5 needs an older CentOS version, as the program is very sensitive to any changes.
So, what is this io-error?
I need to fix this once and for all.

eiger3970 · May 3, 2016

So, anyone able to help.
The error went away when I resized the hard disk from 100GB to 150GB.
However, now the error has come back again even at 150GB?

eiger3970 · May 28, 2016

Ok, I gave up on this one and bought a new disk.

Search

Search

io-error how to identify, fix and check?

eiger3970

Well-Known Member

udo

Distinguished Member

eiger3970

Well-Known Member

eiger3970

Well-Known Member

eiger3970

Well-Known Member

eiger3970

Well-Known Member