Buffer I/O errors

GarlicToe

Active Member
Jul 19, 2018
11
1
43
36
LS,

I've been having trouble with my Proxmox 5.2-2 machine; VM's are acting up, displaying an exclamation mark and

"running (io-error)"

next to Status in the Summary screen. Shutting down the Proxmox machine takes a good 4 minutes. Shortly after boot a long list of Buffer I/O errors is displayed;

Code:
Buffer I/O error on device-dm-6, logical block 22938112
[..] 
Buffer I/O error on device-dm-6, logical block 26701
Buffer I/O error on device-dm-6, logical block 26702
Buffer I/O error on device-dm-6, logical block 26703
Buffer I/O error on device-dm-6, logical block 26704
[...]
Buffer I/O error on device-dm-12, logical block 97796

SMART status for the Samsung SSD 960 EVO disk (the only one connected at the moment). Which is a couple months old:

Code:
SMART/Health Information (NVMe Log 0x02, NSID 0xffffffff)
Critical Warning:                   0x00
Temperature:                        43 Celsius
Available Spare:                    100%
Available Spare Threshold:          10%
Percentage Used:                    1%
Data Units Read:                    756,305 [387 GB]
Data Units Written:                 2,153,166 [1.10 TB]
Host Read Commands:                 31,498,573
Host Write Commands:                130,373,410
Controller Busy Time:               3,332
Power Cycles:                       17
Power On Hours:                     493
Unsafe Shutdowns:                   8
Media and Data Integrity Errors:    0
Error Information Log Entries:      0
Warning  Comp. Temperature Time:    0
Critical Comp. Temperature Time:    0
Temperature Sensor 1:               43 Celsius
Temperature Sensor 2:               70 Celsius

Configuration is pretty basic as I'm quite new to this. Running LVM.
 
LVM or LVM Thin? is your storage maybe full?

No it's not full:

Code:
root@buddha:~# df -h
Filesystem            Size  Used Avail Use% Mounted on
udev                   16G     0   16G   0% /dev
tmpfs                 3.2G  9.1M  3.2G   1% /run
/dev/mapper/pve-root   57G  8.5G   46G  16% /
tmpfs                  16G   43M   16G   1% /dev/shm
tmpfs                 5.0M     0  5.0M   0% /run/lock
tmpfs                  16G     0   16G   0% /sys/fs/cgroup
/dev/fuse              30M   20K   30M   1% /etc/pve
tmpfs                 3.2G     0  3.2G   0% /run/user/0

Sorry it is LVM-Thin I think:

Code:
id    type    content    path/target    shared
local    directory    vzdump backup file, ISO image, container template    /var/lib/vz    no
local-vm    lvm-thin    disk image, container                    -        no
 
Code:
LV              VG  Attr       LSize   Pool Origin          Data%  Meta%  Move Log Cpy%Sync Convert
  base-109-disk-1 pve Vri---tz-k  15.00g data
  data            pve twi-aotzD- 147.62g                      100.00 4.93
  root            pve -wi-ao----  58.00g
  swap            pve -wi-ao----   8.00g
  vm-100-disk-1   pve Vwi-a-tz--  20.00g data                 8.15
  vm-101-disk-1   pve Vwi-aotz-- 100.00g data                 90.66
  vm-102-disk-1   pve Vwi-a-tz--  32.00g data                 5.69
  vm-103-disk-1   pve Vwi-aotz--  32.00g data                 59.38
  vm-104-disk-1   pve Vwi-a-tz--  32.00g data                 5.86
  vm-105-disk-1   pve Vwi-a-tz--   8.00g data                 12.81
  vm-106-disk-1   pve Vwi-a-tz--   8.00g data                 38.25
  vm-107-disk-1   pve Vwi-aotz--   8.00g data                 15.08
  vm-108-disk-1   pve Vwi-a-tz--  20.00g data base-109-disk-1 97.82
 
I'm back again...

I replaced the harddrive (!) and used cfdisk to label it gpt and used mkfs.ext4 to format it. Added it to fstab using its uuid and ext4 defaults 0 0, mounted to /mnt/disk. Set up a VM and added a mountpoint pointing to /mnt/disk. All was well untill now;
I'm running into similar problems, VM's are crashing and the Proxmox WebUI is not accessible. The systems output shows a whole screen full of errors including:

loop: write error at byte offset 17xxxxxxx, sector 3xxxxxxx
Buffer I/O error on device loop0, logical block 109xxxxxx
Buffer I/O error on device loop0, logical block 109xxxxxx, lost async page write
EXT4-fs error (device loop0)
EXT4-fs (sda1): VES: Can't find ext4 filesystem

The drive is definitely not full this time..
 
What does lvs say? :) Also vgs?

Code:
root@buddha:~# vgs
  VG  #PV #LV #SN Attr   VSize   VFree
  pve   1  11   0 wz--n- 232.63g 16.00g
Code:
root@buddha:~# lvs
  LV              VG  Attr       LSize   Pool Origin          Data%  Meta%  Move Log Cpy%Sync Convert
  base-109-disk-1 pve Vri---tz-k  15.00g data
  data            pve twi-aotz-- 147.62g                      75.46  3.66
  root            pve -wi-ao----  58.00g
  swap            pve -wi-ao----   8.00g
  vm-100-disk-1   pve Vwi-a-tz--  20.00g data                 47.19
  vm-101-disk-1   pve Vwi-aotz-- 100.00g data                 46.55
  vm-102-disk-1   pve Vwi-aotz--   5.00g data                 38.90
  vm-103-disk-1   pve Vwi-a-tz-- 100.00g data                 25.21
  vm-106-disk-1   pve Vwi-a-tz--   8.00g data                 38.25
  vm-107-disk-1   pve Vwi-aotz--   8.00g data                 18.32
  vm-110-disk-1   pve Vwi-aotz--  20.00g data base-109-disk-1 72.81