I/O error on virtual disk

iggy

New Member
Jan 23, 2024
5
0
1
Hi,
Proxmox 8.1.3. Linux (Rocky 9) virtual machine began to throw error messages in log:
sd 0:0:0:0: [sda] tag#72 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=14s
sd 0:0:0:0: [sda] tag#72 Sense Key : Aborted Command [current]
sd 0:0:0:0: [sda] tag#72 Add. Sense: I/O process terminated
sd 0:0:0:0: [sda] tag#72 CDB: Write(10) 2a 00 00 25 41 a0 00 00 08 00
I/O error, dev sda, sector 2441632 op 0x1:(WRITE) flags 0x100000 phys_seg 1 prio class 2

Does it mean underlying hardware disk is dying or can be Proxmox settings adjusted ?
Current settings are Cache: default (no cache) and iothread [X]
 
Hi,
do you also see anything in the host's log around the time the issue happened? Did you do any special operations like snapshot/backup around the time the issue happened? Please share the configuration of the VM qm config <ID> and the output of pveversion -v. What kind of storage is the virtual image on?
 
Hi,
a). didn't find host logs (except pveam.log) , are they must be in /var/log/... ?
b). special operations: did a backup from another server to problem server
c). virtual storage is RAID massive of SSD disks
d). qm config <ID>:
agent: 1
balloon: 2048
boot: order=scsi0;ide2;net0
cores: 8
cpu: x86-64-v2-AES
ide2: none,media=cdrom
memory: 12288
meta: creation-qemu=8.1.2,ctime=1704198525
name: MYNAME
net0: virtio=BC:25:11:7D:86:3F,bridge=vmbr0,firewall=1
numa: 0
ostype: l26
scsi0: My-OS-001:vm-120-disk-0,cache=writeback,iothread=1,size=250G
scsi1: My-Data-001:vm-120-disk-0,iothread=1,size=5000G
scsihw: virtio-scsi-single
smbios1: uuid=3c922320-2381-473d-bcf1-8d4bcca26cbb
sockets: 2
vmgenid: e490b963-1231-4ee7-99a6-f759b69dd67f

d). pveversion -v
pveversion -v
proxmox-ve: 8.1.0 (running kernel: 6.5.11-4-pve)
pve-manager: 8.1.3 (running version: 8.1.3/b46aac3b42da5d15)
proxmox-kernel-helper: 8.0.9
proxmox-kernel-6.5.11-4-pve-signed: 6.5.11-4
proxmox-kernel-6.5: 6.5.11-4
ceph-fuse: 17.2.7-pve1
corosync: 3.1.7-pve3
criu: 3.17.1-2
glusterfs-client: 10.3-5
ifupdown2: 3.2.0-1+pmx7
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-4
libknet1: 1.28-pve1
libproxmox-acme-perl: 1.5.0
libproxmox-backup-qemu0: 1.4.0
libproxmox-rs-perl: 0.3.1
libpve-access-control: 8.0.7
libpve-apiclient-perl: 3.3.1
libpve-common-perl: 8.1.0
libpve-guest-common-perl: 5.0.6
libpve-http-server-perl: 5.0.5
libpve-network-perl: 0.9.4
libpve-rs-perl: 0.8.7
libpve-storage-perl: 8.0.5
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 5.0.2-4
lxcfs: 5.0.3-pve3
novnc-pve: 1.4.0-3
proxmox-backup-client: 3.0.4-1
proxmox-backup-file-restore: 3.0.4-1
proxmox-kernel-helper: 8.0.9
proxmox-mail-forward: 0.2.2
proxmox-mini-journalreader: 1.4.0
proxmox-offline-mirror-helper: 0.6.2
proxmox-widget-toolkit: 4.1.3
pve-cluster: 8.0.5
pve-container: 5.0.8
pve-docs: 8.1.3
pve-edk2-firmware: 4.2023.08-1
pve-firewall: 5.0.3
pve-firmware: 3.9-1
pve-ha-manager: 4.0.3
pve-i18n: 3.1.2
pve-qemu-kvm: 8.1.2-4
pve-xtermjs: 5.3.0-2
qemu-server: 8.0.10
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.2.0-pve3
 
a). didn't find host logs (except pveam.log) , are they must be in /var/log/... ?
Nowadays, the default is having the logs in the system journal. You can use journalctl -b to see the logs for the current boot.
b). special operations: did a backup from another server to problem server
Sorry, I might be misunderstanding: The issue happened in the new VM that was created when restoring a backup?

c). virtual storage is RAID massive of SSD disks
Code:
scsi0: My-OS-001:vm-120-disk-0,cache=writeback,iothread=1,size=250G
scsi1: My-Data-001:vm-120-disk-0,iothread=1,size=5000G
Which of these is the sda disk in the VM? Please also share the storage configuration /etc/pve/storage.cfg.
 
Nowadays, the default is having the logs in the system journal. You can use journalctl -b to see the logs for the current boot.

Sorry, I might be misunderstanding: The issue happened in the new VM that was created when restoring a backup?


Which of these is the sda disk in the VM? Please also share the storage configuration /etc/pve/storage.cfg.

a). journalctl -b contains one-two times a day following that might related to disks:

Device: /dev/sde [SAT], SMART Prefailure Attribute: 1 Raw_Read_Error_Rate changed from 82 to 84

b). The issue happened in the new VM that is using as storage server for backups. Disk /dev/sda is:
scsi0: My-OS-001:vm-120-disk-0,cache=writeback,iothread=1,size=250G . This is not an actual backup storage, but logs are writing there.

c). cat /etc/pve/storage.cfg

dir: local
path /var/lib/vz
content backup,iso,vztmpl

lvmthin: local-lvm
thinpool data
vgname pve
content rootdir,images

zfspool: Comp-Data-001
pool Comp-Data-001
content rootdir,images
mountpoint /Comp-Data-001
nodes Proxmox-01
sparse 1

zfspool: Comp-OS-001
pool Comp-OS-001
content images,rootdir
mountpoint /Comp-OS-001
nodes Proxmox-01
sparse 1
 
a). journalctl -b contains one-two times a day following that might related to disks:

Device: /dev/sde [SAT], SMART Prefailure Attribute: 1 Raw_Read_Error_Rate changed from 82 to 84

b). The issue happened in the new VM that is using as storage server for backups. Disk /dev/sda is:
scsi0: My-OS-001:vm-120-disk-0,cache=writeback,iothread=1,size=250G . This is not an actual backup storage, but logs are writing there.
The storage configuration you posted does not contain a storage with ID My-OS-001. Assuming this is also a ZFS storage, what does zpool status -v show? Is /dev/sde part of the relevant ZFS pool? In any case, you might want to run a health check for that disk, e.g. using smartctl.