Proxmox - VM io-error

m.gaggiano

New Member
Oct 29, 2024
2
0
1
Hi all, I'm facing this problem.

Our dedicated server is a SYS-3 (from OVH) with this specs:
- Intel Xeon-E 2288G 8c/16t 3,7/5 GHz
- 128 GB DDR4 with ECC
- 3x 4 TB HDD SATA Soft RAID

On the server is installed Proxmox 9.1

Code:
root@Proxmox01:~# pveversion -v
proxmox-ve: 9.1.0 (running kernel: 6.17.9-1-pve)
pve-manager: 9.1.6 (running version: 9.1.6/71482d1833ded40a)
proxmox-kernel-helper: 9.0.4
proxmox-kernel-6.17: 6.17.13-1
proxmox-kernel-6.17.13-1-pve-signed: 6.17.13-1
proxmox-kernel-6.17.9-1-pve-signed: 6.17.9-1
amd64-microcode: 3.20251202.1~bpo13+1
ceph-fuse: 19.2.3-pve1
corosync: 3.1.10-pve1
criu: 4.1.1-1
frr-pythontools: 10.4.1-1+pve1
ifupdown2: 3.3.0-1+pmx12
intel-microcode: 3.20251111.1~deb13u1
libjs-extjs: 7.0.0-5
libproxmox-acme-perl: 1.7.0
libproxmox-backup-qemu0: 2.0.2
libproxmox-rs-perl: 0.4.1
libpve-access-control: 9.0.5
libpve-apiclient-perl: 3.4.2
libpve-cluster-api-perl: 9.0.7
libpve-cluster-perl: 9.0.7
libpve-common-perl: 9.1.7
libpve-guest-common-perl: 6.0.2
libpve-http-server-perl: 6.0.5
libpve-network-perl: 1.2.5
libpve-rs-perl: 0.11.4
libpve-storage-perl: 9.1.0
libspice-server1: 0.15.2-1+b1
lvm2: 2.03.31-2+pmx1
lxc-pve: 6.0.5-4
lxcfs: 6.0.4-pve1
novnc-pve: 1.6.0-3
proxmox-backup-client: 4.1.4-1
proxmox-backup-file-restore: 4.1.4-1
proxmox-backup-restore-image: 1.0.0
proxmox-firewall: 1.2.1
proxmox-kernel-helper: 9.0.4
proxmox-mail-forward: 1.0.2
proxmox-mini-journalreader: 1.6
proxmox-offline-mirror-helper: 0.7.3
proxmox-widget-toolkit: 5.1.8
pve-cluster: 9.0.7
pve-container: 6.1.2
pve-docs: 9.1.2
pve-edk2-firmware: not correctly installed
pve-esxi-import-tools: 1.0.1
pve-firewall: 6.0.4
pve-firmware: 3.18-1
pve-ha-manager: 5.1.1
pve-i18n: 3.6.6
pve-qemu-kvm: 10.1.2-7
pve-xtermjs: 5.5.0-3
qemu-server: 9.1.4
smartmontools: 7.4-pve1
spiceterm: 3.4.1
swtpm: 0.8.0+pve3
vncterm: 1.9.1
zfsutils-linux: 2.4.0-pve1

The VM has this configuration

Code:
root@Proxmox01:~# qm config 102
agent: 1,fstrim_cloned_disks=1
allow-ksm: 0
balloon: 0
boot: order=scsi0;ide2;net0
cores: 2
cpu: x86-64-v2-AES,flags=+pcid;+spec-ctrl;+ssbd;+ibpb;+pdpe1gb;+aes
hotplug: disk,network,usb,memory
ide2: none,media=cdrom
memory: 32768
meta: creation-qemu=10.1.2,ctime=1771511179
name: Stack3
net0: virtio=02:00:00:18:f0:c8,bridge=vmbr0,firewall=1
numa: 1
onboot: 1
ostype: l26
scsi0: local:102/vm-102-disk-0.qcow2,aio=native,iothread=1,size=150G
scsihw: virtio-scsi-single
smbios1: uuid=61c5bbdb-0fb7-4bb6-93de-c0d7901d8fbb
sockets: 2
tags: production
vmgenid: 19a42a58-63c8-45e5-80fd-bb98b5599314

In the attachments there are:
- the smartctl results
- the df -hi and df -h results

What is needed to debug better the problem? I've already checked the results and don't see the problem, if there is any.
 

Attachments

Hi @m.gaggiano , welcome to the forum.

It seems as you may have omitted the actual complete error from your post. The "I/O" error is a generic symptom. It can be caused by many different things.
Please provide an actual error message, system logs around the time of the message (journalctl , dmesg).

Cheers


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
Thanks @news and @bbgeek17 for the response.
This is a dedicated server that we got from another collegue. I'll try to answer to all questions as best as possible.
Hello in your VM you have set sockets: 2
- why?
No reason in particular. We can change it to 1 and the core increased to 4 without problem.
You use a zfs raidz1 or mdadm setup for your 3x 4 TB HDD SATA ?
- why?
The RAID configuration was already done by the service provider.
your disk is vm-102-disk-0.qcow2 - QCOW2
- why?
Because the QCOW2 is the default format used when a new VM is created. We can convert it to RAW without problem.
And use zfs with zfs special device on ssds, when you setup zfs vdev raidz1 with hdds.
We just realized that the ZFS disks aren't in RAID. We'll be doing some tweaking to fix this.
you need top IOPs for random 4k read/write access.
At the moment we cannot add an SSD or other disks. Which fs is the better for this situation?

It seems as you may have omitted the actual complete error from your post. The "I/O" error is a generic symptom. It can be caused by many different things.
Please provide an actual error message, system logs around the time of the message (journalctl , dmesg).
I didn't know what to post, so I just wrote the basic information. What commands should I run? I've currently made some changes to the number of sockets and disk type.
 
I didn't know what to post, so I just wrote the basic information. What commands should I run? I've currently made some changes to the number of sockets and disk type.
Presumably you saw the IO error somewhere. You should post exactly what you saw and describe where (inside VM, in Hypervisor, elsewhere?).
It could be a log snippet, a screenshot, anything that can help describe your situation. For all we know you could have a network drive mounted and had a network hickup...

The number of cores and disk formation are unlikely to play a role in the IO error.



Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
Because the QCOW2 is the default format used when a new VM is created
You generally want to use the ZFS storage type when you have a ZFS pool. The IO error can have a lot of reasons. Overprovisioning and then attempting to use it all might be one. You might be able to find relevant logs with something like journalctl -b0 -krp 0..5. Otherwise check node > System > System Logs.
 
Last edited: