Proxmox NOOB: I/O errors

Wokkes

New Member
May 22, 2023
4
1
3
Hi there,

I've installed Promox for the first time ever and it's been quite an experience and a journey. I already learned a lot! I am currently running into a potential problem that I can't seem to fix.

I am seeing some I/O errors on the console of my Ubuntu 22.04.2 LTS VM. I've attached them as a screenshot. Most of the are blk_update_request errors.

The hardware I am using is an Intel NUC NUC8i3BEH with 32 GB of RAM (G.Skill Ripjaws F4-2400C16D-32GRS - Memory DDR4 (SO-DIMM) - 32 GB: 2 x 16GB - 260-PIN - 2400) and 1 TB of storage (WD Blue SN570 1TB - SSD M.2 2280 - PCIe 3.0 x4 (NVMe)). The NUC is second hand, the RAM and SSD are brand new.

The SCSI controller is set to VirtIO SCSCI Single. The cache seems to be disabled. I do not know why these settings are like this, I think they are the default (I don't think I changed them). I've attached a screenshot just to be sure.

I tried booting a rescuedisk and performing fsck -y on /dev/sda (which is the main drive). It stated that the partition was a DOS partition (which was weird). Checked it with gparted, which stated it was an ex4fs, like I expected. Checking the partition with gparted showed no errors. I tried to find an answer on this forum, but most of them point at fsck. How can I troubleshoot this issue a bit better? Is there a way for me to fix this? What additional information is required?

Side-note: I also get the message that I do not have a valid subscription for Proxmox. It's a different problem that I am working on and it might not be related. Wanted to share it, just in case.

The output of of pveversion -v is as follows:

Code:
proxmox-ve: 7.4-1 (running kernel: 5.15.107-2-pve)
pve-manager: 7.4-3 (running version: 7.4-3/9002ab8a)
pve-kernel-5.15: 7.4-3
pve-kernel-5.15.107-2-pve: 5.15.107-2
pve-kernel-5.15.102-1-pve: 5.15.102-1
ceph-fuse: 15.2.17-pve1
corosync: 3.1.7-pve1
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.24-pve2
libproxmox-acme-perl: 1.4.4
libproxmox-backup-qemu0: 1.3.1-1
libproxmox-rs-perl: 0.2.1
libpve-access-control: 7.4-2
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.4-1
libpve-guest-common-perl: 4.2-4
libpve-http-server-perl: 4.2-3
libpve-rs-perl: 0.7.6
libpve-storage-perl: 7.4-2
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 5.0.2-2
lxcfs: 5.0.3-pve1
novnc-pve: 1.4.0-1
proxmox-backup-client: 2.4.1-1
proxmox-backup-file-restore: 2.4.1-1
proxmox-kernel-helper: 7.4-1
proxmox-mail-forward: 0.1.1-1
proxmox-mini-journalreader: 1.3-1
proxmox-widget-toolkit: 3.6.5
pve-cluster: 7.3-3
pve-container: 4.4-3
pve-docs: 7.4-2
pve-edk2-firmware: 3.20230228-2
pve-firewall: 4.3-1
pve-firmware: 3.6-5
pve-ha-manager: 3.6.1
pve-i18n: 2.12-1
pve-qemu-kvm: 7.2.0-8
pve-xtermjs: 4.16.0-1
qemu-server: 7.4-3
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.8.0~bpo11+3
vncterm: 1.7-1
zfsutils-linux: 2.1.11-pve1
 

Attachments

  • SCR-20230522-tlqe.png
    SCR-20230522-tlqe.png
    359.5 KB · Views: 4
  • SCR-20230522-toga.png
    SCR-20230522-toga.png
    128.8 KB · Views: 4
Always useful on IO errors:
1.) check if your RAM is healthy by running memtest86+ overnight
2.) check the connection of your cabling (but probably not the problem with a M.2)
3.) initalize a long smart selftest (for example smartctl -t long /dev/yourdisk)
4.) check the test result later with smartctl -a /dev/yourdisk and also use it o check if your SMART attributes show something that indicates errors
5.) check if you can flash a newer firmwares (you sometimes need to update your SSD to fix bugs)
6.) try another kernel so different drivers will be used (for example the 6.2 opt-in kernel)
 
Thanks @Dunuin! I've followed up on your suggestions.

1. Memtest86+ didn't show anything unusual (ran for 7 hours)
2. Checked if the M.2 is connected properly: seems to be the case.
3. smartctl isn't possible with an nvme ssd. I ended up using nvme self-test which doesn't show any issues or errors (how to do this can be found here: https://unix.stackexchange.com/ques...-results-of-a-self-test-of-an-ssd-in-smartctl)
4. I've updated the BIOS of the machine to version BECFL357.86A.0092.2023.2014.114 (previous version was BECFL357.86A.0085.2020.1007.1917)
5. I haven't looked at the firmware of the SSD yet, since that require me to setup a full Windows installation to test this out.
6. I haven't used another kernel yet, will test that out later tonight.
 
Couple of hours in, I am not getting the errors anymore. I have a sneaky suspicion that it was the bios update that made the whole thing more stable... Will report back tomorrow/in a few days.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!