io error on Truenas VM

Bronz

New Member
Mar 6, 2023
7
0
1
Hello, Im newbie here just got into proxmox and im trying to build a homeserver using a on old laptop (HP Pavilion dv6-7080ee) for simple smb file sharing and some movies database (normal file sharing and Jellfin each on separate VMs). I am using 4 external usb hdd 3 500GB 1 1TB , i found a tutorial on how to HDD passthrought without HBA, I occasionally encounter io error and truenas is stuck until I stop the VM and start it again. This happens after 1-2 days of runtime of the VM.
 
Did you run smartctl -a /dev/yourDisk to see how healthy your disks are? In case of 3.5" HDDs they can get really hot when running 24/7. Also make sure not to stack them, as vibrations can cause write/read errors too.
 
Hi,
apart from what @Dunuin said, I'd also check /var/log/syslog from around the time the issue occurs. There might be related messages telling you more. Please also share the output of qm config <ID> for the affected VM and pveversion -v.
 
Did you run smartctl -a /dev/yourDisk to see how healthy your disks are? In case of 3.5" HDDs they can get really hot when running 24/7. Also make sure not to stack them, as vibrations can cause write/read errors too.
Hello Dunuin, i attached the output and yes they are not stacked.

 

Attachments

  • smartctl.txt
    22.9 KB · Views: 6
Hi,
apart from what @Dunuin said, I'd also check /var/log/syslog from around the time the issue occurs. There might be related messages telling you more. Please also share the output of qm config <ID> for the affected VM and pveversion -v.
Hello Fiona, thanks for the reply too. The syslog was too large to be uploaded so i had to omit the recurring message log " Mar 04 00:00:47 proxmox kernel: CIFS: VFS: \\192.168.178.29\HomeServer BAD_NETWORK_NAME: \\192.168.178.29\HomeServer " in order to be able to upload the text file. I also included qmconfig output and pveversion -v in the same text file.
 

Attachments

  • Syslog.txt
    46.8 KB · Views: 6
# tail -n15 /var/log/syslog
Code:
Mar  6 20:03:10 ipa kernel: [2191387.109617] audit: type=1400 audit(1678113190.636:72732): apparmor="DENIED" operation="mount" info="failed flags match" error=-13 profile="lxc-102_</var/lib/lxc>" name="/run/systemd/unit-root/proc/" pid=4433 comm="(netdata)" fstype="proc" srcname="proc" flags="rw, nosuid, nodev, noexec"
Any worries?
 
The Seatage Momentus reports a lot raw read and seek errors as well as loads of data that had to be fixed by ECC.
 
So i should use ECC memory ? or what is the case
Its always better to use ECC memory, but that is not what Is meant there.
The Disk itself got ECC to fix minor errors and that counter counts how much data was fixed by it.
 
When did the error occur? In the syslog I can see
Code:
Mar 04 19:47:46 proxmox smartd[773]: Device: /dev/sdc [SAT], removed ATA device: No such device
Mar 04 19:47:46 proxmox smartd[773]: Device: /dev/sdd [SAT], reconnected ATA device
but no direct messages related to IO errors.

Next time it happens, the following script should tell you which drive got the error:
Code:
root@pve701 ~ # cat query-block.pm
#!/bin/perl

use strict;
use warnings;

use PVE::QemuServer::Monitor qw(mon_cmd);

my $vmid = shift or die "need to specify vmid\n";

my $res = eval { mon_cmd($vmid, "query-block" ) };
die $@ if $@;
for my $blockdev ($res->@*) {
    print $blockdev->{device} . " got status " . $blockdev->{'io-status'} . "\n";
}

root@pve701 ~ # perl query-block.pm 169
drive-ide2 got status ok
drive-scsi0 got status nospace
drive-scsi1 got status ok
drive-scsi2 got status ok
 
When did the error occur? In the syslog I can see
Code:
Mar 04 19:47:46 proxmox smartd[773]: Device: /dev/sdc [SAT], removed ATA device: No such device
Mar 04 19:47:46 proxmox smartd[773]: Device: /dev/sdd [SAT], reconnected ATA device
but no direct messages related to IO errors.

Next time it happens, the following script should tell you which drive got the error:
Code:
root@pve701 ~ # cat query-block.pm
#!/bin/perl

use strict;
use warnings;

use PVE::QemuServer::Monitor qw(mon_cmd);

my $vmid = shift or die "need to specify vmid\n";

my $res = eval { mon_cmd($vmid, "query-block" ) };
die $@ if $@;
for my $blockdev ($res->@*) {
    print $blockdev->{device} . " got status " . $blockdev->{'io-status'} . "\n";
}

root@pve701 ~ # perl query-block.pm 169
drive-ide2 got status ok
drive-scsi0 got status nospace
drive-scsi1 got status ok
drive-scsi2 got status ok
Hi again, the io-error appeared again can you help me with the code, how can i use it. The VM id is 101 its for my truenas server.
 
Hi again, the io-error appeared again can you help me with the code, how can i use it. The VM id is 101 its for my truenas server.
Copy the contents
Code:
#!/bin/perl

use strict;
use warnings;

use PVE::QemuServer::Monitor qw(mon_cmd);

my $vmid = shift or die "need to specify vmid\n";

my $res = eval { mon_cmd($vmid, "query-block" ) };
die $@ if $@;
for my $blockdev ($res->@*) {
    print $blockdev->{device} . " got status " . $blockdev->{'io-status'} . "\n";
}
to a file called query-block.pm and then run it with perl query-block.pm 101
 
Copy the contents
Code:
#!/bin/perl

use strict;
use warnings;

use PVE::QemuServer::Monitor qw(mon_cmd);

my $vmid = shift or die "need to specify vmid\n";

my $res = eval { mon_cmd($vmid, "query-block" ) };
die $@ if $@;
for my $blockdev ($res->@*) {
    print $blockdev->{device} . " got status " . $blockdev->{'io-status'} . "\n";
}
to a file called query-block.pm and then run it with perl query-block.pm 101
Thanks for the swift reply,

i got this from the script:

root@proxmox:~# perl query-block.pm 101
drive-ide2 got status ok
drive-scsi0 got status ok
drive-scsi1 got status ok
drive-scsi2 got status nospace
drive-scsi3 got status ok
drive-scsi4 got status ok
 
drive-scsi2 got status nospace
Now we know it was this drive (in this case). Is the drive actually full? Is it still using aio=threads? AFAICT, QEMU will interpret short writes (which in practice, almost almost never happen) for aio=io_uring and aio=native as out-of-space, but not for aio=threads.
 
Now we know it was this drive (in this case). Is the drive actually full? Is it still using aio=threads? AFAICT, QEMU will interpret short writes (which in practice, almost almost never happen) for aio=io_uring and aio=native as out-of-space, but not for aio=threads.
Hello again, yes i am using io=threads but the drive is not full in fact has more than 400 GB free.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!