Staring a vm causes unrelated disk disappear

tsadigov · Aug 4, 2023

I get weird situation
Proxmox is installed on btrfs raid of 3 USB flash drives
I have two additional btrfs raids on 2x18 TB and 2x15TB where I store data
I have NVME where my images are

When I start a vm from NVME for some reason my unrelated 15TB and 18TB btrfs raids disappear, lsblk does not show them. Again vm does not reference them.
When I start another machine nothing happens.

When those btrfs raids disappear dmesg has BTRFS IO error. But my raid on USB flash drives is not affected.

[ 5854.043778] sd 9:0:0:0: [sdd] tag#16 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK cmd_age=0s
[ 5854.043780] sd 9:0:0:0: [sdd] tag#17 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK cmd_age=0s
[ 5854.043783] sd 9:0:0:0: [sdd] tag#16 CDB: Read(16) 88 00 00 00 00 04 71 61 67 38 00 00 00 48 00 00
[ 5854.043783] sd 9:0:0:0: [sdd] tag#18 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK cmd_age=0s
[ 5854.043785] sd 9:0:0:0: [sdd] tag#17 CDB: Read(16) 88 00 00 00 00 04 71 61 67 b8 00 00 00 38 00 00
[ 5854.043786] sd 9:0:0:0: [sdd] tag#18 CDB: Read(16) 88 00 00 00 00 04 71 61 66 78 00 00 00 08 00 00
[ 5854.043786] blk_update_request: I/O error, dev sdd, sector 19082078008 op 0x0

READ) flags 0x0 phys_seg 9 prio class 0
[ 5854.043788] blk_update_request: I/O error, dev sdd, sector 19082078136 op 0x0

READ) flags 0x0 phys_seg 7 prio class 0
[ 5854.043792] blk_update_request: I/O error, dev sdd, sector 19082077816 op 0x0

READ) flags 0x0 phys_seg 1 prio class 0

leesteken · Aug 4, 2023

Does the VM use PCI(e) passthrough? Then you might want to lookup IOMMU groups. Otherwise, I don't know what could be causing this.

tsadigov · Aug 4, 2023

leesteken said:
Does the VM use PCI(e) passthrough? Then you might want to lookup IOMMU groups. Otherwise, I don't know what could be causing this.

Yes it is using, but it was working for months. I was reconfiguring the PC and messed up. ~~But I found where it touches those disks.~~
(I was wrong at this point) Thanks for your response.

tsadigov · Aug 4, 2023

@leesteken I was thinking it is small thing but it is getting stranger. I created a new vm and attached old vms image to it. From host I unmounted my 10 and 18 TB hard drives that are being affected. Still when I start the machine those hard disks disappear from lsblk.
The latest weirdnes is that I noticed they appear in the guest os when I do lsblk. Somehow guest steals them from host. I dont know why proxmox/qemu/kvm decides to do that because those hard drives are not referenced from guest configuration

balloon: 0
bios: ovmf
boot: order=scsi0
cores: 10
cpu: host,flags=+pcid
hostpci0: 0000:0b:00,pcie=1
ide2: none,media=cdrom
machine: q35
memory: 70960
name: AI.hub
net0: virtio=3A:B7:F7

5

7:44,bridge=vmbr0,firewall=1
numa: 0
ostype: l26
scsi0: black:204/vm-204-disk-0.qcow2,cache=writeback,size=80G
scsihw: virtio-scsi-pci
smbios1: uuid=da4c9d93-f41a-4701-9f25-63fba6254798
sockets: 1
tags: ok
usb1: host=1-6.4
vga: std,memory=512
vmgenid: 7b6e8266-5f6c-4b82-81ac-91e476ddcd35

I ended up with this mess after making some modifications on my computer. I increased PSU from 650 to 1200. I added 3rd nvme(not used or referenced yet). I removed 2 SATA ssds. And copied some contents from them into one of my NVME.

Is it possible that detaching SATA cables and plugging them in different order causes such weird behavior?

tsadigov · Aug 4, 2023

leesteken said:
Does the VM use PCI(e) passthrough? Then you might want to lookup IOMMU groups. Otherwise, I don't know what could be causing this.

You are a genious. You actually pointed in the right direction. Pcie passthrough is the reason guest is stealing these hard drives from host. So the old address for GPU has changed, and hostpci0: 0000:0b:00,pcie=1 is now pointing to sata controller, GPU has moved to 0000:0c:.00.
Thank you very much

Michael2006 · Sep 29, 2023

When we have SATA device error, (IO error) or switching to read-only state. Error appears on the disks on the heavy load process, backup for example. iommu

grep "sdb" /var/log/syslog | tail -n100

Code:

Sep 28 10:40:55 kernel: [404556.968434] sd 1:0:0:0: [sdb] tag#26 CDB: Read(16) 88 00 00 00 00 01 40 00 08 00 00 00 01 00 00 00
Sep 28 10:40:55 kernel: [404556.968435] blk_update_request: I/O error, dev sdb, sector 5368711168 op 0x0:(READ) flags 0x0 phys_seg 32 prio class 0
Sep 28 10:40:55 kernel: [404556.973909] sd 1:0:0:0: [sdb] tag#27 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK cmd_age=0s
Sep 28 10:40:55 kernel: [404556.973910] sd 1:0:0:0: [sdb] tag#27 CDB: Read(16) 88 00 00 00 00 00 00 00 00 00 00 00 01 00 00 00
Sep 28 10:40:55 kernel: [404556.973911] blk_update_request: I/O error, dev sdb, sector 0 op 0x0:(READ) flags 0x0 phys_seg 32 prio class 0
Sep 28 10:40:59 smartd[975]: Device: /dev/sdb [SAT], state written to /var/lib/smartmontools/smartd.HGST_HUS726060ALE610-K8GENJGD.ata.state

In ProxMox VM backup process log

Code:

INFO:  52% (981.8 GiB of 1.8 TiB) in 2h 47m 56s, read: 93.6 MiB/s, write: 92.9 MiB/s
INFO:  52% (985.1 GiB of 1.8 TiB) in 2h 53m 51s, read: 9.8 MiB/s, write: 9.7 MiB/s
ERROR: job failed with err -5 - Input/output error
INFO: aborting backup job
INFO: resuming VM again

For AMD Epic series processor I solved the issue with adding boot parameter

"iommu=pt" and "amd_iommu=on"

Check if iommu is supported

sudo dmesg | grep -i iommu

sudo dmesg | grep -e DMAR -e IOMMU

Set IOMMU

vim /etc/default/grub

add parameters to the string GRUB_CMDLINE_LINUX_DEFAULT

Code:

GRUB_CMDLINE_LINUX_DEFAULT="amd_iommu=on iommu=pt consoleblank=0"

!! Important. If you have Intel processor, command should be "intel_iommu=on"

reboot the server

after rebook check for boot errors

sudo dmesg | grep "error"

I have got a bit slower IO to the SATA disks, but more stable work with SATA devices. Now SATA devices do not dissappear or switches to read-only state.

-

Michael2006 · Sep 29, 2023

I had SATA device error, (IO error) or switching to read-only state. Error appears on the disks on the heavy load process, backup for example.

grep "sdb" /var/log/syslog | tail -n100

Code:

Sep 28 10:40:55 kernel: [404556.968434] sd 1:0:0:0: [sdb] tag#26 CDB: Read(16) 88 00 00 00 00 01 40 00 08 00 00 00 01 00 00 00
Sep 28 10:40:55 kernel: [404556.968435] blk_update_request: I/O error, dev sdb, sector 5368711168 op 0x0:(READ) flags 0x0 phys_seg 32 prio class 0
Sep 28 10:40:55 kernel: [404556.973909] sd 1:0:0:0: [sdb] tag#27 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK cmd_age=0s
Sep 28 10:40:55 kernel: [404556.973910] sd 1:0:0:0: [sdb] tag#27 CDB: Read(16) 88 00 00 00 00 00 00 00 00 00 00 00 01 00 00 00
Sep 28 10:40:55 kernel: [404556.973911] blk_update_request: I/O error, dev sdb, sector 0 op 0x0:(READ) flags 0x0 phys_seg 32 prio class 0
Sep 28 10:40:59 smartd[975]: Device: /dev/sdb [SAT], state written to /var/lib/smartmontools/smartd.HGST_HUS726060ALE610-K8GENJGD.ata.state

In ProxMox VM backup process log

Code:

INFO:  52% (981.8 GiB of 1.8 TiB) in 2h 47m 56s, read: 93.6 MiB/s, write: 92.9 MiB/s
INFO:  52% (985.1 GiB of 1.8 TiB) in 2h 53m 51s, read: 9.8 MiB/s, write: 9.7 MiB/s
ERROR: job failed with err -5 - Input/output error
INFO: aborting backup job
INFO: resuming VM again

For AMD Epic series processor I solved the issue with adding boot parameter
"iommu=pt" and "amd_iommu=on"

Check if iommu is supported
sudo dmesg | grep -i iommu
sudo dmesg | grep -e DMAR -e IOMMU

Set IOMMU
vim /etc/default/grub
add parameters to the string GRUB_CMDLINE_LINUX_DEFAULT
GRUB_CMDLINE_LINUX_DEFAULT="amd_iommu=on iommu=pt consoleblank=0"

reboot the server
after rebook check for boot errors
sudo dmesg | grep "error"

I have got a bit slower IO to the SATA disks, but more stable work with SATA devices. Now SATA devices do not dissappear or switches to read-only state.

Staring a vm causes unrelated disk disappear

tsadigov

Member

leesteken

Distinguished Member

tsadigov

Member

tsadigov

Member

tsadigov

Member

Michael2006

Active Member

Michael2006

Active Member

We value your privacy