Hi,
I'll explain my problem, and what investigation I've done first, and then end with my concrete questions.
# Inside the VM
I have a VM with 3 drives attached, the boot drive, and 2 drives which then make up an LVM logical volume.
If I have them all attached when starting this VM, it boots, mounts the logical volume, and then within 10s it becomes unresponsive, and the GUI states "io-error".
I've managed to get into the VM without it crashing by detaching one of the drives and waiting for the LVM start job to end, then I can reattach the drive and the LV never gets mounted (which is what appears to cause the issue!).
From looking around after that as far as I can tell the LV is not overfull, nor is the boot drive.
I've not much experience with LVM, but I dumped some info in case there are any glaring red flags for you pros![Smile :) :)](data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7)
Here, "storage" is the volume group that causes the issue.
And here are the kernel logs upon reattaching the drive
Now, I remount the drive, and I get the following kernel logs, within 10 seconds after this I get the io-error and the non responsiveness
Interestingly, the drive does indeed mount, I can enter the directory and look around the files.
It even lived long enough for me to run `du` and find that indeed, the files in the directory total only ~300GB.
# Outside the VM
Now, I'm not sure if this is relevant but the 3 drives are all from LVM thin storages.
For one of them, the VM disk takes up 100% of the underlying LVM thin, but that is fine as far as I can understand?
# Concrete questions
What actually is an "io-error"? How does Proxmox determine this, I'd love to know where I can find more logs.
Other than that, is there any extra LVM checks/logs you can think of that could help? I understand that's more of an LVM question than a Proxmox question!
Thanks![Smile :) :)](data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7)
I'll explain my problem, and what investigation I've done first, and then end with my concrete questions.
# Inside the VM
I have a VM with 3 drives attached, the boot drive, and 2 drives which then make up an LVM logical volume.
If I have them all attached when starting this VM, it boots, mounts the logical volume, and then within 10s it becomes unresponsive, and the GUI states "io-error".
I've managed to get into the VM without it crashing by detaching one of the drives and waiting for the LVM start job to end, then I can reattach the drive and the LV never gets mounted (which is what appears to cause the issue!).
From looking around after that as far as I can tell the LV is not overfull, nor is the boot drive.
I've not much experience with LVM, but I dumped some info in case there are any glaring red flags for you pros
Here, "storage" is the volume group that causes the issue.
Code:
root@shares:/home/ubuntu# lvdisplay
--- Logical volume ---
LV Path /dev/storage/store-lv
LV Name store-lv
VG Name storage
LV UUID 8SA1Lf-Lzoi-U61H-qGm3-6R1C-okjd-nWyXWT
LV Write Access read/write
LV Creation host, time shares, 2021-07-15 12:07:01 +0000
LV Status available
# open 0
LV Size <1.96 TiB
Current LE 512902
Segments 2
Allocation inherit
Read ahead sectors auto
- currently set to 256
Block device 253:1
--- Logical volume ---
LV Path /dev/ubuntu-vg/ubuntu-lv
LV Name ubuntu-lv
VG Name ubuntu-vg
LV UUID 787Rhl-fM1s-aG3d-NNbc-1ktJ-A8uD-dyhl33
LV Write Access read/write
LV Creation host, time ubuntu-server, 2021-07-15 11:23:20 +0000
LV Status available
# open 1
LV Size <31.00 GiB
Current LE 7935
Segments 1
Allocation inherit
Read ahead sectors auto
- currently set to 256
Block device 253:0
Code:
root@shares:/home/ubuntu# vgdisplay
--- Volume group ---
VG Name storage
System ID
Format lvm2
Metadata Areas 2
Metadata Sequence No 2
VG Access read/write
VG Status resizable
MAX LV 0
Cur LV 1
Open LV 0
Max PV 0
Cur PV 2
Act PV 2
VG Size <1.96 TiB
PE Size 4.00 MiB
Total PE 512902
Alloc PE / Size 512902 / <1.96 TiB
Free PE / Size 0 / 0
VG UUID FXzY0z-ZAjD-WwQM-Ae4v-sWtS-s0M3-YbMRyf
--- Volume group ---
VG Name ubuntu-vg
System ID
Format lvm2
Metadata Areas 1
Metadata Sequence No 2
VG Access read/write
VG Status resizable
MAX LV 0
Cur LV 1
Open LV 1
Max PV 0
Cur PV 1
Act PV 1
VG Size <31.00 GiB
PE Size 4.00 MiB
Total PE 7935
Alloc PE / Size 7935 / <31.00 GiB
Free PE / Size 0 / 0
VG UUID hVtuu3-XvcR-fYsL-GWpe-6VR1-DATY-uH0nND
Code:
root@shares:/home/ubuntu# pvdisplay
--- Physical volume ---
PV Name /dev/sdb
VG Name storage
PV Size 979.53 GiB / not usable 4.00 MiB
Allocatable yes (but full)
PE Size 4.00 MiB
Total PE 250759
Free PE 0
Allocated PE 250759
PV UUID nsv0AZ-ey0m-vs2T-AaQE-l3aJ-m1bq-ySI5te
--- Physical volume ---
PV Name /dev/sdc
VG Name storage
PV Size 1.00 TiB / not usable 4.00 MiB
Allocatable yes (but full)
PE Size 4.00 MiB
Total PE 262143
Free PE 0
Allocated PE 262143
PV UUID a8SOy9-E7mf-KLvK-e1a0-VPrQ-1p4L-eOHXIy
--- Physical volume ---
PV Name /dev/sda3
VG Name ubuntu-vg
PV Size <31.00 GiB / not usable 0
Allocatable yes (but full)
PE Size 4.00 MiB
Total PE 7935
Free PE 0
Allocated PE 7935
PV UUID EO38KG-PfTM-NA6J-cBJc-jPsL-i22g-8pyINC
Code:
root@shares:/home/ubuntu# df -h
Filesystem Size Used Avail Use% Mounted on
udev 950M 0 950M 0% /dev
tmpfs 199M 1.1M 198M 1% /run
/dev/mapper/ubuntu--vg-ubuntu--lv 31G 5.5G 24G 19% /
tmpfs 994M 0 994M 0% /dev/shm
tmpfs 5.0M 0 5.0M 0% /run/lock
tmpfs 994M 0 994M 0% /sys/fs/cgroup
/dev/sda2 976M 298M 611M 33% /boot
/dev/loop0 56M 56M 0 100% /snap/core18/2074
/dev/loop1 56M 56M 0 100% /snap/core18/2128
/dev/loop2 62M 62M 0 100% /snap/core20/1081
/dev/loop3 68M 68M 0 100% /snap/lxd/21545
/dev/loop4 71M 71M 0 100% /snap/lxd/21029
/dev/loop5 33M 33M 0 100% /snap/snapd/13170
/dev/loop6 33M 33M 0 100% /snap/snapd/12883
tmpfs 199M 0 199M 0% /run/user/1000
And here are the kernel logs upon reattaching the drive
Code:
[ 368.530490] scsi 2:0:0:2: Direct-Access QEMU QEMU HARDDISK 2.5+ PQ: 0 ANSI: 5
[ 368.538129] sd 2:0:0:2: Power-on or device reset occurred
[ 368.538605] sd 2:0:0:2: Attached scsi generic sg3 type 0
[ 368.538859] sd 2:0:0:2: [sdc] 2147483648 512-byte logical blocks: (1.10 TB/1.00 TiB)
[ 368.538971] sd 2:0:0:2: [sdc] Write Protect is off
[ 368.538975] sd 2:0:0:2: [sdc] Mode Sense: 63 00 00 08
[ 368.539139] sd 2:0:0:2: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[ 368.550920] sd 2:0:0:2: [sdc] Attached SCSI disk
Now, I remount the drive, and I get the following kernel logs, within 10 seconds after this I get the io-error and the non responsiveness
Code:
[ 377.667450] EXT4-fs (dm-1): recovery complete
[ 377.668365] EXT4-fs (dm-1): mounted filesystem with ordered data mode. Opts: (null)
Code:
Interestingly, the drive does indeed mount, I can enter the directory and look around the files.
It even lived long enough for me to run `du` and find that indeed, the files in the directory total only ~300GB.
# Outside the VM
Now, I'm not sure if this is relevant but the 3 drives are all from LVM thin storages.
For one of them, the VM disk takes up 100% of the underlying LVM thin, but that is fine as far as I can understand?
# Concrete questions
What actually is an "io-error"? How does Proxmox determine this, I'd love to know where I can find more logs.
Other than that, is there any extra LVM checks/logs you can think of that could help? I understand that's more of an LVM question than a Proxmox question!
Thanks