Hi,
I'll explain my problem, and what investigation I've done first, and then end with my concrete questions.
# Inside the VM
I have a VM with 3 drives attached, the boot drive, and 2 drives which then make up an LVM logical volume.
If I have them all attached when starting this VM, it boots, mounts the logical volume, and then within 10s it becomes unresponsive, and the GUI states "io-error".
I've managed to get into the VM without it crashing by detaching one of the drives and waiting for the LVM start job to end, then I can reattach the drive and the LV never gets mounted (which is what appears to cause the issue!).
From looking around after that as far as I can tell the LV is not overfull, nor is the boot drive.
I've not much experience with LVM, but I dumped some info in case there are any glaring red flags for you pros
Here, "storage" is the volume group that causes the issue.
And here are the kernel logs upon reattaching the drive
Now, I remount the drive, and I get the following kernel logs, within 10 seconds after this I get the io-error and the non responsiveness
Interestingly, the drive does indeed mount, I can enter the directory and look around the files.
It even lived long enough for me to run `du` and find that indeed, the files in the directory total only ~300GB.
# Outside the VM
Now, I'm not sure if this is relevant but the 3 drives are all from LVM thin storages.
For one of them, the VM disk takes up 100% of the underlying LVM thin, but that is fine as far as I can understand?
# Concrete questions
What actually is an "io-error"? How does Proxmox determine this, I'd love to know where I can find more logs.
Other than that, is there any extra LVM checks/logs you can think of that could help? I understand that's more of an LVM question than a Proxmox question!
Thanks
I'll explain my problem, and what investigation I've done first, and then end with my concrete questions.
# Inside the VM
I have a VM with 3 drives attached, the boot drive, and 2 drives which then make up an LVM logical volume.
If I have them all attached when starting this VM, it boots, mounts the logical volume, and then within 10s it becomes unresponsive, and the GUI states "io-error".
I've managed to get into the VM without it crashing by detaching one of the drives and waiting for the LVM start job to end, then I can reattach the drive and the LV never gets mounted (which is what appears to cause the issue!).
From looking around after that as far as I can tell the LV is not overfull, nor is the boot drive.
I've not much experience with LVM, but I dumped some info in case there are any glaring red flags for you pros
Here, "storage" is the volume group that causes the issue.
Code:
root@shares:/home/ubuntu# lvdisplay
--- Logical volume ---
LV Path /dev/storage/store-lv
LV Name store-lv
VG Name storage
LV UUID 8SA1Lf-Lzoi-U61H-qGm3-6R1C-okjd-nWyXWT
LV Write Access read/write
LV Creation host, time shares, 2021-07-15 12:07:01 +0000
LV Status available
# open 0
LV Size <1.96 TiB
Current LE 512902
Segments 2
Allocation inherit
Read ahead sectors auto
- currently set to 256
Block device 253:1
--- Logical volume ---
LV Path /dev/ubuntu-vg/ubuntu-lv
LV Name ubuntu-lv
VG Name ubuntu-vg
LV UUID 787Rhl-fM1s-aG3d-NNbc-1ktJ-A8uD-dyhl33
LV Write Access read/write
LV Creation host, time ubuntu-server, 2021-07-15 11:23:20 +0000
LV Status available
# open 1
LV Size <31.00 GiB
Current LE 7935
Segments 1
Allocation inherit
Read ahead sectors auto
- currently set to 256
Block device 253:0
Code:
root@shares:/home/ubuntu# vgdisplay
--- Volume group ---
VG Name storage
System ID
Format lvm2
Metadata Areas 2
Metadata Sequence No 2
VG Access read/write
VG Status resizable
MAX LV 0
Cur LV 1
Open LV 0
Max PV 0
Cur PV 2
Act PV 2
VG Size <1.96 TiB
PE Size 4.00 MiB
Total PE 512902
Alloc PE / Size 512902 / <1.96 TiB
Free PE / Size 0 / 0
VG UUID FXzY0z-ZAjD-WwQM-Ae4v-sWtS-s0M3-YbMRyf
--- Volume group ---
VG Name ubuntu-vg
System ID
Format lvm2
Metadata Areas 1
Metadata Sequence No 2
VG Access read/write
VG Status resizable
MAX LV 0
Cur LV 1
Open LV 1
Max PV 0
Cur PV 1
Act PV 1
VG Size <31.00 GiB
PE Size 4.00 MiB
Total PE 7935
Alloc PE / Size 7935 / <31.00 GiB
Free PE / Size 0 / 0
VG UUID hVtuu3-XvcR-fYsL-GWpe-6VR1-DATY-uH0nND
Code:
root@shares:/home/ubuntu# pvdisplay
--- Physical volume ---
PV Name /dev/sdb
VG Name storage
PV Size 979.53 GiB / not usable 4.00 MiB
Allocatable yes (but full)
PE Size 4.00 MiB
Total PE 250759
Free PE 0
Allocated PE 250759
PV UUID nsv0AZ-ey0m-vs2T-AaQE-l3aJ-m1bq-ySI5te
--- Physical volume ---
PV Name /dev/sdc
VG Name storage
PV Size 1.00 TiB / not usable 4.00 MiB
Allocatable yes (but full)
PE Size 4.00 MiB
Total PE 262143
Free PE 0
Allocated PE 262143
PV UUID a8SOy9-E7mf-KLvK-e1a0-VPrQ-1p4L-eOHXIy
--- Physical volume ---
PV Name /dev/sda3
VG Name ubuntu-vg
PV Size <31.00 GiB / not usable 0
Allocatable yes (but full)
PE Size 4.00 MiB
Total PE 7935
Free PE 0
Allocated PE 7935
PV UUID EO38KG-PfTM-NA6J-cBJc-jPsL-i22g-8pyINC
Code:
root@shares:/home/ubuntu# df -h
Filesystem Size Used Avail Use% Mounted on
udev 950M 0 950M 0% /dev
tmpfs 199M 1.1M 198M 1% /run
/dev/mapper/ubuntu--vg-ubuntu--lv 31G 5.5G 24G 19% /
tmpfs 994M 0 994M 0% /dev/shm
tmpfs 5.0M 0 5.0M 0% /run/lock
tmpfs 994M 0 994M 0% /sys/fs/cgroup
/dev/sda2 976M 298M 611M 33% /boot
/dev/loop0 56M 56M 0 100% /snap/core18/2074
/dev/loop1 56M 56M 0 100% /snap/core18/2128
/dev/loop2 62M 62M 0 100% /snap/core20/1081
/dev/loop3 68M 68M 0 100% /snap/lxd/21545
/dev/loop4 71M 71M 0 100% /snap/lxd/21029
/dev/loop5 33M 33M 0 100% /snap/snapd/13170
/dev/loop6 33M 33M 0 100% /snap/snapd/12883
tmpfs 199M 0 199M 0% /run/user/1000
And here are the kernel logs upon reattaching the drive
Code:
[ 368.530490] scsi 2:0:0:2: Direct-Access QEMU QEMU HARDDISK 2.5+ PQ: 0 ANSI: 5
[ 368.538129] sd 2:0:0:2: Power-on or device reset occurred
[ 368.538605] sd 2:0:0:2: Attached scsi generic sg3 type 0
[ 368.538859] sd 2:0:0:2: [sdc] 2147483648 512-byte logical blocks: (1.10 TB/1.00 TiB)
[ 368.538971] sd 2:0:0:2: [sdc] Write Protect is off
[ 368.538975] sd 2:0:0:2: [sdc] Mode Sense: 63 00 00 08
[ 368.539139] sd 2:0:0:2: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[ 368.550920] sd 2:0:0:2: [sdc] Attached SCSI disk
Now, I remount the drive, and I get the following kernel logs, within 10 seconds after this I get the io-error and the non responsiveness
Code:
[ 377.667450] EXT4-fs (dm-1): recovery complete
[ 377.668365] EXT4-fs (dm-1): mounted filesystem with ordered data mode. Opts: (null)
Code:
Interestingly, the drive does indeed mount, I can enter the directory and look around the files.
It even lived long enough for me to run `du` and find that indeed, the files in the directory total only ~300GB.
# Outside the VM
Now, I'm not sure if this is relevant but the 3 drives are all from LVM thin storages.
For one of them, the VM disk takes up 100% of the underlying LVM thin, but that is fine as far as I can understand?
# Concrete questions
What actually is an "io-error"? How does Proxmox determine this, I'd love to know where I can find more logs.
Other than that, is there any extra LVM checks/logs you can think of that could help? I understand that's more of an LVM question than a Proxmox question!
Thanks