Hi!
I'm running Proxmox as a base for my home lab setup, and today I noticed one of my VMs didn't start up again after a power outage.
Found this error in PVE web ui:
Then these lines in boot logs indicate issues with one of the LVM VGs:
- "pve" VG comes up fine, while "vmdata" doesnt. VM's that dont use resources from the "vmdata" VG obviously start OK.
I'm struggling to find out what kind of manual repair is needed for vmdata/vmstore vg/pool.
Here are some LVM command output details:
pveversion (yes, I need to upgrade to a newer version, but thats another issue)
Afraid of worsening the situation I've yet to take any actions.
Any suggestions on what to do?
With many thanks,
Geir
I'm running Proxmox as a base for my home lab setup, and today I noticed one of my VMs didn't start up again after a power outage.
Found this error in PVE web ui:
Code:
kvm: -drive file=/dev/vmdata/vm-100-disk-0,if=none,id=drive-scsi1,format=raw,cache=none,aio=native,detect-zeroes=on: Could not open '/dev/vmdata/vm-100-disk-0': No such file or directory
TASK ERROR: start failed: command '/usr/bin/kvm -id 100 -name docker1-vm -chardev 'socket,id=qmp,path=/var/run/qemu-server/100.qmp,server,nowait' -mon 'chardev=qmp,mode=control' -chardev 'socket,id=qmp-event,path=/var/run/qmeventd.sock,reconnect=5' -mon 'chardev=qmp-event,mode=control' -pidfile /var/run/qemu-server/100.pid -daemonize -smbios 'type=1,uuid=48e150e4-fa95-4536-b369-3e6463d9e07a' -smp '8,sockets=2,cores=4,maxcpus=8' -nodefaults -boot 'menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg' -vnc unix:/var/run/qemu-server/100.vnc,x509,password -cpu kvm64,+lahf_lm,+sep,+kvm_pv_unhalt,+kvm_pv_eoi,enforce -m 16304 -device 'pci-bridge,id=pci.1,chassis_nr=1,bus=pci.0,addr=0x1e' -device 'pci-bridge,id=pci.2,chassis_nr=2,bus=pci.0,addr=0x1f' -device 'vmgenid,guid=43b4be43-bf71-455e-aa15-0016753190dc' -device 'piix3-usb-uhci,id=uhci,bus=pci.0,addr=0x1.0x2' -device 'usb-tablet,id=tablet,bus=uhci.0,port=1' -device 'VGA,id=vga,bus=pci.0,addr=0x2' -device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3' -iscsi 'initiator-name=iqn.1993-08.org.debian:01:e845266492fe' -drive 'file=/var/lib/vz/template/iso/turnkey-core-15.0-stretch-amd64.iso,if=none,id=drive-ide2,media=cdrom,aio=threads' -device 'ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=200' -device 'virtio-scsi-pci,id=scsihw0,bus=pci.0,addr=0x5' -drive 'file=/dev/pve/vm-100-disk-0,if=none,id=drive-scsi0,format=raw,cache=none,aio=native,detect-zeroes=on' -device 'scsi-hd,bus=scsihw0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0,bootindex=100' -drive 'file=/dev/vmdata/vm-100-disk-0,if=none,id=drive-scsi1,format=raw,cache=none,aio=native,detect-zeroes=on' -device 'scsi-hd,bus=scsihw0.0,channel=0,scsi-id=0,lun=1,drive=drive-scsi1,id=scsi1' -netdev 'type=tap,id=net0,ifname=tap100i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown' -device 'e1000,mac=52:4E:B2:55:22:43,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=300' -machine 'type=pc'' failed: exit code 1
Then these lines in boot logs indicate issues with one of the LVM VGs:
Code:
(..)
Oct 28 19:25:20 hp1 systemd[1]: Starting Activation of LVM2 logical volumes...
Oct 28 19:25:20 hp1 systemd[1]: Started LXC Container Initialization and Autoboot Code.
Oct 28 19:25:20 hp1 postfix[1939]: Postfix is running with backwards-compatible default settings
Oct 28 19:25:20 hp1 postfix[1939]: See http://www.postfix.org/COMPATIBILITY_README.html for details
Oct 28 19:25:20 hp1 postfix[1939]: To disable backwards compatibility use "postconf compatibility_level=2" and "postfix reload"
Oct 28 19:25:20 hp1 systemd[1]: Started OpenBSD Secure Shell server.
Oct 28 19:25:20 hp1 kernel: [ 28.480105] vmbr0: port 1(enp3s0) entered disabled state
Oct 28 19:25:21 hp1 systemd[1]: Started Proxmox VE Login Banner.
Oct 28 19:25:21 hp1 lvm[1893]: Check of pool vmdata/vmstore failed (status:1). Manual repair required!
(..)
Oct 28 19:25:25 hp1 lvm[1893]: 0 logical volume(s) in volume group "vmdata" now active
Oct 28 19:25:25 hp1 lvm[1893]: 9 logical volume(s) in volume group "pve" now active
Oct 28 19:25:25 hp1 systemd[1]: lvm2-activation-net.service: Main process exited, code=exited, status=5/NOTINSTALLED
Oct 28 19:25:25 hp1 systemd[1]: Failed to start Activation of LVM2 logical volumes.
Oct 28 19:25:25 hp1 systemd[1]: lvm2-activation-net.service: Unit entered failed state.
Oct 28 19:25:25 hp1 systemd[1]: lvm2-activation-net.service: Failed with result 'exit-code'.
- "pve" VG comes up fine, while "vmdata" doesnt. VM's that dont use resources from the "vmdata" VG obviously start OK.
I'm struggling to find out what kind of manual repair is needed for vmdata/vmstore vg/pool.
Here are some LVM command output details:
Code:
# pvs
PV VG Fmt Attr PSize PFree
/dev/sda3 pve lvm2 a-- 136.41g 16.00g
/dev/sdb1 vmdata lvm2 a-- 1.82t 19.58g
# vgs
VG #PV #LV #SN Attr VSize VFree
pve 1 10 0 wz--n- 136.41g 16.00g
vmdata 1 13 0 wz--n- 1.82t 19.58g
# lvs
LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert
data pve twi-aotz-- 76.41g 75.18 4.38
root pve -wi-ao---- 34.00g
snap_vm-204-disk-0_ssh_key_configured pve Vri---tz-k 100.00g data
swap pve -wi-ao---- 8.00g
vm-100-disk-0 pve Vwi-a-tz-- 60.00g data 82.08
vm-101-state-mangler_902 pve Vwi-a-tz-- 12.21g data 6.77
vm-103-disk-0 pve Vwi-a-tz-- 8.00g data 0.84
vm-104-disk-1 pve Vwi-aotz-- 8.00g data 13.53
vm-204-disk-0 pve Vwi-a-tz-- 100.00g data snap_vm-204-disk-0_ssh_key_configured 3.38
vm-204-state-ssh_key_configured pve Vwi-a-tz-- 8.49g data 27.25
snap_vm-201-disk-0_post_install vmdata Vri---tz-k 100.00g vmstore
snap_vm-201-disk-0_ssh_key_configured vmdata Vri---tz-k 100.00g vmstore
snap_vm-202-disk-0_ssh_key_configured vmdata Vri---tz-k 100.00g vmstore
snap_vm-203-disk-0_ssh_key_configured vmdata Vri---tz-k 100.00g vmstore
vm-100-disk-0 vmdata Vwi---tz-- 1000.00g vmstore
vm-201-disk-0 vmdata Vwi---tz-- 100.00g vmstore snap_vm-201-disk-0_ssh_key_configured
vm-201-state-post_install vmdata Vwi---tz-- 8.49g vmstore
vm-201-state-ssh_key_configured vmdata Vwi---tz-- 8.49g vmstore
vm-202-disk-0 vmdata Vwi---tz-- 100.00g vmstore snap_vm-202-disk-0_ssh_key_configured
vm-202-state-ssh_key_configured vmdata Vwi---tz-- 8.49g vmstore
vm-203-disk-0 vmdata Vwi---tz-- 100.00g vmstore snap_vm-203-disk-0_ssh_key_configured
vm-203-state-ssh_key_configured vmdata Vwi---tz-- 8.49g vmstore
vmstore vmdata twi---tz-- 1.80t
# pvdisplay
--- Physical volume ---
PV Name /dev/sdb1
VG Name vmdata
PV Size 1.82 TiB / not usable 4.07 MiB
Allocatable yes
PE Size 4.00 MiB
Total PE 476931
Free PE 5013
Allocated PE 471918
PV UUID lbTz8p-u7XV-ednj-edDI-7t6h-iiB6-klNDyB
--- Physical volume ---
PV Name /dev/sda3
VG Name pve
PV Size 136.42 GiB / not usable 1.17 MiB
Allocatable yes
PE Size 4.00 MiB
Total PE 34922
Free PE 4096
Allocated PE 30826
PV UUID 2rIsB2-4BT3-L9QJ-fHIu-x0QC-xQ8G-41p9yP
# vgdisplay
--- Volume group ---
VG Name vmdata
System ID
Format lvm2
Metadata Areas 1
Metadata Sequence No 52
VG Access read/write
VG Status resizable
MAX LV 0
Cur LV 13
Open LV 0
Max PV 0
Cur PV 1
Act PV 1
VG Size 1.82 TiB
PE Size 4.00 MiB
Total PE 476931
Alloc PE / Size 471918 / 1.80 TiB
Free PE / Size 5013 / 19.58 GiB
VG UUID 2mCev2-514B-Y81u-SlSg-kNRT-99AF-zZrAyI
--- Volume group ---
VG Name pve
System ID
Format lvm2
Metadata Areas 1
Metadata Sequence No 155
VG Access read/write
VG Status resizable
MAX LV 0
Cur LV 10
Open LV 3
Max PV 0
Cur PV 1
Act PV 1
VG Size 136.41 GiB
PE Size 4.00 MiB
Total PE 34922
Alloc PE / Size 30826 / 120.41 GiB
Free PE / Size 4096 / 16.00 GiB
VG UUID 1RJnVp-KdPT-3LIH-dnyc-fJgZ-uY56-9ROedp
# lvdisplay
(...)
--- Logical volume ---
LV Path /dev/pve/vm-100-disk-0
LV Name vm-100-disk-0
VG Name pve
LV UUID 0b8QxD-UzCU-TiOl-2jvV-dbmt-XYGo-Od1eXj
LV Write Access read/write
LV Creation host, time hp1, 2018-11-17 19:06:10 +0100
LV Pool name data
LV Status available
# open 0
LV Size 60.00 GiB
Mapped size 82.08%
Current LE 15360
Segments 1
Allocation inherit
Read ahead sectors auto
- currently set to 256
Block device 253:9
(..)
pveversion (yes, I need to upgrade to a newer version, but thats another issue)
Code:
proxmox-ve: 5.4-2 (running kernel: 4.15.18-30-pve)
pve-manager: 5.4-15 (running version: 5.4-15/d0ec33c6)
pve-kernel-4.15: 5.4-19
pve-kernel-4.15.18-30-pve: 4.15.18-58
pve-kernel-4.15.18-21-pve: 4.15.18-48
pve-kernel-4.15.18-15-pve: 4.15.18-40
pve-kernel-4.15.18-11-pve: 4.15.18-34
pve-kernel-4.15.18-9-pve: 4.15.18-30
pve-kernel-4.15.18-8-pve: 4.15.18-28
pve-kernel-4.15.18-7-pve: 4.15.18-27
pve-kernel-4.15.17-1-pve: 4.15.17-9
corosync: 2.4.4-pve1
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: 1.2-2
libjs-extjs: 6.0.1-2
libpve-access-control: 5.1-12
libpve-apiclient-perl: 2.0-5
libpve-common-perl: 5.0-56
libpve-guest-common-perl: 2.0-20
libpve-http-server-perl: 2.0-14
libpve-storage-perl: 5.0-44
libqb0: 1.0.3-1~bpo9
lvm2: 2.02.168-pve6
lxc-pve: 3.1.0-7
lxcfs: 3.0.3-pve1
novnc-pve: 1.0.0-3
proxmox-widget-toolkit: 1.0-28
pve-cluster: 5.0-38
pve-container: 2.0-42
pve-docs: 5.4-2
pve-edk2-firmware: 1.20190312-1
pve-firewall: 3.0-22
pve-firmware: 2.0-7
pve-ha-manager: 2.0-9
pve-i18n: 1.1-4
pve-libspice-server1: 0.14.1-2
pve-qemu-kvm: 3.0.1-4
pve-xtermjs: 3.12.0-1
qemu-server: 5.0-56
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3
zfsutils-linux: 0.7.13-pve1~bpo2
Afraid of worsening the situation I've yet to take any actions.
Any suggestions on what to do?
With many thanks,
Geir