Greetings. I have two installations of Proxmox VE in virtual machines. The first one is a few months old, PBS and virtual machines are installed on it. The second one is quite fresh. In both the root volume is installed on LVM as the installer suggests by default.
I set up an LVM volume of type raid 5 and in the file /etc/lvm/lvm.conf I select automatic recovery with the type allocate. On one virtual machine, when the disk included in the LV volume is disconnected, automatic recovery is started, on the other - not. I can't find the reason, the lvm.conf configs are identical there and there.
Here is the sequence of actions:
After that:
and remove /dev/vdd (Detach 3Gb HDD from hypervisor)
first, good working Proxmox automatically restores LV:
syslog shows that rebuild process was invokated:
at second VM I already messed with LV rebuilding and that is why it shows some missing PVs:
syslog shows nothing:
lvm.conf is attached.
it seems to me that monitoring on the second node does not work as it should, but I do not know where to start.
I set up an LVM volume of type raid 5 and in the file /etc/lvm/lvm.conf I select automatic recovery with the type allocate. On one virtual machine, when the disk included in the LV volume is disconnected, automatic recovery is started, on the other - not. I can't find the reason, the lvm.conf configs are identical there and there.
Here is the sequence of actions:
Code:
apt update
apt upgrade
nano /etc/lvm/lvm.conf # change lvm raid_fault_policy to allocate mirror_image_fault_policy = "allocate" mirror_log_fault_policy = "allocate"
lvmdiskscan
vgcreate test-lvm /dev/vdb /dev/vdc /dev/vdd /dev/vde /dev/vdf /dev/vdg /dev/vdh /dev/vdi
lvcreate --type raid5 -L 3G -n test-lv test-lvm
mkfs.xfs /dev/test-lvm/test-lv
mkdir /LVM
mount /dev/test-lvm/test-lv /LVM
After that:
Code:
root@pve3:~# lvs -a -o +devices
LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert Devices
data pve twi-a-tz-- <8.31g 0.00 1.58 data_tdata(0)
[data_tdata] pve Twi-ao---- <8.31g /dev/vda3(3375)
[data_tmeta] pve ewi-ao---- 1.00g /dev/vda3(5502)
[lvol0_pmspare] pve ewi------- 1.00g /dev/vda3(5758)
root pve -wi-ao---- <10.31g /dev/vda3(736)
swap pve -wi-ao---- <2.88g /dev/vda3(0)
test-lv test-lvm rwi-aor--- 3.00g 100.00 test-lv_rimage_0(0),test-lv_rimage_1(0),test-lv_rimage_2(0)
[test-lv_rimage_0] test-lvm iwi-aor--- 1.50g /dev/vdc(1)
[test-lv_rimage_1] test-lvm iwi-aor--- 1.50g /dev/vdd(1)
[test-lv_rimage_2] test-lvm iwi-aor--- 1.50g /dev/vde(1)
[test-lv_rmeta_0] test-lvm ewi-aor--- 4.00m /dev/vdc(0)
[test-lv_rmeta_1] test-lvm ewi-aor--- 4.00m /dev/vdd(0)
[test-lv_rmeta_2] test-lvm ewi-aor--- 4.00m /dev/vde(0)
first, good working Proxmox automatically restores LV:
Code:
root@pve3:~# lvs -a -o +devices
WARNING: Couldn't find device with uuid gqJAeh-juFX-SZg3-9nYl-Kd4g-bNCs-vzWNU4.
WARNING: VG test-lvm is missing PV gqJAeh-juFX-SZg3-9nYl-Kd4g-bNCs-vzWNU4 (last written to [unknown]).
LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert Devices
data pve twi-a-tz-- <8.31g 0.00 1.58 data_tdata(0)
[data_tdata] pve Twi-ao---- <8.31g /dev/vda3(3375)
[data_tmeta] pve ewi-ao---- 1.00g /dev/vda3(5502)
[lvol0_pmspare] pve ewi------- 1.00g /dev/vda3(5758)
root pve -wi-ao---- <10.31g /dev/vda3(736)
swap pve -wi-ao---- <2.88g /dev/vda3(0)
test-lv test-lvm rwi-aor--- 3.00g 100.00 test-lv_rimage_0(0),test-lv_rimage_1(0),test-lv_rimage_2(0)
[test-lv_rimage_0] test-lvm iwi-aor--- 1.50g /dev/vdc(1)
[test-lv_rimage_1] test-lvm iwi-aor--- 1.50g /dev/vdf(1)
[test-lv_rimage_2] test-lvm iwi-aor--- 1.50g /dev/vde(1)
[test-lv_rmeta_0] test-lvm ewi-aor--- 4.00m /dev/vdc(0)
[test-lv_rmeta_1] test-lvm ewi-aor--- 4.00m /dev/vdf(0)
[test-lv_rmeta_2] test-lvm ewi-aor--- 4.00m /dev/vde(0)
syslog shows that rebuild process was invokated:
Code:
Mar 3 12:21:38 pve3 lvm[356]: WARNING: Device #1 of raid5_ls array, test--lvm-test--lv, has failed.
Mar 3 12:21:38 pve3 kernel: [ 5078.087899] device-mapper: raid: Device 1 specified for rebuild; clearing superblock
Mar 3 12:21:38 pve3 kernel: [ 5078.092585] md/raid:mdX: device dm-6 operational as raid disk 0
Mar 3 12:21:38 pve3 kernel: [ 5078.092591] md/raid:mdX: device dm-10 operational as raid disk 2
Mar 3 12:21:38 pve3 kernel: [ 5078.094144] md/raid:mdX: raid level 5 active with 2 out of 3 devices, algorithm 2
Mar 3 12:21:38 pve3 lvm[356]: WARNING: waiting for resynchronization to finish before initiating repair on RAID device test--lvm-test--lv.
Mar 3 12:21:38 pve3 lvm[356]: WARNING: Couldn't find device with uuid gqJAeh-juFX-SZg3-9nYl-Kd4g-bNCs-vzWNU4.
Mar 3 12:21:38 pve3 lvm[356]: WARNING: VG test-lvm is missing PV gqJAeh-juFX-SZg3-9nYl-Kd4g-bNCs-vzWNU4 (last written to /dev/vdd).
Mar 3 12:21:38 pve3 lvm[356]: WARNING: Couldn't find device with uuid gqJAeh-juFX-SZg3-9nYl-Kd4g-bNCs-vzWNU4.
Mar 3 12:21:38 pve3 lvm[356]: WARNING: Couldn't find device with uuid gqJAeh-juFX-SZg3-9nYl-Kd4g-bNCs-vzWNU4.
Mar 3 12:21:38 pve3 kernel: [ 5078.248447] md: recovery of RAID array mdX
Mar 3 12:21:38 pve3 lvm[356]: WARNING: Couldn't find device with uuid gqJAeh-juFX-SZg3-9nYl-Kd4g-bNCs-vzWNU4.
Mar 3 12:21:38 pve3 kernel: [ 5078.367755] md/raid:mdX: device dm-6 operational as raid disk 0
Mar 3 12:21:38 pve3 kernel: [ 5078.367762] md/raid:mdX: device dm-10 operational as raid disk 2
Mar 3 12:21:38 pve3 kernel: [ 5078.371227] md/raid:mdX: raid level 5 active with 2 out of 3 devices, algorithm 2
Mar 3 12:21:38 pve3 kernel: [ 5078.371484] mdX: bitmap file is out of date (23 < 24) -- forcing full recovery
Mar 3 12:21:39 pve3 kernel: [ 5078.425678] md: mdX: recovery interrupted.
Mar 3 12:21:39 pve3 kernel: [ 5078.553910] mdX: bitmap file is out of date, doing full recovery
Mar 3 12:21:39 pve3 kernel: [ 5078.560690] md: recovery of RAID array mdX
Mar 3 12:21:39 pve3 lvm[356]: Faulty devices in test-lvm/test-lv successfully replaced.
Mar 3 12:21:39 pve3 lvm[356]: raid5_ls array, test--lvm-test--lv, is not in-sync.
Mar 3 12:21:48 pve3 kernel: [ 5088.130518] md: mdX: recovery done.
Mar 3 12:21:48 pve3 lvm[356]: raid5_ls array, test--lvm-test--lv, is now in-sync.
at second VM I already messed with LV rebuilding and that is why it shows some missing PVs:
Code:
root@pve2:~# lvs -a -o +devices
WARNING: Couldn't find device with uuid cpDTTr-0yxK-7IW3-hcaU-0kBC-aevV-jlQtwX.
WARNING: Couldn't find device with uuid S0cqz8-67Nz-YFqu-0MZF-fwVg-1twg-6WozyG.
WARNING: VG test-lvm is missing PV cpDTTr-0yxK-7IW3-hcaU-0kBC-aevV-jlQtwX (last written to [unknown]).
WARNING: VG test-lvm is missing PV S0cqz8-67Nz-YFqu-0MZF-fwVg-1twg-6WozyG (last written to /dev/vdf).
LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert Devices
data pve twi-aotz-- 30.87g 16.48 1.76 data_tdata(0)
[data_tdata] pve Twi-ao---- 30.87g /dev/sda3(5824)
[data_tmeta] pve ewi-ao---- 1.00g /dev/sda3(13727)
[lvol0_pmspare] pve ewi------- 1.00g /dev/sda3(13983)
root pve -wi-ao---- 15.75g /dev/sda3(1792)
swap pve -wi-ao---- 7.00g /dev/sda3(0)
vm-100-disk-0 pve Vwi-a-tz-- 22.00g data 23.12
test-vg test-lvm rwc-a-r-p- 3.00g 100.00 test-vg_rimage_0(0),test-vg_rimage_1(0),test-vg_rimage_2(0)
[test-vg_rimage_0] test-lvm iwi-aor--- 1.50g /dev/vdc(1)
[test-vg_rimage_1] test-lvm iwi-aor--- 1.50g /dev/vdb(1)
[test-vg_rimage_2] test-lvm iwi-aor-p- 1.50g [unknown](1)
[test-vg_rmeta_0] test-lvm ewi-aor--- 4.00m /dev/vdc(0)
[test-vg_rmeta_1] test-lvm ewi-aor--- 4.00m /dev/vdb(0)
[test-vg_rmeta_2] test-lvm ewi-aor-p- 4.00m [unknown](0)
Code:
Mar 3 12:28:25 pve2 kernel: [ 33.420366] audit: type=1400 audit(1677835705.832:16): apparmor="STATUS" operation="profile_replace" info="same as current profile, skipping" profile="unconfined" name="/usr/bin/lxc-start" pid=970 comm="apparmor_parser"
Mar 3 12:28:25 pve2 kernel: [ 33.457379] audit: type=1400 audit(1677835705.868:17): apparmor="STATUS" operation="profile_replace" info="same as current profile, skipping" profile="unconfined" name="lxc-container-default" pid=974 comm="apparmor_parser"
Mar 3 12:28:25 pve2 kernel: [ 33.457399] audit: type=1400 audit(1677835705.868:18): apparmor="STATUS" operation="profile_replace" info="same as current profile, skipping" profile="unconfined" name="lxc-container-default-cgns" pid=974 comm="apparmor_parser"
Mar 3 12:28:25 pve2 kernel: [ 33.457411] audit: type=1400 audit(1677835705.868:19): apparmor="STATUS" operation="profile_replace" info="same as current profile, skipping" profile="unconfined" name="lxc-container-default-with-mounting" pid=974 comm="apparmor_parser"
Mar 3 12:28:25 pve2 kernel: [ 33.457421] audit: type=1400 audit(1677835705.868:20): apparmor="STATUS" operation="profile_replace" info="same as current profile, skipping" profile="unconfined" name="lxc-container-default-with-nesting" pid=974 comm="apparmor_parser"
Mar 3 12:29:11 pve2 qemu-ga: info: guest-ping called
Mar 3 12:29:25 pve2 qemu-ga: info: guest-ping called
Mar 3 12:29:43 pve2 qemu-ga: info: guest-ping called
Mar 3 12:29:59 pve2 qemu-ga: info: guest-ping called
Mar 3 12:30:14 pve2 qemu-ga: info: guest-ping called
Mar 3 12:30:28 pve2 qemu-ga: info: guest-ping called
Mar 3 12:30:42 pve2 qemu-ga: info: guest-ping called
Mar 3 12:30:57 pve2 qemu-ga: info: guest-ping called
Mar 3 12:31:10 pve2 qemu-ga: info: guest-ping called
Mar 3 12:31:24 pve2 qemu-ga: info: guest-ping called
Mar 3 12:31:38 pve2 qemu-ga: info: guest-ping called
it seems to me that monitoring on the second node does not work as it should, but I do not know where to start.