Why my lvm raid_fault_policy = "allocate" not working

hatred · Mar 3, 2023

Greetings. I have two installations of Proxmox VE in virtual machines. The first one is a few months old, PBS and virtual machines are installed on it. The second one is quite fresh. In both the root volume is installed on LVM as the installer suggests by default.

I set up an LVM volume of type raid 5 and in the file /etc/lvm/lvm.conf I select automatic recovery with the type allocate. On one virtual machine, when the disk included in the LV volume is disconnected, automatic recovery is started, on the other - not. I can't find the reason, the lvm.conf configs are identical there and there.

Here is the sequence of actions:

Code:

apt update

apt upgrade


nano /etc/lvm/lvm.conf # change lvm raid_fault_policy to allocate mirror_image_fault_policy = "allocate" mirror_log_fault_policy = "allocate"

lvmdiskscan

vgcreate test-lvm /dev/vdb /dev/vdc /dev/vdd /dev/vde /dev/vdf /dev/vdg /dev/vdh /dev/vdi
lvcreate --type raid5 -L 3G -n test-lv test-lvm
mkfs.xfs /dev/test-lvm/test-lv
mkdir /LVM
mount /dev/test-lvm/test-lv /LVM

After that:

Code:

root@pve3:~# lvs -a -o +devices

  LV                 VG       Attr       LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert Devices                                                    

  data               pve      twi-a-tz--  <8.31g             0.00   1.58                             data_tdata(0)                                              

  [data_tdata]       pve      Twi-ao----  <8.31g                                                     /dev/vda3(3375)                                            

  [data_tmeta]       pve      ewi-ao----   1.00g                                                     /dev/vda3(5502)                                            

  [lvol0_pmspare]    pve      ewi-------   1.00g                                                     /dev/vda3(5758)                                            

  root               pve      -wi-ao---- <10.31g                                                     /dev/vda3(736)                                             

  swap               pve      -wi-ao----  <2.88g                                                     /dev/vda3(0)                                               

  test-lv            test-lvm rwi-aor---   3.00g                                    100.00           test-lv_rimage_0(0),test-lv_rimage_1(0),test-lv_rimage_2(0)

  [test-lv_rimage_0] test-lvm iwi-aor---   1.50g                                                     /dev/vdc(1)                                                

  [test-lv_rimage_1] test-lvm iwi-aor---   1.50g                                                     /dev/vdd(1)                                                

  [test-lv_rimage_2] test-lvm iwi-aor---   1.50g                                                     /dev/vde(1)                                                

  [test-lv_rmeta_0]  test-lvm ewi-aor---   4.00m                                                     /dev/vdc(0)                                                

  [test-lv_rmeta_1]  test-lvm ewi-aor---   4.00m                                                     /dev/vdd(0)                                                

  [test-lv_rmeta_2]  test-lvm ewi-aor---   4.00m                                                     /dev/vde(0)

and remove /dev/vdd (Detach 3Gb HDD from hypervisor)
first, good working Proxmox automatically restores LV:

Code:

root@pve3:~# lvs -a -o +devices

  WARNING: Couldn't find device with uuid gqJAeh-juFX-SZg3-9nYl-Kd4g-bNCs-vzWNU4.

  WARNING: VG test-lvm is missing PV gqJAeh-juFX-SZg3-9nYl-Kd4g-bNCs-vzWNU4 (last written to [unknown]).

  LV                 VG       Attr       LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert Devices

  data               pve      twi-a-tz--  <8.31g             0.00   1.58                             data_tdata(0)

  [data_tdata]       pve      Twi-ao----  <8.31g                                                     /dev/vda3(3375)

  [data_tmeta]       pve      ewi-ao----   1.00g                                                     /dev/vda3(5502)

  [lvol0_pmspare]    pve      ewi-------   1.00g                                                     /dev/vda3(5758)

  root               pve      -wi-ao---- <10.31g                                                     /dev/vda3(736)

  swap               pve      -wi-ao----  <2.88g                                                     /dev/vda3(0)

  test-lv            test-lvm rwi-aor---   3.00g                                    100.00           test-lv_rimage_0(0),test-lv_rimage_1(0),test-lv_rimage_2(0)

  [test-lv_rimage_0] test-lvm iwi-aor---   1.50g                                                     /dev/vdc(1)

  [test-lv_rimage_1] test-lvm iwi-aor---   1.50g                                                     /dev/vdf(1)

  [test-lv_rimage_2] test-lvm iwi-aor---   1.50g                                                     /dev/vde(1)

  [test-lv_rmeta_0]  test-lvm ewi-aor---   4.00m                                                     /dev/vdc(0)

  [test-lv_rmeta_1]  test-lvm ewi-aor---   4.00m                                                     /dev/vdf(0)

  [test-lv_rmeta_2]  test-lvm ewi-aor---   4.00m                                                     /dev/vde(0)

syslog shows that rebuild process was invokated:

Code:

Mar  3 12:21:38 pve3 lvm[356]: WARNING: Device #1 of raid5_ls array, test--lvm-test--lv, has failed.

Mar  3 12:21:38 pve3 kernel: [ 5078.087899] device-mapper: raid: Device 1 specified for rebuild; clearing superblock

Mar  3 12:21:38 pve3 kernel: [ 5078.092585] md/raid:mdX: device dm-6 operational as raid disk 0

Mar  3 12:21:38 pve3 kernel: [ 5078.092591] md/raid:mdX: device dm-10 operational as raid disk 2

Mar  3 12:21:38 pve3 kernel: [ 5078.094144] md/raid:mdX: raid level 5 active with 2 out of 3 devices, algorithm 2

Mar  3 12:21:38 pve3 lvm[356]: WARNING: waiting for resynchronization to finish before initiating repair on RAID device test--lvm-test--lv.

Mar  3 12:21:38 pve3 lvm[356]: WARNING: Couldn't find device with uuid gqJAeh-juFX-SZg3-9nYl-Kd4g-bNCs-vzWNU4.

Mar  3 12:21:38 pve3 lvm[356]: WARNING: VG test-lvm is missing PV gqJAeh-juFX-SZg3-9nYl-Kd4g-bNCs-vzWNU4 (last written to /dev/vdd).

Mar  3 12:21:38 pve3 lvm[356]: WARNING: Couldn't find device with uuid gqJAeh-juFX-SZg3-9nYl-Kd4g-bNCs-vzWNU4.

Mar  3 12:21:38 pve3 lvm[356]: WARNING: Couldn't find device with uuid gqJAeh-juFX-SZg3-9nYl-Kd4g-bNCs-vzWNU4.

Mar  3 12:21:38 pve3 kernel: [ 5078.248447] md: recovery of RAID array mdX

Mar  3 12:21:38 pve3 lvm[356]: WARNING: Couldn't find device with uuid gqJAeh-juFX-SZg3-9nYl-Kd4g-bNCs-vzWNU4.

Mar  3 12:21:38 pve3 kernel: [ 5078.367755] md/raid:mdX: device dm-6 operational as raid disk 0

Mar  3 12:21:38 pve3 kernel: [ 5078.367762] md/raid:mdX: device dm-10 operational as raid disk 2

Mar  3 12:21:38 pve3 kernel: [ 5078.371227] md/raid:mdX: raid level 5 active with 2 out of 3 devices, algorithm 2

Mar  3 12:21:38 pve3 kernel: [ 5078.371484] mdX: bitmap file is out of date (23 < 24) -- forcing full recovery

Mar  3 12:21:39 pve3 kernel: [ 5078.425678] md: mdX: recovery interrupted.

Mar  3 12:21:39 pve3 kernel: [ 5078.553910] mdX: bitmap file is out of date, doing full recovery

Mar  3 12:21:39 pve3 kernel: [ 5078.560690] md: recovery of RAID array mdX

Mar  3 12:21:39 pve3 lvm[356]: Faulty devices in test-lvm/test-lv successfully replaced.

Mar  3 12:21:39 pve3 lvm[356]: raid5_ls array, test--lvm-test--lv, is not in-sync.

Mar  3 12:21:48 pve3 kernel: [ 5088.130518] md: mdX: recovery done.

Mar  3 12:21:48 pve3 lvm[356]: raid5_ls array, test--lvm-test--lv, is now in-sync.

at second VM I already messed with LV rebuilding and that is why it shows some missing PVs:

Code:

root@pve2:~# lvs -a -o +devices

  WARNING: Couldn't find device with uuid cpDTTr-0yxK-7IW3-hcaU-0kBC-aevV-jlQtwX.

  WARNING: Couldn't find device with uuid S0cqz8-67Nz-YFqu-0MZF-fwVg-1twg-6WozyG.

  WARNING: VG test-lvm is missing PV cpDTTr-0yxK-7IW3-hcaU-0kBC-aevV-jlQtwX (last written to [unknown]).

  WARNING: VG test-lvm is missing PV S0cqz8-67Nz-YFqu-0MZF-fwVg-1twg-6WozyG (last written to /dev/vdf).

  LV                 VG       Attr       LSize  Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert Devices

  data               pve      twi-aotz-- 30.87g             16.48  1.76                             data_tdata(0)

  [data_tdata]       pve      Twi-ao---- 30.87g                                                     /dev/sda3(5824)

  [data_tmeta]       pve      ewi-ao----  1.00g                                                     /dev/sda3(13727)

  [lvol0_pmspare]    pve      ewi-------  1.00g                                                     /dev/sda3(13983)

  root               pve      -wi-ao---- 15.75g                                                     /dev/sda3(1792)

  swap               pve      -wi-ao----  7.00g                                                     /dev/sda3(0)

  vm-100-disk-0      pve      Vwi-a-tz-- 22.00g data        23.12

  test-vg            test-lvm rwc-a-r-p-  3.00g                                    100.00           test-vg_rimage_0(0),test-vg_rimage_1(0),test-vg_rimage_2(0)

  [test-vg_rimage_0] test-lvm iwi-aor---  1.50g                                                     /dev/vdc(1)

  [test-vg_rimage_1] test-lvm iwi-aor---  1.50g                                                     /dev/vdb(1)

  [test-vg_rimage_2] test-lvm iwi-aor-p-  1.50g                                                     [unknown](1)

  [test-vg_rmeta_0]  test-lvm ewi-aor---  4.00m                                                     /dev/vdc(0)

  [test-vg_rmeta_1]  test-lvm ewi-aor---  4.00m                                                     /dev/vdb(0)

  [test-vg_rmeta_2]  test-lvm ewi-aor-p-  4.00m                                                     [unknown](0)

syslog shows nothing:

Code:

Mar  3 12:28:25 pve2 kernel: [   33.420366] audit: type=1400 audit(1677835705.832:16): apparmor="STATUS" operation="profile_replace" info="same as current profile, skipping" profile="unconfined" name="/usr/bin/lxc-start" pid=970 comm="apparmor_parser"

Mar  3 12:28:25 pve2 kernel: [   33.457379] audit: type=1400 audit(1677835705.868:17): apparmor="STATUS" operation="profile_replace" info="same as current profile, skipping" profile="unconfined" name="lxc-container-default" pid=974 comm="apparmor_parser"

Mar  3 12:28:25 pve2 kernel: [   33.457399] audit: type=1400 audit(1677835705.868:18): apparmor="STATUS" operation="profile_replace" info="same as current profile, skipping" profile="unconfined" name="lxc-container-default-cgns" pid=974 comm="apparmor_parser"

Mar  3 12:28:25 pve2 kernel: [   33.457411] audit: type=1400 audit(1677835705.868:19): apparmor="STATUS" operation="profile_replace" info="same as current profile, skipping" profile="unconfined" name="lxc-container-default-with-mounting" pid=974 comm="apparmor_parser"

Mar  3 12:28:25 pve2 kernel: [   33.457421] audit: type=1400 audit(1677835705.868:20): apparmor="STATUS" operation="profile_replace" info="same as current profile, skipping" profile="unconfined" name="lxc-container-default-with-nesting" pid=974 comm="apparmor_parser"

Mar  3 12:29:11 pve2 qemu-ga: info: guest-ping called

Mar  3 12:29:25 pve2 qemu-ga: info: guest-ping called

Mar  3 12:29:43 pve2 qemu-ga: info: guest-ping called

Mar  3 12:29:59 pve2 qemu-ga: info: guest-ping called

Mar  3 12:30:14 pve2 qemu-ga: info: guest-ping called

Mar  3 12:30:28 pve2 qemu-ga: info: guest-ping called

Mar  3 12:30:42 pve2 qemu-ga: info: guest-ping called

Mar  3 12:30:57 pve2 qemu-ga: info: guest-ping called

Mar  3 12:31:10 pve2 qemu-ga: info: guest-ping called

Mar  3 12:31:24 pve2 qemu-ga: info: guest-ping called

Mar  3 12:31:38 pve2 qemu-ga: info: guest-ping called

lvm.conf is attached.

it seems to me that monitoring on the second node does not work as it should, but I do not know where to start.

Search

Search

Why my lvm raid_fault_policy = "allocate" not working

hatred

New Member

Attachments

We value your privacy