Disable LVM/VGs on iSCSI PVs during dist-upgrades

stefws

Member
Jan 29, 2015
302
4
18
Denmark
siimnet.dk
Whenever we need to upgrade the pve-kernel in our PVE 4.4 HA cluster, we find grub updating to be very slow (seem to be looking for other boot images on all known devices). In fact so slow that the HA SW watchdog sometimes fires a NMI, depending on at what stage this happens, it sometimes renders a host unbootable. We find no issue if we disable all the many VM' LVMs by exporting our VGs created across 5 iSCSI PVs, but this renders all other hypervisor hosts to seem the shared iSCSI VGs as exported, hindering live migration as long as the node under upgrade has been rebooted again, reimporting the VGs.

Feeling this isn't the right way to thus temporary 'release' LVMs + PVs to speed grub updating. Appreciate suggestion as howto do this better/best?

This is what we do currently:

# Let's get all non-essentiel disk device out of the way...
# LVMs
vgexport -a
# NFS mounts
umount /mnt/pve/backupA
umount /mnt/pve/backupB
sleep 2
# clean up
dmsetup remove_all
iscsiadm -m session -u

# now run update/upgrade(s)
apt-get update
apt-get -y upgrade
apt-get -y dist-upgrade
 
Do you have this: GRUB_DISABLE_OS_PROBER=true
in your /etc/default/grub ?
"This entry is used to prevent GRUB from adding the results of os-prober to the menu. A value of "true" disables the os-prober check of other partitions for operating systems, including Windows, Linux, OSX and Hurd, during execution of the update-grub command."
 
Do you have this: GRUB_DISABLE_OS_PROBER=true
in your /etc/default/grub ?
"This entry is used to prevent GRUB from adding the results of os-prober to the menu. A value of "true" disables the os-prober check of other partitions for operating systems, including Windows, Linux, OSX and Hurd, during execution of the update-grub command."

this is the default in PVE for a reason, as it is not only slow for a lot of VMs / disks, it can also corrupt the images!

@stefws: if OS_PROBER is disabled in your case, you could try finding out which of the commands that are used to generate the config file take so long...
 
@stefws: if OS_PROBER is disabled in your case, you could try finding out which of the commands that are used to generate the config file take so long...
It's whenever it's running 'updating grub'... slowly finding new boot entries, like the newly installed kernel and previously kernel etc. It just takes too long +60 sec, up to minutes), but if we export VGs and logout of iSCSI, it's all swift (less than 60 sec) and manage before a NMI gets fired.
 
It's whenever it's running 'updating grub'... slowly finding new boot entries, like the newly installed kernel and previously kernel etc. It just takes too long +60 sec, up to minutes), but if we export VGs and logout of iSCSI, it's all swift (less than 60 sec) and manage before a NMI gets fired.

yes, but update-grub is just a collecting of shell scripts calling various grub binaries and other shell scripts ;) so you can look at them and test which of the steps takes so long. AFAIK without os-prober it should not look at anything besides the disks where /boot , /boot/efi and / are, but possibly finding out which of the disks those are takes too long with LVM?

e.g., collecting the kernel images is done in /etc/grub.d/10_linux lines 200ff - there is no probing of other disks here..
Code:
203 machine=`uname -m`
204 case "x$machine" in
205     xi?86 | xx86_64)
206         list=
207         for i in /boot/vmlinuz-* /vmlinuz-* /boot/kernel-* ; do
208             if grub_file_is_not_garbage "$i" ; then list="$list $i" ; fi
209         done ;;
210     *)
211         list=
212         for i in /boot/vmlinuz-* /boot/vmlinux-* /vmlinuz-* /vmlinux-* /boot/kernel-* ; do
213             if grub_file_is_not_garbage "$i" ; then list="$list $i" ; fi
214         done ;;
215 esac

I'd look at the grub-probe and related calls in /usr/sbin/grub-mkconfig :
Code:
141 # Device containing our userland.  Typically used for root= parameter.
142 GRUB_DEVICE="`${grub_probe} --target=device /`"
143 GRUB_DEVICE_UUID="`${grub_probe} --device ${GRUB_DEVICE} --target=fs_uuid 2> /dev/null`" || true
144
145 # Device containing our /boot partition.  Usually the same as GRUB_DEVICE.
146 GRUB_DEVICE_BOOT="`${grub_probe} --target=device /boot`"
147 GRUB_DEVICE_BOOT_UUID="`${grub_probe} --device ${GRUB_DEVICE_BOOT} --target=fs_uuid 2> /dev/null`" || true
148
149 # Filesystem for the device containing our userland.  Used for stuff like
150 # choosing Hurd filesystem module.
151 GRUB_FS="`${grub_probe} --device ${GRUB_DEVICE} --target=fs 2> /dev/null || echo unknown`"

and then the individual scripts in /etc/grub.d/ , which are called by grub-mkconfig in lines 269-284 (note that you will need to set various GRUB_* environment variables correctly for those to work!)
 
yes, but update-grub is just a collecting of shell scripts calling various grub binaries and other shell scripts ;) so you can look at them and test which of the steps takes so long. AFAIK without os-prober it should not look at anything besides the disks where /boot , /boot/efi and / are, but possibly finding out which of the disks those are takes too long with LVM?

e.g., collecting the kernel images is done in /etc/grub.d/10_linux lines 200ff - there is no probing of other disks here..
I'd look at the grub-probe and related calls in /usr/sbin/grub-mkconfig :

and then the individual scripts in /etc/grub.d/ , which are called by grub-mkconfig in lines 269-284 (note that you will need to set various GRUB_* environment variables correctly for those to work!)
I admit not to be familiar with these inner scripts triggered behind grup updating. Also I'll rather not risk trashing a node inorder to debugging what's taking too long ;)

What are others using HA & shared LVMs to hold VM LVMs do to avoid SW watchdog to fire and/or are others seeing long time to do probing/grup updating?
 
I admit not to be familiar with these inner scripts triggered behind grup updating. Also I'll rather not risk trashing a node inorder to debugging what's taking too long ;)

What are others using HA & shared LVMs to hold VM LVMs do to avoid SW watchdog to fire and/or are others seeing long time to do probing/grup updating?
You could also tune your lvm.conf. I have configured my lvm.conf so that scanning for volumes is only done on disks which hold volumes configured inside proxmox. In the example below I have the pve volumes on:
- /dev/disk/by-id/ata-OCZ-AGILITY3_OCZ-QMZN8K4967DA9NGO (a|ata-OCZ-AGILITY3_OCZ-QMZN8K4967DA9NGO.*|) .* means also to look for partitions. The local proxmox vg pve.
- /dev/disk/by-id/scsi-36001405e38e9f02ddef9d4573db7a0d0 (a|scsi-36001405e38e9f02ddef9d4573db7a0d0|) this is a shared scsi disk providing shared storage for VM's through CLVM.
- Block scanning for volumes on all other disks (r|.*|)

# Do not scan ZFS zvols (to avoid problems on ZFS zvols snapshots)
global_filter = [ "r|/dev/zd.*|", "r|/dev/mapper/pve-.*|" ]

filter = [ "a|ata-OCZ-AGILITY3_OCZ-QMZN8K4967DA9NGO.*|", "a|scsi-36001405e38e9f02ddef9d4573db7a0d0|", "r|.*|" ]

Above increases boot, shutdown, and kernel upgrade time dramatically.
 
We've got dual pathed iSCSI PV devices:

root@n7:~# pvs
PV VG Fmt Attr PSize PFree
/dev/mapper/3600c0ff000258a36c4cb245601000000 vgAbck lvm2 a-- 1.82t 0
/dev/mapper/3600c0ff000258a36decb245601000000 vgA lvm2 a-- 744.49g 54.49g
/dev/mapper/3600c0ff000258a36decb245602000000 vgA lvm2 a-- 744.49g 4.49g
/dev/mapper/3600c0ff000258a36decb245603000000 vgA lvm2 a-- 744.49g 194.49g
/dev/mapper/3600c0ff000258a36dfcb245601000000 vgA lvm2 a-- 744.49g 444.49g
/dev/mapper/3600c0ff000258a36dfcb245602000000 vgA lvm2 a-- 744.49g 744.49g
/dev/mapper/3600c0ff000258cfd1403225601000000 vgBbck lvm2 a-- 1.82t 0
/dev/mapper/3600c0ff000258cfd2b03225601000000 vgB lvm2 a-- 744.49g 94.49g
/dev/mapper/3600c0ff000258cfd2b03225602000000 vgB lvm2 a-- 744.49g 744.49g
/dev/mapper/3600c0ff000258cfd2c03225601000000 vgB lvm2 a-- 744.49g 494.49g
/dev/mapper/3600c0ff000258cfd2c03225602000000 vgB lvm2 a-- 744.49g 184.49g
/dev/mapper/3600c0ff000258cfd2c03225603000000 vgB lvm2 a-- 744.49g 14.49g
/dev/sda3 pve lvm2 a-- 136.57g 16.00g
root@n7:~# dmsetup ls
3600c0ff000258a36decb245603000000 (251:5)
3600c0ff000258cfd2c03225601000000 (251:12)
3600c0ff000258a36dfcb245601000000 (251:6)
3600c0ff000258cfd2b03225601000000 (251:14)
pve-swap (251:1)
pve-root (251:0)
3600c0ff000258a36c4cb245601000000 (251:8)
pve-data (251:2)
3600c0ff000258cfd2c03225602000000 (251:11)
3600c0ff000258a36decb245601000000 (251:3)
3600c0ff000258a36dfcb245602000000 (251:7)
3600c0ff000258cfd2b03225602000000 (251:13)
3600c0ff000258cfd1403225601000000 (251:9)
3600c0ff000258cfd2c03225603000000 (251:10)
3600c0ff000258a36decb245602000000 (251:4)
root@n7:~# vgs
VG #PV #LV #SN Attr VSize VFree
pve 1 3 0 wz--n- 136.57g 16.00g
vgA 5 48 0 wz--n- 3.64t 1.41t
vgAbck 1 1 0 wz--n- 1.82t 0
vgB 5 45 0 wz--n- 3.64t 1.50t
vgBbck 1 1 0 wz--n- 1.82t 0

Always wondered how PVE handled shared LVMs, thought initially it would use CLVM or HA-LVM with shared VGs, but seems not to, more like normal VGs and then [de]activating LVMs on the HN node to run a VM, right?

Our storage.cfg looks like this:

dir: local
path /var/lib/vz
content iso,backup,vztmpl,images,rootdir
maxfiles 3

nfs: backupA
export /exports/nfsShareA
server nfsbackupA
path /mnt/pve/backupA
maxfiles 3
options vers=4,soft,intr,timeo=300,rsize=262144,wsize=262144
content iso,backup

nfs: backupB
export /exports/nfsShareB
server nfsbackupB
path /mnt/pve/backupB
maxfiles 3
options vers=4,soft,intr,timeo=300,rsize=262144,wsize=262144
content backup,iso

lvm: vgB
vgname vgB
shared
content images

lvm: vgBbck
vgname vgBbck
shared
content images

lvm: vgA
vgname vgA
shared
content images

lvm: vgAbck
vgname vgAbck
shared
content images

To me that's 'normal'/single host VGs only they are shared, thus if we do vgexport on one host all HN nodes sees the shared VGs as exported, which properly isn't right?

Also some times when booting a node after patching, we see what we so far thought to be noise in our OVS networking temporary blocking network, but could this maybe be due to vgchange -aay/vgimport on booting node causing other nodes to briefly block on lvm IO, the way we got thing configured?
 
you can look at them and test which of the steps takes so long.
Not sure how to turn on more verbose/debug in those scripts, sh -x?

The phase that's take long time, is when it's finding and listing each entry to go into the grub menu, first a pause then listing image of kernel A, another pause listing next image kernel B... So what ever it's doing to find an kernel image to put into grub meny is taking time. When we do vgexport -a and log out of iSCSI targets it runs fast.
 
Always wondered how PVE handled shared LVMs, thought initially it would use CLVM or HA-LVM with shared VGs, but seems not to, more like normal VGs and then [de]activating LVMs on the HN node to run a VM, right?

pretty much. the only really problematic scenario is migrating, there one needs to deactive the volumes (which PVE does ;))
 
Yes, os-proper only helps finding kernels on other disks than the one holding /boot
Seems not to have a package by this name already:

root@n6:~# dpkg -l | grep -i grub
ii grub-common 2.02-pve5 amd64 GRand Unified Bootloader (common files)
ii grub-efi-amd64-bin 2.02-pve5 amd64 GRand Unified Bootloader, version 2 (EFI-AMD64 binaries)
ii grub-efi-ia32-bin 2.02-pve5 amd64 GRand Unified Bootloader, version 2 (EFI-IA32 binaries)
ii grub-pc 2.02-pve5 amd64 GRand Unified Bootloader, version 2 (PC/BIOS version)
ii grub-pc-bin 2.02-pve5 amd64 GRand Unified Bootloader, version 2 (PC/BIOS binaries)
ii grub2-common 2.02-pve5 amd64 GRand Unified Bootloader (common files for version 2)
root@n6:~# dpkg -l | grep -i prop
 
Seems not to have a package by this name already:
it's "os-prober":
Code:
$ apt show os-prober
Package: os-prober
Version: 1.65
Installed-Size: 143 kB
Maintainer: Debian Install System Team <debian-boot@lists.debian.org>
Depends: libc6 (>= 2.4)
Tag: admin::install, implemented-in::shell, interface::commandline,
 role::program, scope::utility, suite::debian
Section: utils
Priority: extra
Download-Size: 27.9 kB
APT-Manual-Installed: yes
APT-Sources: http://deb.debian.org/debian/ jessie/main amd64 Packages
Description: utility to detect other OSes on a set of drives
 This package detects other OSes available on a system and outputs the
 results in a generic machine-readable format.

but it will only run if /etc/default/grub does not contain the disable line, so in your case it should be skipped..
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!