Full node hangs for seconds at CT/VM creation/deletion/bootup/backup (mdraid1+LVM)

benitoll · Apr 26, 2016

Hello,

the title describes the problem pretty well. It's a single node with no cluster or any "exotic" external storage. The problem has been happening for a while now, may be some months (we've learned just to not backup entire VMs and not to restart them unnecesarily, and never at heavy usage hours). We hoped that the next relatively big update would fix it, but it hasn't been the case with the kernel and interface update

.

I will paste a handful of information from the node software versions and configurations, specially storage layout, and hopefully what I'm not seeing about the problem, some other can. And I hope the problem is not about using mdraid, it wasn't a problem with Proxmox 3.X and, come on, it's just mirrors. I have other nodes with local ZFS and it has its own problems (specially memory usage, the kind of issue that in this node I cannot afford to have).

################################################################
################################################################

Code:

Linux eu0 4.4.6-1-pve #1 SMP Thu Apr 21 11:25:40 CEST 2016 x86_64 GNU/Linux

Code:

proxmox-ve: 4.1-48 (running kernel: 4.4.6-1-pve)
pve-manager: 4.1-34 (running version: 4.1-34/8887b0fd)
pve-kernel-4.4.6-1-pve: 4.4.6-48
pve-kernel-4.2.8-1-pve: 4.2.8-41
lvm2: 2.02.116-pve2
corosync-pve: 2.3.5-2
libqb0: 1.0-1
pve-cluster: 4.0-39
qemu-server: 4.0-72
pve-firmware: 1.1-8
libpve-common-perl: 4.0-59
libpve-access-control: 4.0-16
libpve-storage-perl: 4.0-50
pve-libspice-server1: 0.12.5-2
vncterm: 1.2-1
pve-qemu-kvm: 2.5-13
pve-container: 1.0-61
pve-firewall: 2.0-25
pve-ha-manager: 1.0-28
ksm-control-daemon: 1.2-1
glusterfs-client: 3.5.2-2+deb8u1
lxc-pve: 1.1.5-7
lxcfs: 2.0.0-pve2
cgmanager: 0.39-pve1
criu: 1.6.0-1
zfsutils: 0.6.5-pve9~jessie

Code:

root@eu0:~# cat /etc/apt/sources.list
deb http://debian.mirrors.ovh.net/debian/ jessie main contrib non-free
# security updates
deb http://security.debian.org jessie/updates main contrib
root@eu0:~# cat /etc/apt/sources.list.d/pve-public-repo.list
deb http://download.proxmox.com/debian jessie pve-no-subscription

Code:

root@eu0:~# fdisk -l /dev/sd*

Disk /dev/sda: 111.8 GiB, 120034123776 bytes, 234441648 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0x4f8ece82

Device     Boot    Start       End   Sectors  Size Id Type
/dev/sda1  *        4096  40962047  40957952 19.5G fd Linux raid autodetect
/dev/sda2       40962048  43057151   2095104 1023M fd Linux raid autodetect
/dev/sda3       43057152 234432511 191375360 91.3G  f W95 Ext'd (LBA)
/dev/sda5       43059200 234432511 191373312 91.3G fd Linux raid autodetect

(IMO worthless clutter removed)

Disk /dev/sdb: 111.8 GiB, 120034123776 bytes, 234441648 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0xfe82207f

Device     Boot    Start       End   Sectors  Size Id Type
/dev/sdb1  *        4096  40962047  40957952 19.5G fd Linux raid autodetect
/dev/sdb2       40962048  43057151   2095104 1023M 82 Linux swap / Solaris
/dev/sdb3       43057152 234432511 191375360 91.3G  f W95 Ext'd (LBA)
/dev/sdb5       43059200 234432511 191373312 91.3G fd Linux raid autodetect

(IMO worthless clutter removed)

Disk /dev/sdc: 111.8 GiB, 120034123776 bytes, 234441648 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0x04e0f75e

Device     Boot    Start       End   Sectors  Size Id Type
/dev/sdc1  *        4096  40962047  40957952 19.5G fd Linux raid autodetect
/dev/sdc2       40962048  43057151   2095104 1023M 82 Linux swap / Solaris
/dev/sdc3       43057152 234432511 191375360 91.3G  f W95 Ext'd (LBA)
/dev/sdc5       43059200 234432511 191373312 91.3G fd Linux raid autodetect

(IMO worthless clutter removed)

root@eu0:~# pvdisplay
  --- Physical volume ---
  PV Name               /dev/md5
  VG Name               pve
  PV Size               91.25 GiB / not usable 3.94 MiB
  Allocatable           yes
  PE Size               4.00 MiB
  Total PE              23360
  Free PE               64
  Allocated PE          23296
  PV UUID               g##eKu-AZQx-####-6nDg-VMT2-ExeB-#####

Code:

root@eu0:~# mdadm --detail --scan
ARRAY /dev/md1 metadata=0.90 UUID=f#dcf5##:388###56:a4d###c2:26###302
ARRAY /dev/md5 metadata=0.90 UUID=0#8443##:378###f8:a4d###c2:26###302

root@eu0:~# vgdisplay
  --- Volume group ---
  VG Name               pve
  System ID
  Format                lvm2
  Metadata Areas        1
  Metadata Sequence No  22
  VG Access             read/write
  VG Status             resizable
  MAX LV                0
  Cur LV                5
  Open LV               4
  Max PV                0
  Cur PV                1
  Act PV                1
  VG Size               91.25 GiB
  PE Size               4.00 MiB
  Total PE              23360
  Alloc PE / Size       23296 / 91.00 GiB
  Free  PE / Size       64 / 256.00 MiB
  VG UUID               izX###-AwnD-i5ti-####-ZZu1-Xj6X-#####

I know the VG is almost full (virtually because we are not using thing provisioning, real usage sits around 50-60%), but the problem also happened when it was (virtually) around 50-60%.

There are two LXC containers and three KVM VMs, all using single non-thin-provisioned LVs as VDIs.

Thanks in advance and I hope this also helps anyone else with similar/same problem.

Search

Search

Full node hangs for seconds at CT/VM creation/deletion/bootup/backup (mdraid1+LVM)

benitoll

Active Member

We value your privacy