Hello,
the title describes the problem pretty well. It's a single node with no cluster or any "exotic" external storage. The problem has been happening for a while now, may be some months (we've learned just to not backup entire VMs and not to restart them unnecesarily, and never at heavy usage hours). We hoped that the next relatively big update would fix it, but it hasn't been the case with the kernel and interface update .
I will paste a handful of information from the node software versions and configurations, specially storage layout, and hopefully what I'm not seeing about the problem, some other can. And I hope the problem is not about using mdraid, it wasn't a problem with Proxmox 3.X and, come on, it's just mirrors. I have other nodes with local ZFS and it has its own problems (specially memory usage, the kind of issue that in this node I cannot afford to have).
################################################################
################################################################
I know the VG is almost full (virtually because we are not using thing provisioning, real usage sits around 50-60%), but the problem also happened when it was (virtually) around 50-60%.
There are two LXC containers and three KVM VMs, all using single non-thin-provisioned LVs as VDIs.
Thanks in advance and I hope this also helps anyone else with similar/same problem.
the title describes the problem pretty well. It's a single node with no cluster or any "exotic" external storage. The problem has been happening for a while now, may be some months (we've learned just to not backup entire VMs and not to restart them unnecesarily, and never at heavy usage hours). We hoped that the next relatively big update would fix it, but it hasn't been the case with the kernel and interface update .
I will paste a handful of information from the node software versions and configurations, specially storage layout, and hopefully what I'm not seeing about the problem, some other can. And I hope the problem is not about using mdraid, it wasn't a problem with Proxmox 3.X and, come on, it's just mirrors. I have other nodes with local ZFS and it has its own problems (specially memory usage, the kind of issue that in this node I cannot afford to have).
################################################################
################################################################
Code:
Linux eu0 4.4.6-1-pve #1 SMP Thu Apr 21 11:25:40 CEST 2016 x86_64 GNU/Linux
Code:
proxmox-ve: 4.1-48 (running kernel: 4.4.6-1-pve)
pve-manager: 4.1-34 (running version: 4.1-34/8887b0fd)
pve-kernel-4.4.6-1-pve: 4.4.6-48
pve-kernel-4.2.8-1-pve: 4.2.8-41
lvm2: 2.02.116-pve2
corosync-pve: 2.3.5-2
libqb0: 1.0-1
pve-cluster: 4.0-39
qemu-server: 4.0-72
pve-firmware: 1.1-8
libpve-common-perl: 4.0-59
libpve-access-control: 4.0-16
libpve-storage-perl: 4.0-50
pve-libspice-server1: 0.12.5-2
vncterm: 1.2-1
pve-qemu-kvm: 2.5-13
pve-container: 1.0-61
pve-firewall: 2.0-25
pve-ha-manager: 1.0-28
ksm-control-daemon: 1.2-1
glusterfs-client: 3.5.2-2+deb8u1
lxc-pve: 1.1.5-7
lxcfs: 2.0.0-pve2
cgmanager: 0.39-pve1
criu: 1.6.0-1
zfsutils: 0.6.5-pve9~jessie
Code:
root@eu0:~# cat /etc/apt/sources.list
deb http://debian.mirrors.ovh.net/debian/ jessie main contrib non-free
# security updates
deb http://security.debian.org jessie/updates main contrib
root@eu0:~# cat /etc/apt/sources.list.d/pve-public-repo.list
deb http://download.proxmox.com/debian jessie pve-no-subscription
Code:
root@eu0:~# fdisk -l /dev/sd*
Disk /dev/sda: 111.8 GiB, 120034123776 bytes, 234441648 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0x4f8ece82
Device Boot Start End Sectors Size Id Type
/dev/sda1 * 4096 40962047 40957952 19.5G fd Linux raid autodetect
/dev/sda2 40962048 43057151 2095104 1023M fd Linux raid autodetect
/dev/sda3 43057152 234432511 191375360 91.3G f W95 Ext'd (LBA)
/dev/sda5 43059200 234432511 191373312 91.3G fd Linux raid autodetect
(IMO worthless clutter removed)
Disk /dev/sdb: 111.8 GiB, 120034123776 bytes, 234441648 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0xfe82207f
Device Boot Start End Sectors Size Id Type
/dev/sdb1 * 4096 40962047 40957952 19.5G fd Linux raid autodetect
/dev/sdb2 40962048 43057151 2095104 1023M 82 Linux swap / Solaris
/dev/sdb3 43057152 234432511 191375360 91.3G f W95 Ext'd (LBA)
/dev/sdb5 43059200 234432511 191373312 91.3G fd Linux raid autodetect
(IMO worthless clutter removed)
Disk /dev/sdc: 111.8 GiB, 120034123776 bytes, 234441648 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0x04e0f75e
Device Boot Start End Sectors Size Id Type
/dev/sdc1 * 4096 40962047 40957952 19.5G fd Linux raid autodetect
/dev/sdc2 40962048 43057151 2095104 1023M 82 Linux swap / Solaris
/dev/sdc3 43057152 234432511 191375360 91.3G f W95 Ext'd (LBA)
/dev/sdc5 43059200 234432511 191373312 91.3G fd Linux raid autodetect
(IMO worthless clutter removed)
root@eu0:~# pvdisplay
--- Physical volume ---
PV Name /dev/md5
VG Name pve
PV Size 91.25 GiB / not usable 3.94 MiB
Allocatable yes
PE Size 4.00 MiB
Total PE 23360
Free PE 64
Allocated PE 23296
PV UUID g##eKu-AZQx-####-6nDg-VMT2-ExeB-#####
Code:
root@eu0:~# mdadm --detail --scan
ARRAY /dev/md1 metadata=0.90 UUID=f#dcf5##:388###56:a4d###c2:26###302
ARRAY /dev/md5 metadata=0.90 UUID=0#8443##:378###f8:a4d###c2:26###302
root@eu0:~# vgdisplay
--- Volume group ---
VG Name pve
System ID
Format lvm2
Metadata Areas 1
Metadata Sequence No 22
VG Access read/write
VG Status resizable
MAX LV 0
Cur LV 5
Open LV 4
Max PV 0
Cur PV 1
Act PV 1
VG Size 91.25 GiB
PE Size 4.00 MiB
Total PE 23360
Alloc PE / Size 23296 / 91.00 GiB
Free PE / Size 64 / 256.00 MiB
VG UUID izX###-AwnD-i5ti-####-ZZu1-Xj6X-#####
I know the VG is almost full (virtually because we are not using thing provisioning, real usage sits around 50-60%), but the problem also happened when it was (virtually) around 50-60%.
There are two LXC containers and three KVM VMs, all using single non-thin-provisioned LVs as VDIs.
Thanks in advance and I hope this also helps anyone else with similar/same problem.