Sirs,
we've a 4 nodes Proxmox cluster running on v.5.2,
this cluster is actually operating in production environment and all nodes are covered with 'community' subscription...
all servers have identical hardware: HP Proliant DL380 G6, dual processor, 32gb ram and dual path fiber channel connection to msa storage for VMs disk images.
We've done just yesterday some routine maintenance actions (cable clean-up and re-route, bios upgrades and various firmware upgrades from HP)
then operated a routine upgrade of proxmox o.s. doing :
apt-get update
apt-get dist-upgrade
during reboot no issue at all, but during VMs start-up, two of four nodes has crashed, leaving the cluster out of quorum....
so all VMs was stopped in less than 5 minutes we was unable to operate at all!!
After over an hour of cross testing, we've noticed that nodes was crashing only if operating charge was over 65/70% approx of total ram....
So we've searched inside forum also and read about some issues related to 4.15.17 version of kernel (high latency on disks?!)
to solve the issue, we've restarted units choosing to start with 4.13.xx kernel versions during grub prompt
we're running actually on 4.13.13-5 or 4.13.16-3.
Waiting instructions on how to full roll-back without disrupt of cluster or suggestions on new updated version of 4.15.xx family of kernel...
Available to give you more info if requested
thanks in advance for your time,
regards,
we've a 4 nodes Proxmox cluster running on v.5.2,
this cluster is actually operating in production environment and all nodes are covered with 'community' subscription...
all servers have identical hardware: HP Proliant DL380 G6, dual processor, 32gb ram and dual path fiber channel connection to msa storage for VMs disk images.
We've done just yesterday some routine maintenance actions (cable clean-up and re-route, bios upgrades and various firmware upgrades from HP)
then operated a routine upgrade of proxmox o.s. doing :
apt-get update
apt-get dist-upgrade
during reboot no issue at all, but during VMs start-up, two of four nodes has crashed, leaving the cluster out of quorum....
so all VMs was stopped in less than 5 minutes we was unable to operate at all!!
After over an hour of cross testing, we've noticed that nodes was crashing only if operating charge was over 65/70% approx of total ram....
So we've searched inside forum also and read about some issues related to 4.15.17 version of kernel (high latency on disks?!)
to solve the issue, we've restarted units choosing to start with 4.13.xx kernel versions during grub prompt
we're running actually on 4.13.13-5 or 4.13.16-3.
Waiting instructions on how to full roll-back without disrupt of cluster or suggestions on new updated version of 4.15.xx family of kernel...
Available to give you more info if requested
thanks in advance for your time,
regards,
Last edited: