Help! Dangerous behaviour on 3-nodes PVE HA cluster

casalicomputers · May 12, 2017

Hello everybody,
At the beginning of this week, I just deployed a new PVE 4.4-13 cluster over a DELL VRTX with 3 blades with HA active for almost every VMs, and everything was working flawlessly. Storage is LVM, of course.

Just one hour ago, for some unknown reason, one (or more? I didn't get that) cluster node rebooted and this happened:

1)
On the third node, I found all vms powered off, so I started them again. One booted fine but the other refused to boot complaining about being unable to mount a filesystem larger than the disk and also e2fsck was throwing errors. After I while I managed to get everything back to work and figured out the reason of this:
proxmox ui was reportinh disk size 710gb while the corresponding lv size was 700gb.
This sounded strange to me, so I tried to add 1 gb to that disk and... disk size returned from 710GB to 701GB!!
I added the missing 9gb and then I've been able to successfully complete the filesystem check and bring the vm back to life... phew!

2)
Another VM on an another node refused to start because of missing LV

Code:

TASK ERROR: can't activate LV '/dev/raid10-sas/vm-114-disk-4':  
Failed to find logical volume "raid10-sas/vm-114-disk-4"

Looking at the VM disk configuration I found this:

Code:

virtio0: raid10-sas:vm-114-disk-1,size=60G
virtio1: raid10-sas:vm-114-disk-2,size=25G
virtio2: raid10-sas:vm-114-disk-3,size=50G
virtio3: raid10-sas:vm-114-disk-4,size=100G

while a "lvscan" was reporting this:

Code:

  ACTIVE            '/dev/raid10-sas/vm-114-disk-1' [60,00 GiB] inherit
  ACTIVE            '/dev/raid10-sas/vm-114-disk-2' [25,00 GiB] inherit
  ACTIVE            '/dev/raid10-sas/vm-114-disk-3' [100,00 GiB] inherit
  ACTIVE            '/dev/raid10-ssd/vm-114-disk-1' [50,00 GiB] inherit

Please mind that there are 2 different datastores: RAID10-SAS and RAID10-SSD.
To be brief, PVE was trying to mount the wrong LVs !
I manually corrected the 114.conf config file with the right disks, and I've been able to recover this vm as well.

---

At this point I'm thinking that HA is doing something nasty so I disabled it.
Sincerely now I'm scared about rebooting the nodes for whatever reason!

But I would like to understand why this happened and how to make sure this won't happen again!

Any help is appreciated.
Thanks.

udo · May 12, 2017

Hi,
this sounds, that somebody corrected vm114.conf by hand and do an mistake with disk-4?!
That the disks on different storages numberd with 1 beginning is normal.

I had a lot of VMs with disks on different storage but never would pve mixed the disks... (but I don't use ha, but ha afaik changed something on the conf-file).

Udo

casalicomputers · May 13, 2017

Hi,
thanks for your reply.

I can tell for sure that nobody had directly modified that file... and neither had I, of course.

Anyway, when I was installing the server, I restored that vm from backup on the SAS storage and then moved the 50GB disk to SSD storage through UI -> Move disk .. no errors were thrown and everything was working perfectly even if I didn't have chance to try a whole reboot of the PVE host. It looks like that when HA was triggered due to a node failure (reason still unknown), VM has been moved to a new host using an outdated configuration... And this could also explain the issue described at point 1 on my last post.

This is my first attempt to setup HA, but It looks like to create more problems than those it solves.

UPDATE
I migrated that VM on another node and the LV has been resized to 700GB again resulting in corrupt filesystem. I've been able to restore correct size and perform an e2fsck, but why this happens??

fabian · May 15, 2017

please post the complete storage and vm configuration, as well as "pveversion -v" output

casalicomputers · May 15, 2017

Hi,
I have the following configuration on my DELL VRTX:
3x M630 blades attached through a DAS with a Shared PERC8 raid controller in multipath failover configuration (as described in DELL documentation).

Code:

/etc/multipath.conf

defaults {
   verbosity 2
   polling_interval 10
   uid_attribute "ID_SERIAL"
   checker_timeout 90
}

devices {
   device {
      vendor "DELL"
      product "Shared PERC8"
      hardware_handler "1 alua"
      path_grouping_policy failover
      prio alua
      path_checker tur
      rr_weight priorities
      failback immediate
      no_path_retry fail
      path_selector "round-robin 0"
      flush_on_last_del no
      user_friendly_names "yes"
      alias_prefix "mpath-sperc"
      features "0"
      fast_io_fail_tmo 5
   }
}
blacklist {
   wwid .*
}

blacklist_exceptions {
   # Shared Perc8
   wwid "361866da09e97130020962ecd82a1a2c3"
   wwid "361866da09e97130020962db571f2e273"
}


multipaths {

  multipath {
     wwid "361866da09e97130020962ecd82a1a2c3"
     alias r10sas
  }

  multipath {
     wwid "361866da09e97130020962db571f2e273"
     alias r10ssd
  }

}

On this controller i have two volumes: a RAID10 of 8x2TB NL-SAS (raid10-sas), a RAID10 of 8x480GB SSD (raid10-ssd).
These volumes have LVM on top and the multipath devices (/dev/mapper/r10sas, /dev/mapper/r10ssd) have been
used for pvcreate and vgcreate.

Every node have the same version of PVE as they have been installed together.

Code:

proxmox-ve: 4.4-87 (running kernel: 4.4.59-1-pve)
pve-manager: 4.4-13 (running version: 4.4-13/7ea56165)
pve-kernel-4.4.35-1-pve: 4.4.35-77
pve-kernel-4.4.59-1-pve: 4.4.59-87
lvm2: 2.02.116-pve3
corosync-pve: 2.4.2-2~pve4+1
libqb0: 1.0.1-1
pve-cluster: 4.0-49
qemu-server: 4.0-110
pve-firmware: 1.1-11
libpve-common-perl: 4.0-94
libpve-access-control: 4.0-23
libpve-storage-perl: 4.0-76
pve-libspice-server1: 0.12.8-2
vncterm: 1.3-2
pve-docs: 4.4-4
pve-qemu-kvm: 2.7.1-4
pve-container: 1.0-99
pve-firewall: 2.0-33
pve-ha-manager: 1.0-40
ksm-control-daemon: 1.2-1
glusterfs-client: 3.5.2-2+deb8u3
lxc-pve: 2.0.7-4
lxcfs: 2.0.6-pve1
criu: 1.6.0-1
novnc-pve: 0.5-9
smartmontools: 6.5+svn4324-1~pve80
zfsutils: 0.6.5.9-pve15~bpo80
openvswitch-switch: 2.6.0-2

cluster is healthy and IGMP snooping is enabled on the internal 10GB switch.

Code:

root@pve01:~# pvecm status
Quorum information
------------------
Date:             Mon May 15 11:44:43 2017
Quorum provider:  corosync_votequorum
Nodes:            3
Node ID:          0x00000001
Ring ID:          1/1696
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   3
Highest expected: 3
Total votes:      3
Quorum:           2 
Flags:            Quorate

Membership information
----------------------
    Nodeid      Votes Name
0x00000001          1 192.168.1.32 (local)
0x00000002          1 192.168.1.33
0x00000003          1 192.168.1.34

fabian · May 15, 2017

we don't actually touch shared disks when migrating, so this problem must have a different source.. having an outdated VM configuration is also not really possible, unless you manually forced quorum..

casalicomputers · May 15, 2017

Hi Fabian,
no, I didn't manually force quorum in any way.

casalicomputers · May 15, 2017

Ok... I just opened a support ticket for this.

Search

Search

Help! Dangerous behaviour on 3-nodes PVE HA cluster

casalicomputers

Renowned Member

udo

Distinguished Member

casalicomputers

Renowned Member

fabian

Proxmox Staff Member

casalicomputers

Renowned Member

fabian

Proxmox Staff Member

casalicomputers

Renowned Member

casalicomputers

Renowned Member

We value your privacy