[SOLVED] Proxmox not reliable?! don`t upgrade to 4.3! - the cluster ruined after upgrade 4.2 to 4.3

PepeOnaChair

Member
Feb 25, 2013
15
1
21
Hi,

this evening I did upgrade Proxmox from 4.2 to 4.3-12/6894c9d9. I should kill myself rather than doing that...

After "upgrade" - better say destroy, to 4.3, Proxmox behave like, well...

First of all. IS THERE ANY SAVE WAY HOW TO DOWNGRADE TO 4.2 AND HAVE WORKING CLUSTER AGAIN?

I did run 3 node cluster with one small node as an arbiter for GlusterFS. In Prox. 4.3 now most of VMs DON`T EVEN FINISH STARTING THE OS. VM is booting for a few seconds maybe 10-15 second normally (I can see Linux daemons starting, for example), BUT THAN IT TURNS OFF IN A SECOND. And this is repeating and repating for VMs with HA. VMs without HA do it just once, of course. Some VM start normally, some don`t. I have backup of all the VMs made every day or week, but none of backups I restored was able to make VM run normally again. And whole the cluster did run lets say nicely since damn "upgrade" today.

GlusterFS says that NO SPLIT-BRAIN... see bellow

Some of the VM were MikrotikOS. And none of them is able to start again. Some were stopped during the upgrade, some not. But it doesnt make difference. Restoring Mikrotik VM didn`t help. AND EVEN INSTALLING NEW VM WITH MIKROTIK IS NOT POSSIBLE NOW. VM is able to boot from ISO, but in a few seconds of installing RouterOS is VM turned off, * * * !
I have 2 Windows XP VM, none starts.

In syslog, messages, no suspicious message can be found.



Some command outputs:

[root@Proxmox-1 log]$ pvecm status
Quorum information
------------------
Date: Wed Nov 30 00:24:09 2016
Quorum provider: corosync_votequorum
Nodes: 3
Node ID: 0x00000001
Ring ID: 1/280
Quorate: Yes

Votequorum information
----------------------
Expected votes: 3
Highest expected: 3
Total votes: 3
Quorum: 2
Flags: Quorate

Membership information
----------------------
Nodeid Votes Name
0x00000001 1 192.168.170.100 (local)
0x00000002 1 192.168.170.102
0x00000003 1 192.168.170.120


[root@Proxmox-1 log]$ pvecm nodes

Membership information
----------------------
Nodeid Votes Name
1 1 Proxmox-1 (local)
2 1 Proxmox-2
3 1 Proxmox-quorum




[root@Proxmox-1 log]$ gluster volume heal gluster_volume_0 info
Brick 192.168.170.100:/export/pve-LV--pro--GlusterFS/brick
Status: Connected
Number of entries: 0

Brick 192.168.170.102:/export/pve-LV--pro--GlusterFS/brick
Status: Connected
Number of entries: 0

Brick 192.168.170.120:/export/pve-LV--pro--GlusterFS/brick
Status: Connected
Number of entries: 0


[root@Proxmox-1 log]$ gluster volume info

Volume Name: gluster_volume_0
Type: Replicate
Volume ID: 014b8ec6-9934-421a-ac33-0a75e884eaec
Status: Started
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Bricks:
Brick1: 192.168.170.100:/export/pve-LV--pro--GlusterFS/brick
Brick2: 192.168.170.102:/export/pve-LV--pro--GlusterFS/brick
Brick3: 192.168.170.120:/export/pve-LV--pro--GlusterFS/brick (arbiter)
Options Reconfigured:
nfs.disable: on
performance.readdir-ahead: on
transport.address-family: inet
cluster.quorum-type: auto



[root@Proxmox-1 log]$ gluster volume heal gluster_volume_0 info
Brick 192.168.170.100:/export/pve-LV--pro--GlusterFS/brick
Status: Connected
Number of entries: 0

Brick 192.168.170.102:/export/pve-LV--pro--GlusterFS/brick
Status: Connected
Number of entries: 0

Brick 192.168.170.120:/export/pve-LV--pro--GlusterFS/brick
Status: Connected
Number of entries: 0



[root@Proxmox-1 log]$ gluster volume heal gluster_volume_0 info split-brain
Brick 192.168.170.100:/export/pve-LV--pro--GlusterFS/brick
Status: Connected
Number of entries in split-brain: 0

Brick 192.168.170.102:/export/pve-LV--pro--GlusterFS/brick
Status: Connected
Number of entries in split-brain: 0

Brick 192.168.170.120:/export/pve-LV--pro--GlusterFS/brick
Status: Connected
Number of entries in split-brain: 0



[root@Proxmox-1 log]$ gluster peer status
Number of Peers: 2

Hostname: 192.168.170.120
Uuid: 00807e8e-c600-4025-bd3e-8b2a5c2ebbfd
State: Peer in Cluster (Connected)

Hostname: 192.168.170.102
Uuid: 8837af84-a446-44e3-bcd2-dc8d037a268e
State: Peer in Cluster (Connected)



[root@Proxmox-1 log]$ gluster volume status all detail
Status of volume: gluster_volume_0
------------------------------------------------------------------------------
Brick : Brick 192.168.170.100:/export/pve-LV--pro--GlusterFS/brick
TCP Port : 49152
RDMA Port : 0
Online : Y
Pid : 1818
File System : xfs
Device : /dev/mapper/pve-LV--pro--GlusterFS
Mount Options : rw,relatime,attr2,inode64,noquota
Inode Size : 512
Disk Space Free : 575.2GB
Total Disk Space : 817.0GB
Inode Count : 428529664
Free Inodes : 428529287
------------------------------------------------------------------------------
Brick : Brick 192.168.170.102:/export/pve-LV--pro--GlusterFS/brick
TCP Port : 49152
RDMA Port : 0
Online : Y
Pid : 1689
File System : xfs
Device : /dev/mapper/pve-LV--pro--GlusterFS
Mount Options : rw,relatime,attr2,inode64,noquota
Inode Size : 512
Disk Space Free : 576.4GB
Total Disk Space : 817.0GB
Inode Count : 428529664
Free Inodes : 428529287
------------------------------------------------------------------------------
Brick : Brick 192.168.170.120:/export/pve-LV--pro--GlusterFS/brick
TCP Port : 49152
RDMA Port : 0
Online : Y
Pid : 1613
File System : xfs
Device : /dev/mapper/pve-LV--pro--GlusterFS
Mount Options : rw,relatime,attr2,inode64,noquota
Inode Size : 512
Disk Space Free : 39.9GB
Total Disk Space : 40.0GB
Inode Count : 20971520
Free Inodes : 20971143
 
Last edited:
hi,

first of all, there is no need to spread panic and fear...
second : which version of gluster do you use?

i ask because in all gluster versions since 3.8.0 there is a bug (https://bugzilla.redhat.com/show_bug.cgi?id=1398076) which only emerges on qemu 2.7
the bug is already fixed in their git and i guess 3.8.7 and 3.9.1 will contain it.

also the last version i tested where everything worked is 3.7.15 (3.7.17 has a breaking bug which prevents image creation via qemu, this bug was also in 3.8.5 but got fixed in 3.8.6)

also there is a bug in qemu since 2.7 (https://bugs.launchpad.net/qemu/+bug/1644754) which we will workaround, but this only affects the creation of linked clones as far as i tested.

if you can not use 3.7.15 (or do not intend to) you can (for now) pin your pve-qemu-kvm to 2.6.2-2 until gluster fixes their issues (but beware, you will not get security fixes for qemu anymore)

also a side remark:

if you have important infrastructure running on proxmox (or any software for that matter), i would recommend a test environment (can be virtualized), which resembles your production environment, where you are on the pve-test repository and regularly test if there are any bugs, and if yes, please report them

thanks :)
edit: typo
 
Hello,

thank you for the reply, and I`m sorry to be so upset. Later on I`ll try to delete all those f*ck words from the evening...

Proxmox 4.2 before yesterday upgrade also run the same version of GluysterFS - 3.8.3 and it has not any problem - the crash came after upgrade the packages to Prox. 4.3. I didn`t upgrade GlusterFS. This URL in sources.list for Gluster, which I didn`t change:
deb http://download.gluster.org/pub/gluster/glusterfs/3.8/3.8.3/Debian/jessie/apt jessie main

I`ll try to upgrade GlusterFS to 3.8.7 or newer, if you think?

Or how do I "pin your pve-qemu-kvm to 2.6.2-2"?

With the testing and production environment - you are 100% right. Normally I do it this way. I have one testing server the same as in production cluster. But this time I just didn`t do that and punishment came quickly...

Anyway. Cluster runs with:
[root@Proxmox-1 ~]$ gluster --version
glusterfs 3.8.3 built on Aug 22 2016 15:12:43
Repository revision: git://git.gluster.com/glusterfs.git

Package versions copy-pasted from web GUI:
proxmox-ve: 4.3-72 (running kernel: 4.4.24-1-pve) pve-manager: 4.3-12 (running version: 4.3-12/6894c9d9) pve-kernel-4.4.6-1-pve: 4.4.6-48 pve-kernel-4.4.16-1-pve: 4.4.16-64 pve-kernel-4.4.24-1-pve: 4.4.24-72 lvm2: 2.02.116-pve3 corosync-pve: 2.4.0-1 libqb0: 1.0-1 pve-cluster: 4.0-47 qemu-server: 4.0-96 pve-firmware: 1.1-10 libpve-common-perl: 4.0-83 libpve-access-control: 4.0-19 libpve-storage-perl: 4.0-68 pve-libspice-server1: 0.12.8-1 vncterm: 1.2-1 pve-docs: 4.3-17 pve-qemu-kvm: 2.7.0-8 pve-container: 1.0-85 pve-firewall: 2.0-31 pve-ha-manager: 1.0-38 ksm-control-daemon: 1.2-1 glusterfs-client: 3.8.3-1 lxc-pve: 2.0.6-1 lxcfs: 2.0.5-pve1 criu: 1.6.0-1 novnc-pve: 0.5-8 smartmontools: 6.5+svn4324-1~pve80 zfsutils: 0.6.5.8-pve13~bpo80
 
as i said:

the bugs in gluster only manifest when using qemu >= 2.7, which is in the enterprise repo since about a month
(technical details: gluster has new features since 3.8 which qemu uses only since 2.7)

and the fix is currently in the gluster git, no new version has been released since, but i guess they will be in an upcoming 3.8.x (the next will probably be 3.8.7) and 3.9.x (probably 3.9.1)

Or how do I "pin your pve-qemu-kvm to 2.6.2-2"?

you can make file under /etc/apt/preferenced.d (e.g. pin-pve-qemu-kvm)
with the following content:

Code:
Package: pve-qemu-kvm
Pin: version 2.6.2-2
Pin-Priority: 1001


and then an apt-get update && apt-get dist-upgrade should downgrade that package

but dont forget to delete this again, when the issues are resolved
 
Thank you for advices, downgrade of pve-qemu-kvm to 2.6.2-2, god bless, helped.

Just a tip for the future. Why do you release versions of the Proxmox with known huge bug between package versions like this one? GlusterFS is officially supported. So at least some red warning would be nice. I beleive I`m not the only one, who had this problem.

But anyway, thank you for help.

Pepe
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!