Small Issue Leading into Something More

emilhozan

Member
Aug 27, 2019
51
4
8
Hey all,

My predicament started with emails that had:
- subject: Cron <root@HOSTNAME> test -x /usr/sbin/anacron || ( cd / && run-parts --report /etc/cron.daily )
- body: /usr/bin/mandb: can't set the locale; make sure $LC_* and $LANG are correct

I have 9 nodes in a cluster and only 3 of them were reporting this issue.

When starting to troubleshoot, I picked one and went through many suggestions based on many online searchs, none of which helped. I tried many variations of the keywords and it's obvious issue is with the LOCALE but anyways.

I then decided to try and update the system:
- apt-get update; this worked fine
- apt-get dist-upgrade; this is where my issues started

After running this, I am prompted to run "apt --fix-broken-install' to which I do, it completes, and then I run apt-get dist-update again. I run into another issue so then I double check PVE 5.x's official repo per this link; https://pve.proxmox.com/wiki/Package_Repositories#_proxmox_ve_5_x_repositories

This:
1581407355814.png
And this:
1581407382234.png

Use two different Debian versions but i decided to switch from one to the other. Note that ALL 9 had the same /etc/apt/sources.list but only changed on 3. I ran apt-get update and apt-get dist-upgrade again and seemed to be making some progress. The update was taking forever on the first node, which is why i went through with the other two (mistake, I see that now....hindsight 2020).

And to sum up what my current issue is now: these three nodes are reporting offline but are accessible and respond to network commands. I've tried to search many resources pertaining to this but cannot find a solution that worked.

One thing that may be of importance and why I explained what I did above. After dist-upgrade, I was prompted with a message. I don't quite recall what it was or its messaging but something about a file with differences and being prompted with keeping the current file (which is what I clicked, considering the file may have been something to do with configurations and whatnot), view file differences (to which there were only a few line differences that didn't really stick out as harmful but still opted to retain the original file), and a few others.

After dist-upgrade, I rebooted the servers for good measure, I noticed they took a while to come back up. I then tested a ping, noticed it worked, and then slowly got to where I am now.

Can anyone help shed some light?
 
Oh, also, I am fairly certain this issue has something to do with corosync. An error from syslog:
Feb 10 23:55:49 t1n4 pveproxy[1715]: Cluster not quorate - extending auth key lifetime!
 
Not sure I understand the situation completely - but it sounds like you had a 9-node cluster running PVE 5/stretch,
and then tried to update/updated 3 nodes to PVE 6/buster?

Major version upgrades (PVE 5 -> PVE 6) need a bit of preparation, and in that case corosync also made a major version jump.

Check out the wiki-entry for the version-upgrade:
https://pve.proxmox.com/wiki/Upgrade_from_5.x_to_6.0

For the current situation - compare the output of `pveversion -v` on all hosts - that should help identifying what is happening and what to do from there

I hope this helps!
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!