Not correct view of nodes in web

snakeru54

New Member
Jun 8, 2017
6
0
1
44
Hello.
I have a PROXMOX 4.4-1, cluster with 6 nodes, with 10 VM on each node.
In web-gui i see a 2 nodes with red-cross (non-working).
All VM, placed there, I can't to start with web-gui.
But, I can login (SSH) to one of VM on those "non-working" nodes, can restart them etc.
By the way, some VM still working, shows a usage a proc or mem.

(I can't attach a screenshot).

So, can anyone say, what happened?
What should I do to fix that?

Help me, please.
From Russia with love.

P. S. Sorry for my not-good-enough english.
 
Here is my steps and results.

I said:
omping -c 10000 -i 0.001 -F -q 10.1.1.223 10.1.1.229 10.1.1.238 10.1.1.232 10.1.1.228 10.1.1.234

I've got:
10.1.1.223 : waiting for response msg
10.1.1.229 : waiting for response msg
10.1.1.238 : waiting for response msg
10.1.1.228 : waiting for response msg
10.1.1.234 : waiting for response msg
^C
10.1.1.223 : response message never received
10.1.1.229 : response message never received
10.1.1.238 : response message never received
10.1.1.228 : response message never received
10.1.1.234 : response message never received

And I said:
systemctl status corosync

I've got:
corosync.service - Corosync Cluster Engine
Loaded: loaded (/lib/systemd/system/corosync.service; enabled)
Active: active (running) since Tue 2017-06-20 10:51:09 +07; 6h ago
Process: 26912 ExecStop=/usr/share/corosync/corosync stop (code=exited, status=0/SUCCESS)
Process: 26935 ExecStart=/usr/share/corosync/corosync start (code=exited, status=0/SUCCESS)
Main PID: 26945 (corosync)
CGroup: /system.slice/corosync.service
└─26945 corosync
Jun 20 17:11:13 pve-3 corosync[26945]: [TOTEM ] Invalid packet data
Jun 20 17:11:13 pve-3 corosync[26945]: [TOTEM ] Incoming packet has differe...ng
Jun 20 17:11:13 pve-3 corosync[26945]: [TOTEM ] Received message has invali...g.
Jun 20 17:11:13 pve-3 corosync[26945]: [TOTEM ] Invalid packet data
Jun 20 17:11:13 pve-3 corosync[26945]: [TOTEM ] Incoming packet has differe...ng
Jun 20 17:11:13 pve-3 corosync[26945]: [TOTEM ] Received message has invali...g.
Jun 20 17:11:13 pve-3 corosync[26945]: [TOTEM ] Invalid packet data
Jun 20 17:11:13 pve-3 corosync[26945]: [TOTEM ] Incoming packet has differe...ng
Jun 20 17:11:13 pve-3 corosync[26945]: [TOTEM ] Received message has invali...g.
Jun 20 17:11:13 pve-3 corosync[26945]: [TOTEM ] Invalid packet data
Hint: Some lines were ellipsized, use -l to show in full.

What should I do now?

I'm a novice in proxmox's jungle...
Thank's for your help.
 
Братуха, с мультикастом нормально всё?
Is there no problems with multicast?
 
Братуха, с мультикастом нормально всё?
Is there no problems with multicast?
Да хрен его знает. Пойду искать, куда сервера подключены и что с тем узлом можно сделать.
Well, dick knowing him. I'll search, where those servers plugged and look, what can I do with that knot...
 
I said:
omping -c 10000 -i 0.001 -F -q 10.1.1.223 10.1.1.229 10.1.1.238 10.1.1.232 10.1.1.228 10.1.1.234

I've got:
10.1.1.223 : waiting for response msg

Multicast is not working, so the cluster cannot really work in this state. As omping fails this is likely a network problem, not a configuration one.

As I assume that it worked previously there may have been some network setup changes which broke it.
I.e. check switches and firewall. Or where other changes made in the cluster?
 
Multicast is not working...
There is no firewall or any rules. All node's plugged to one 16-ports non-managed L2-switch.
Or where other changes made in the cluster?
I don't know. I think - nowhere.
And I going look'n'search. Here are the result's.

/etc/hosts on all nodes
127.0.0.1 localhost.localdomain localhost

10.1.1.234 pve-1.-----.local pve-1
10.1.1.229 pve-3.-----.local pve-3
10.1.1.238 pve-5.-----.local pve-5 pvelocalhost
10.1.1.232 pve-6.-----.local pve-6
10.1.1.227 pve-7.-----.local pve-7
10.1.1.228 pve-8.-----.local pve-8



# The following lines are desirable for IPv6 capable hosts

::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts

Node pve-5, not work, no one VM.
root@pve-5:!~# journalctl -xn
Jun 21 13:47:19 pve-5 pmxcfs[1282]: [quorum] crit: quorum_initialize failed: 2
Jun 21 13:47:19 pve-5 pmxcfs[1282]: [confdb] crit: cmap_initialize failed: 2
Jun 21 13:47:19 pve-5 pmxcfs[1282]: [dcdb] crit: cpg_initialize failed: 2
Jun 21 13:47:19 pve-5 pmxcfs[1282]: [status] crit: cpg_initialize failed: 2
Jun 21 13:47:19 pve-5 kernel: sd 6:0:0:1: [sdd] Very big device. Trying to use READ CAPACITY(16).

Node pve-3, not work, few VM, all works,
journalctl -xn
root@pve-3:~# journalctl -xn
-- Logs begin at Wed 2017-06-21 01:21:10 +07, end at Wed 2017-06-21 14:03:03 +07. --
Jun 21 14:03:03 pve-3 corosync[13408]: [TOTEM ] Invalid packet data
Jun 21 14:03:03 pve-3 corosync[13408]: [TOTEM ] Incoming packet has different crypto type. Rejecting
Jun 21 14:03:03 pve-3 corosync[13408]: [TOTEM ] Received message has invalid digest... ignoring.
Jun 21 14:03:03 pve-3 corosync[13408]: [TOTEM ] Invalid packet data

Node pve-8, works,VM are exists, working perfectly
root@pve-8:~# journalctl -xn
-- Logs begin at Tue 2017-06-20 09:12:59 +07, end at Wed 2017-06-21 14:05:36 +07
Jun 21 14:05:36 pve-8 corosync[6922]: [TOTEM ] Invalid packet data
Jun 21 14:05:36 pve-8 corosync[6922]: [TOTEM ] Incoming packet has different cry
Jun 21 14:05:36 pve-8 corosync[6922]: [TOTEM ] Received message has invalid dige
Jun 21 14:05:36 pve-8 corosync[6922]: [TOTEM ] Invalid packet data
Jun 21 14:05:36 pve-8 corosync[6922]: [TOTEM ] Incoming packet has different cry
Jun 21 14:05:36 pve-8 corosync[6922]: [TOTEM ] Received message has invalid dige
Jun 21 14:05:36 pve-8 corosync[6922]: [TOTEM ] Invalid packet data
Jun 21 14:05:36 pve-8 corosync[6922]: [TOTEM ] Incoming packet has different cry
Jun 21 14:05:36 pve-8 corosync[6922]: [TOTEM ] Received message has invalid dige
Jun 21 14:05:36 pve-8 corosync[6922]: [TOTEM ] Invalid packet data
 
А свитч какой? Гигабит или сотка?
What type of switch? 1 Gbps or 100 Mbps?
 
I said:
omping -c 10000 -i 0.001 -F -q 10.1.1.223 10.1.1.229 10.1.1.238 10.1.1.232 10.1.1.228 10.1.1.234

Did you execute this on all nodes at the (roughly) same time? Else this test won't work and shows the output you got.
 
Did you execute this on all nodes at the (roughly) same time? Else this test won't work and shows the output you got.
Yes, all switches same time.
Problem in another "place". And that was not a problem.
On the node, witch started all VM, but showing "not-worked" in web-gui of PROXMOX, we just said:
/etc/init.d/pvestatd start
and node came "alive" status.

Thank you for your attention))))
 
That's really good!
Хорошо, что всё так легко обошлось :)
Но свитчи ты на гигабит поменяй. Бэкап, небось, медленный?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!