[SOLVED] section 'device' already exists and not marked as array! (500)

BellaicheF

New Member
Jun 19, 2024
6
0
1
Hi,

I built 2 weeks ago a cluster with HA using 2 Proxmox servers.

Servers version is 8.1.4 on both
I made few changes to get the Quorum with only 1 server.
I have 3 different networks on each :
- One for storage
- One for the cluster
- One for the Prod

They can reach each others on the 3 lans.

Till now I had no issue but today in datacenter->cluster I have this issue : section 'device' already exists and not marked as array! (500), and nothing else.

1718806483860.png

ha-manager status gives this :

Bash:
ha-manager status
quorum OK
master NO-PROX-01 (active, Wed Jun 19 15:51:40 2024)
lrm NOR-PROX-02, started)
service vm:101 (NO-PROX-02, started)

But pvecm status give this :

Code:
Can't use an undefined value as a HASH reference at /usr/share/perl5/PVE/CLI/pvecm.pm line 486, <DATA> line 960

A look on similar issues in this forum lead to do a diff between /etc/pve/corosync.conf and /etc/corosync/corosync.conf. It shows no differences.
Any idea on what could be the issue ?

Thanks in advance.
 
Further information to help me in this case :

Bash:
systemctl status corosync
● corosync.service - Corosync Cluster Engine
     Loaded: loaded (/lib/systemd/system/corosync.service; enabled; preset: enabled)
     Active: active (running) since Wed 2024-06-19 16:19:59 CEST; 5min ago
       Docs: man:corosync
             man:corosync.conf
             man:corosync_overview
   Main PID: 3215472 (corosync)
      Tasks: 9 (limit: 57702)
     Memory: 131.1M
        CPU: 5.229s
     CGroup: /system.slice/corosync.service
             └─3215472 /usr/sbin/corosync -f


Jun 19 16:20:00 NO-PROX-02 corosync[3215472]:   [KNET  ] rx: host: 1 link: 0 is up
Jun 19 16:20:00 NO-PROX-02 corosync[3215472]:   [KNET  ] link: Resetting MTU for link 0 because host 1 joined
Jun 19 16:20:00 NO-PROX-02 corosync[3215472]:   [KNET  ] host: host: 1 (passive) best link: 0 (pri: 1)
Jun 19 16:20:01 NO-PROX-02 corosync[3215472]:   [KNET  ] pmtud: PMTUD link change for host: 1 link: 0 from 469 to 1397
Jun 19 16:20:01 NO-PROX-02 corosync[3215472]:   [KNET  ] pmtud: Global data MTU changed to: 1397
Jun 19 16:20:01 NO-PROX-02 corosync[3215472]:   [QUORUM] Sync members[2]: 1 2
Jun 19 16:20:01 NO-PROX-02 corosync[3215472]:   [QUORUM] Sync joined[1]: 1
Jun 19 16:20:01 NO-PROX-02 corosync[3215472]:   [TOTEM ] A new membership (1.e6) was formed. Members joined: 1
Jun 19 16:20:01 NO-PROX-02 corosync[3215472]:   [QUORUM] Members[2]: 1 2
Jun 19 16:20:01 NO-PROX-02 corosync[3215472]:   [MAIN  ] Completed service synchronization, ready to provide service.
root@NO-PROX-02:~# corosync-quorumtool
Quorum information
------------------
Date:             Wed Jun 19 16:25:40 2024
Quorum provider:  corosync_votequorum
Nodes:            2
Node ID:          2
Ring ID:          1.e6
Quorate:          Yes


Votequorum information
----------------------
Expected votes:   2
Highest expected: 2
Total votes:      2
Quorum:           1 
Flags:            2Node Quorate WaitForAll


Membership information
----------------------
    Nodeid      Votes Name
         1          1 NO-PROX-01
         2          1 NO-PROX-02 (local)
 
I went a little bit further in my test restarting at the same time both servers.
They behave perfectly fine. The server that was handling machines wait the second server to be up again before mirgate machines adn then restart itself. And finally vm went back on this server.

Unfortunatelly the error remain.

This is my /etc/pve/corosync.conf file, is there any mistake ? :

YAML:
logging {
  debug: off
  to_syslog: yes
}


nodelist {
  node {
    name: NO-PROX-01
    nodeid: 1
    quorum_votes: 1
    ring0_addr: 10.0.1.9
  }
  node {
    name: NO-PROX-02
    nodeid: 2
    quorum_votes: 1
    ring0_addr: 10.0.1.8
  }
}


quorum {
  provider: corosync_votequorum
  expected_votes: 1
  two_node: 1
}


totem {
  cluster_name: Notown-Cluster
  config_version: 2
  interface {
    linknumber: 0
  }
  ip_version: ipv4-6
  link_mode: passive
  secauth: on
  version: 2
}


fencing {
    mode: resource
    devices {
        device {
            agent: fence_ssh
            ipaddr: 10.0.1.9
            login: root
            passwd: somepass
        }
        device {
            agent: fence_ssh
            ipaddr: 10.0.1.8
            login: root
            passwd: somepass
        }
    }
}
 
Regarding this error given by command "pvecm status" :
Bash:
Can't use an undefined value as a HASH reference at /usr/share/perl5/PVE/CLI/pvecm.pm line 486, <DATA> line 960

I edited /usr/share/perl5/PVE/CLI/pvecm.pm line 486 from this :
Perl:
if (scalar(%$totem)) {
to this :
Perl:
if (defined($totem) && ref($totem) eq 'HASH' && scalar(%$totem)) {

Now pevm status give the right informations, but in WEB GUI, Datacenter->Cluster still give this error :

1718869730309.png

Many thanks in advance for any help.
 
Last edited:
Sorry for the flooding, this is the journalctl regarding the cluster when restarting a server, I see no error :confused: :

Code:
Prox01 server restart
Jun 20 10:22:51 NO-PROX-01 corosync[1007]:   [SERV  ] Service engine unloaded: corosync cluster closed process group service v1.01
Jun 20 10:22:51 NO-PROX-01 corosync[1007]:   [SERV  ] Service engine unloaded: corosync cluster quorum service v0.1
Jun 20 10:22:51 NO-PROX-01 pmxcfs[911]: [dcdb] notice: start cluster connection
Jun 20 10:22:52 NO-PROX-01 systemd[1]: Stopping pve-cluster.service - The Proxmox VE cluster filesystem...
Jun 20 10:22:52 NO-PROX-01 systemd[1]: pve-cluster.service: Deactivated successfully.
Jun 20 10:22:52 NO-PROX-01 systemd[1]: Stopped pve-cluster.service - The Proxmox VE cluster filesystem.
Jun 20 10:22:52 NO-PROX-01 systemd[1]: pve-cluster.service: Consumed 1min 32.964s CPU time.
Jun 20 10:24:51 NO-PROX-01 systemd[1]: Starting pve-cluster.service - The Proxmox VE cluster filesystem...
Jun 20 10:24:52 NO-PROX-01 systemd[1]: Started pve-cluster.service - The Proxmox VE cluster filesystem.
Jun 20 10:24:53 NO-PROX-01 corosync[1010]:   [SERV  ] Service engine loaded: corosync cluster closed process group service v1.01 [2]
Jun 20 10:24:53 NO-PROX-01 corosync[1010]:   [SERV  ] Service engine loaded: corosync cluster quorum service v0.1 [3]
Jun 20 10:24:57 NO-PROX-01 pmxcfs[913]: [status] notice: update cluster info (cluster name  Notown-Cluster, version = 2)


Prox02
un 20 10:17:16 NO-PROX-02 corosync[1192]:   [SERV  ] Service engine unloaded: corosync cluster closed process group service v1.01
Jun 20 10:17:16 NO-PROX-02 corosync[1192]:   [SERV  ] Service engine unloaded: corosync cluster quorum service v0.1
Jun 20 10:17:17 NO-PROX-02 systemd[1]: Stopping pve-cluster.service - The Proxmox VE cluster filesystem...
Jun 20 10:17:17 NO-PROX-02 pmxcfs[1186]: [dcdb] notice: start cluster connection
Jun 20 10:17:17 NO-PROX-02 pmxcfs[1186]: [status] notice: start cluster connection
Jun 20 10:17:18 NO-PROX-02 systemd[1]: pve-cluster.service: Deactivated successfully.
Jun 20 10:17:18 NO-PROX-02 systemd[1]: Stopped pve-cluster.service - The Proxmox VE cluster filesystem.
Jun 20 10:17:18 NO-PROX-02 systemd[1]: pve-cluster.service: Consumed 1min 51.768s CPU time.
Jun 20 10:20:28 NO-PROX-02 systemd[1]: Starting pve-cluster.service - The Proxmox VE cluster filesystem...
Jun 20 10:20:30 NO-PROX-02 systemd[1]: Started pve-cluster.service - The Proxmox VE cluster filesystem.
Jun 20 10:20:30 NO-PROX-02 corosync[1201]:   [SERV  ] Service engine loaded: corosync cluster closed process group service v1.01 [2]
Jun 20 10:20:30 NO-PROX-02 corosync[1201]:   [SERV  ] Service engine loaded: corosync cluster quorum service v0.1 [3]
Jun 20 10:20:35 NO-PROX-02 pmxcfs[1194]: [status] notice: update cluster info (cluster name  Notown-Cluster, version = 2)
Jun 20 10:21:54 NO-PROX-02 pvedaemon[1252]: <root@pam> update VM 101: -tags clusterstorage,client
 
I built 2 weeks ago a cluster with HA using 2 Proxmox servers.
that's not a supported setup, you need 3 nodes for HA (quorum!, the corosync option two_node, etc is not supported by us)

YAML:
fencing {
    mode: resource
    devices {
        device {
            agent: fence_ssh
            ipaddr: 10.0.1.9
            login: root
            passwd: somepass
        }
        device {
            agent: fence_ssh
            ipaddr: 10.0.1.8
            login: root
            passwd: somepass
        }
    }
}

where did you get that part of the config? i couldn't find it documented anywhere in the manpages or documentation and neither in the source code of corosync (maybe it was there in an old version bit isn't anymore?)

anyway here is the actual source of the error message, you define 'device' twice with different values, and the parser chokes on that
 
that's not a supported setup, you need 3 nodes for HA (quorum!, the corosync option two_node, etc is not supported by us)



where did you get that part of the config? i couldn't find it documented anywhere in the manpages or documentation and neither in the source code of corosync (maybe it was there in an old version bit isn't anymore?)

anyway here is the actual source of the error message, you define 'device' twice with different values, and the parser chokes on that
Hi, That was my issue ! I removed the fencing section and everything worked again.

What's disturbing is that corosync never complain using this conf !! Obviously I misunderstood how fencing works. I got to read a little bit more I guess.

THANK YOU
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!