Corosync won't start

okay. could you do the following for me?

Code:
cat /etc/default/corosync

and

Code:
systemctl cat corosync
 
okay. could you do the following for me?

Code:
cat /etc/default/corosync

and

Code:
systemctl cat corosync
Code:
root@pve847:~# cat /etc/default/corosync
# Command line options
#OPTIONS=""

and
Code:
root@pve847:~# systemctl cat corosync
# /lib/systemd/system/corosync.service
[Unit]
Description=Corosync Cluster Engine
Documentation=man:corosync man:corosync.conf man:corosync_overview
ConditionKernelCommandLine=!nocluster
ConditionPathExists=/etc/corosync/corosync.conf
Requires=network-online.target
After=network-online.target

[Service]
EnvironmentFile=-/etc/default/corosync
ExecStart=/usr/sbin/corosync -f $COROSYNC_OPTIONS
ExecStop=/usr/sbin/corosync-cfgtool -H --force
Type=notify

# In typical systemd deployments, both standard outputs are forwarded to
# journal (stderr is what's relevant in the pristine corosync configuration),
# which hazards a message redundancy since the syslog stream usually ends there
# as well; before editing this line, you may want to check DefaultStandardError
# in systemd-system.conf(5) and whether /dev/log is a systemd related symlink.
StandardError=null

# The following config is for corosync with enabled watchdog service.
#
#  When corosync watchdog service is being enabled and using with
#  pacemaker.service, and if you want to exert the watchdog when a
#  corosync process is terminated abnormally,
#  uncomment the line of the following Restart= and RestartSec=.
#Restart=on-failure
#  Specify a period longer than soft_margin as RestartSec.
#RestartSec=70
#  rewrite according to environment.
#ExecStartPre=/sbin/modprobe softdog
PrivateTmp=yes

[Install]
WantedBy=multi-user.target
 
at this point I am rather stumped.. if you start "corosync -f" again on the problematic node and keep it running, is the cluster functional? if not, could you post the full output of the corosync command?
 
that's really weird, as the command is identical to what systemd is supposed to run.. I guess you could retry with debug logging enabled (but a heads-up - it is really verbose).
 
that's really weird, as the command is identical to what systemd is supposed to run.. I guess you could retry with debug logging enabled (but a heads-up - it is really verbose).

I see you guys were having fun here. Could this be related to the pmxcfs issue which I had found on pve-devel in 2020?
 
that's really weird, as the command is identical to what systemd is supposed to run.. I guess you could retry with debug logging enabled (but a heads-up - it is really verbose).
not sure if this is related but when I connect to the node via terminal on my local machine, it takes a a long time to actually connect to 847, sometimes as much as 20 seconds, while the 380 node will take a second or so. No idea how I'd debug this, they're both connected to the same switch and uplink.
 
not sure if this is related but when I connect to the node via terminal on my local machine, it takes a a long time to actually connect to 847, sometimes as much as 20 seconds, while the 380 node will take a second or so. No idea how I'd debug this, they're both connected to the same switch and uplink.
If you you mean SSH, this might be completely unrelated, it's either e.g. doing reverse lookups so if you have some DNS at play, that might be why, or if there's IPv4 and IPv6, it takes a while to fallback to IPv4. Test connecting directly by IP to see if it's pure SSH issue, nothing to do with what you are troubleshooting.
 
If you you mean SSH, this might be completely unrelated, it's either e.g. doing reverse lookups so if you have some DNS at play, that might be why, or if there's IPv4 and IPv6, it takes a while to fallback to IPv4. Test connecting directly by IP to see if it's pure SSH issue, nothing to do with what you are troubleshooting.
I was already connecting direct by IP. If it wasn’t such a pain, I think I’m at the point where wiping and reinstalling OS would probably be better to fix the random niggles that have appeared on this node.
 
is there anything else that seems off? do you have monitoring in place that might show you anything out of the ordinary?
 
is there anything else that seems off? do you have monitoring in place that might show you anything out of the ordinary?
Other than the laggy opening of an SSH session and the fact that a reboot results in me having to start cronosync manually, it appears to run OK. So without a reboot, which is rare, a slow initial
Connection to SSH is all that is noticeable as being out of place on the node

As for monitoring. It’s connected to Grafana and also observium. I don’t really think either of those has any data that would help diagnose the issue. As for things like memory, disk and CPU usage, they seem normal.

Any suggestions for what is worth checking for either of these 2 issues?
 
like I said, enabling debug logging might give a clue. another thing that you could try would be overriding the unit ("systemctl edit --full corosync") and dropping the StdErr redirection there, add strace to the command line to see what is going on, or similar things..
 
I was already connecting direct by IP. If it wasn’t such a pain, I think I’m at the point where wiping and reinstalling OS would probably be better to fix the random niggles that have appeared on this node.
Do you happen to use hosts.deny? Can you run it as ssh -vv to see at which point it's making you wait? I don't think it's related at all, but since it should not be happening either, maybe you have network issue that you otherwise don't see?
 
Do you happen to use hosts.deny? Can you run it as ssh -vv to see at which point it's making you wait? I don't think it's related at all, but since it should not be happening either, maybe you have network issue that you otherwise don't see?

Hi,

Thanks for the response, the point at which it pauses appears to be debug1: pledge: filesystem

Not sure why this would be the case?
 
Hi,

Thanks for the response, the point at which it pauses appears to be debug1: pledge: filesystem

Not sure why this would be the case?
What's on the other side around the same time? journalctl -t sshd

like I said, enabling debug logging might give a clue

Did you also try to set logging { debug: on } in /etc/corosync/corosync.conf?
 
Last edited:
What's on the other side around the same time? journalctl -t sshd
Here's the entire login log from ssh
Last login: Sat Jan 20 09:55:49 on ttys000 ❯ ssh -vv root@xxx.xx.xxx.xxx OpenSSH_9.4p1, LibreSSL 3.3.6 debug1: Reading configuration data /etc/ssh/ssh_config debug1: /etc/ssh/ssh_config line 21: include /etc/ssh/ssh_config.d/* matched no files debug1: /etc/ssh/ssh_config line 54: Applying options for * debug2: resolve_canonicalize: hostname xxx.xx.xxx.xxx is address debug1: Authenticator provider $SSH_SK_PROVIDER did not resolve; disabling debug1: Connecting to xxx.xx.xxx.xxx [xxx.xx.xxx.xxx] port 22. debug1: Connection established. debug1: identity file /Users/user/.ssh/id_rsa type 0 debug1: identity file /Users/user/.ssh/id_rsa-cert type -1 debug1: identity file /Users/user/.ssh/id_ecdsa type -1 debug1: identity file /Users/user/.ssh/id_ecdsa-cert type -1 debug1: identity file /Users/user/.ssh/id_ecdsa_sk type -1 debug1: identity file /Users/user/.ssh/id_ecdsa_sk-cert type -1 debug1: identity file /Users/user/.ssh/id_ed25519 type -1 debug1: identity file /Users/user/.ssh/id_ed25519-cert type -1 debug1: identity file /Users/user/.ssh/id_ed25519_sk type -1 debug1: identity file /Users/user/.ssh/id_ed25519_sk-cert type -1 debug1: identity file /Users/user/.ssh/id_xmss type -1 debug1: identity file /Users/user/.ssh/id_xmss-cert type -1 debug1: identity file /Users/user/.ssh/id_dsa type -1 debug1: identity file /Users/user/.ssh/id_dsa-cert type -1 debug1: Local version string SSH-2.0-OpenSSH_9.4 debug1: Remote protocol version 2.0, remote software version OpenSSH_9.2p1 Debian-2+deb12u2 debug1: compat_banner: match: OpenSSH_9.2p1 Debian-2+deb12u2 pat OpenSSH* compat 0x04000000 debug2: fd 3 setting O_NONBLOCK debug1: Authenticating to xxx.xx.xxx.xxx:22 as 'root' debug1: load_hostkeys: fopen /Users/user/.ssh/known_hosts2: No such file or directory debug1: load_hostkeys: fopen /etc/ssh/ssh_known_hosts: No such file or directory debug1: load_hostkeys: fopen /etc/ssh/ssh_known_hosts2: No such file or directory debug1: SSH2_MSG_KEXINIT sent debug1: SSH2_MSG_KEXINIT received debug2: local client KEXINIT proposal debug2: KEX algorithms: sntrup761x25519-sha512@openssh.com,curve25519-sha256,curve25519-sha256@libssh.org,ecdh-sha2-nistp256,ecdh-sha2-nistp384,ecdh-sha2-nistp521,diffie-hellman-group-exchange-sha256,diffie-hellman-group16-sha512,diffie-hellman-group18-sha512,diffie-hellman-group14-sha256,ext-info-c debug2: host key algorithms: ecdsa-sha2-nistp256-cert-v01@openssh.com,ecdsa-sha2-nistp256,ssh-ed25519-cert-v01@openssh.com,ecdsa-sha2-nistp384-cert-v01@openssh.com,ecdsa-sha2-nistp521-cert-v01@openssh.com,sk-ssh-ed25519-cert-v01@openssh.com,sk-ecdsa-sha2-nistp256-cert-v01@openssh.com,rsa-sha2-512-cert-v01@openssh.com,rsa-sha2-256-cert-v01@openssh.com,ssh-ed25519,ecdsa-sha2-nistp384,ecdsa-sha2-nistp521,sk-ssh-ed25519@openssh.com,sk-ecdsa-sha2-nistp256@openssh.com,rsa-sha2-512,rsa-sha2-256 debug2: ciphers ctos: chacha20-poly1305@openssh.com,aes128-ctr,aes192-ctr,aes256-ctr,aes128-gcm@openssh.com,aes256-gcm@openssh.com debug2: ciphers stoc: chacha20-poly1305@openssh.com,aes128-ctr,aes192-ctr,aes256-ctr,aes128-gcm@openssh.com,aes256-gcm@openssh.com debug2: MACs ctos: umac-64-etm@openssh.com,umac-128-etm@openssh.com,hmac-sha2-256-etm@openssh.com,hmac-sha2-512-etm@openssh.com,hmac-sha1-etm@openssh.com,umac-64@openssh.com,umac-128@openssh.com,hmac-sha2-256,hmac-sha2-512,hmac-sha1 debug2: MACs stoc: umac-64-etm@openssh.com,umac-128-etm@openssh.com,hmac-sha2-256-etm@openssh.com,hmac-sha2-512-etm@openssh.com,hmac-sha1-etm@openssh.com,umac-64@openssh.com,umac-128@openssh.com,hmac-sha2-256,hmac-sha2-512,hmac-sha1 debug2: compression ctos: none,zlib@openssh.com,zlib debug2: compression stoc: none,zlib@openssh.com,zlib debug2: languages ctos: debug2: languages stoc: debug2: first_kex_follows 0 debug2: reserved 0 debug2: peer server KEXINIT proposal debug2: KEX algorithms: sntrup761x25519-sha512@openssh.com,curve25519-sha256,curve25519-sha256@libssh.org,ecdh-sha2-nistp256,ecdh-sha2-nistp384,ecdh-sha2-nistp521,diffie-hellman-group-exchange-sha256,diffie-hellman-group16-sha512,diffie-hellman-group18-sha512,diffie-hellman-group14-sha256,kex-strict-s-v00@openssh.com debug2: host key algorithms: rsa-sha2-512,rsa-sha2-256,ecdsa-sha2-nistp256,ssh-ed25519 debug2: ciphers ctos: chacha20-poly1305@openssh.com,aes128-ctr,aes192-ctr,aes256-ctr,aes128-gcm@openssh.com,aes256-gcm@openssh.com debug2: ciphers stoc: chacha20-poly1305@openssh.com,aes128-ctr,aes192-ctr,aes256-ctr,aes128-gcm@openssh.com,aes256-gcm@openssh.com debug2: MACs ctos: umac-64-etm@openssh.com,umac-128-etm@openssh.com,hmac-sha2-256-etm@openssh.com,hmac-sha2-512-etm@openssh.com,hmac-sha1-etm@openssh.com,umac-64@openssh.com,umac-128@openssh.com,hmac-sha2-256,hmac-sha2-512,hmac-sha1 debug2: MACs stoc: umac-64-etm@openssh.com,umac-128-etm@openssh.com,hmac-sha2-256-etm@openssh.com,hmac-sha2-512-etm@openssh.com,hmac-sha1-etm@openssh.com,umac-64@openssh.com,umac-128@openssh.com,hmac-sha2-256,hmac-sha2-512,hmac-sha1 debug2: compression ctos: none,zlib@openssh.com debug2: compression stoc: none,zlib@openssh.com debug2: languages ctos: debug2: languages stoc: debug2: first_kex_follows 0 debug2: reserved 0 debug1: kex: algorithm: sntrup761x25519-sha512@openssh.com debug1: kex: host key algorithm: ecdsa-sha2-nistp256 debug1: kex: server->client cipher: chacha20-poly1305@openssh.com MAC: <implicit> compression: none debug1: kex: client->server cipher: chacha20-poly1305@openssh.com MAC: <implicit> compression: none debug1: expecting SSH2_MSG_KEX_ECDH_REPLY debug1: SSH2_MSG_KEX_ECDH_REPLY received debug1: Server host key: ecdsa-sha2-nistp256 SHA256:uVEKdAfoXAlk3cbkZas0O9UgpVvR2Vf4xFh99lU7fGs debug1: load_hostkeys: fopen /Users/user/.ssh/known_hosts2: No such file or directory debug1: load_hostkeys: fopen /etc/ssh/ssh_known_hosts: No such file or directory debug1: load_hostkeys: fopen /etc/ssh/ssh_known_hosts2: No such file or directory debug1: Host 'xxx.xx.xxx.xxx' is known and matches the ECDSA host key. debug1: Found key in /Users/user/.ssh/known_hosts:127 debug2: ssh_set_newkeys: mode 1 debug1: rekey out after 134217728 blocks debug1: SSH2_MSG_NEWKEYS sent debug1: expecting SSH2_MSG_NEWKEYS debug1: SSH2_MSG_NEWKEYS received debug2: ssh_set_newkeys: mode 0 debug1: rekey in after 134217728 blocks debug1: get_agent_identities: bound agent to hostkey debug1: get_agent_identities: ssh_fetch_identitylist: agent contains no identities debug1: Will attempt key: /Users/user/.ssh/id_rsa RSA SHA256:LY8tMPMR14TGZIOWWcj1Lo+ohTBcKce39+vJRwK9sEc debug1: Will attempt key: /Users/user/.ssh/id_ecdsa debug1: Will attempt key: /Users/user/.ssh/id_ecdsa_sk debug1: Will attempt key: /Users/user/.ssh/id_ed25519 debug1: Will attempt key: /Users/user/.ssh/id_ed25519_sk debug1: Will attempt key: /Users/user/.ssh/id_xmss debug1: Will attempt key: /Users/user/.ssh/id_dsa debug2: pubkey_prepare: done debug1: SSH2_MSG_EXT_INFO received debug1: kex_input_ext_info: server-sig-algs=<ssh-ed25519,sk-ssh-ed25519@openssh.com,ecdsa-sha2-nistp256,ecdsa-sha2-nistp384,ecdsa-sha2-nistp521,sk-ecdsa-sha2-nistp256@openssh.com,webauthn-sk-ecdsa-sha2-nistp256@openssh.com,ssh-dss,ssh-rsa,rsa-sha2-256,rsa-sha2-512> debug1: kex_input_ext_info: publickey-hostbound@openssh.com=<0> debug2: service_accept: ssh-userauth debug1: SSH2_MSG_SERVICE_ACCEPT received debug1: Authentications that can continue: publickey,password debug1: Next authentication method: publickey debug1: Offering public key: /Users/user/.ssh/id_rsa RSA SHA256:LY8tMPMR14TGZIOWWcj1Lo+ohTBcKce39+vJRwK9sEc debug2: we sent a publickey packet, wait for reply debug1: Server accepts key: /Users/user/.ssh/id_rsa RSA SHA256:LY8tMPMR14TGZIOWWcj1Lo+ohTBcKce39+vJRwK9sEc Enter passphrase for key '/Users/user/.ssh/id_rsa': Authenticated to xxx.xx.xxx.xxx ([xxx.xx.xxx.xxx]:22) using "publickey". debug1: channel 0: new session [client-session] (inactive timeout: 0) debug2: channel 0: send open debug1: Requesting no-more-sessions@openssh.com debug1: Entering interactive session. debug1: pledge: filesystem debug1: client_input_global_request: rtype hostkeys-00@openssh.com want_reply 0 debug1: client_input_hostkeys: searching /Users/user/.ssh/known_hosts for xxx.xx.xxx.xxx / (none) debug1: client_input_hostkeys: searching /Users/user/.ssh/known_hosts2 for xxx.xx.xxx.xxx / (none) debug1: client_input_hostkeys: hostkeys file /Users/user/.ssh/known_hosts2 does not exist debug1: client_input_hostkeys: host key found matching a different name/address, skipping UserKnownHostsFile update debug1: Remote: /root/.ssh/authorized_keys:6: key options: agent-forwarding port-forwarding pty user-rc x11-forwarding debug1: Remote: /root/.ssh/authorized_keys:6: key options: agent-forwarding port-forwarding pty user-rc x11-forwarding debug2: channel_input_open_confirmation: channel 0: callback start debug2: fd 3 setting TCP_NODELAY debug2: client_session2_setup: id 0 debug2: channel 0: request pty-req confirm 1 debug1: Sending environment. debug1: channel 0: setting env LANG = "en_GB.UTF-8" debug2: channel 0: request env confirm 0 debug2: channel 0: request shell confirm 1 debug1: pledge: fork debug2: channel_input_open_confirmation: channel 0: callback done debug2: channel 0: open confirm rwindow 0 rmax 32768 debug2: channel_input_status_confirm: type 99 id 0 debug2: PTY allocation request accepted on channel 0 debug2: channel 0: rcvd adjust 2097152 debug2: channel_input_status_confirm: type 99 id 0 debug2: shell request accepted on channel 0 Linux pve847 6.5.11-7-pve #1 SMP PREEMPT_DYNAMIC PMX 6.5.11-7 (2023-12-05T09:44Z) x86_64 The programs included with the Debian GNU/Linux system are free software; the exact distribution terms for each program are described in the individual files in /usr/share/doc/*/copyright. Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent permitted by applicable law. Last login: Sat Jan 20 13:40:12 2024 from 217.42.217.91 root@pve847:~#
 
I've not enabled
Did you also try to set logging { debug: on } in /etc/corosync/corosync.conf?
I've not had a chance to edit corosync yet, it's a running system so haven't really wanted to cause it to go down again messing about with the corosync. I will need to though. its' not really the best having to have a command running to keep it alive.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!