please run "corosync -t" on the problematic node, the systemctl output indicates corosync still chokes on the config file..
Code:
corosync -t
Jan 16 12:44:51.575 notice [MAIN ] Corosync Cluster Engine exiting normally
please run "corosync -t" on the problematic node, the systemctl output indicates corosync still chokes on the config file..
corosync -t
Jan 16 12:44:51.575 notice [MAIN ] Corosync Cluster Engine exiting normally
okay. could you do the following for me?
Code:cat /etc/default/corosync
and
Code:systemctl cat corosync
root@pve847:~# cat /etc/default/corosync
# Command line options
#OPTIONS=""
root@pve847:~# systemctl cat corosync
# /lib/systemd/system/corosync.service
[Unit]
Description=Corosync Cluster Engine
Documentation=man:corosync man:corosync.conf man:corosync_overview
ConditionKernelCommandLine=!nocluster
ConditionPathExists=/etc/corosync/corosync.conf
Requires=network-online.target
After=network-online.target
[Service]
EnvironmentFile=-/etc/default/corosync
ExecStart=/usr/sbin/corosync -f $COROSYNC_OPTIONS
ExecStop=/usr/sbin/corosync-cfgtool -H --force
Type=notify
# In typical systemd deployments, both standard outputs are forwarded to
# journal (stderr is what's relevant in the pristine corosync configuration),
# which hazards a message redundancy since the syslog stream usually ends there
# as well; before editing this line, you may want to check DefaultStandardError
# in systemd-system.conf(5) and whether /dev/log is a systemd related symlink.
StandardError=null
# The following config is for corosync with enabled watchdog service.
#
# When corosync watchdog service is being enabled and using with
# pacemaker.service, and if you want to exert the watchdog when a
# corosync process is terminated abnormally,
# uncomment the line of the following Restart= and RestartSec=.
#Restart=on-failure
# Specify a period longer than soft_margin as RestartSec.
#RestartSec=70
# rewrite according to environment.
#ExecStartPre=/sbin/modprobe softdog
PrivateTmp=yes
[Install]
WantedBy=multi-user.target
that's really weird, as the command is identical to what systemd is supposed to run.. I guess you could retry with debug logging enabled (but a heads-up - it is really verbose).
not sure if this is related but when I connect to the node via terminal on my local machine, it takes a a long time to actually connect to 847, sometimes as much as 20 seconds, while the 380 node will take a second or so. No idea how I'd debug this, they're both connected to the same switch and uplink.that's really weird, as the command is identical to what systemd is supposed to run.. I guess you could retry with debug logging enabled (but a heads-up - it is really verbose).
Do you have some details about this? Perhaps I could checkI see you guys were having fun here. Could this be related to the pmxcfs issue which I had found on pve-devel in 2020?
Do you have some details about this? Perhaps I could check
If you you mean SSH, this might be completely unrelated, it's either e.g. doing reverse lookups so if you have some DNS at play, that might be why, or if there's IPv4 and IPv6, it takes a while to fallback to IPv4. Test connecting directly by IP to see if it's pure SSH issue, nothing to do with what you are troubleshooting.not sure if this is related but when I connect to the node via terminal on my local machine, it takes a a long time to actually connect to 847, sometimes as much as 20 seconds, while the 380 node will take a second or so. No idea how I'd debug this, they're both connected to the same switch and uplink.
I was already connecting direct by IP. If it wasn’t such a pain, I think I’m at the point where wiping and reinstalling OS would probably be better to fix the random niggles that have appeared on this node.If you you mean SSH, this might be completely unrelated, it's either e.g. doing reverse lookups so if you have some DNS at play, that might be why, or if there's IPv4 and IPv6, it takes a while to fallback to IPv4. Test connecting directly by IP to see if it's pure SSH issue, nothing to do with what you are troubleshooting.
Other than the laggy opening of an SSH session and the fact that a reboot results in me having to start cronosync manually, it appears to run OK. So without a reboot, which is rare, a slow initialis there anything else that seems off? do you have monitoring in place that might show you anything out of the ordinary?
Do you happen to useI was already connecting direct by IP. If it wasn’t such a pain, I think I’m at the point where wiping and reinstalling OS would probably be better to fix the random niggles that have appeared on this node.
hosts.deny
? Can you run it as ssh -vv
to see at which point it's making you wait? I don't think it's related at all, but since it should not be happening either, maybe you have network issue that you otherwise don't see?Do you happen to usehosts.deny
? Can you run it asssh -vv
to see at which point it's making you wait? I don't think it's related at all, but since it should not be happening either, maybe you have network issue that you otherwise don't see?
What's on the other side around the same time?Hi,
Thanks for the response, the point at which it pauses appears to be debug1: pledge: filesystem
Not sure why this would be the case?
journalctl -t sshd
like I said, enabling debug logging might give a clue
logging { debug: on }
in /etc/corosync/corosync.conf
?Here's the entire login log from sshWhat's on the other side around the same time?journalctl -t sshd
Last login: Sat Jan 20 09:55:49 on ttys000
❯ ssh -vv root@xxx.xx.xxx.xxx
OpenSSH_9.4p1, LibreSSL 3.3.6
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: /etc/ssh/ssh_config line 21: include /etc/ssh/ssh_config.d/* matched no files
debug1: /etc/ssh/ssh_config line 54: Applying options for *
debug2: resolve_canonicalize: hostname xxx.xx.xxx.xxx is address
debug1: Authenticator provider $SSH_SK_PROVIDER did not resolve; disabling
debug1: Connecting to xxx.xx.xxx.xxx [xxx.xx.xxx.xxx] port 22.
debug1: Connection established.
debug1: identity file /Users/user/.ssh/id_rsa type 0
debug1: identity file /Users/user/.ssh/id_rsa-cert type -1
debug1: identity file /Users/user/.ssh/id_ecdsa type -1
debug1: identity file /Users/user/.ssh/id_ecdsa-cert type -1
debug1: identity file /Users/user/.ssh/id_ecdsa_sk type -1
debug1: identity file /Users/user/.ssh/id_ecdsa_sk-cert type -1
debug1: identity file /Users/user/.ssh/id_ed25519 type -1
debug1: identity file /Users/user/.ssh/id_ed25519-cert type -1
debug1: identity file /Users/user/.ssh/id_ed25519_sk type -1
debug1: identity file /Users/user/.ssh/id_ed25519_sk-cert type -1
debug1: identity file /Users/user/.ssh/id_xmss type -1
debug1: identity file /Users/user/.ssh/id_xmss-cert type -1
debug1: identity file /Users/user/.ssh/id_dsa type -1
debug1: identity file /Users/user/.ssh/id_dsa-cert type -1
debug1: Local version string SSH-2.0-OpenSSH_9.4
debug1: Remote protocol version 2.0, remote software version OpenSSH_9.2p1 Debian-2+deb12u2
debug1: compat_banner: match: OpenSSH_9.2p1 Debian-2+deb12u2 pat OpenSSH* compat 0x04000000
debug2: fd 3 setting O_NONBLOCK
debug1: Authenticating to xxx.xx.xxx.xxx:22 as 'root'
debug1: load_hostkeys: fopen /Users/user/.ssh/known_hosts2: No such file or directory
debug1: load_hostkeys: fopen /etc/ssh/ssh_known_hosts: No such file or directory
debug1: load_hostkeys: fopen /etc/ssh/ssh_known_hosts2: No such file or directory
debug1: SSH2_MSG_KEXINIT sent
debug1: SSH2_MSG_KEXINIT received
debug2: local client KEXINIT proposal
debug2: KEX algorithms: sntrup761x25519-sha512@openssh.com,curve25519-sha256,curve25519-sha256@libssh.org,ecdh-sha2-nistp256,ecdh-sha2-nistp384,ecdh-sha2-nistp521,diffie-hellman-group-exchange-sha256,diffie-hellman-group16-sha512,diffie-hellman-group18-sha512,diffie-hellman-group14-sha256,ext-info-c
debug2: host key algorithms: ecdsa-sha2-nistp256-cert-v01@openssh.com,ecdsa-sha2-nistp256,ssh-ed25519-cert-v01@openssh.com,ecdsa-sha2-nistp384-cert-v01@openssh.com,ecdsa-sha2-nistp521-cert-v01@openssh.com,sk-ssh-ed25519-cert-v01@openssh.com,sk-ecdsa-sha2-nistp256-cert-v01@openssh.com,rsa-sha2-512-cert-v01@openssh.com,rsa-sha2-256-cert-v01@openssh.com,ssh-ed25519,ecdsa-sha2-nistp384,ecdsa-sha2-nistp521,sk-ssh-ed25519@openssh.com,sk-ecdsa-sha2-nistp256@openssh.com,rsa-sha2-512,rsa-sha2-256
debug2: ciphers ctos: chacha20-poly1305@openssh.com,aes128-ctr,aes192-ctr,aes256-ctr,aes128-gcm@openssh.com,aes256-gcm@openssh.com
debug2: ciphers stoc: chacha20-poly1305@openssh.com,aes128-ctr,aes192-ctr,aes256-ctr,aes128-gcm@openssh.com,aes256-gcm@openssh.com
debug2: MACs ctos: umac-64-etm@openssh.com,umac-128-etm@openssh.com,hmac-sha2-256-etm@openssh.com,hmac-sha2-512-etm@openssh.com,hmac-sha1-etm@openssh.com,umac-64@openssh.com,umac-128@openssh.com,hmac-sha2-256,hmac-sha2-512,hmac-sha1
debug2: MACs stoc: umac-64-etm@openssh.com,umac-128-etm@openssh.com,hmac-sha2-256-etm@openssh.com,hmac-sha2-512-etm@openssh.com,hmac-sha1-etm@openssh.com,umac-64@openssh.com,umac-128@openssh.com,hmac-sha2-256,hmac-sha2-512,hmac-sha1
debug2: compression ctos: none,zlib@openssh.com,zlib
debug2: compression stoc: none,zlib@openssh.com,zlib
debug2: languages ctos:
debug2: languages stoc:
debug2: first_kex_follows 0
debug2: reserved 0
debug2: peer server KEXINIT proposal
debug2: KEX algorithms: sntrup761x25519-sha512@openssh.com,curve25519-sha256,curve25519-sha256@libssh.org,ecdh-sha2-nistp256,ecdh-sha2-nistp384,ecdh-sha2-nistp521,diffie-hellman-group-exchange-sha256,diffie-hellman-group16-sha512,diffie-hellman-group18-sha512,diffie-hellman-group14-sha256,kex-strict-s-v00@openssh.com
debug2: host key algorithms: rsa-sha2-512,rsa-sha2-256,ecdsa-sha2-nistp256,ssh-ed25519
debug2: ciphers ctos: chacha20-poly1305@openssh.com,aes128-ctr,aes192-ctr,aes256-ctr,aes128-gcm@openssh.com,aes256-gcm@openssh.com
debug2: ciphers stoc: chacha20-poly1305@openssh.com,aes128-ctr,aes192-ctr,aes256-ctr,aes128-gcm@openssh.com,aes256-gcm@openssh.com
debug2: MACs ctos: umac-64-etm@openssh.com,umac-128-etm@openssh.com,hmac-sha2-256-etm@openssh.com,hmac-sha2-512-etm@openssh.com,hmac-sha1-etm@openssh.com,umac-64@openssh.com,umac-128@openssh.com,hmac-sha2-256,hmac-sha2-512,hmac-sha1
debug2: MACs stoc: umac-64-etm@openssh.com,umac-128-etm@openssh.com,hmac-sha2-256-etm@openssh.com,hmac-sha2-512-etm@openssh.com,hmac-sha1-etm@openssh.com,umac-64@openssh.com,umac-128@openssh.com,hmac-sha2-256,hmac-sha2-512,hmac-sha1
debug2: compression ctos: none,zlib@openssh.com
debug2: compression stoc: none,zlib@openssh.com
debug2: languages ctos:
debug2: languages stoc:
debug2: first_kex_follows 0
debug2: reserved 0
debug1: kex: algorithm: sntrup761x25519-sha512@openssh.com
debug1: kex: host key algorithm: ecdsa-sha2-nistp256
debug1: kex: server->client cipher: chacha20-poly1305@openssh.com MAC: <implicit> compression: none
debug1: kex: client->server cipher: chacha20-poly1305@openssh.com MAC: <implicit> compression: none
debug1: expecting SSH2_MSG_KEX_ECDH_REPLY
debug1: SSH2_MSG_KEX_ECDH_REPLY received
debug1: Server host key: ecdsa-sha2-nistp256 SHA256:uVEKdAfoXAlk3cbkZas0O9UgpVvR2Vf4xFh99lU7fGs
debug1: load_hostkeys: fopen /Users/user/.ssh/known_hosts2: No such file or directory
debug1: load_hostkeys: fopen /etc/ssh/ssh_known_hosts: No such file or directory
debug1: load_hostkeys: fopen /etc/ssh/ssh_known_hosts2: No such file or directory
debug1: Host 'xxx.xx.xxx.xxx' is known and matches the ECDSA host key.
debug1: Found key in /Users/user/.ssh/known_hosts:127
debug2: ssh_set_newkeys: mode 1
debug1: rekey out after 134217728 blocks
debug1: SSH2_MSG_NEWKEYS sent
debug1: expecting SSH2_MSG_NEWKEYS
debug1: SSH2_MSG_NEWKEYS received
debug2: ssh_set_newkeys: mode 0
debug1: rekey in after 134217728 blocks
debug1: get_agent_identities: bound agent to hostkey
debug1: get_agent_identities: ssh_fetch_identitylist: agent contains no identities
debug1: Will attempt key: /Users/user/.ssh/id_rsa RSA SHA256:LY8tMPMR14TGZIOWWcj1Lo+ohTBcKce39+vJRwK9sEc
debug1: Will attempt key: /Users/user/.ssh/id_ecdsa
debug1: Will attempt key: /Users/user/.ssh/id_ecdsa_sk
debug1: Will attempt key: /Users/user/.ssh/id_ed25519
debug1: Will attempt key: /Users/user/.ssh/id_ed25519_sk
debug1: Will attempt key: /Users/user/.ssh/id_xmss
debug1: Will attempt key: /Users/user/.ssh/id_dsa
debug2: pubkey_prepare: done
debug1: SSH2_MSG_EXT_INFO received
debug1: kex_input_ext_info: server-sig-algs=<ssh-ed25519,sk-ssh-ed25519@openssh.com,ecdsa-sha2-nistp256,ecdsa-sha2-nistp384,ecdsa-sha2-nistp521,sk-ecdsa-sha2-nistp256@openssh.com,webauthn-sk-ecdsa-sha2-nistp256@openssh.com,ssh-dss,ssh-rsa,rsa-sha2-256,rsa-sha2-512>
debug1: kex_input_ext_info: publickey-hostbound@openssh.com=<0>
debug2: service_accept: ssh-userauth
debug1: SSH2_MSG_SERVICE_ACCEPT received
debug1: Authentications that can continue: publickey,password
debug1: Next authentication method: publickey
debug1: Offering public key: /Users/user/.ssh/id_rsa RSA SHA256:LY8tMPMR14TGZIOWWcj1Lo+ohTBcKce39+vJRwK9sEc
debug2: we sent a publickey packet, wait for reply
debug1: Server accepts key: /Users/user/.ssh/id_rsa RSA SHA256:LY8tMPMR14TGZIOWWcj1Lo+ohTBcKce39+vJRwK9sEc
Enter passphrase for key '/Users/user/.ssh/id_rsa':
Authenticated to xxx.xx.xxx.xxx ([xxx.xx.xxx.xxx]:22) using "publickey".
debug1: channel 0: new session [client-session] (inactive timeout: 0)
debug2: channel 0: send open
debug1: Requesting no-more-sessions@openssh.com
debug1: Entering interactive session.
debug1: pledge: filesystem
debug1: client_input_global_request: rtype hostkeys-00@openssh.com want_reply 0
debug1: client_input_hostkeys: searching /Users/user/.ssh/known_hosts for xxx.xx.xxx.xxx / (none)
debug1: client_input_hostkeys: searching /Users/user/.ssh/known_hosts2 for xxx.xx.xxx.xxx / (none)
debug1: client_input_hostkeys: hostkeys file /Users/user/.ssh/known_hosts2 does not exist
debug1: client_input_hostkeys: host key found matching a different name/address, skipping UserKnownHostsFile update
debug1: Remote: /root/.ssh/authorized_keys:6: key options: agent-forwarding port-forwarding pty user-rc x11-forwarding
debug1: Remote: /root/.ssh/authorized_keys:6: key options: agent-forwarding port-forwarding pty user-rc x11-forwarding
debug2: channel_input_open_confirmation: channel 0: callback start
debug2: fd 3 setting TCP_NODELAY
debug2: client_session2_setup: id 0
debug2: channel 0: request pty-req confirm 1
debug1: Sending environment.
debug1: channel 0: setting env LANG = "en_GB.UTF-8"
debug2: channel 0: request env confirm 0
debug2: channel 0: request shell confirm 1
debug1: pledge: fork
debug2: channel_input_open_confirmation: channel 0: callback done
debug2: channel 0: open confirm rwindow 0 rmax 32768
debug2: channel_input_status_confirm: type 99 id 0
debug2: PTY allocation request accepted on channel 0
debug2: channel 0: rcvd adjust 2097152
debug2: channel_input_status_confirm: type 99 id 0
debug2: shell request accepted on channel 0
Linux pve847 6.5.11-7-pve #1 SMP PREEMPT_DYNAMIC PMX 6.5.11-7 (2023-12-05T09:44Z) x86_64
The programs included with the Debian GNU/Linux system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.
Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.
Last login: Sat Jan 20 13:40:12 2024 from 217.42.217.91
root@pve847:~#
I've not had a chance to edit corosync yet, it's a running system so haven't really wanted to cause it to go down again messing about with the corosync. I will need to though. its' not really the best having to have a command running to keep it alive.Did you also try to setlogging { debug: on }
in/etc/corosync/corosync.conf
?