Loss of remote internet access

gYanis

New Member
Dec 10, 2021
12
0
1
37
proxmox (1).jpg
Hello,
Usually, I connect to the services of my servers (ssh, web, sftp...) from the internet.
But after a few days, the services hosted on the VMs and the proxmox web interface are no longer accessible: "ERR_CONNECTION_REFUSED"
As if port forwarding didn't work anymore.
Only the services on the independent server remain accessible. (port A).
>>I was thinking of a problem with proxmox!!!

The firewall is disabled on proxmox.
Restarting the proxmox servers does not solve the problem.
Restarting the router solved the problem.
But when restarting the router proxmox also performed a restart of the nodes.
I can't identify who is blocking, the router or proxmox.
The problem comes back after a few days/hours.



VMs under proxmox are still accessible with TeamViewer.
The network works well.

Have you ever had this problem ?
Can you help me ?
Thank you
 

bobmc

Well-Known Member
May 17, 2018
527
85
48
65
Accessibility via TeamViewer only proves the VM is still running

"when restarting the router proxmox also performed a restart of the nodes" so it this a cluster setup? why would restarting the router reset the nodes?

Are your IP's statically assigned or is DHCP involved?
 

gYanis

New Member
Dec 10, 2021
12
0
1
37
The restart of the nodes is not wanted, but it takes place, with reboot of all the VMs. I can't turn it off.
The proxmox servers are in fixed IP.

May 14 01:39:57 srv-virt-01 pmxcfs[1473]: [dcdb] notice: members: 1/1473 May 14 01:39:57 srv-virt-01 pmxcfs[1473]: [dcdb] notice: all data is up to date May 14 01:40:09 srv-virt-01 pvescheduler[13273]: jobs: cfs-lock 'file-jobs_cfg' error: no quorum! May 14 01:40:09 srv-virt-01 pvescheduler[13272]: replication: cfs-lock 'file-replication_cfg' error: no quorum! -- Reboot --
 

gYanis

New Member
Dec 10, 2021
12
0
1
37
Hello,

The quorum is working well.
The reboot of the proxmox servers takes place because I restart the router.
>> Lost connection (or link), >> Automatic reboot of servers

I restart the router because I lose the forwarding port to the server and the VMs. (about every two days, sometimes a week)
But I don't know if the problem comes from the router (the forwarding port works towards another server).
A manual reboot of the servers does not solve the problem.
The reboot of the router causes the reboot of the nodes and the forwarding port works again.
Port forwarding problem (or a blockage due to a firewall but everything is open for testing).
 

shrdlicka

Active Member
Staff member
May 2, 2022
334
38
28
Bonjour ;),
It sounds like a router issue to me. Can you maybe disable the automatic reboot of the proxmox server?
 

gYanis

New Member
Dec 10, 2021
12
0
1
37
I would like to disable automatic reboot.
I can't find how to do it.


I'm also thinking about the router.
 
Last edited:

shrdlicka

Active Member
Staff member
May 2, 2022
334
38
28
try checking your logs, restarting the proxmox server in connection los is not a proxmox feature as far as I know
 

gYanis

New Member
Dec 10, 2021
12
0
1
37
The bug just happened again after 4 days.
In the logs / I reboot the router / and I find this in the logs:




May 14 01:39:57 srv-virt-01 pmxcfs[1473]: [dcdb] notice: members: 1/1473
May 14 01:39:57 srv-virt-01 pmxcfs[1473]: [dcdb] notice: all data is up to date
May 14 01:40:09 srv-virt-01 pvescheduler[13273]: jobs: cfs-lock 'file-jobs_cfg' error: no quorum!
May 14 01:40:09 srv-virt-01 pvescheduler[13272]: replication: cfs-lock 'file-replication_cfg' error: no quorum!
-- Reboot --
 

gYanis

New Member
Dec 10, 2021
12
0
1
37
Active connections with remote-viewer, are not disconnected.
I have access to everything locally.
Remotely via the web, I no longer have access to the services hosted on the proxmox servers.
I still have ssh access to other servers not hosted on proxmox.
 

gYanis

New Member
Dec 10, 2021
12
0
1
37
Syslog nod 1:



May 19 08:26:21 srv-virt-01 corosync[1568]: [TOTEM ] A processor failed, forming new configuration: token timed out (3650ms), waiting 4380ms for consensus. May 19 08:26:26 srv-virt-01 corosync[1568]: [QUORUM] Sync members[1]: 1 May 19 08:26:26 srv-virt-01 corosync[1568]: [QUORUM] Sync left[1]: 2 May 19 08:26:26 srv-virt-01 corosync[1568]: [TOTEM ] A new membership (1.378) was formed. Members left: 2 May 19 08:26:26 srv-virt-01 corosync[1568]: [TOTEM ] Failed to receive the leave message. failed: 2 May 19 08:26:26 srv-virt-01 pmxcfs[1497]: [dcdb] notice: members: 1/1497 May 19 08:26:26 srv-virt-01 pmxcfs[1497]: [status] notice: members: 1/1497 May 19 08:26:26 srv-virt-01 corosync[1568]: [QUORUM] This node is within the non-primary component and will NOT provide any services. May 19 08:26:26 srv-virt-01 corosync[1568]: [QUORUM] Members[1]: 1 May 19 08:26:26 srv-virt-01 corosync[1568]: [MAIN ] Completed service synchronization, ready to provide service. May 19 08:26:26 srv-virt-01 pmxcfs[1497]: [status] notice: node lost quorum May 19 08:26:26 srv-virt-01 pmxcfs[1497]: [dcdb] crit: received write while not quorate - trigger resync May 19 08:26:26 srv-virt-01 pmxcfs[1497]: [dcdb] crit: leaving CPG group May 19 08:26:26 srv-virt-01 pmxcfs[1497]: [dcdb] notice: start cluster connection May 19 08:26:26 srv-virt-01 pmxcfs[1497]: [dcdb] crit: cpg_join failed: 14 May 19 08:26:26 srv-virt-01 pmxcfs[1497]: [dcdb] crit: can't initialize service May 19 08:26:26 srv-virt-01 pve-ha-lrm[1631]: lost lock 'ha_agent_srv-virt-01_lock - cfs lock update failed - Device or resource busy May 19 08:26:26 srv-virt-01 pve-ha-crm[1622]: status change slave => wait_for_quorum May 19 08:26:28 srv-virt-01 pve-ha-lrm[1631]: status change active => lost_agent_lock May 19 08:26:32 srv-virt-01 pmxcfs[1497]: [dcdb] notice: members: 1/1497 May 19 08:26:32 srv-virt-01 pmxcfs[1497]: [dcdb] notice: all data is up to date May 19 08:27:04 srv-virt-01 kernel: bnxt_en 0000:17:00.0 eno1np0: NIC Link is Up, 1000 Mbps full duplex, Flow control: none May 19 08:27:04 srv-virt-01 kernel: bnxt_en 0000:17:00.0 eno1np0: EEE is not active May 19 08:27:04 srv-virt-01 kernel: bnxt_en 0000:17:00.0 eno1np0: FEC autoneg off encoding: None May 19 08:27:04 srv-virt-01 kernel: vmbr0: port 1(eno1np0) entered blocking state May 19 08:27:04 srv-virt-01 kernel: vmbr0: port 1(eno1np0) entered forwarding state May 19 08:27:05 srv-virt-01 kernel: bnxt_en 0000:17:00.0 eno1np0: NIC Link is Down May 19 08:27:05 srv-virt-01 kernel: vmbr0: port 1(eno1np0) entered disabled state May 19 08:27:07 srv-virt-01 kernel: bnxt_en 0000:17:00.0 eno1np0: NIC Link is Up, 1000 Mbps full duplex, Flow control: ON - receive & transmit May 19 08:27:07 srv-virt-01 kernel: bnxt_en 0000:17:00.0 eno1np0: EEE is not active May 19 08:27:07 srv-virt-01 kernel: bnxt_en 0000:17:00.0 eno1np0: FEC autoneg off encoding: None May 19 08:27:07 srv-virt-01 kernel: vmbr0: port 1(eno1np0) entered blocking state May 19 08:27:07 srv-virt-01 kernel: vmbr0: port 1(eno1np0) entered forwarding state May 19 08:27:08 srv-virt-01 kernel: bnxt_en 0000:17:00.0 eno1np0: NIC Link is Down May 19 08:27:08 srv-virt-01 kernel: vmbr0: port 1(eno1np0) entered disabled state May 19 08:27:09 srv-virt-01 QEMU[2067]: kvm: warning: Spice: main:0 (0x559f01dda0a0): rcc 0x559f0204f9f0 has been unresponsive for more than 30000 ms, disconnecting -- Reboot -- May 19 08:29:21 srv-virt-01 kernel: Linux version 5.15.35-1-pve (build@proxmox) (gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2) #1 SMP PVE 5.15.35-3 (Wed, 11 May 2022 07:57:51 +0200) () May 19 08:29:21 srv-virt-01 kernel: Command line: BOOT_IMAGE=/boot/vmlinuz-5.15.35-1-pve root=/dev/mapper/pve-root ro quiet May 19 08:29:21 srv-virt-01 kernel: KERNEL supported cpus:
 
Last edited:

gYanis

New Member
Dec 10, 2021
12
0
1
37
Syslog node 2:

May 19 08:17:01 srv-virt-02 CRON[1138365]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly) May 19 08:17:01 srv-virt-02 CRON[1138364]: pam_unix(cron:session): session closed for user root May 19 08:26:18 srv-virt-02 kernel: bnxt_en 0000:17:00.0 eno1np0: NIC Link is Down May 19 08:26:18 srv-virt-02 kernel: vmbr0: port 1(eno1np0) entered disabled state May 19 08:26:19 srv-virt-02 corosync[1508]: [KNET ] link: host: 1 link: 0 is down May 19 08:26:19 srv-virt-02 corosync[1508]: [KNET ] host: host: 1 (passive) best link: 0 (pri: 1) May 19 08:26:19 srv-virt-02 corosync[1508]: [KNET ] host: host: 1 has no active links May 19 08:26:20 srv-virt-02 corosync[1508]: [TOTEM ] Token has not been received in 2737 ms May 19 08:26:21 srv-virt-02 corosync[1508]: [TOTEM ] A processor failed, forming new configuration: token timed out (3650ms), waiting 4380ms for consensus. May 19 08:26:26 srv-virt-02 corosync[1508]: [QUORUM] Sync members[1]: 2 May 19 08:26:26 srv-virt-02 corosync[1508]: [QUORUM] Sync left[1]: 1 May 19 08:26:26 srv-virt-02 corosync[1508]: [TOTEM ] A new membership (2.378) was formed. Members left: 1 May 19 08:26:26 srv-virt-02 corosync[1508]: [TOTEM ] Failed to receive the leave message. failed: 1 May 19 08:26:26 srv-virt-02 pmxcfs[1437]: [dcdb] notice: members: 2/1437 May 19 08:26:26 srv-virt-02 pmxcfs[1437]: [status] notice: members: 2/1437 May 19 08:26:26 srv-virt-02 corosync[1508]: [QUORUM] This node is within the non-primary component and will NOT provide any services. May 19 08:26:26 srv-virt-02 corosync[1508]: [QUORUM] Members[1]: 2 May 19 08:26:26 srv-virt-02 corosync[1508]: [MAIN ] Completed service synchronization, ready to provide service. May 19 08:26:26 srv-virt-02 pmxcfs[1437]: [status] notice: node lost quorum May 19 08:26:26 srv-virt-02 pmxcfs[1437]: [dcdb] crit: received write while not quorate - trigger resync May 19 08:26:26 srv-virt-02 pmxcfs[1437]: [dcdb] crit: leaving CPG group May 19 08:26:26 srv-virt-02 pve-ha-crm[1562]: lost lock 'ha_manager_lock - cfs lock update failed - Operation not permitted May 19 08:26:26 srv-virt-02 pve-ha-crm[1562]: status change master => lost_manager_lock May 19 08:26:26 srv-virt-02 pve-ha-crm[1562]: watchdog closed (disabled) May 19 08:26:26 srv-virt-02 pve-ha-crm[1562]: status change lost_manager_lock => wait_for_quorum May 19 08:26:26 srv-virt-02 pmxcfs[1437]: [dcdb] notice: start cluster connection May 19 08:26:26 srv-virt-02 pmxcfs[1437]: [dcdb] crit: cpg_join failed: 14 May 19 08:26:26 srv-virt-02 pmxcfs[1437]: [dcdb] crit: can't initialize service May 19 08:26:26 srv-virt-02 pve-ha-lrm[1572]: lost lock 'ha_agent_srv-virt-02_lock - cfs lock update failed - Device or resource busy May 19 08:26:31 srv-virt-02 pve-ha-lrm[1572]: status change active => lost_agent_lock May 19 08:26:32 srv-virt-02 pmxcfs[1437]: [dcdb] notice: members: 2/1437 May 19 08:26:32 srv-virt-02 pmxcfs[1437]: [dcdb] notice: all data is up to date -- Reboot -- May 19 08:29:21 srv-virt-02 kernel: Linux version 5.15.35-1-pve (build@proxmox) (gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2) #1 SMP PVE 5.15.35-3 (Wed, 11 May 2022 07:57:51 +0200) () May 19 08:29:21 srv-virt-02 kernel: Command line: BOOT_IMAGE=/boot/vmlinuz-5.15.35-1-pve root=/dev/mapper/pve-root ro quiet
 

shrdlicka

Active Member
Staff member
May 2, 2022
334
38
28
May 14 01:40:09 srv-virt-01 pvescheduler[13273]: jobs: cfs-lock 'file-jobs_cfg' error: no quorum!
May 14 01:40:09 srv-virt-01 pvescheduler[13272]: replication: cfs-lock 'file-replication_cfg' error: no quorum!
This looks more like a symptom of no network connection.

This looks like your network connection is going down for some reason on
Code:
May 19 08:27:05 srv-virt-01 kernel: bnxt_en 0000:17:00.0 eno1np0: NIC Link is Down
 
  • Like
Reactions: Stoiko Ivanov

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get your own in 60 seconds.

Buy now!