Proxmox shutdown VM and swap ethernet port

Djiraf

New Member
Mar 21, 2022
14
0
1
24
Hi, some days ago i got a new truble. Sometimes my proxmox server-1 just not seen in the cluster. Restart from console didnt not help. For normal work server-1 i just needs swap cable from 2nd ethernet port, to 1st. After that i can see how all VM was shutdown and start worak after that, as if the server was turned off. Syslog says nothing special. This problen happend. This problem happened on the night of the 7th to the 9th
 

Attachments

  • Syslog.txt
    243.7 KB · Views: 3
Last edited:
Hi,
please post the output of pveversion -v as well as cat /etc/network/interfaces and cat /etc/pve/corosync.conf.
 
Output of pveversion -v:
root@Server-1:~# pveversion -v
proxmox-ve: 7.4-1 (running kernel: 5.15.102-1-pve)
pve-manager: 7.4-3 (running version: 7.4-3/9002ab8a)
pve-kernel-5.15: 7.3-3
pve-kernel-5.15.102-1-pve: 5.15.102-1
ceph: 17.2.5-pve1
ceph-fuse: 17.2.5-pve1
corosync: 3.1.7-pve1
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.24-pve2
libproxmox-acme-perl: 1.4.4
libproxmox-backup-qemu0: 1.3.1-1
libproxmox-rs-perl: 0.2.1
libpve-access-control: 7.4-2
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.3-4
libpve-guest-common-perl: 4.2-4
libpve-http-server-perl: 4.2-1
libpve-rs-perl: 0.7.5
libpve-storage-perl: 7.4-2
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 5.0.2-2
lxcfs: 5.0.3-pve1
novnc-pve: 1.4.0-1
proxmox-backup-client: 2.4.1-1
proxmox-backup-file-restore: 2.4.1-1
proxmox-kernel-helper: 7.4-1
proxmox-mail-forward: 0.1.1-1
proxmox-mini-journalreader: 1.3-1
proxmox-widget-toolkit: 3.6.5
pve-cluster: 7.3-3
pve-container: 4.4-3
pve-docs: 7.4-2
pve-edk2-firmware: 3.20230228-1
pve-firewall: 4.3-1
pve-firmware: 3.6-4
pve-ha-manager: 3.6.0
pve-i18n: 2.12-1
pve-qemu-kvm: 7.2.0-8
pve-xtermjs: 4.16.0-1
qemu-server: 7.4-3
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.8.0~bpo11+3
vncterm: 1.7-1
zfsutils-linux: 2.1.9-pve1

Interfaces:
# network interface settings; autogenerated
# Please do NOT modify this file directly, unless you know what
# you're doing.
#
# If you want to manage parts of the network configuration manually,
# please utilize the 'source' or 'source-directory' directives to do
# so.
# PVE will preserve these directives, but will NOT read its network
# configuration from sourced files, so do not attempt to move any of
# the PVE managed interfaces into external files!

auto lo
iface lo inet loopback

iface eno0 inet manual

iface eth1 inet manual

auto vmbr0
iface vmbr0 inet static
address 192.168.1.6/24
gateway 192.168.1.1
bridge-ports eno0
bridge-stp off
bridge-fd 0


corosync:
logging {
debug: off
to_syslog: yes
}

nodelist {
node {
name: Server-1
nodeid: 1
quorum_votes: 1
ring0_addr: 192.168.1.6
}
node {
name: Server-3
nodeid: 4
quorum_votes: 1
ring0_addr: 192.168.1.10
}
node {
name: Server-4
nodeid: 3
quorum_votes: 1
ring0_addr: 192.168.1.15
}
node {
name: Server-6
nodeid: 2
quorum_votes: 1
ring0_addr: 192.168.1.19
}
}

quorum {
provider: corosync_votequorum
}

totem {
cluster_name: Expert-Servers
config_version: 8
interface {
linknumber: 0
}
ip_version: ipv4-6
link_mode: passive
secauth: on
version: 2
}
 
It might be that the kernel renamed you network interfaces in an unexpected way. From you syslog we see:
Code:
Jun  7 23:21:36 Server-1 systemd-udevd[657]: eth0: Failed to rename network interface 2 from 'eth0' to 'eno0': File exists
Jun  7 23:21:36 Server-1 systemd-udevd[657]: eth0: Failed to process device, ignoring: File exists
...
Jun  7 23:21:36 Server-1 kernel: [    7.332326] e1000e 0000:06:00.0 eno0: renamed from eth1
...

Did you maybe set a .link file with possible clashing nomeclature? If so, note that this is not supported, see https://github.com/systemd/systemd/issues/16665#issuecomment-669160229
 
It might be that the kernel renamed you network interfaces in an unexpected way. From you syslog we see:
Code:
Jun  7 23:21:36 Server-1 systemd-udevd[657]: eth0: Failed to rename network interface 2 from 'eth0' to 'eno0': File exists
Jun  7 23:21:36 Server-1 systemd-udevd[657]: eth0: Failed to process device, ignoring: File exists
...
Jun  7 23:21:36 Server-1 kernel: [    7.332326] e1000e 0000:06:00.0 eno0: renamed from eth1
...

Did you maybe set a .link file with possible clashing nomeclature? If so, note that this is not supported, see https://github.com/systemd/systemd/issues/16665#issuecomment-669160229
Its repeated again: https://youtu.be/NgO6bqdlGQQ
 
Could you attach the updated syslog? Verify that the correct interface is configured as bridge port, you can do this by e.g. blinking the LED using ethtool -p <DEVNAME>
 
Could you attach the updated syslog? Verify that the correct interface is configured as bridge port, you can do this by e.g. blinking the LED using ethtool -p <DEVNAME>
I use correct ethernet port
 
Seems like your host gets fenced and reboots. Do you have HA enabled? Please provide also the syslog from the other nodes from around that time.
 
Seems like your host gets fenced and reboots. Do you have HA enabled? Please provide also the syslog from the other nodes from around that time.
Sorry, dont know what is HA?
 

Attachments

  • Syslog 3 node.txt
    254.7 KB · Views: 2
What server are you using and what chipset is the server using for onboard nics?

Whats the output of lshw -c network -businfo
What is the output of ls /etc/network

HA stands for high availabilty and is a feature in proxmox ve, which can be used to automatically start vms (that failed because of host failure) on the remaining hosts, aslong as you have a shared or replicated storage. You can check via ui if you have High Availbilty enabled for ressources by going to Datacenter -> HA and check if there are vms or ct that have this enabled.
 
Last edited:
Sorry, dont know what is HA?
Something might be unstable in your network, please check cables, ecc.

So the following seems to happen here:
  • Node 1 reboots as seen from the syslog (reason still unknown, might have been fenced), this happened in both cases.
  • The other two nodes which are online loose quorum, because the 4-th node (Server-6) is not online, as can be seen from the syslog and your video.
  • During boot, the network interfaces get renamed incorrectly, as again we see from the logs, eth0: Failed to rename network interface 2 from 'eth0' to 'eno0': File exists in the first syslog you send, eth1: Failed to rename network interface 3 from 'eth1' to 'eno0': File exists in the second one.
  • You switch cable and therefore the system brings up the correct interface again, which is attached to the bridge as configured and therefore allows the node to rejoin the cluster.
So the question remains:
  1. Why did node 1 reboot?
  2. Why does udev has this renaming conflict?
 
Last edited:
What server are you using and what chipset is the server using for onboard nics?

Whats the output of lshw -c network -businfo
What is the output of ls /etc/network

HA stands for high availabilty and is a feature in proxmox ve, which can be used to automatically start vms (that failed because of host failure) on the remaining hosts, aslong as you have a shared or replicated storage. You can check via ui if you have High Availbilty enabled for ressources by going to Datacenter -> HA and check if there are vms or ct that have this enabled.

root@Server-1:~# lshw -c network -businfo
Bus info Device Class Description
=======================================================
pci@0000:05:00.0 eno0 network 82574L Gigabit Network Connection
pci@0000:06:00.0 eth1 network 82574L Gigabit Network Connection
vmbr0 network Ethernet interface
tap810i0 network Ethernet interface
tap810i1 network Ethernet interface
tap107i0 network Ethernet interface
tap108i0 network Ethernet interface
tap201i0 network Ethernet interface
tap100i0 network Ethernet interface
root@Server-1:~#

root@Server-1:~# ls /etc/network
if-down.d if-post-down.d if-pre-up.d if-up.d ifupdown2 interfaces interfaces.d run
 
Something might be unstable in your network, please check cables, ecc.

So the following seems to happen here:
  • Node 1 reboots as seen from the syslog (reason still unknown, might have been fenced), this happened in both cases.
  • The other two nodes which are online loose quorum, because the 4-th node (Server-6) is not online, as can be seen from the syslog and your video.
  • During boot, the network interfaces get renamed incorrectly, as again we see from the logs, eth0: Failed to rename network interface 2 from 'eth0' to 'eno0': File exists in the first syslog you send, eth1: Failed to rename network interface 3 from 'eth1' to 'eno0': File exists in the second one.
  • You switch cable and therefore the system brings up the correct interface again, which is attached to the bridge as configured and therefore allows the node to rejoin the cluster.
So the question remains:
  1. Why did node 1 reboot?
  2. Why does udev has this renaming conflict?
I cant answer on this questions. I needs replace the ethernet cable?
 
I solved this problem. In my cluster just was problem. This problem was on "Server-6, when i find this problem, I removed Server-6 from cluster
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!