Proxmox host reboot

dhmusil

Member
Jun 10, 2020
5
0
6
64
Hey all,

We have stood up a 3 node Proxmox cluster, each node with 1.2T local storage for gust servers. Each of the nodes is running Debian 10.

We have also stood up a 6 node Ceph cluster with ~400G of storage. Each of these nodes is running CentOS 8

We have mounted Ceph onto two nodes on the cluster for testing using RDB. (End goal is a poor-man's fail-over.) Steps used...

a) On Proxmox node, created a /etc/pve/priv/ceph

b) Copied the /etc/ceph/ceph.client.admin.keyring from one of the monitor servers to the above directory on the proxmox server.

c) On one of the Proxmox cluster nodes GUI's I went to Datacenter>Storage and selected add. The name for the storage is the same as the name of the keyring I copied over. I put all three monitors in the list.

The proxmox server seem to be happy with that.

This configuration allows us to create a guest machine using the external Ceph storage but every time we attempt to install any OS on the guest machine, the proxmox host reboots. Below is what we have been able to capture from the logs just prior to the reboot.

Jul 08 15:38:27 {HOSTNAME} systemd[1]: Started 122.scope. Jul 08 15:38:27 {HOSTNAME} systemd-udevd[30890]: Using default interface naming scheme 'v240'. Jul 08 15:38:27 {HOSTNAME} systemd-udevd[30890]: link_config: autonegotiation is unset or enabled, the speed and duplex are not writable. Jul 08 15:38:27 {HOSTNAME} systemd-udevd[30890]: Could not generate persistent MAC address for tap122i0: No such file or directory Jul 08 15:38:28 {HOSTNAME} kernel: device tap122i0 entered promiscuous mode Jul 08 15:38:28 {HOSTNAME} systemd-udevd[30890]: link_config: autonegotiation is unset or enabled, the speed and duplex are not writable. Jul 08 15:38:28 {HOSTNAME} systemd-udevd[30890]: Could not generate persistent MAC address for fwbr122i0: No such file or directory Jul 08 15:38:28 {HOSTNAME} systemd-udevd[30890]: link_config: autonegotiation is unset or enabled, the speed and duplex are not writable. Jul 08 15:38:28 {HOSTNAME} systemd-udevd[30890]: Could not generate persistent MAC address for fwpr122p0: No such file or directory Jul 08 15:38:28 {HOSTNAME} systemd-udevd[30881]: link_config: autonegotiation is unset or enabled, the speed and duplex are not writable. Jul 08 15:38:28 {HOSTNAME} systemd-udevd[30881]: Using default interface naming scheme 'v240'. Jul 08 15:38:28 {HOSTNAME} systemd-udevd[30881]: Could not generate persistent MAC address for fwln122i0: No such file or directory Jul 08 15:38:28 {HOSTNAME} kernel: fwbr122i0: port 1(fwln122i0) entered blocking state Jul 08 15:38:28 {HOSTNAME} kernel: fwbr122i0: port 1(fwln122i0) entered disabled state Jul 08 15:38:28 {HOSTNAME} kernel: device fwln122i0 entered promiscuous mode Jul 08 15:38:28 {HOSTNAME} kernel: fwbr122i0: port 1(fwln122i0) entered blocking state Jul 08 15:38:28 {HOSTNAME} kernel: fwbr122i0: port 1(fwln122i0) entered forwarding state Jul 08 15:38:28 {HOSTNAME} kernel: vmbr0: port 5(fwpr122p0) entered blocking state Jul 08 15:38:28 {HOSTNAME} kernel: vmbr0: port 5(fwpr122p0) entered disabled state Jul 08 15:38:28 {HOSTNAME} kernel: device fwpr122p0 entered promiscuous mode Jul 08 15:38:28 {HOSTNAME} kernel: vmbr0: port 5(fwpr122p0) entered blocking state Jul 08 15:38:28 {HOSTNAME} kernel: vmbr0: port 5(fwpr122p0) entered forwarding state Jul 08 15:38:28 {HOSTNAME} kernel: fwbr122i0: port 2(tap122i0) entered blocking state Jul 08 15:38:28 {HOSTNAME} kernel: fwbr122i0: port 2(tap122i0) entered disabled state Jul 08 15:38:28 {HOSTNAME} kernel: fwbr122i0: port 2(tap122i0) entered blocking state Jul 08 15:38:28 {HOSTNAME} kernel: fwbr122i0: port 2(tap122i0) entered forwarding state Jul 08 15:38:28 {HOSTNAME} pvedaemon[5599]: <USER@pve> end task UPID:HOSTNAME:000078A0:0001260A:5F064AE3:qmstart:122:USER@pve: OK Jul 08 15:38:29 {HOSTNAME} pvedaemon[30954]: starting vnc proxy UPID:HOSTNAME:000078EA:000126F4:5F064AE5:vncproxy:122:USER@pve: Jul 08 15:38:29 {HOSTNAME} pvedaemon[5599]: <USER@pve> starting task UPID:HOSTNAME:000078EA:000126F4:5F064AE5:vncproxy:122:USER@pve: Jul 08 15:39:00 {HOSTNAME} systemd[1]: Starting Proxmox VE replication runner... Jul 08 15:39:01 {HOSTNAME} systemd[1]: pvesr.service: Succeeded. Jul 08 15:39:01 {HOSTNAME} systemd[1]: Started Proxmox VE replication runner. Jul 08 15:39:38 {HOSTNAME} pvedaemon[5600]: <root@pam> successful auth for user 'USER@pve' Jul 08 15:40:00 {HOSTNAME} systemd[1]: Starting Proxmox VE replication runner... Jul 08 15:40:01 {HOSTNAME} systemd[1]: pvesr.service: Succeeded. Jul 08 15:40:01 {HOSTNAME} systemd[1]: Started Proxmox VE replication runner.

There is no evidence of a panic, just a reboot. I would make sense if the guest had issues but why would it reboot the host?

I am continuing to investigate what the issue might be but any direction would be greatly appreciated. If there are any additional configurations needed, please let me know and I will append them.

Thank you,

DHM
 
There have been several simmilar posts, a node reboot is often the result of delay on the ceph network link.

Do you have a seperate network for ceph ? (hypervsior nic, switch etc.)
 
Thank you for you quick response.

We have dedicated bonded NIC's. One set is for client access to the Proxmox cluster and the second pair is in the storage network back-plane. Now, that said, I will test to verify that the storage traffic is going to the proper network. BTW, both of the bonded NIC's are active/active, or mode=4. Would that cause an issue? should we change it to active passive?
 
LACP should be fine given the switch supports it and is correctly configured as well.

Its important that there is no gateway in between the proxmox host and storage. Ping should be less then 1ms.

Can you post your network config and status "cat /etc/network/interfaces" and "ip a"
 
Again, thank you kindly for your response.

I have attached a rough diagram in PDF format of the architecture we have put together. the 10 and 100 networks each have redundant switches that are used for the bonds.

here is the data you requested. I have included the data for each of the Proxmox server. Proxmox 2 and Proxmox 3 are the machines that I have been using for testing since Proxmox 1 has production machines on it. (All local storage.)

Sorry I left one more thing out. these are the traceroutes to the Ceph servers from the proxmox servers.

root@prox01:~# traceroute 10.161.100.11 traceroute to 10.161.100.11 (10.161.100.11), 30 hops max, 60 byte packets 1 11-100-161-10-in-addr.tusimple.ai (10.161.100.11) 0.231 ms * 0.196 ms root@prox01:~# traceroute 10.161.100.13 traceroute to 10.161.100.13 (10.161.100.13), 30 hops max, 60 byte packets 1 13-100-161-10-in-addr.tusimple.ai (10.161.100.13) 0.253 ms * * root@prox01:~# traceroute 10.161.100.15 traceroute to 10.161.100.15 (10.161.100.15), 30 hops max, 60 byte packets 1 15-100-161-10-in-addr.tusimple.ai (10.161.100.15) 0.317 ms 0.308 ms *
root@prox02:~# traceroute 10.161.100.11 traceroute to 10.161.100.11 (10.161.100.11), 30 hops max, 60 byte packets 1 11-100-161-10-in-addr.tusimple.ai (10.161.100.11) 0.175 ms * 0.094 ms root@prox02:~# traceroute 10.161.100.13 traceroute to 10.161.100.13 (10.161.100.13), 30 hops max, 60 byte packets 1 13-100-161-10-in-addr.tusimple.ai (10.161.100.13) 0.235 ms * 0.210 ms root@prox02:~# traceroute 10.161.100.15 traceroute to 10.161.100.15 (10.161.100.15), 30 hops max, 60 byte packets 1 15-100-161-10-in-addr.tusimple.ai (10.161.100.15) 0.173 ms 0.175 ms *

root@prox03:~# traceroute 10.161.100.11 traceroute to 10.161.100.11 (10.161.100.11), 30 hops max, 60 byte packets 1 11-100-161-10-in-addr.tusimple.ai (10.161.100.11) 0.465 ms 0.465 ms 0.460 ms root@prox03:~# traceroute 10.161.100.13 traceroute to 10.161.100.13 (10.161.100.13), 30 hops max, 60 byte packets 1 13-100-161-10-in-addr.tusimple.ai (10.161.100.13) 0.134 ms 0.108 ms 0.105 ms root@prox03:~# traceroute 10.161.100.15 traceroute to 10.161.100.15 (10.161.100.15), 30 hops max, 60 byte packets 1 15-100-161-10-in-addr.tusimple.ai (10.161.100.15) 0.177 ms 0.163 ms 0.160 ms
 

Attachments

  • Prox_Ceph.pdf
    9.9 KB · Views: 2
  • proxmox-1.txt
    5.6 KB · Views: 2
  • proxmox-2.txt
    5.5 KB · Views: 2
  • proxmox-3.txt
    5.5 KB · Views: 0
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!