Node random reboot at around specific time

Can you run it with -0. I just want to know there's not more than one machine that thinks it has that IP address on the segment.

(I forgot to mention there's two arpings, one is in arping package the other in iputils-arping, which I meant. They have different switches. :D)
 
(I forgot to mention there's two arpings, one is in arping package the other in iputils-arping, which I meant. They have different switches. :D)
Oh that is a different story :D
I installed iputils-arping now, arping -b 10.11.12.163 works perfectly, under 1ms. I guess you meant the -0 is for the other package as the iputils-arping one does not have this switch, right?
 
Oh that is a different story :D
I installed iputils-arping now, arping -b 10.11.12.163 works perfectly, under 1ms. I guess you meant the -0 is for the other package as the iputils-arping one does not have this switch, right?

I kind of realised it mid-way and thought you would give me a pass on it if I just adapt my answer as if nothing happened. :D But once you went checking the man page, I better tell you. So ... no duplicate MACs there?

I still feel weird about that machine's behavious before 2:12. Could you perhaps keep a continuous ping in 1 sec intervals running from a screen/tmux somewhere to see when exactly it's unreachable by the network when these reboots occur?
 
  • Like
Reactions: scania471
I still feel weird about that machine's behavious before 2:12. Could you perhaps keep a continuous ping in 1 sec intervals running from a screen/tmux somewhere to see when exactly it's unreachable by the network when these reboots occur?

I mean anything works, even just something like ping -D 10.11.12.163 > pinging.log & and then tail -f pinging.log to check ... on some machine that will not be impacted, ideally not a VM that is being HA migrated. :D
 
no duplicate MACs there?
Correct, only 1 responses and that is the correct one.
Could you perhaps keep a continuous ping in 1 sec intervals running from a screen/tmux somewhere to see when exactly it's unreachable by the network when these reboots occur?
Yup, will put VMs back on it and start a ping the node. Will report back with the results when it happens again!
Thank you again for your help! :)
 
I mean anything works, even just something like ping -D 10.11.12.163 > pinging.log & and then tail -f pinging.log to check ... on some machine that will not be impacted, ideally not a VM that is being HA migrated. :D
Hi again! :D

So finally it rebooted today. I attached the logs for it. In Uptime kuma, it shows a 1m downtime from 2024-03-09 02:15:20 to 2024-03-09 02:16:24 (it checks every 60s, I now set it to check every 30s), every VM got migrated.

Weirdly enough, I found a reboot on 27th Feb in the system logs but I did not get any notification about that, the VMs did not get migrated, Uptime kuma doesn't show any downtime for that timeframe. I also attached the logs for that timeframe.

Side note: since then, I also had a hard drive failure in a different node (anton1) and since then, I get a lot less replication timeouts, like a LOT less, so that's good. :D

+ I updated and rebooted the VM that was pinging the node and I totally forgot about it, so I don't have pinging information :)
 

Attachments

  • feb27_journalctl.txt
    14.4 KB · Views: 0
  • feb27.txt
    242.7 KB · Views: 0
  • journalctl.txt
    14.7 KB · Views: 1
  • syslog.txt
    326.6 KB · Views: 1

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!