Hey everyone,
I'm facing an issue with my dedicated server (running Proxmox) hosted by OVH since 4 days. The system randomly freezes, and OVH ends up performing an automatic reboot. This has happened multiple times now, and I'm trying to identify the root cause.
OVH has already conducted three separate hardware investigations and reported no hardware issues.
The NVMe drives were replaced in December, as the previous ones were worn out.
OVH now claims it must be a software issue, but unfortunately there are no clear logs pointing to the root cause.
Interestingly, we had the same issue about two months ago, and what helped back then was reducing the resource limits (CPU/RAM) on the individual containers and VMs. After doing that, the server ran stable for a while – until now.
Here are my server specs:
May 15 14:20:57 mio-network sshd[995589]: Received disconnect from 218.92.0.249 port 31935:11: [preauth]
May 15 14:20:57 mio-network sshd[995589]: Disconnected from authenticating user root 218.92.0.249 port 31935 [preauth]
May 15 14:20:57 mio-network sshd[995589]: PAM 2 more authentication failures; logname= uid=0 euid=0 tty=ssh ruser= rhost=218.92.0.249 user=root
...
May 15 14:21:13 mio-network sshd[996018]: Disconnected from invalid user elena 8.217.43.77 port 49102 [preauth]
May 15 14:21:15 mio-network audit[996161]: NETFILTER_CFG table=filter family=7 entries=0 op=xt_replace pid=996161 subj=unconfined comm="ebtables-restor"
-- Reboot --
As you can see:
There’s no clear indication of a kernel panic, OOM killer, or disk error in the logs. Just regular cron activity, audit logs, and network rule updates.
Has anyone experienced similar behavior with ebtables-restore or Proxmox freezing with no clear cause?
Additional Info:
Any insights, suggestions, or tools to better trace the next incident would be highly appreciated.
Thanks in advance!
I'm facing an issue with my dedicated server (running Proxmox) hosted by OVH since 4 days. The system randomly freezes, and OVH ends up performing an automatic reboot. This has happened multiple times now, and I'm trying to identify the root cause.
OVH has already conducted three separate hardware investigations and reported no hardware issues.
The NVMe drives were replaced in December, as the previous ones were worn out.
OVH now claims it must be a software issue, but unfortunately there are no clear logs pointing to the root cause.
Interestingly, we had the same issue about two months ago, and what helped back then was reducing the resource limits (CPU/RAM) on the individual containers and VMs. After doing that, the server ran stable for a while – until now.
Here are my server specs:
- Proxmox Version: 8.4.1
- CPU: AMD Ryzen 5 3600X - 6c/12t - 3.8 GHz / 4.4 GHz
- RAM: 64 GB ECC 2666 MHz
- Storage: 2×500 GB NVMe SSD (Soft RAID)
May 15 14:20:57 mio-network sshd[995589]: Received disconnect from 218.92.0.249 port 31935:11: [preauth]
May 15 14:20:57 mio-network sshd[995589]: Disconnected from authenticating user root 218.92.0.249 port 31935 [preauth]
May 15 14:20:57 mio-network sshd[995589]: PAM 2 more authentication failures; logname= uid=0 euid=0 tty=ssh ruser= rhost=218.92.0.249 user=root
...
May 15 14:21:13 mio-network sshd[996018]: Disconnected from invalid user elena 8.217.43.77 port 49102 [preauth]
May 15 14:21:15 mio-network audit[996161]: NETFILTER_CFG table=filter family=7 entries=0 op=xt_replace pid=996161 subj=unconfined comm="ebtables-restor"
-- Reboot --
As you can see:
- There were multiple SSH brute-force attempts from random IPs (China, Russia, etc.).
- SSH logins for invalid or root users were being attempted repeatedly.
- Around the time of the freeze, ebtables-restore was executed, modifying netfilter rules.
- Shortly after, the server completely froze and OVH initiated a reboot.
There’s no clear indication of a kernel panic, OOM killer, or disk error in the logs. Just regular cron activity, audit logs, and network rule updates.
Has anyone experienced similar behavior with ebtables-restore or Proxmox freezing with no clear cause?
Additional Info:
- Root SSH login is still enabled (working on securing it).
- No monitoring for hardware issues yet (will check with smartctl).
- Using 4 LXC Containers, 2 Windows VMs
- Filesystem check (fsck)
- RAM tests (memtest)
- Disk health checks (SMART)
All of these showed no errors.
Any insights, suggestions, or tools to better trace the next incident would be highly appreciated.
Thanks in advance!