Just encountered the same issue, server restarted because of some problem yesterday and it seems like it magically switched the default route from vmbr0 to vmbr1 which actually is just used for storage and not going to the router...
any ideas why this could happen? Any tips to investigate...
Sadly still continues to crash even after updating to pve 8
Could see some new info in last crash, any ideas?
2023-08-03T03:17:04.968617+02:00 proxmox kernel: [289193.998738] CPU: 0 PID: 118 Comm: kcompactd0 Tainted: P D O 6.2.16-3-pve #1
2023-08-03T03:17:04.968617+02:00 proxmox...
And again... even updated the bios once more last week as there was a new version.
I have now set up external monitoring and the host froze at exactly 04:00 in the morning.
Again nothing can be found in the log.
However the KVM had some USB devices disconnecting listed but that could also be...
This should be fixed right?
Had a spontaneous reboots recently and found this in the logs when investigating.
pr 20 02:53:41 proxmox kernel: [20761.448291] show_signal_msg: 2 callbacks suppressed
Apr 20 02:53:41 proxmox kernel: [20761.448294] kvm[15747]: segfault at 51 ip 00007f4f21328f63 sp...
Yes it’s still up and running, no problems or hiccups so far. Really wired, I did not update the bios initially as it was running with months uptime before the update without any problems.
Maybe the new kernel triggers some bug in/with the old bios. Maybe it is just a setting I set differently...
Ok 5h up and going. Sadly I did three things which might have remedied it.
- added parameter to corosync two_node:1
- updated the bios
- switched some usb devices around due to bios flashing
Let’s keep the fingers crossed.
I go it’s stable now but I also really would like to know what exactly...
Ok also crashed with 6.2 within less than 2h :( (Had a screen attached this time, no output at all at the time of hang). This time it did not recover, needed to powercycle.
Running out of ideas, maybe anybody a tip which logs to check as the usual suspects don't point me to anything
saw a reboot again and an ssh console I still had open showed this
root@proxmox:~#
Message from syslogd@proxmox at Apr 2 11:10:20 ...
kernel:[ 3937.332015] NMI watchdog: Watchdog detected hard LOCKUP on cpu 4
Message from syslogd@proxmox at Apr 2 11:10:20 ...
kernel:[ 3943.640004] NMI...
Any progress? I'm having similar issues on a machine that previously ran rock solid with 6.4 but now crashes on 7.4
https://forum.proxmox.com/threads/repeated-crashes-reboots-of-host-after-update-from-6-4-to-7-4.125177/
Some more hints, this is showing all the reboots without prior shutdown. Only one proper shutdown when I switched kernels
root@proxmox:~# last -xF reboot shutdown | head
reboot system boot 5.19.17-2-pve Sun Apr 2 10:03:37 2023 still running
reboot system boot 5.19.17-2-pve Sun...
Same here after update to 7.4 from 6.4, was running rock solid for months on 6.4, nothing else changed
https://forum.proxmox.com/threads/repeated-crashes-reboots-of-host-after-update-from-6-4-to-7-4.125177/
Hi,
I updated two machines yesterday from 6.4 to 7.4 one is running fine the other is not.
The machine that misbehaves has not been stable for more than 3 hours since it was updated.
The syslog does not show any problems, must of the time something about the hourly cron or a disk temperature...
I ran into the same problems and was playing a bit with gid/guid mapping but it quickly becomes complex.
I followed the instructions in the first post and am wondering if "noauto" is really the right property for the fstab.
As with "noauto" set the "mount -a" does actually do nothing as...
ok... it seems to be working again but I have to continue diagnosing later
The problem seems to be the network topology/configuration. There have been a few changes the last week (new router, additional switch).
Both nodes have been on different switches (which both should support IGMP...
next investigation which might help resolving it... output from the "dead" node is missing an IP address in the members file.
Can I just edit this or is it somehow generated or just representing a current stat?
root@proxmox:~# cat /etc/pve/.members
{
"nodename": "proxmox",
"version": 18...
found some more info that might be relevant in the journal (read from bottom)
Apr 22 10:22:10 proxmix pvesr[1618]: trying to acquire cfs lock 'file-replication_cfg' ...
Apr 22 10:22:10 proxmix pmxcfs[1368]: [dcdb] crit: cpg_send_message failed: 9
Apr 22 10:22:10 proxmix pmxcfs[1368]: [dcdb]...
there were changes in the network lately however the IPs all stayed the same, just DNS changed. Could that have caused a "desynchronization"?
- How can I verify corosync is running fine?
- How can I "re-initiate" the sync?
This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
By continuing to use this site, you are consenting to our use of cookies.