Alpine VM freezes with 100% CPU

Mrt12

Well-Known Member
May 19, 2019
132
9
58
44
Good day,

I have one VM that runs on Alpine

Code:
# cat /etc/alpine-release
3.20.3

and so far, it was working fine for several years already. It has 2 CPU cores and 1GB RAM allocated. Now, since a couple of weeks, I have the VM intermittently being frozen. When this happens, the CPU usage shown on the Proxmox GUI goes to 100%, and I can no longer log into the machine, either by Console or by SSH. Also the machine does not react to STOP, but only to RESET.
Of course, because the only way to revive the machine is via RESET, the dmesg of the VM is cleared after reboot, and I don't see anything useful in the syslog. Further, in the syslog of PVE, I cannot find anything related to this, too.
What I can find in the VM's syslog is flooded messages such as

Code:
 user.info : info: guest-ping called

which I think is related to the guest agent?
I have disk usage only 2%, so the lockup is not due to full disk. RAM is also not fully used, so he is also not swapping.

About one or 2 months ago, I saw in the syslog lots of messages of the sorts

Code:
Aug 8 07:21:37 kern warn kernel [50706.507735] eth1: bad gso: type: 1, size: 1452
Aug 8 07:21:38 kern warn kernel [50706.829626] eth1: bad gso: type: 4, size: 1440
Aug 8 07:21:38 kern warn kernel [50706.830058] eth1: bad gso: type: 4, size: 1440
Aug 8 07:21:38 kern warn kernel [50707.532034] eth1: bad gso: type: 4, size: 1440
Aug 8 07:21:39 kern warn kernel [50707.912949] eth1: bad gso: type: 1, size: 1452
Aug 8 07:21:43 kern warn kernel [50712.035334] net_ratelimit: 15 callbacks suppressed

and ethernet performance of this VM was incredibly bad. I found one solution which said that in the virtio ethernet driver is a bug and one shall use Intel E1000E. I switched the VM's network adapter to this Intel thingy, and indeed, the network performance is good again, but I have the feeling that the random lockups occur since then. However, it was so far not possible for me to reproduce the lockup. I have now htop open since ~10 days, in the hope that I could see something when the VM locks up. But nothing. Today it locked up again, and I couldn't see a thing in htop (and of course, htop was also frozen).

I saw already this thread

https://forum.proxmox.com/threads/vms-freeze-with-100-cpu.127459/page-12

but I have not seen a solution so far. I wonder, how can I debug this sort of problem? I am sure in the VM's dmesg I could see something, but unfortunately, it is cleared when the VM resets.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!