[SOLVED] Proxmox freezes randomly

gamerh

Member
Jun 11, 2020
35
2
13
25
Hi,

For a coupel of days mine proxmox servers stops responding at random times i need to hard restart the server to get it back running.

I noticed this happens when one VM has a lot of load on it or when i start Firefox on a Ubuntu VM.

There is nothing in the logs that gives an explination and the VM settings are whitin the limits of the server.

Any help?

Regards,
Gamerh
 
Keep the Proxmox console open and especially let the "summary" tab of the host display.
Maybe you can get an idea what's going on through the performance graphs.

Aside that: how old are the components (PSU especially)?
 
Keep the Proxmox console open and especially let the "summary" tab of the host display.
Maybe you can get an idea what's going on through the performance graphs.

Aside that: how old are the components (PSU especially)?
Hi tburger,

I did this but when it happens again i see that everything is normal and then it shows communication failure. About the hardware i am not sure cause i rent this server at a datacenter.

Also can you tell me which log files to look into for troubleshooting.

I looked into /var/log/syslog

Are there any more files or things i can do to see whats wrong?


1611754073970.png
 
i see that everything is normal
I don't think so.
Notice how the Server-Load and also the IO-Wait spikes to an increase of 20%?
There is something going on there, quite clearly. The question is: what?

Could you please share the VM-Configuration? I am especially interested in the vCPUs you have assigned
 
I don't think so.
Notice how the Server-Load and also the IO-Wait spikes to an increase of 20%?
There is something going on there, quite clearly. The question is: what?

Could you please share the VM-Configuration? I am especially interested in the vCPUs you have assigned
Hi tburger,

Seems i overlooked that here are the configs of the VM i have put the cores in bolt.


bios: ovmf
bootdisk: scsi0
cores: 1
efidisk0: local-zfs:vm-100-disk-0,size=1M
ide2: none,media=cdrom
memory: 1048
name: Firewall.old
net0: virtio=00:50:56:01:c4:8a,bridge=vmbr0
net1: virtio=4E:8B:20:4E:59:BC,bridge=vmbr1
numa: 0
ostype: other
scsi0: local-zfs:vm-100-disk-1,iothread=1,size=32G
scsihw: virtio-scsi-single
smbios1: uuid=b79aeeee-1dbf-4446-931f-a640d7152922
sockets: 1
vmgenid: d4fdb857-8aec-441a-b523-1bd40f8fb7ee

bios: ovmf
bootdisk: scsi0
cores: 2
efidisk0: local-zfs:vm-101-disk-0,size=1M
ide2: none,media=cdrom
memory: 3048
name: Webserver
net0: virtio=26:85:CD:30:B3:29,bridge=vmbr1
numa: 0
onboot: 1
ostype: l26
scsi0: local-zfs:vm-101-disk-1,iothread=1,size=200G
scsihw: virtio-scsi-single
smbios1: uuid=40aff4d6-b1c9-4000-9928-c04436cf387d
sockets: 1
startup: order=2,up=180
vmgenid: ee2fd0e3-faad-41d2-b51a-aec5843b0330
agent: 1

bios: ovmf
bootdisk: scsi0
cores: 4
efidisk0: local-zfs:vm-102-disk-0,size=1M
ide2: none,media=cdrom
memory: 4048
name: Stuff
net0: virtio=EA:74:69:40:18:FD,bridge=vmbr1,firewall=1
numa: 0
ostype: l26
scsi0: local-zfs:vm-102-disk-1,iothread=1,size=300G
scsihw: virtio-scsi-single
smbios1: uuid=98c58d0d-5e4e-4387-86b8-c114bb2b5c12
sockets: 1
vmgenid: 7959a7b9-8365-4f5c-bffd-95e292977232
agent: 1
audio0: device=AC97,driver=spice

bios: ovmf
boot: order=scsi0;ide2;net0
cores: 2
efidisk0: local-zfs:vm-103-disk-0,size=1M
ide2: none,media=cdrom
memory: 1048
name: Computer
net0: virtio=A6:80:C3:90:C0:E9,bridge=vmbr1,firewall=1
numa: 0
ostype: l26
scsi0: local-zfs:vm-103-disk-1,size=1000G
scsihw: virtio-scsi-pci
smbios1: uuid=81a75139-737d-407d-9ee6-cc9c84dc655f
sockets: 1
vga: qxl
vmgenid: 251e8a5d-ae52-4493-8614-b39a7d92a9e7

bios: ovmf
boot: order=sata0;ide2;net0
cores: 4
efidisk0: local-zfs:vm-104-disk-0,size=1M
ide2: none,media=cdrom
memory: 4096
name: Plex
net0: virtio=86:9E:C7:E8:30:55,bridge=vmbr1,firewall=1
numa: 0
onboot: 1
ostype: l26
sata0: local-zfs:vm-104-disk-1,size=10000G
scsihw: virtio-scsi-pci
smbios1: uuid=e660545f-8d32-45e1-a94e-79574abdbf31
sockets: 1
startup: order=3
vmgenid: 0f1c11d0-a86d-40bd-bf2c-00c3d9e684d4
agent: 1

bios: ovmf
boot: order=sata0;ide2;net0
cores: 2
efidisk0: local-zfs:vm-105-disk-0,size=1M
ide2: none,media=cdrom
memory: 4048
name: Firewall
net0: virtio=00:50:56:01:c4:8a,bridge=vmbr0,firewall=1
net1: virtio=2E:12:41:C0:0E:45,bridge=vmbr1,firewall=1
numa: 0
onboot: 1
ostype: l26
sata0: local-zfs:vm-105-disk-1,size=32G
scsihw: virtio-scsi-pci
smbios1: uuid=083de91b-5b92-48ae-9782-1e9f529738b9
sockets: 1
startup: order=1
vmgenid: a8217459-e635-4011-99f8-fafedf61e438





I do need to say that only 3 vm are running at the same time i also think i maybe gave too much cores to the VM.

I do find it strange that the whole server freezes insted of Freezing the VM whit an errorcode.

Regards,
Gamerh
 
Hi tbuger,

I don't think it's the VM seems it happened again whit just 2 VM running.

Regards,
Gamerh
 
Question to me would be: which VMs?
the 4vCPU-vms are poison for your processor. You only have 4 physical cores. I could imagine the system has some challenges in scheduling those and jumps into an edge-case.
I'd recommend to not use more than 3 vCPus (2 would be even better).
It is a small spark of hope but thats where I would start.
Do you have access to the IPMI board to check the BIOS?
 
Question to me would be: which VMs?
the 4vCPU-vms are poison for your processor. You only have 4 physical cores. I could imagine the system has some challenges in scheduling those and jumps into an edge-case.
I'd recommend to not use more than 3 vCPus (2 would be even better).
It is a small spark of hope but thats where I would start.
Do you have access to the IPMI board to check the BIOS?
Hi tburger,

I took a look at the requirments of the VM and i changed it to 1 core. Howerver this issue did happened again.

I have IPMI access and i can go into the BIOS if needed.

Regards,
Gamerh
 
Well that is odd for sure. If you have access to the ipmi board then check if that reveals something. Hardware related issues sometimes are logged in the events there.
Also try resetting the BIOS to defaults and start from there. Take especially notice on things like turbo boost, speedstep and c-states. I'd recommend to disable that all.
Is there a watchdog available in the BIOS/for the system?
 
Well that is odd for sure. If you have access to the ipmi board then check if that reveals something. Hardware related issues sometimes are logged in the events there.
Also try resetting the BIOS to defaults and start from there. Take especially notice on things like turbo boost, speedstep and c-states. I'd recommend to disable that all.
Is there a watchdog available in the BIOS/for the system?
Hi tburger,

I am not really allowed to make BIOS changes. About the watchdog i have send an e-mail to the datacenter.

I do need to say that every time this happens i can't access the server at all not even true IPMI and i see the usage of the CPU going to 100 % before everything freezes.

Regards,
Gamerh
 
What do the last lines in /var/log/syslog show during that time.
Hi Entilza,

just that it can't connct to mine backup server.

It is almost as if the VM some how is able to use more CPU resources then accolade.

Regards,
Hüseyin Kilinç
 
Sorry that its freezing, it may be a memory issue, hard to debug if its not really reproducible. You can boot proxmox and do the mem test during the boot screen.
 
you could try another OS.
see if it is stable or not.
could be something related to a driver etc.
 
I do need to say that only 3 vm are running at the same time i also think i maybe gave too much cores to the VM.


Hi,

At any time, the sum of all cores allocated to the all running VMs must be < n-1,
n is the total cores of your server!

Good luck / Bafta!
 
Hi,

The issue has gone worse now this happens multiple times a day whit just one VM running.

Regards,
Hüseyin Kilinç
 
First choice : RAM bar
Second : PSU
If IPMI is not accessible when the server freeze, big chance this is a hardware problem as IPMI is OS independant
Have experienced several times that kind of trouble, always RAM ...
Hi,

I will send the datacenter an e-mail whit the request to check the RAM.

Regards,
Hüseyin Kilinç
 
I'd ask to replace this server.
This is likely a hardware defect.
At any time, the sum of all cores allocated to the all running VMs must be < n-1,
If that would be an requirement no-one would use virtualization. It is a good recommendation to provide "near-physical" performance, but absolute unrealistic. The biggest VM should be less of the available physical cores. But I would even go CoreCount-2 (so 2 vCPU in this case here). but the total of vCPUs can be way higher. Your performance might suffer, but it will not crash the Host-OS.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!