Proxmox Freezes random. No clue in the log

gamerh

Member
Jun 11, 2020
35
2
13
25
Hi,

I have noticed that for a while the mine Proxmox server freezes randomly. All the VM and GUI are not accessible then and even the IPMI does not work. After 5-10 Minutes everything gets back to normal.

The datacenter has already checked on hardware failures but they have not found anything. Also the log's don't show anything.

I also noticed that this happens mostly after some load gets on the server but i can't see anything out of the ordinary.

Also after the server response i see a high IO delay and a higher CPU useage.

Any help?

Kind Regards,
Gamerh
 
and even the IPMI does not work
when you say IPMI does not work... do you mean that the IPMI is unreachable? or do you mean that it doesn't respond to keyboard input?

if the OS is hanging then it's normal it wouldn't respond to input. can you still reboot the host through IPMI?

if you're able to regain access to the server via SSH or GUI, you can enable persistent journals:
Code:
mkdir /var/log/journal
systemctl restart systemd-journald
[/icode]

which will keep journals from consequent boots (this can give us more hints to go with).
 
when you say IPMI does not work... do you mean that the IPMI is unreachable? or do you mean that it doesn't respond to keyboard input?

if the OS is hanging then it's normal it wouldn't respond to input. can you still reboot the host through IPMI?

if you're able to regain access to the server via SSH or GUI, you can enable persistent journals:
Code:
mkdir /var/log/journal
systemctl restart systemd-journald
[/icode]

which will keep journals from consequent boots (this can give us more hints to go with).
Hi,

What IPMI does not work i mean that i am not able to start a session the whole server just freezes for +- 5 min and then just works.

This does happen when there is load on the server ex. i start an other VM and start running task's on it. But the VM does not exceed the server available RAM/CPU.

I have add this code. I can recreate the problem by starting an other VM whit Ubuntu and open Firefox(no idea why but it happens when i do that)

Kind Regards,
Gamerh
 
What IPMI does not work i mean that i am not able to start a session the whole server just freezes for +- 5 min and then just works.
that is very weird. usually IPMI will have their own network interface and they are independent of the running OS. so it the IPMI isn't responding then there's likely something with the hardware. have you mentioned the IPMI issues also to your hosting provider?

I have add this code. I can recreate the problem by starting an other VM whit Ubuntu and open Firefox(no idea why but it happens when i do that)
what do you mean? can you be more specific?
 
that is very weird. usually IPMI will have their own network interface and they are independent of the running OS. so it the IPMI isn't responding then there's likely something with the hardware. have you mentioned the IPMI issues also to your hosting provider?


what do you mean? can you be more specific?
Hi Oguz,

Yes and we even changed the hardware but the problem stay's.

Whit the code i meant this code you sent me earlier


Code:

mkdir /var/log/journal
systemctl restart systemd-journald
[/icode]

which will keep journals from consequent boots (this can give us more hints to go with).


Kind Regards,
Gamerh
 
Yes and we even changed the hardware but the problem stay's.
that's very strange.

Whit the code i meant this code you sent me earlier
okay, if you did that then your journal is being saved for each boot.

I can recreate the problem by starting an other VM whit Ubuntu and open Firefox(no idea why but it happens when i do that)
so when you start your ubuntu VM and run firefox in it, the server hangs. do i understand correctly?

please try to reproduce the problem, and attach here the output from journalctl -b 0 (for the latest boot). it also helps if you tell the exact time when this happened so that it's easier to correlate things inside the log.
 
that's very strange.


okay, if you did that then your journal is being saved for each boot.


so when you start your ubuntu VM and run firefox in it, the server hangs. do i understand correctly?

please try to reproduce the problem, and attach here the output from journalctl -b 0 (for the latest boot). it also helps if you tell the exact time when this happened so that it's easier to correlate things inside the log.
Hi Oguz,

Do note that i don't need to reboot the whole system just freezes for +- 5 min and then just works should i reboot if i reproduce the issue?

Kind Regards,
Gamerh
 
Do note that i don't need to reboot the whole system just freezes for +- 5 min and then just works should i reboot if i reproduce the issue?
you can reboot to be sure (and to avoid a really long journal), and then next time it happens you can attach the journal here as described
 
you can reboot to be sure (and to avoid a really long journal), and then next time it happens you can attach the journal here as described
Hi Oguz,

Here is the log the issue happend on 22/05/2021 at 18:20 (had to zip it in order to send it).

Kind Regards,
Gamerh
 

Attachments

  • log.zip
    406.3 KB · Views: 3
if it happened around 18:20 then it seems that the problem starts occuring when VM 103 is being started (for a backup?)

there's first:
Code:
May 22 18:18:18 private pvedaemon[3454571]: start VM 103: UPID:private:0034B66B:0417DCD3:60A92ECA:qmstart:103:root@pam:                                                                                                                             
May 22 18:18:18 private systemd[1]: Started 103.scope.                                                                                                                                                                                              
May 22 18:18:18 private systemd-udevd[3454585]: Using default interface naming scheme 'v240'.                                                                                                                                                       
May 22 18:18:18 private systemd-udevd[3454585]: link_config: autonegotiation is unset or enabled, the speed and duplex are not writable.                                                                                                            
May 22 18:18:18 private systemd-udevd[3454585]: Could not generate persistent MAC address for tap103i0: No such file or directory                                                                                                                   
May 22 18:18:19 private kernel: device tap103i0 entered promiscuous mode                                                                                                                                                                            
May 22 18:18:19 private systemd-udevd[3454585]: link_config: autonegotiation is unset or enabled, the speed and duplex are not writable.                                                                                                            
May 22 18:18:19 private systemd-udevd[3454585]: Could not generate persistent MAC address for fwbr103i0: No such file or directory                                                                                                                  
May 22 18:18:19 private systemd-udevd[3454585]: link_config: autonegotiation is unset or enabled, the speed and duplex are not writable.                                                                                                            
May 22 18:18:19 private systemd-udevd[3454585]: Could not generate persistent MAC address for fwpr103p0: No such file or directory                                                                                                                  
May 22 18:18:19 private systemd-udevd[3454584]: link_config: autonegotiation is unset or enabled, the speed and duplex are not writable.                                                                                                            
May 22 18:18:19 private systemd-udevd[3454584]: Using default interface naming scheme 'v240'.                                                                                                                                                       
May 22 18:18:19 private systemd-udevd[3454584]: Could not generate persistent MAC address for fwln103i0: No such file or directory                                                                                                                  
May 22 18:18:19 private kernel: fwbr103i0: port 1(fwln103i0) entered blocking state                                                                                                                                                                 
May 22 18:18:19 private kernel: fwbr103i0: port 1(fwln103i0) entered disabled state                                                                                                                                                                 
May 22 18:18:19 private kernel: device fwln103i0 entered promiscuous mode                                                                                                                                                                           
May 22 18:18:19 private kernel: fwbr103i0: port 1(fwln103i0) entered blocking state                                                                                                                                                                 
May 22 18:18:19 private kernel: fwbr103i0: port 1(fwln103i0) entered forwarding state                                                                                                                                                               
May 22 18:18:19 private kernel: vmbr1: port 3(fwpr103p0) entered blocking state                                                                                                                                                                     
May 22 18:18:19 private kernel: vmbr1: port 3(fwpr103p0) entered disabled state                                                                                                                                                                     
May 22 18:18:19 private kernel: device fwpr103p0 entered promiscuous mode                                                                                                                                                                           
May 22 18:18:19 private kernel: vmbr1: port 3(fwpr103p0) entered blocking state                                                                                                                                                                     
May 22 18:18:19 private kernel: vmbr1: port 3(fwpr103p0) entered forwarding state                                                                                                                                                                   
May 22 18:18:19 private kernel: fwbr103i0: port 2(tap103i0) entered blocking state                                                                                                                                                                  
May 22 18:18:19 private kernel: fwbr103i0: port 2(tap103i0) entered disabled state                                                                                                                                                                  
May 22 18:18:19 private kernel: fwbr103i0: port 2(tap103i0) entered blocking state                                                                                                                                                                  
May 22 18:18:19 private kernel: fwbr103i0: port 2(tap103i0) entered forwarding state                                                                                                                                                                
May 22 18:18:19 private pvedaemon[3384808]: <root@pam> end task UPID:private:0034B66B:0417DCD3:60A92ECA:qmstart:103:root@pam: OK

and then

Code:
May 22 18:20:05 private kernel: tg3 0000:01:00.0 eno1: Link is down                                                                                                                                                                                 
May 22 18:20:05 private kernel: vmbr0: port 1(eno1) entered disabled state                                                                                                                                                                          
May 22 18:20:31 private pvestatd[1990]: Backup: error fetching datastores - 500 Can't connect to backup.hgroepservers.com:8007 (Temporary failure in name resolution)                                                                               
May 22 18:20:31 private pvestatd[1990]: status update time (20.163 seconds)                                                                                                                                                                         
May 22 18:20:51 private pvestatd[1990]: Backup: error fetching datastores - 500 Can't connect to backup.hgroepservers.com:8007 (Temporary failure in name resolution)                                                                               
May 22 18:20:51 private pvestatd[1990]: status update time (20.127 seconds)                                                                                                                                                                         
May 22 18:20:59 private QEMU[3454580]: kvm: warning: Spice: main:0 (0x563a877608a0): rcc 0x563a8842f2c0 has been unresponsive for more than 30000 ms, disconnecting

followed by a lot of the same for about 5 minutes... so that seems to match your description of the symptoms.

it seems the backup server hostname isn't being resolved. is there a backup job scheduled at this hour?

can you normally perform backups?

what is in the config of this VM?
 
if it happened around 18:20 then it seems that the problem starts occuring when VM 103 is being started (for a backup?)

there's first:
Code:
May 22 18:18:18 private pvedaemon[3454571]: start VM 103: UPID:private:0034B66B:0417DCD3:60A92ECA:qmstart:103:root@pam:                                                                                                                           
May 22 18:18:18 private systemd[1]: Started 103.scope.                                                                                                                                                                                            
May 22 18:18:18 private systemd-udevd[3454585]: Using default interface naming scheme 'v240'.                                                                                                                                                     
May 22 18:18:18 private systemd-udevd[3454585]: link_config: autonegotiation is unset or enabled, the speed and duplex are not writable.                                                                                                          
May 22 18:18:18 private systemd-udevd[3454585]: Could not generate persistent MAC address for tap103i0: No such file or directory                                                                                                                 
May 22 18:18:19 private kernel: device tap103i0 entered promiscuous mode                                                                                                                                                                          
May 22 18:18:19 private systemd-udevd[3454585]: link_config: autonegotiation is unset or enabled, the speed and duplex are not writable.                                                                                                          
May 22 18:18:19 private systemd-udevd[3454585]: Could not generate persistent MAC address for fwbr103i0: No such file or directory                                                                                                                
May 22 18:18:19 private systemd-udevd[3454585]: link_config: autonegotiation is unset or enabled, the speed and duplex are not writable.                                                                                                          
May 22 18:18:19 private systemd-udevd[3454585]: Could not generate persistent MAC address for fwpr103p0: No such file or directory                                                                                                                
May 22 18:18:19 private systemd-udevd[3454584]: link_config: autonegotiation is unset or enabled, the speed and duplex are not writable.                                                                                                          
May 22 18:18:19 private systemd-udevd[3454584]: Using default interface naming scheme 'v240'.                                                                                                                                                     
May 22 18:18:19 private systemd-udevd[3454584]: Could not generate persistent MAC address for fwln103i0: No such file or directory                                                                                                                
May 22 18:18:19 private kernel: fwbr103i0: port 1(fwln103i0) entered blocking state                                                                                                                                                               
May 22 18:18:19 private kernel: fwbr103i0: port 1(fwln103i0) entered disabled state                                                                                                                                                               
May 22 18:18:19 private kernel: device fwln103i0 entered promiscuous mode                                                                                                                                                                         
May 22 18:18:19 private kernel: fwbr103i0: port 1(fwln103i0) entered blocking state                                                                                                                                                               
May 22 18:18:19 private kernel: fwbr103i0: port 1(fwln103i0) entered forwarding state                                                                                                                                                             
May 22 18:18:19 private kernel: vmbr1: port 3(fwpr103p0) entered blocking state                                                                                                                                                                   
May 22 18:18:19 private kernel: vmbr1: port 3(fwpr103p0) entered disabled state                                                                                                                                                                   
May 22 18:18:19 private kernel: device fwpr103p0 entered promiscuous mode                                                                                                                                                                         
May 22 18:18:19 private kernel: vmbr1: port 3(fwpr103p0) entered blocking state                                                                                                                                                                   
May 22 18:18:19 private kernel: vmbr1: port 3(fwpr103p0) entered forwarding state                                                                                                                                                                 
May 22 18:18:19 private kernel: fwbr103i0: port 2(tap103i0) entered blocking state                                                                                                                                                                
May 22 18:18:19 private kernel: fwbr103i0: port 2(tap103i0) entered disabled state                                                                                                                                                                
May 22 18:18:19 private kernel: fwbr103i0: port 2(tap103i0) entered blocking state                                                                                                                                                                
May 22 18:18:19 private kernel: fwbr103i0: port 2(tap103i0) entered forwarding state                                                                                                                                                              
May 22 18:18:19 private pvedaemon[3384808]: <root@pam> end task UPID:private:0034B66B:0417DCD3:60A92ECA:qmstart:103:root@pam: OK

and then

Code:
May 22 18:20:05 private kernel: tg3 0000:01:00.0 eno1: Link is down                                                                                                                                                                               
May 22 18:20:05 private kernel: vmbr0: port 1(eno1) entered disabled state                                                                                                                                                                        
May 22 18:20:31 private pvestatd[1990]: Backup: error fetching datastores - 500 Can't connect to backup.hgroepservers.com:8007 (Temporary failure in name resolution)                                                                             
May 22 18:20:31 private pvestatd[1990]: status update time (20.163 seconds)                                                                                                                                                                       
May 22 18:20:51 private pvestatd[1990]: Backup: error fetching datastores - 500 Can't connect to backup.hgroepservers.com:8007 (Temporary failure in name resolution)                                                                             
May 22 18:20:51 private pvestatd[1990]: status update time (20.127 seconds)                                                                                                                                                                       
May 22 18:20:59 private QEMU[3454580]: kvm: warning: Spice: main:0 (0x563a877608a0): rcc 0x563a8842f2c0 has been unresponsive for more than 30000 ms, disconnecting

followed by a lot of the same for about 5 minutes... so that seems to match your description of the symptoms.

it seems the backup server hostname isn't being resolved. is there a backup job scheduled at this hour?

can you normally perform backups?

what is in the config of this VM?
Hi Oguz,


There is a backup planned every day at 00:00.

In order to reproduce the issue i had started vm 103
I have started a backup manuelly and this worked just fine.

Here is the setup for VM 103 (This is also the VM whit Ubuntu on it that freezes the whole server after startup)

agent: 1
audio0: device=AC97,driver=spice
balloon: 0
bios: ovmf
boot: order=scsi0;ide2;net0
cores: 1
cpu: host
cpulimit: 1
efidisk0: local-zfs:vm-103-disk-0,size=1M
freeze: 0
ide2: none,media=cdrom
kvm: 1
memory: 1048
name: Computer
net0: virtio=A6:80:C3:90:C0:E9,bridge=vmbr1,firewall=1
numa: 1
ostype: l26
protection: 0
scsi0: local-zfs:vm-103-disk-1,size=1000G
scsihw: virtio-scsi-pci
smbios1: uuid=81a75139-737d-407d-9ee6-cc9c84dc655f
sockets: 1
vga: qxl
vmgenid: 251e8a5d-ae52-4493-8614-b39a7d92a9e7

Kind Regards,
Gamerh
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!