[SOLVED] Proxmox "freezing" randomly

Jebula999

Member
Apr 18, 2024
30
3
8
Hi there,

I have a proxmox server, with 4 VM's running.

Every couple hours to couple of days, the server just "freezes"
The machine is still online and running, but no one is home.

I need to force shutdown and boot back up.

Log files at the crash are:
Code:
Jun 15 17:17:01 homelab CRON[256424]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
Jun 15 17:17:01 homelab CRON[256423]: pam_unix(cron:session): session closed for user root
Jun 15 18:10:03 homelab chronyd[954]: Selected source 102.130.49.195 (2.debian.pool.ntp.org)
Jun 15 18:17:01 homelab CRON[265747]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Jun 15 18:17:01 homelab CRON[265748]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
Jun 15 18:17:01 homelab CRON[265747]: pam_unix(cron:session): session closed for user root
Jun 15 19:17:01 homelab CRON[275056]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Jun 15 19:17:01 homelab CRON[275057]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
Jun 15 19:17:01 homelab CRON[275056]: pam_unix(cron:session): session closed for user root
Jun 15 19:31:22 homelab smartd[811]: Device: /dev/sdb [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 122 to 123
-- Boot c159405377bd470a9bac3635d1bf2abf --
Jun 15 21:56:56 homelab kernel: Linux version 6.5.11-8-pve (build@proxmox) (gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40) #1 SMP PREEMPT_DYNAMIC PMX 6.5.11-8 (2024-01-30T12:27Z) ()
Jun 15 21:56:56 homelab kernel: Command line: BOOT_IMAGE=/boot/vmlinuz-6.5.11-8-pve root=/dev/mapper/pve-root ro quiet
Jun 15 21:56:56 homelab kernel: KERNEL supported cpus:
Jun 15 21:56:56 homelab kernel:   Intel GenuineIntel
Jun 15 21:56:56 homelab kernel:   AMD AuthenticAMD
Jun 15 21:56:56 homelab kernel:   Hygon HygonGenuine
Jun 15 21:56:56 homelab kernel:   Centaur CentaurHauls
Jun 15 21:56:56 homelab kernel:   zhaoxin   Shanghai

I cannot find any reason this would happen, and am at a loss of where to look.


I would like to note I am very new to all this, and not sure of commands to run or how to dig deeper/troubleshoot.
Any guidance or troubleshooting tips would be appreciated.
 
This morning I woke up to Proxmox being offline completely.
The last entry in the Summary tab was at 3:30AM.
1718617951840.png

The last log entry was:

Code:
Jun 17 03:24:03 homelab systemd[1]: apt-daily.service: Deactivated successfully.
Jun 17 03:24:03 homelab systemd[1]: Finished apt-daily.service - Daily apt download activities.
Jun 17 03:26:03 homelab systemd[1]: Starting pve-daily-update.service - Daily PVE download activities...
Jun 17 03:26:05 homelab pveupdate[81176]: <root@pam> starting task UPID:homelab:00013D1E:002DB6A0:666F90AD:aptupdate::root@pam:
Jun 17 03:26:07 homelab pveupdate[81182]: command 'apt-get update' failed: exit code 100
Jun 17 03:26:07 homelab pveupdate[81176]: command 'apt-get update' failed: exit code 100
Jun 17 03:26:07 homelab pveupdate[81176]: <root@pam> end task UPID:homelab:00013D1E:002DB6A0:666F90AD:aptupdate::root@pam: command 'apt-get update' failed: exit code 100
Jun 17 03:26:07 homelab systemd[1]: pve-daily-update.service: Deactivated successfully.
Jun 17 03:26:07 homelab systemd[1]: Finished pve-daily-update.service - Daily PVE download activities.
Jun 17 03:26:07 homelab systemd[1]: pve-daily-update.service: Consumed 1.713s CPU time.
-- Boot 3bc50e52f3b44b09aa87c4d9aea4d0b1 --
Jun 17 11:15:56 homelab kernel: Linux version 6.5.11-8-pve (build@proxmox) (gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40) #1 SMP PREEMPT_DYNAMIC PMX 6.5.11-8 (2024-01-30T12:27Z) ()
Jun 17 11:15:56 homelab kernel: Command line: BOOT_IMAGE=/boot/vmlinuz-6.5.11-8-pve root=/dev/mapper/pve-root ro quiet
Jun 17 11:15:56 homelab kernel: KERNEL supported cpus:
Jun 17 11:15:56 homelab kernel:   Intel GenuineIntel
Jun 17 11:15:56 homelab kernel:   AMD AuthenticAMD
Jun 17 11:15:56 homelab kernel:   Hygon HygonGenuine
Jun 17 11:15:56 homelab kernel:   Centaur CentaurHauls
Jun 17 11:15:56 homelab kernel:   zhaoxin   Shanghai
Jun 17 11:15:56 homelab kernel: BIOS-provided physical RAM map:
 
Last edited:
Nothing obviously stands out.

The only thing I can think of checking is that /dev/sdb . How is this drive incorporated into your PVE instance? I see smartd issued a temp change just before freeze. Could this drive's situation be linked to the freeze? Cooling/Aging?
 
Do you check you hdds temperture with below command?

smartctl -A /dev/sdb | grep -i temperature
 
/dev/sdb is one of 2 3TB drives.
It's a part of my TrueNAS VM.
I doubt the failure of a HD on my TrueNAS VM will bring down all other VM's or freeze up Proxmox.

Below screenshots from the drives.
1718619438496.png
1718619522762.png
 
Do you check you hdds temperture with below command?

smartctl -A /dev/sdb | grep -i temperature
I've done a few spot checks, but they always seem to be <30 degrees Celsius.
The SSD running Proxmox runs at about 40 degrees Celsius.
/dev/sda

Code:
root@homelab:~# smartctl -A /dev/sdb | grep -i temperature
194 Temperature_Celsius     0x0022   122   111   000    Old_age   Always       -       28

1718619747997.png
 
Do you try to check disk health?

smartctl -H /dev/sdb
Code:
root@homelab:~# smartctl -H /dev/sdb
smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.5.11-8-pve] (local build)
Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
 
Not even sure what a Kernel is tbh.
Glad to meet a complete newbie to Linux - Happy times.


To update your system (assuming you don't have a subscription, which may be worth acquiring), you need to do the following:

1. In the GUI, select Node (left-panel), Updates (middle-panel) then (lower-menu) Repositories, then from the available repositories enable both no-subscription & pve-no-subscription where available.

2. Go to Updates (middle-panel) then press (on top) Refresh.

3. Finally press the Upgrade button to go through with the actual updates. (Window should open & you will need to enter to confirm).

On a kernel update - you will need to reboot the whole node - as you will be advised.


Proxmox has a learning curve - but its doable by almost anyone - & the more you learn the more you will enjoy it.

Please note that updates sometimes help & sometimes have the reverse effect.

ALWAYS HAVE CONSTANT BACKUPS OF ALL YOUR VMs & LXCs. I would suggest you familiarize yourself with the whole backup/restore structure encompassed in PVE & also focusing on the specific backup location of your choice, (Ideally this should be another non-OS drive or even an external one & maybe both ......).

BTW Proxmox is awesome as is this forum & staff.
 
  • Like
Reactions: justinclift
Glad to meet a complete newbie to Linux - Happy times.


To update your system (assuming you don't have a subscription, which may be worth acquiring), you need to do the following:

1. In the GUI, select Node (left-panel), Updates (middle-panel) then (lower-menu) Repositories, then from the available repositories enable both no-subscription & pve-no-subscription where available.

2. Go to Updates (middle-panel) then press (on top) Refresh.

3. Finally press the Upgrade button to go through with the actual updates. (Window should open & you will need to enter to confirm).

On a kernel update - you will need to reboot the whole node - as you will be advised.


Proxmox has a learning curve - but its doable by almost anyone - & the more you learn the more you will enjoy it.

Please note that updates sometimes help & sometimes have the reverse effect.

ALWAYS HAVE CONSTANT BACKUPS OF ALL YOUR VMs & LXCs. I would suggest you familiarize yourself with the whole backup/restore structure encompassed in PVE & also focusing on the specific backup location of your choice, (Ideally this should be another non-OS drive or even an external one & maybe both ......).

BTW Proxmox is awesome as is this forum & staff.
Thank you for the kind words.
Now that you mention it, I've never looked into backups for the VM/Machine.
Will deep dive into that soon.

In response to your suggestions:

1. I don't see anything refer to no-subscription nor pve-no-subscription
But everything there is enabled from what I can tell.
1718636256681.png

2&3. I actually did this yesterday as I figured maybe an update would solve things.
When I follow your steps now though, I am left with:
Code:
starting apt-get update
Hit:1 http://security.debian.org bookworm-security InRelease
Hit:2 http://ftp.debian.org/debian bookworm InRelease
Hit:3 http://ftp.debian.org/debian bookworm-updates InRelease
Err:4 https://enterprise.proxmox.com/debian/ceph-quincy bookworm InRelease
  401  Unauthorized [IP: 51.91.38.34 443]
Err:5 https://enterprise.proxmox.com/debian/pve bookworm InRelease
  401  Unauthorized [IP: 51.91.38.34 443]
Reading package lists...
E: Failed to fetch https://enterprise.proxmox.com/debian/ceph-quincy/dists/bookworm/InRelease  401  Unauthorized [IP: 51.91.38.34 443]
E: The repository 'https://enterprise.proxmox.com/debian/ceph-quincy bookworm InRelease' is not signed.
E: Failed to fetch https://enterprise.proxmox.com/debian/pve/dists/bookworm/InRelease  401  Unauthorized [IP: 51.91.38.34 443]
E: The repository 'https://enterprise.proxmox.com/debian/pve bookworm InRelease' is not signed.
TASK ERROR: command 'apt-get update' failed: exit code 100

I don't recall errors yesterday, and as you mentioned, a window came up asking if I wanted to install xyz updates for +-70kb of additional space.
I accepted and proceeded to reboot afterwards.
 
Hi there,

I have a proxmox server, with 4 VM's running.

Every couple hours to couple of days, the server just "freezes"
The machine is still online and running, but no one is home.

I need to force shutdown and boot back up.

Log files at the crash are:
Code:
Jun 15 17:17:01 homelab CRON[256424]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
Jun 15 17:17:01 homelab CRON[256423]: pam_unix(cron:session): session closed for user root
Jun 15 18:10:03 homelab chronyd[954]: Selected source 102.130.49.195 (2.debian.pool.ntp.org)
Jun 15 18:17:01 homelab CRON[265747]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Jun 15 18:17:01 homelab CRON[265748]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
Jun 15 18:17:01 homelab CRON[265747]: pam_unix(cron:session): session closed for user root
Jun 15 19:17:01 homelab CRON[275056]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Jun 15 19:17:01 homelab CRON[275057]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
Jun 15 19:17:01 homelab CRON[275056]: pam_unix(cron:session): session closed for user root
Jun 15 19:31:22 homelab smartd[811]: Device: /dev/sdb [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 122 to 123
-- Boot c159405377bd470a9bac3635d1bf2abf --
Jun 15 21:56:56 homelab kernel: Linux version 6.5.11-8-pve (build@proxmox) (gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40) #1 SMP PREEMPT_DYNAMIC PMX 6.5.11-8 (2024-01-30T12:27Z) ()
Jun 15 21:56:56 homelab kernel: Command line: BOOT_IMAGE=/boot/vmlinuz-6.5.11-8-pve root=/dev/mapper/pve-root ro quiet
Jun 15 21:56:56 homelab kernel: KERNEL supported cpus:
Jun 15 21:56:56 homelab kernel:   Intel GenuineIntel
Jun 15 21:56:56 homelab kernel:   AMD AuthenticAMD
Jun 15 21:56:56 homelab kernel:   Hygon HygonGenuine
Jun 15 21:56:56 homelab kernel:   Centaur CentaurHauls
Jun 15 21:56:56 homelab kernel:   zhaoxin   Shanghai

I cannot find any reason this would happen, and am at a loss of where to look.


I would like to note I am very new to all this, and not sure of commands to run or how to dig deeper/troubleshoot.
Any guidance or troubleshooting tips would be appreciated.
might be totally unrelated, but the line before the kernel freeze reports a temperature of 123C...
 
Ok

Ok so press the ADD button and choose the no-subscription/s from there. Also in the main window you may want to disable the enterprise repos, since you obviously don't have a subscription.

You've been adequately warned!
This looks much better.

But shortly after downloading ~400MB worth of updates, it stops here:
Code:
55 upgraded, 5 newly installed, 2 to remove and 1 not upgraded.
Need to get 0 B/434 MB of archives.
After this operation, 1205 MB of additional disk space will be used.
Do you want to continue? [Y/n] Y
W: (pve-apt-hook) !! WARNING !!
W: (pve-apt-hook) You are attempting to remove the meta-package 'proxmox-ve'!
W: (pve-apt-hook)
W: (pve-apt-hook) If you really want to permanently remove 'proxmox-ve' from your system, run the following command
W: (pve-apt-hook)       touch '/please-remove-proxmox-ve'
W: (pve-apt-hook) run apt purge proxmox-ve to remove the meta-package
W: (pve-apt-hook) and repeat your apt invocation.
W: (pve-apt-hook)
W: (pve-apt-hook) If you are unsure why 'proxmox-ve' would be removed, please verify
W: (pve-apt-hook)       - your APT repository settings
W: (pve-apt-hook)       - that you are using 'apt full-upgrade' to upgrade your system
E: Sub-process /usr/share/proxmox-ve/pve-apt-hook returned an error code (1)
E: Failure running script /usr/share/proxmox-ve/pve-apt-hook
Although this doesn't look very friendly, not sure weather to force it or not.

According to the list of updates, it wants to go from 8.1.0 to 8.2.0
But this makes it sound like it wants to remove it...

To force or not to force?

EDIT:
I just noticed it's happening to a lot of people, I see it's being talked about:
Post regarding above
 
Last edited:
Me again.

Just had 3 freezes today again... this is the most i've had in one day.

I did go about and update Proxmox 2 days ago as suggested, but seems to have not helped.

Latest log for most resent freeze/crash:
Code:
Jun 19 17:17:01 homelab CRON[55937]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
Jun 19 17:17:01 homelab CRON[55935]: pam_unix(cron:session): session closed for user root
Jun 19 17:26:59 homelab smartd[828]: Device: /dev/sdc [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 120 to 121
Jun 19 17:51:23 homelab kernel: perf: interrupt took too long (3137 > 3127), lowering kernel.perf_event_max_sample_rate to 63000
Jun 19 18:17:01 homelab CRON[64522]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Jun 19 18:17:01 homelab CRON[64523]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
Jun 19 18:17:01 homelab CRON[64522]: pam_unix(cron:session): session closed for user root
Jun 19 19:17:01 homelab CRON[73048]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Jun 19 19:17:01 homelab CRON[73049]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
Jun 19 19:17:01 homelab CRON[73048]: pam_unix(cron:session): session closed for user root
-- Boot 7761861583d14dbd9afafc679baa10b3 --
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!