[SOLVED] Can't access web GUI. Need to restart whole Proxmox server.

amateurus

New Member
Nov 16, 2022
8
2
3
Hello,

sometimes I can't access proxmox through web GUI. I have only one VM with ubuntu and it is also unavailable so its also "frozen".

I updated BIOS, tried different kernels but nothing helps.

CPU: Ryzen 5 3600
Kernel: Linux 5.19.7-1-pve

Code:
Mar 31 09:42:41 test pvedaemon[262956]: <root@pam> successful auth for user 'root@pam'
Mar 31 09:42:43 test pvedaemon[356586]: starting vnc proxy UPID:test:000570EA:00E4B4E1:64268EF3:vncproxy:100:root@pam:
Mar 31 09:42:43 test pvedaemon[262956]: <root@pam> starting task UPID:test:000570EA:00E4B4E1:64268EF3:vncproxy:100:root@pam:
Mar 31 09:42:44 test pvedaemon[269435]: VM 100 qmp command failed - VM 100 qmp command 'guest-ping' failed - got timeout
Mar 31 09:42:46 test pvedaemon[262956]: <root@pam> end task UPID:test:000570EA:00E4B4E1:64268EF3:vncproxy:100:root@pam: OK
Mar 31 09:42:50 test pvedaemon[262956]: VM 100 qmp command failed - VM 100 qmp command 'guest-ping' failed - got timeout
Mar 31 10:17:01 test CRON[361410]: (root) CMD (   cd / && run-parts --report /etc/cron.hourly)
Mar 31 11:17:01 test CRON[369828]: (root) CMD (   cd / && run-parts --report /etc/cron.hourly)
Mar 31 12:17:01 test CRON[378164]: (root) CMD (   cd / && run-parts --report /etc/cron.hourly)
Mar 31 13:05:08 test smartd[811]: Device: /dev/sda [SAT], CHECK POWER STATUS spins up disk (0x81 -> 0xff)
Mar 31 13:17:01 test CRON[386645]: (root) CMD (   cd / && run-parts --report /etc/cron.hourly)
Mar 31 13:17:32 test systemd[1]: Starting Daily apt download activities...
Mar 31 13:17:32 test systemd[1]: apt-daily.service: Succeeded.
Mar 31 13:17:32 test systemd[1]: Finished Daily apt download activities.
Mar 31 13:35:08 test smartd[811]: Device: /dev/sda [SAT], SMART Usage Attribute: 190 Airflow_Temperature_Cel changed from 65 to 66
Mar 31 13:35:08 test smartd[811]: Device: /dev/sda [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 35 to 34

This is what I have in my syslog. Are there any logs I should check?
 
Hello,

I'm not 100% sure, but from the syslog you're provided the issue related to `/dev/sda` cab you post the output of `smartctl -a /dev/sda` command?
 
Thanks for reply!

Code:
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.19.7-1-pve] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Seagate IronWolf Pro
Device Model:     ST4000NE001-2MA101
Serial Number:    WS23R321
LU WWN Device Id: 5 000c50 0ed17b941
Firmware Version: EN01
User Capacity:    4,000,787,030,016 bytes [4.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    7200 rpm
Form Factor:      3.5 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-4 (minor revision not indicated)
SATA Version is:  SATA 3.3, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Tue Apr  4 14:27:19 2023 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
                                        was completed without error.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (  567) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        ( 369) minutes.
Conveyance self-test routine
recommended polling time:        (   2) minutes.
SCT capabilities:              (0x50bd) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   083   064   044    Pre-fail  Always       -       219939404
  3 Spin_Up_Time            0x0003   091   091   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       10
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   076   060   045    Pre-fail  Always       -       40425735
  9 Power_On_Hours          0x0032   097   097   000    Old_age   Always       -       3321
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       10
 18 Head_Health             0x000b   100   100   050    Pre-fail  Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
188 Command_Timeout         0x0032   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   065   059   040    Old_age   Always       -       35 (Min/Max 33/39)
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       7
193 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       332
194 Temperature_Celsius     0x0022   035   041   000    Old_age   Always       -       35 (0 19 0 0 0)
195 Hardware_ECC_Recovered  0x001a   083   064   000    Old_age   Always       -       219939404
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       1
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       2950h+33m+59.876s
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       6950974018
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       35519138994

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%         0         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
 
Thank you for the output!

The output appears to be normal. Can you please provide us with the Syslog at the time when the issue occurs again? You can sort the Syslog using jornalctl tool, like the following command:

Bash:
journalctl --since "2023-04-04 00:00" --until "2023-04-04 08:00" > /tmp/Syslog.txt

You may change the date/time, in the above command.
 
This is from journalctl. Last 3 hours before last log on 07:17.

Code:
Apr 05 04:17:01 test CRON[298192]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Apr 05 04:17:01 test CRON[298193]: (root) CMD (   cd / && run-parts --report /etc/cron.hourly)
Apr 05 04:17:01 test CRON[298192]: pam_unix(cron:session): session closed for user root
Apr 05 04:17:07 test smartd[808]: Device: /dev/sda [SAT], SMART Prefailure Attribute: 1 Raw_Read_Error_Rate changed from 72 to 77
Apr 05 04:17:07 test smartd[808]: Device: /dev/sda [SAT], SMART Usage Attribute: 195 Hardware_ECC_Recovered changed from 72 to 77
Apr 05 04:17:07 test smartd[808]: Device: /dev/sdb [SAT], SMART Prefailure Attribute: 1 Raw_Read_Error_Rate changed from 80 to 82
Apr 05 04:17:07 test smartd[808]: Device: /dev/sdb [SAT], SMART Usage Attribute: 190 Airflow_Temperature_Cel changed from 67 to 66
Apr 05 04:17:07 test smartd[808]: Device: /dev/sdb [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 33 to 34
Apr 05 04:47:08 test smartd[808]: Device: /dev/sdb [SAT], SMART Usage Attribute: 190 Airflow_Temperature_Cel changed from 66 to 67
Apr 05 04:47:08 test smartd[808]: Device: /dev/sdb [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 34 to 33
Apr 05 05:03:12 test systemd[1]: Starting Daily PVE download activities...
Apr 05 05:03:13 test pveupdate[304536]: <root@pam> starting task UPID:test:0004A59D:00C74610:642CE4F1:aptupdate::root@pam:
Apr 05 05:03:14 test pveupdate[304541]: update new package list: /var/lib/pve-manager/pkgupdates
Apr 05 05:03:15 test pveupdate[304536]: <root@pam> end task UPID:test:0004A59D:00C74610:642CE4F1:aptupdate::root@pam: OK
Apr 05 05:03:15 test systemd[1]: pve-daily-update.service: Succeeded.
Apr 05 05:03:15 test systemd[1]: Finished Daily PVE download activities.
Apr 05 05:03:15 test systemd[1]: pve-daily-update.service: Consumed 2.261s CPU time.
Apr 05 05:17:01 test CRON[306803]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Apr 05 05:17:01 test CRON[306804]: (root) CMD (   cd / && run-parts --report /etc/cron.hourly)
Apr 05 05:17:01 test CRON[306803]: pam_unix(cron:session): session closed for user root
Apr 05 06:16:34 test IPCC.xs[1158]: pam_unix(proxmox-ve-auth:auth): authentication failure; logname= uid=0 euid=0 tty= ruser= rhost=  user=root
Apr 05 06:16:36 test pvedaemon[1158]: authentication failure; rhost=::ffff:185.220.101.169 user=root@pam msg=Authentication failure
Apr 05 06:17:01 test CRON[315110]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Apr 05 06:17:01 test CRON[315111]: (root) CMD (   cd / && run-parts --report /etc/cron.hourly)
Apr 05 06:17:01 test CRON[315110]: pam_unix(cron:session): session closed for user root
Apr 05 06:17:07 test smartd[808]: Device: /dev/sda [SAT], SMART Usage Attribute: 190 Airflow_Temperature_Cel changed from 65 to 66
Apr 05 06:17:07 test smartd[808]: Device: /dev/sda [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 35 to 34
Apr 05 06:25:01 test CRON[316234]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Apr 05 06:25:01 test CRON[316235]: (root) CMD (test -x /usr/sbin/anacron || ( cd / && run-parts --report /etc/cron.daily ))
Apr 05 06:25:01 test CRON[316234]: pam_unix(cron:session): session closed for user root
Apr 05 06:27:12 test systemd[1]: Starting Daily apt upgrade and clean activities...
Apr 05 06:27:12 test systemd[1]: apt-daily-upgrade.service: Succeeded.
Apr 05 06:27:12 test systemd[1]: Finished Daily apt upgrade and clean activities.
Apr 05 07:17:01 test CRON[323474]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Apr 05 07:17:01 test CRON[323475]: (root) CMD (   cd / && run-parts --report /etc/cron.hourly)
Apr 05 07:17:01 test CRON[323474]: pam_unix(cron:session): session closed for user root
 
Not even SSH or ping. I tried it multiple times but cant connect through SSH or ping the proxmox/VM.
 
If you can access physically to your Proxmox VE server, do you see anything in the screen? or login?
 
I dont have gpu in my proxmox build right now so I cant see anything on the screen when it happens.
 
Last edited:
I also noticed that my ubuntu VM is sometimes freezed and I cant controll it trough web GUI or SSH.
Also when I look at VM sumamry it says that Ubuntu is using 95% of allocated ram but HTOP is showing only something around 2.4G/10G.

I will try to reinstall whole Proxmox this week and see if it helps somehow.
 
After reinstalling proxmox everything works! So I dont know exactly where was my problem but it's fixed now. Thanks!
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!