I can't access the GUI after an update

elturke

New Member
Jun 3, 2021
9
0
1
56
install Proxmox server normally without problem, install a virtual machine that is running smoothly. After weeks of normal use I suddenly no longer have access to the GUI, I have ssh access. I reviewed many comments from other users with similar problems but the answers to solve them did not help me.
I decided to turn off the computer and restart, it stayed at the beginning asking me to do a "fsck /dev/mapper/pve-root" and after restarting it allowed me to access the GUI again and the virtual machine started alone normally.
I work normal with GUI access normally. I was able to make a backup of the virtual machine to an external network path to Proxmox and leave it running.
After a few hours he lost access to the GUI again.



Send, "systemctl status pveproxy pvedaemon"

root@prosmox01:/# systemctl status pveproxy pvedaemon

● pveproxy.service - PVE API Proxy Server
Loaded: loaded (/lib/systemd/system/pveproxy.service; enabled; vendor preset: enabled)
Active: active (running) since Wed 2021-06-02 14:18:52 -03; 6h ago
Process: 3651 ExecStartPre=/usr/bin/pvecm updatecerts --silent (code=exited, status=1/FAILURE)
Process: 3660 ExecStart=/usr/bin/pveproxy start (code=exited, status=0/SUCCESS)
Main PID: 3678 (pveproxy)
Tasks: 4 (limit: 4915)
Memory: 134.0M
CGroup: /system.slice/pveproxy.service
├─ 3678 pveproxy
├─17567 pveproxy worker
├─17568 pveproxy worker
└─17569 pveproxy worker

Jun 02 21:15:25 prosmox01 pveproxy[3678]: worker 17542 finished
Jun 02 21:15:25 prosmox01 pveproxy[3678]: starting 1 worker(s)
Jun 02 21:15:25 prosmox01 pveproxy[3678]: worker 17568 started
Jun 02 21:15:25 prosmox01 pveproxy[17543]: worker exit
Jun 02 21:15:25 prosmox01 pveproxy[17567]: unable to open log file '/var/log/pveproxy/access.log' - Read-only f
Jun 02 21:15:25 prosmox01 pveproxy[17568]: unable to open log file '/var/log/pveproxy/access.log' - Read-only f
Jun 02 21:15:25 prosmox01 pveproxy[3678]: worker 17543 finished
Jun 02 21:15:25 prosmox01 pveproxy[3678]: starting 1 worker(s)
Jun 02 21:15:25 prosmox01 pveproxy[3678]: worker 17569 started
Jun 02 21:15:25 prosmox01 pveproxy[17569]: unable to open log file '/var/log/pveproxy/access.log' - Read-only f

● pvedaemon.service - PVE API Daemon
Loaded: loaded (/lib/systemd/system/pvedaemon.service; enabled; vendor preset: enabled)
Active: active (running) since Mon 2021-05-31 08:38:40 -03; 2 days ago
Process: 1145 ExecStart=/usr/bin/pvedaemon start (code=exited, status=0/SUCCESS)
Main PID: 1178 (pvedaemon)
Tasks: 4 (limit: 4915)
Memory: 179.0M
CGroup: /system.slice/pvedaemon.service
├─ 1178 pvedaemon
├─23459 pvedaemon worker
├─23719 pvedaemon worker
└─28350 pvedaemon worker

Warning: Journal has been rotated since unit was started. Log output is incomplete or unavailable.



Anyone to help me, please!.

Regards.
ElTurke.
 
As you can see the filesystem gets set to read-only, which usually happens because of inconsistencies.
Also, the fsck info points in that direction.
Is your hardware okay?
 
As you can see the filesystem gets set to read-only, which usually happens because of inconsistencies.
Also, the fsck info points in that direction.
Is your hardware okay?
the hardware is new, it is an intel i3 10th gen PC, 16gb ram and SSD disk, good power supply, common sata cables that come in the Asus mother box.
I understand the inconsistency of files that is shown at the start of Proxmox, it asks me for a fsck.
I do not know very well where to attack, change sata cable and sata port on motherboard and try, then change the SSD if the problem persists.
 
That sounds like a good start, yes.
the inconsistency of operating system files are generated by hardware failures?, what catches my attention is that the virtual equipment is working well without cuts or problems. The Proxmox PC is in production all day and no one complains.
 
That sounds like a good start, yes.
My proxmox pc work well with ssd and windows 10, no problems. In proxmox start well on ssd Drive but hours after, gui not work and I need to restart on ssh command Line.
I need help.
 
I kind of expect stuff like this to happen with virtual guests on a 50 dollar SSD ...
How many machines are you running simultaneously?
I would also guess that the s.m.a.r.t. data doesn't look very healthy, could you provide some test results?
 
The good thing about proxmox is to be able to implement a good work platform in hardware, not necessarily expensive.
An SSD of USD50 is not bad and should not cause the problem that appears to me, you can say that it has a more limited life than a Samsung EVO.
At the moment I only have a single Windows Server 2016 machine running virtual within Proxmox, a terminal server.
Could it be a Proxmox related issue on SSD? Could it be an issue related to the Asus H410m-e motherboard in UEFI mode?
You comment on the SMART data doesn't look right.
Can you tell me what test I should do and post for you to analyze? Thanks.
 
The good thing about proxmox is to be able to implement a good work platform in hardware, not necessarily expensive.
Proxmox - as every hypervisor - relies on hardware that fulfills the need of virtualization.
An SSD of USD50 is not bad and should not cause the problem that appears to me, you can say that it has a more limited life than a Samsung EVO.
Maybe it's not the cause but not everything is about life span or TBW. Concurrent writes, sync writes, random access, ... There really is a lot more to a suitable disk.
At the moment I only have a single Windows Server 2016 machine running virtual within Proxmox, a terminal server.
Well, that should indeed be able to run even off a cheap SSD.
Could it be a Proxmox related issue on SSD?
Very unlikely, since it's basically a Debian. However, Proxmox and its guests can employ a remarkable load on your disks. For how long has this setup be running? And did it run 24/7?
Could it be an issue related to the Asus H410m-e motherboard in UEFI mode?
If Windows runs without problems, I would rule that out.
You comment on the SMART data doesn't look right.
Can you tell me what test I should do and post for you to analyze? Thanks.
The best is a test directly on the hypervisor:
Code:
apt install smartmontools
smartctl -t short /dev/sdX
smartctl -a /dev/sdX (after the test finished)
 
Last edited:
I send you test results,
The tests were carried out recently I rebooted the Proxmox PC, everything working ok at first.

root@prosmox01:/dev# smartctl -t short /dev/sda
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.4.73-1-pve] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===
Sending command: "Execute SMART Short self-test routine immediately in off-line mode".
Drive command "Execute SMART Short self-test routine immediately in off-line mode" successful.
Testing has begun.
Please wait 1 minutes for test to complete.
Test will complete after Fri Jun 11 22:34:27 2021 -03
Use smartctl -X to abort test.
root@prosmox01:/dev# smartctl -a /dev/sda
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.4.73-1-pve] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family: Phison Driven SSDs
Device Model: KINGSTON SA400S37240G
Serial Number: 50026B7783CA487B
LU WWN Device Id: 5 0026b7 783ca487b
Firmware Version: SBFKB1H5
User Capacity: 240,057,409,536 bytes [240 GB]
Sector Size: 512 bytes logical/physical
Rotation Rate: Solid State Device
TRIM Command: Available
Device is: In smartctl database [for details use: -P show]
ATA Version is: ACS-3 T13/2161-D revision 4
SATA Version is: SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Fri Jun 11 22:35:18 2021 -03
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: (65535) seconds.
Offline data collection
capabilities: (0x11) SMART execute Offline immediate.
No Auto Offline data collection support.
Suspend Offline collection upon new
command.
No Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
No Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 2) minutes.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x0032 100 100 000 Old_age Always - 0
9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 1567
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 35
148 Unknown_Attribute 0x0000 100 100 000 Old_age Offline - 0
149 Unknown_Attribute 0x0000 100 100 000 Old_age Offline - 0
167 Write_Protect_Mode 0x0000 100 100 000 Old_age Offline - 0
168 SATA_Phy_Error_Count 0x0012 100 100 000 Old_age Always - 0
169 Bad_Block_Rate 0x0000 100 100 000 Old_age Offline - 26
170 Bad_Blk_Ct_Erl/Lat 0x0000 100 100 010 Old_age Offline - 0/18
172 Erase_Fail_Count 0x0032 100 100 000 Old_age Always - 0
173 MaxAvgErase_Ct 0x0000 100 100 000 Old_age Offline - 31 (Average 8)
181 Program_Fail_Count 0x0032 100 100 000 Old_age Always - 0
182 Erase_Fail_Count 0x0000 100 100 000 Old_age Offline - 0
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0
192 Unsafe_Shutdown_Count 0x0012 100 100 000 Old_age Always - 21
194 Temperature_Celsius 0x0022 025 040 000 Old_age Always - 25 (Min/Max 19/40)
196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 0
199 SATA_CRC_Error_Count 0x0032 100 100 000 Old_age Always - 0
218 CRC_Error_Count 0x0032 100 100 000 Old_age Always - 0
231 SSD_Life_Left 0x0000 099 099 000 Old_age Offline - 99
233 Flash_Writes_GiB 0x0032 100 100 000 Old_age Always - 1115
241 Lifetime_Writes_GiB 0x0032 100 100 000 Old_age Always - 775
242 Lifetime_Reads_GiB 0x0032 100 100 000 Old_age Always - 809
244 Average_Erase_Count 0x0000 100 100 000 Old_age Offline - 8
245 Max_Erase_Count 0x0000 100 100 000 Old_age Offline - 31
246 Total_Erase_Count 0x0000 100 100 000 Old_age Offline - 65728

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed without error 00% 1567 -
# 2 Short offline Completed without error 00% 1567 -
# 3 Short offline Completed without error 00% 1567 -

Selective Self-tests/Logging not supported
 
Last edited:
Okay, the disk is healthy at least. I still think that the controller can't handle the load and would recommend looking for a Samsung PM or SM or an Intel DC disk.
 
The load would be only the proxmox pc plus a virtual windows server 2016 running terminal server, the clients only use an administrative management soft, the sql engine is in another physical server. No data is saved on that server. Only ERP execution. While everything works fine, no processor or ram memory overload is observed. Even when it appears in the problem in the proxmox, the gui cannot be accessed but the virtual windows server does not know about the problem, for the users everything works normal and smooth. I know ssd is cheap but it should work fine. It could be a cache configuration parameter problem or another configuration problem.
 
Don't you think that the fact that the guest is unaffected by that problem is worth to be mentioned in the beginning?

I'm not very experienced with filesystem errors, you probably have to dig into that direction, because /var/log seems to be set to read only whereas your VM images seem to continue to work fine.
I'm of not much help here, though. :)
 
exactly that is what happens, operating system files are put in read-only and the virtual PC does not notice anything, it continues to work without problems.
On the other hand, with the flaw present, the Proxmox GUI is not accessible and many of the tests give errors in the linux base layer. For this reason, restart the Proxmox PC and run the tests with everything working fine.
Can you tell me which area of the forum should I post my problem?
 
Last edited:
You're absolutely right here, but if this is a Linux problem, maybe asking in a Debian forum can bring more details.