Proxmox Grey Question Mark

CGtheAnnoying

New Member
Jun 24, 2023
2
0
1
Dears,

Few months back I installed Proxmox VE 7.4.3 and everything is great.

Since last week my host gets stuck with "Grey question mark" and I was unable to find out what is the issue, and the only way to solve this is by just rebooting the entire host.

Nothing useful to read in the syslogs and I'm really tired of troubleshooting and I need some professional guidance. Note that I can access my VM via SSH & RDP normally.

My setup as the below note that I do not have ZFS.
This is none proudcation setup its only for homelab

CPU: Ryzen 7 2700X
Motherboard: ASUS EX-A320M-GAMING
RAM: 20GB (None ECC)
GPU: GeForce 8400 GS
Storage:
NVme 256GB (Proxmox OS)
500GB SSD (ThinProv for VMs)

I have 4 CT running on Ubuntu with only 1GB of RAM
and 1 windows VM with 3GB RAM and 2 cores.

1687626694701.png
 

Attachments

Code:
Jun 24 19:42:33 HV2700 kernel: pveproxy worker[1152442]: segfault at 17a000146d1 ip 000055e48b04b7e5 sp 00007ffe2c7f2890 error 4 in perl[55e48af7d000+185000]
Jun 24 19:42:33 HV2700 kernel: Code: 25 ff c0 00 00 3d 09 80 00 00 75 d9 8b 53 08 85 d2 74 d2 48 8b 43 10 48 85 c0 74 c9 83 c2 01 89 53 08 48 8b 30 48 85 f6 74 0a <f6> 46 0e 10 0f 85 c1 02 00 00 48 8b 70 28 48 85 f6 74 0a f6 46 0e
Jun 24 19:42:33 HV2700 pveproxy[1428]: worker 1152442 finished
Jun 24 19:42:33 HV2700 pveproxy[1428]: starting 1 worker(s)
Jun 24 19:42:33 HV2700 pveproxy[1428]: worker 1164599 started
Jun 24 19:44:33 HV2700 pvedaemon[1140535]: <root@pam> successful auth for user 'root@pam'
Jun 24 19:44:33 HV2700 pveproxy[1428]: worker 1148521 finished
Jun 24 19:44:33 HV2700 pveproxy[1428]: starting 1 worker(s)
Jun 24 19:44:33 HV2700 pveproxy[1428]: worker 1165920 started
Jun 24 19:44:34 HV2700 pveproxy[1165919]: worker exit
Jun 24 19:46:31 HV2700 systemd[1]: Starting Daily apt download activities...
Jun 24 19:46:31 HV2700 systemd[1]: apt-daily.service: Succeeded.
Jun 24 19:46:31 HV2700 systemd[1]: Finished Daily apt download activities.
Jun 24 19:51:35 HV2700 kernel: pveproxy worker[1165920]: segfault at 4008 ip 000055e48b034f90 sp 00007ffe2c7f2600 error 4 in perl[55e48af7d000+185000]
Jun 24 19:51:35 HV2700 kernel: Code: 76 01 00 00 48 8b 52 10 49 89 f7 89 cd 48 8d 04 c2 4c 8b 28 48 89 44 24 20 4d 85 ed 0f 84 94 00 00 00 4c 89 eb 0f 1f 44 00 00 <48> 8b 43 08 39 28 75 78 48 63 48 04 4c 39 e1 75 6f 48 8d 78 08 4c
Jun 24 19:51:35 HV2700 pveproxy[1428]: worker 1165920 finished
Jun 24 19:51:35 HV2700 pveproxy[1428]: starting 1 worker(s)
Jun 24 19:51:35 HV2700 pveproxy[1428]: worker 1170549 started
Jun 24 19:55:33 HV2700 pvedaemon[1154549]: <root@pam> successful auth for user 'root@pam'
Jun 24 19:59:33 HV2700 pvedaemon[1140535]: <root@pam> successful auth for user 'root@pam'
Jun 24 20:03:22 HV2700 smartd[1077]: Device: /dev/sdb [SAT], SMART Usage Attribute: 190 Airflow_Temperature_Cel changed from 61 to 62
Jun 24 20:03:22 HV2700 smartd[1077]: Device: /dev/sdb [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 39 to 38
Jun 24 20:03:22 HV2700 smartd[1077]: Device: /dev/nvme0, number of Error Log entries increased from 2966554 to 2970010


What is kernel segfault?


What Is Segmentation Fault? In a nutshell, segmentation fault refers to errors due to a process's attempts to access memory regions that it shouldn't. When the kernel detects odd memory access behaviors, it terminates the process issuing a segmentation violation signal (SIGSEGV

likely bad hardware, constantly increasing errors on your nvme device, possibly bad memory, may be a combination of above.
In short the question mark means that a process that collects state is having trouble doing so. Log entry showing segfaults, explains why



Blockbridge: Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
  • Like
Reactions: CGtheAnnoying
Code:
Jun 24 19:42:33 HV2700 kernel: pveproxy worker[1152442]: segfault at 17a000146d1 ip 000055e48b04b7e5 sp 00007ffe2c7f2890 error 4 in perl[55e48af7d000+185000]
Jun 24 19:42:33 HV2700 kernel: Code: 25 ff c0 00 00 3d 09 80 00 00 75 d9 8b 53 08 85 d2 74 d2 48 8b 43 10 48 85 c0 74 c9 83 c2 01 89 53 08 48 8b 30 48 85 f6 74 0a <f6> 46 0e 10 0f 85 c1 02 00 00 48 8b 70 28 48 85 f6 74 0a f6 46 0e
Jun 24 19:42:33 HV2700 pveproxy[1428]: worker 1152442 finished
Jun 24 19:42:33 HV2700 pveproxy[1428]: starting 1 worker(s)
Jun 24 19:42:33 HV2700 pveproxy[1428]: worker 1164599 started
Jun 24 19:44:33 HV2700 pvedaemon[1140535]: <root@pam> successful auth for user 'root@pam'
Jun 24 19:44:33 HV2700 pveproxy[1428]: worker 1148521 finished
Jun 24 19:44:33 HV2700 pveproxy[1428]: starting 1 worker(s)
Jun 24 19:44:33 HV2700 pveproxy[1428]: worker 1165920 started
Jun 24 19:44:34 HV2700 pveproxy[1165919]: worker exit
Jun 24 19:46:31 HV2700 systemd[1]: Starting Daily apt download activities...
Jun 24 19:46:31 HV2700 systemd[1]: apt-daily.service: Succeeded.
Jun 24 19:46:31 HV2700 systemd[1]: Finished Daily apt download activities.
Jun 24 19:51:35 HV2700 kernel: pveproxy worker[1165920]: segfault at 4008 ip 000055e48b034f90 sp 00007ffe2c7f2600 error 4 in perl[55e48af7d000+185000]
Jun 24 19:51:35 HV2700 kernel: Code: 76 01 00 00 48 8b 52 10 49 89 f7 89 cd 48 8d 04 c2 4c 8b 28 48 89 44 24 20 4d 85 ed 0f 84 94 00 00 00 4c 89 eb 0f 1f 44 00 00 <48> 8b 43 08 39 28 75 78 48 63 48 04 4c 39 e1 75 6f 48 8d 78 08 4c
Jun 24 19:51:35 HV2700 pveproxy[1428]: worker 1165920 finished
Jun 24 19:51:35 HV2700 pveproxy[1428]: starting 1 worker(s)
Jun 24 19:51:35 HV2700 pveproxy[1428]: worker 1170549 started
Jun 24 19:55:33 HV2700 pvedaemon[1154549]: <root@pam> successful auth for user 'root@pam'
Jun 24 19:59:33 HV2700 pvedaemon[1140535]: <root@pam> successful auth for user 'root@pam'
Jun 24 20:03:22 HV2700 smartd[1077]: Device: /dev/sdb [SAT], SMART Usage Attribute: 190 Airflow_Temperature_Cel changed from 61 to 62
Jun 24 20:03:22 HV2700 smartd[1077]: Device: /dev/sdb [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 39 to 38
Jun 24 20:03:22 HV2700 smartd[1077]: Device: /dev/nvme0, number of Error Log entries increased from 2966554 to 2970010


What is kernel segfault?


What Is Segmentation Fault? In a nutshell, segmentation fault refers to errors due to a process's attempts to access memory regions that it shouldn't. When the kernel detects odd memory access behaviors, it terminates the process issuing a segmentation violation signal (SIGSEGV

likely bad hardware, constantly increasing errors on your nvme device, possibly bad memory, may be a combination of above.
In short the question mark means that a process that collects state is having trouble doing so. Log entry showing segfaults, explains why



Blockbridge: Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
Actually I think you might be right; 2 of my RAM sticks gives me BSOD when I ran them on Physical windows however on Linux they wont crash instantly so I thought "maybe" my windows doesnt like my RAM and its working fine on my Linux.

Let me give that a try and swap them with healthy RAMs.

Thanks for the headsup :).
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!