Proxmox GUI showing question marks / hangs and freezes

clearwater · Sep 11, 2020

Hi there,

Over the last week, the GUI has been showing gray question marks over each LXC, VM and storage (screenshot attached). Running

Code:

systemctl pvstatd restart

brings the GUI back to normal for ~5 - 10mins, and then it reverts back to displaying question marks. Within the last two days, two more things have started to occur:

Generally overnight, Proxmox will freeze (unable to access the GUI, returns a 'page not found' error). I'm able to log in via SSH but commands hang indefinitely.
Several times, my entire network (including independent devices on my wifi network) has lost its connection to the internet and the network when I am working to troubleshoot the above issues

I suspect this is a hardware problem, potentially an early sign of hard drive failure? I am unsure, so hoping someone can help me narrow down the issue.

Thank you

Additional details:

Syslog does not show anything out of the ordinary, except the following, occuring ~every 10 minutes (this server is not part of a cluster):

Code:

Sep 10 20:59:43 pve ceph-crash[8480]: WARNING:__main__:post /var/lib/ceph/crash/2019-11-16_02:55:22.99375[...] as client.crash.pve failed: [errno 2] error connecting to the cluster

Kernel Version: Linux 5.4.60-1-pve #1 SMP PVE 5.4.60-1 (Mon, 31 Aug 2020 10:36:22 +0200)

PVE Manager Version: pve-manager/6.2-11/22fb4983

r710, dual X5675, 48GB RAM, and a PERC H700 Raid Controller
1 x 500GB SSD ZFS with compression (running Proxmox and VM images)
2 x 2 TB 7200RPM SAS drives LVM RAID0 (for storage)
1 x 2 TB 7200RPM LVM drive (for storage)
Running 2 Windows Server 2019 VMs, 2 Ubuntu VMs, 2 Alpine Linux LXCs

mira · Sep 11, 2020

What's the output of systemctl status pvestatd, systemctl status pvedaemon and systemctl status pveproxy when only question marks are shown?

clearwater · Sep 11, 2020

Hi Mira,

Thanks for your reply. Please see attached.

Thanks

clearwater · Sep 12, 2020

Just bumping this up, hoping someone can help! Thanks

mira · Sep 22, 2020

Those look fine. Could you provide your storage config (/etc/pve/storage.cfg) as well as the journal (~10 minutes before it happens until ~5min after)?

clearwater · Sep 24, 2020

Hi mira,

See below for storage.cfg output:

I will need to reboot to remove the unknown status and obtain the journal entry, so will provide that as soon as I am able.

I did have a CIFS share that I removed. When I removed that, the unknown status went away for a day, however it has since returned. And my storage always show the '?' status, even after a reboot.

Thank you

clearwater · Oct 1, 2020

clearwater said:
Hi mira,

See below for storage.cfg output:
View attachment 20041

I will need to reboot to remove the unknown status and obtain the journal entry, so will provide that as soon as I am able.

I did have a CIFS share that I removed. When I removed that, the unknown status went away for a day, however it has since returned. And my storage always show the '?' status, even after a reboot.

Thank you

Just an update, I rebooted and recorded the journal - I will send to you directly. Some of the interesting outputs include the below:

This occurs many times:

Code:

Sep 30 23:50:32 pve kernel: sd 3:0:0:0: [sdc] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Sep 30 23:50:32 pve kernel: sd 3:0:0:0: [sdc] tag#0 Sense Key : Medium Error [current]
Sep 30 23:50:32 pve kernel: sd 3:0:0:0: [sdc] tag#0 Add. Sense: Unrecovered read error
Sep 30 23:50:32 pve kernel: sd 3:0:0:0: [sdc] tag#0 CDB: Read(10) 28 00 00 00 01 30 00 00 10 00
Sep 30 23:50:32 pve kernel: blk_update_request: critical medium error, dev sdc, sector 304 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
Sep 30 23:50:32 pve kernel: sd 3:0:0:0: [sdc] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Sep 30 23:50:32 pve kernel: sd 3:0:0:0: [sdc] tag#0 Sense Key : Medium Error [current]
Sep 30 23:50:32 pve kernel: sd 3:0:0:0: [sdc] tag#0 Add. Sense: Unrecovered read error
Sep 30 23:50:32 pve kernel: sd 3:0:0:0: [sdc] tag#0 CDB: Read(10) 28 00 00 00 01 30 00 00 10 00
Sep 30 23:50:32 pve kernel: blk_update_request: critical medium error, dev sdc, sector 304 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
Sep 30 23:50:32 pve kernel: sd 3:0:0:0: [sdc] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Sep 30 23:50:32 pve kernel: sd 3:0:0:0: [sdc] tag#0 Sense Key : Medium Error [current]
Sep 30 23:50:32 pve kernel: sd 3:0:0:0: [sdc] tag#0 Add. Sense: Unrecovered read error
Sep 30 23:50:32 pve kernel: sd 3:0:0:0: [sdc] tag#0 CDB: Read(10) 28 00 00 00 01 30 00 00 10 00
Sep 30 23:50:32 pve kernel: blk_update_request: critical medium error, dev sdc, sector 304 op 0x0:(READ) flags 0x0 phys_seg 2 prio class 0
Sep 30 23:50:33 pve kernel: sd 3:0:0:0: [sdc] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Sep 30 23:50:33 pve kernel: sd 3:0:0:0: [sdc] tag#0 Sense Key : Medium Error [current]
Sep 30 23:50:33 pve kernel: sd 3:0:0:0: [sdc] tag#0 Add. Sense: Unrecovered read error
Sep 30 23:50:33 pve kernel: sd 3:0:0:0: [sdc] tag#0 CDB: Read(10) 28 00 00 00 01 30 00 00 10 00
Sep 30 23:50:33 pve kernel: blk_update_request: critical medium error, dev sdc, sector 304 op 0x0:(READ) flags 0x0 phys_seg 2 prio class 0

Then the following occurs constantly, non-stop (this node isn't part of a cluster):

Code:

Oct 01 01:34:43 pve ceph-crash[1726]: WARNING:__main__:post /var/lib/ceph/crash/2019-11-16_02:55:22.993758Z_c9eb5e75-0d5a-4c83-8755-28471689150a as client.crash.pve failed: [errno 2] error connecting to the cOct 01 01:34:43 pve ceph-crash[1726]: WARNING:__main__:post /var/lib/ceph/crash/2019-11-16_02:55:22.993758Z_c9eb5e75-0d5a-4c83-8755-28471689150a as client.crash failed: [errno 2] error connecting to the clustOct 01 01:35:00 pve systemd[1]: Starting Proxmox VE replication runner...
Oct 01 01:35:01 pve systemd[1]: pvesr.service: Succeeded.
Oct 01 01:35:01 pve systemd[1]: Started Proxmox VE replication runner.
Oct 01 01:35:13 pve ceph-crash[1726]: WARNING:__main__:post /var/lib/ceph/crash/2019-11-16_02:55:22.993758Z_c9eb5e75-0d5a-4c83-8755-28471689150a as client.admin failed:
Oct 01 01:36:00 pve systemd[1]: Starting Proxmox VE replication runner...
Oct 01 01:36:01 pve systemd[1]: pvesr.service: Succeeded.
Oct 01 01:36:01 pve systemd[1]: Started Proxmox VE replication runner.
Oct 01 01:37:00 pve systemd[1]: Starting Proxmox VE replication runner...
Oct 01 01:37:01 pve systemd[1]: pvesr.service: Succeeded.
Oct 01 01:37:01 pve systemd[1]: Started Proxmox VE replication runner.
Oct 01 01:38:00 pve systemd[1]: Starting Proxmox VE replication runner...
Oct 01 01:38:01 pve systemd[1]: pvesr.service: Succeeded.
Oct 01 01:38:01 pve systemd[1]: Started Proxmox VE replication runner.
Oct 01 01:39:00 pve systemd[1]: Starting Proxmox VE replication runner...
Oct 01 01:39:01 pve systemd[1]: pvesr.service: Succeeded.
Oct 01 01:39:01 pve systemd[1]: Started Proxmox VE replication runner.
Oct 01 01:40:00 pve systemd[1]: Starting Proxmox VE replication runner...
Oct 01 01:40:01 pve systemd[1]: pvesr.service: Succeeded.
Oct 01 01:40:01 pve systemd[1]: Started Proxmox VE replication runner.
Oct 01 01:41:00 pve systemd[1]: Starting Proxmox VE replication runner...
Oct 01 01:41:01 pve systemd[1]: pvesr.service: Succeeded.
Oct 01 01:41:01 pve systemd[1]: Started Proxmox VE replication runner.
Oct 01 01:42:00 pve systemd[1]: Starting Proxmox VE replication runner...
Oct 01 01:42:01 pve systemd[1]: pvesr.service: Succeeded.
Oct 01 01:42:01 pve systemd[1]: Started Proxmox VE replication runner.
Oct 01 01:43:00 pve systemd[1]: Starting Proxmox VE replication runner...
Oct 01 01:43:01 pve systemd[1]: pvesr.service: Succeeded.
Oct 01 01:43:01 pve systemd[1]: Started Proxmox VE replication runner.
Oct 01 01:44:00 pve systemd[1]: Starting Proxmox VE replication runner...
Oct 01 01:44:01 pve systemd[1]: pvesr.service: Succeeded.
Oct 01 01:44:01 pve systemd[1]: Started Proxmox VE replication runner.
Oct 01 01:45:00 pve systemd[1]: Starting Proxmox VE replication runner...
Oct 01 01:45:01 pve systemd[1]: pvesr.service: Succeeded.
Oct 01 01:45:01 pve systemd[1]: Started Proxmox VE replication runner.
Oct 01 01:45:13 pve ceph-crash[1726]: WARNING:__main__:post /var/lib/ceph/crash/2019-11-16_02:55:22.993758Z_c9eb5e75-0d5a-4c83-8755-28471689150a as client.crash.pve failed: [errno 2] error connecting to the cOct 01 01:45:13 pve ceph-crash[1726]: WARNING:__main__:post /var/lib/ceph/crash/2019-11-16_02:55:22.993758Z_c9eb5e75-0d5a-4c83-8755-28471689150a as client.crash failed: [errno 2] error connecting to the clustOct 01 01:45:43 pve ceph-crash[1726]: WARNING:__main__:post /var/lib/ceph/crash/2019-11-16_02:55:22.993758Z_c9eb5e75-0d5a-4c83-8755-28471689150a as client.admin failed:
Oct 01 01:46:00 pve systemd[1]: Starting Proxmox VE replication runner...
Oct 01 01:46:01 pve systemd[1]: pvesr.service: Succeeded.

Link to full journal output: https://drive.google.com/file/d/13rWzl-VDiiPA_JwY6ZvlcqOEawInPhFe/view?usp=sharing

Edit: I also discovered my ceph status is showing an error, showing that 1 filesystem is offline. I am not using ceph, how do I disable this, as it might be causing the issue.

Thanks!

mira · Oct 2, 2020

If you don't use it, you can run pveceph purge (see man pveceph for more information). This will destroy all ceph related data and configuration files.
Regarding the issue with /dev/sdc, I would check the disk.

clearwater · Oct 2, 2020

Thanks! It seems like /dev/sdc is a holdover from when I had esxi installed, but not sure how to remove it, since it doesn't show up in fdisk. Do you know how I can remove it?

Code:

sdc
├─sdc1                       vfat        ESXi            5AC3-F70B
├─sdc5                       vfat                        4E43-A96C
├─sdc6                       vfat                        4E43-A96C
├─sdc7
├─sdc8                       vfat                        4E43-A96D
└─sdc9

Dominic · Oct 9, 2020

You can try the following two commands. Double check for correct letters!

Code:

wipefs -a /dev/sdc
dd if=/dev/zero of=/dev/sdc bs=1M count=200

Search

Search

Proxmox GUI showing question marks / hangs and freezes

clearwater

Member

Attachments

mira

Proxmox Staff Member

clearwater

Member

Attachments

clearwater

Member

mira

Proxmox Staff Member

clearwater

Member

clearwater

Member

mira

Proxmox Staff Member

clearwater

Member

Dominic

Proxmox Retired Staff

We value your privacy