[SOLVED] Promox question marks on all machines and storage

ixproxmox

Renowned Member
Nov 25, 2015
76
2
73
I have two proxmox machines in Cluster and I do not run any vms on shared storage (but I do backups to the shared nfs storage) and no HA or redundancy or remote storage for any vm.

Suddenly, all machines shows question mark and about each 2nd nigth, two of the machines (running really ligth load) is going down for no apparent reason. I can't connect to its console, but I can stop it and start it. Then it runs good for two days (but still with question marks on all machines and all storage).

I suspect this started due to a shared nfs-storage got full due to backups. But again, I do not run any vms off it, so can't understand how it can relate. I have made more space on the backup-nfs-device.

pve-manager/6.2-12/b287dd27

Cluster information
-------------------
Name: XX
Config Version: 2
Transport: knet
Secure auth: on

Quorum information
------------------
Date: Tue Dec 22 01:26:41 2020
Quorum provider: corosync_votequorum
Nodes: 2
Node ID: 0x00000002
Ring ID: 1.1cd
Quorate: Yes

Votequorum information
----------------------
Expected votes: 2
Highest expected: 2
Total votes: 2
Quorum: 2
Flags: Quorate

Membership information
----------------------
Nodeid Votes Name
0x00000001 1
0x00000002 1 (local)
 
Hi,

Please post the output of pveversion -v, Have you checked your syslog or journalctl?
 
Note that I run an apt upgrade/update on each server after posting here (to see if it was fixed by upgrade) - just in case there is difference.

:~# pveversion -v
proxmox-ve: 6.3-1 (running kernel: 5.4.34-1-pve)
pve-manager: 6.2-12 (running version: 6.2-12/b287dd27)
pve-kernel-helper: 6.3-3
pve-kernel-5.4: 6.2-7
pve-kernel-5.4.65-1-pve: 5.4.65-1
pve-kernel-5.4.34-1-pve: 5.4.34-2
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.0.4-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: residual config
ifupdown2: 3.0.0-1+pve3
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.16-pve1
libproxmox-acme-perl: 1.0.7
libpve-access-control: 6.1-3
libpve-apiclient-perl: 3.1-3
libpve-common-perl: 6.2-2
libpve-guest-common-perl: 3.1-3
libpve-http-server-perl: 3.1-1
libpve-storage-perl: 6.2-8
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.3-1
lxcfs: 4.0.3-pve3
novnc-pve: 1.1.0-1
proxmox-backup-client: 1.0.6-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.4-3
pve-cluster: 6.2-1
pve-container: 3.2-2
pve-docs: 6.3-1
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-3
pve-firmware: 3.1-3
pve-ha-manager: 3.1-1
pve-i18n: 2.2-2
pve-qemu-kvm: 5.1.0-7
pve-xtermjs: 4.7.0-3
qemu-server: 6.2-14
smartmontools: 7.1-pve2
spiceterm: 3.1-1
vncterm: 1.6-2
zfsutils-linux: 0.8.5-pve1
 
I can't say that I noticed anything the latest months, I have checked both those logs. It is also two machines at same time, so it must be something common I think.

I ran update to 6.3 on both:
proxmox-ve: 6.3-1 (running kernel: 5.4.34-1-pve)
pve-manager: 6.3-3 (running version: 6.3-3/eee5f901)
pve-kernel-5.4: 6.3-3
pve-kernel-helper: 6.3-3
pve-kernel-5.4.78-2-pve: 5.4.78-2
pve-kernel-5.4.65-1-pve: 5.4.65-1
pve-kernel-5.4.34-1-pve: 5.4.34-2
ceph-fuse: 12.2.11+dfsg1-2.1+b1
....

Didn't help anything it seems. Still question marks everywhere. But all machines are running, but it doesn't show their names in the list (it shows if I go into one vm in detailed view). It must be some process or something that is stuck... My guess is something related to nfs.
 
Last edited:
Is there anything I can do to get things up and running here? I have this questions mark everywhere still and I have upgraded both nodes several times. I assume it doesn't have anything to do with the cluster, since I have earlier had this issue on single Proxmox server. All the servers work, but I can't see the names of the machines unless I go into each one.
 
service pve-cluster stop
service corosync stop
service pvestatd stop
service pveproxy stop
service pvedaemon stop

service pve-cluster start
service corosync start
service pvestatd start
service pveproxy start
service pvedaemon start

Tried this before, but now it actually worked. Got one node up and working, testing it on 2nd node now...
 
  • Like
Reactions: Zaman and jobine23
It worked for me.
Researching the issue, I came an across a post where the poster had issues connecting to his NAS from Proxmox. I had been reworking the IP connections to my NAS so I may have had the same issue.

Everything was back up, but the backup job looked to be hung. I clicked on the mount, clicked on the files tab, it timed out, and all the question marks came back. I reset them again and then disabled the backup job to stop it from restarting every time I did the reset. I then just had question marks on the storage items. It hung again while I was checking them. After one more reset, I was able to delete the backup NFS storage item. All the question marks are gone. Unfortunately, I get an error trying to remount the share. I have an SMB share to the same NAS that works fine.
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!