[SOLVED] Promox question marks on all machines and storage

ixproxmox · Dec 22, 2020

I have two proxmox machines in Cluster and I do not run any vms on shared storage (but I do backups to the shared nfs storage) and no HA or redundancy or remote storage for any vm.

Suddenly, all machines shows question mark and about each 2nd nigth, two of the machines (running really ligth load) is going down for no apparent reason. I can't connect to its console, but I can stop it and start it. Then it runs good for two days (but still with question marks on all machines and all storage).

I suspect this started due to a shared nfs-storage got full due to backups. But again, I do not run any vms off it, so can't understand how it can relate. I have made more space on the backup-nfs-device.

pve-manager/6.2-12/b287dd27

Cluster information
-------------------
Name: XX
Config Version: 2
Transport: knet
Secure auth: on

Quorum information
------------------
Date: Tue Dec 22 01:26:41 2020
Quorum provider: corosync_votequorum
Nodes: 2
Node ID: 0x00000002
Ring ID: 1.1cd
Quorate: Yes

Votequorum information
----------------------
Expected votes: 2
Highest expected: 2
Total votes: 2
Quorum: 2
Flags: Quorate

Membership information
----------------------
Nodeid Votes Name
0x00000001 1
0x00000002 1 (local)

Moayad · Dec 22, 2020

Hi,

Please post the output of pveversion -v, Have you checked your syslog or journalctl?

ixproxmox · Dec 22, 2020

Note that I run an apt upgrade/update on each server after posting here (to see if it was fixed by upgrade) - just in case there is difference.

:~# pveversion -v
proxmox-ve: 6.3-1 (running kernel: 5.4.34-1-pve)
pve-manager: 6.2-12 (running version: 6.2-12/b287dd27)
pve-kernel-helper: 6.3-3
pve-kernel-5.4: 6.2-7
pve-kernel-5.4.65-1-pve: 5.4.65-1
pve-kernel-5.4.34-1-pve: 5.4.34-2
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.0.4-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: residual config
ifupdown2: 3.0.0-1+pve3
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.16-pve1
libproxmox-acme-perl: 1.0.7
libpve-access-control: 6.1-3
libpve-apiclient-perl: 3.1-3
libpve-common-perl: 6.2-2
libpve-guest-common-perl: 3.1-3
libpve-http-server-perl: 3.1-1
libpve-storage-perl: 6.2-8
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.3-1
lxcfs: 4.0.3-pve3
novnc-pve: 1.1.0-1
proxmox-backup-client: 1.0.6-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.4-3
pve-cluster: 6.2-1
pve-container: 3.2-2
pve-docs: 6.3-1
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-3
pve-firmware: 3.1-3
pve-ha-manager: 3.1-1
pve-i18n: 2.2-2
pve-qemu-kvm: 5.1.0-7
pve-xtermjs: 4.7.0-3
qemu-server: 6.2-14
smartmontools: 7.1-pve2
spiceterm: 3.1-1
vncterm: 1.6-2
zfsutils-linux: 0.8.5-pve1

Moayad · Dec 22, 2020

please consider to upgrade to the latest version of Proxmox VE[1], have you notice any error in Syslog or journalctl?

Also, I suggest you install QDivice[2] because you have only two votes on your cluster

[1] https://forum.proxmox.com/threads/proxmox-ve-6-3-available.79687/
[2] https://pve.proxmox.com/pve-docs/chapter-pvecm.html#_corosync_external_vote_support

ixproxmox · Dec 22, 2020

I can't say that I noticed anything the latest months, I have checked both those logs. It is also two machines at same time, so it must be something common I think.

I ran update to 6.3 on both:
proxmox-ve: 6.3-1 (running kernel: 5.4.34-1-pve)
pve-manager: 6.3-3 (running version: 6.3-3/eee5f901)
pve-kernel-5.4: 6.3-3
pve-kernel-helper: 6.3-3
pve-kernel-5.4.78-2-pve: 5.4.78-2
pve-kernel-5.4.65-1-pve: 5.4.65-1
pve-kernel-5.4.34-1-pve: 5.4.34-2
ceph-fuse: 12.2.11+dfsg1-2.1+b1
....

Didn't help anything it seems. Still question marks everywhere. But all machines are running, but it doesn't show their names in the list (it shows if I go into one vm in detailed view). It must be some process or something that is stuck... My guess is something related to nfs.

ixproxmox · Jan 19, 2021

Is there anything I can do to get things up and running here? I have this questions mark everywhere still and I have upgraded both nodes several times. I assume it doesn't have anything to do with the cluster, since I have earlier had this issue on single Proxmox server. All the servers work, but I can't see the names of the machines unless I go into each one.

ixproxmox · Jan 19, 2021

service pve-cluster stop
service corosync stop
service pvestatd stop
service pveproxy stop
service pvedaemon stop

service pve-cluster start
service corosync start
service pvestatd start
service pveproxy start
service pvedaemon start

Tried this before, but now it actually worked. Got one node up and working, testing it on 2nd node now...

ixproxmox · Jan 19, 2021

And also worked on 2nd node. Finally all is working!

cleitonpena · Jan 17, 2024

It didn't work for me. Interrogations continue in all VMs and in the storages.

cortes · Mar 16, 2024

It worked for me.
Researching the issue, I came an across a post where the poster had issues connecting to his NAS from Proxmox. I had been reworking the IP connections to my NAS so I may have had the same issue.

Everything was back up, but the backup job looked to be hung. I clicked on the mount, clicked on the files tab, it timed out, and all the question marks came back. I reset them again and then disabled the backup job to stop it from restarting every time I did the reset. I then just had question marks on the storage items. It hung again while I was checking them. After one more reset, I was able to delete the backup NFS storage item. All the question marks are gone. Unfortunately, I get an error trying to remount the share. I have an SMB share to the same NAS that works fine.

d_singh · Aug 1, 2024

ixproxmox said:
service pve-cluster stop
service corosync stop
service pvestatd stop
service pveproxy stop
service pvedaemon stop

service pve-cluster start
service corosync start
service pvestatd start
service pveproxy start
service pvedaemon start

Tried this before, but now it actually worked. Got one node up and working, testing it on 2nd node now...

It Worked very well for some time but after some time same problem occur again all the nodes with all storage and pool in the cluster are showing question mark.

Please check what can be the root problem.
@ixproxmox @Moayad @fabian

ITexports · Oct 26, 2024

We had this problem today as well.

Background: we have two nodes in the cluster. An older one with usb 2 ports and a newer one with usb 3. We do weekly offsite backups with external usb hard drives. The older node 1 has greater resources than the newer node 2 so has a higher and heavier VM load. But as node 1 only has usb 2, backups take about 12 hours at about 40 mB/s. Where as node 2 is able to write at about 115 mB/s.

Last week I created a lxc container on node 2 to act as NFS server so that I can backup both nodes in a single night instead of backing up one node the first night, move the drive to the second node and back that one up in a second night. Added benefit, node 1 would be able to write at 115 mB/s over the 2 gbps LAGed nics and usb 3 connection of the other node.

The problem: yesterday I did a backup from node 1 and was somewhat disappointed about transfer speeds but thought to myself "there are about 2 dozen explanations for this, let's see how the local backup goes over nfs".

Today I tried to backup node 2 over the NFS share of the locally mounted container. It got to about 4% before the backup got hung and the GUI crashed. After a F5 everything was question marked. Same results connecting to either node.

If I tried to click on the nfs share it timed out with:

Code:

unable to activate storage 'nfs01' - directory '/mnt/pve/nfs01' does not exist or is unreachable (500)

Next I tried:
service pvestatd restart
on both nodes. No change. I disabled the NFS share under cluster>storage and retried pvestatd restart. Still no change.

I then tried all of the commands suggested by ixproxmox on both servers. Node1 came back. Node2 remained all greyed out.

I tried to reboot node 2 but it didn't. Likely because of the crashed backup process. reboot -f was equally ineffective (tasks showed bulk shutdown all VMs and containers but nothing happened). As a last resort I issued an ipmi restart command with the obvious risk to all VMs involved.

This obviously rebooted the server and everything went back to normal (fortunately).

At this point I feel very disinclined to EVER try that again. Yet lots of folks report success running LXC based NFS servers and NFS per se seems to be rock solid as it's been in use for like forever. Also, googling around this problem doesn't seem to be limited to NFS. Encounters with the grey question marks seem to date back to about 2020.

Does anyone have any pointers as to what could be causing this?

ITexports · Oct 26, 2024

By the way here's my pveversion:

Code:

proxmox-ve: 8.2.0 (running kernel: 6.8.12-2-pve)
pve-manager: 8.2.7 (running version: 8.2.7/3e0176e6bb2ade3b)
proxmox-kernel-helper: 8.1.0
pve-kernel-5.15: 7.4-11
proxmox-kernel-6.8: 6.8.12-2
proxmox-kernel-6.8.12-2-pve-signed: 6.8.12-2
proxmox-kernel-6.8.8-4-pve-signed: 6.8.8-4
proxmox-kernel-6.8.4-3-pve-signed: 6.8.4-3
proxmox-kernel-6.8.4-2-pve-signed: 6.8.4-2
proxmox-kernel-6.5.13-6-pve-signed: 6.5.13-6
proxmox-kernel-6.5: 6.5.13-6
proxmox-kernel-6.5.13-5-pve-signed: 6.5.13-5
proxmox-kernel-6.5.13-1-pve-signed: 6.5.13-1
pve-kernel-5.4: 6.4-18
pve-kernel-5.15.143-1-pve: 5.15.143-1
pve-kernel-5.15.131-2-pve: 5.15.131-3
pve-kernel-5.15.131-1-pve: 5.15.131-2
pve-kernel-5.15.126-1-pve: 5.15.126-1
pve-kernel-5.15.116-1-pve: 5.15.116-1
pve-kernel-5.15.108-1-pve: 5.15.108-2
pve-kernel-5.15.107-2-pve: 5.15.107-2
pve-kernel-5.15.104-1-pve: 5.15.104-2
pve-kernel-5.4.189-2-pve: 5.4.189-2
pve-kernel-5.4.106-1-pve: 5.4.106-1
ceph-fuse: 16.2.11+ds-2
corosync: 3.1.7-pve3
criu: 3.17.1-2
glusterfs-client: 10.3-5
ifupdown: not correctly installed
ifupdown2: 3.2.0-1+pmx9
ksm-control-daemon: 1.5-1
libjs-extjs: 7.0.0-4
libknet1: 1.28-pve1
libproxmox-acme-perl: 1.5.1
libproxmox-backup-qemu0: 1.4.1
libproxmox-rs-perl: 0.3.4
libpve-access-control: 8.1.4
libpve-apiclient-perl: 3.3.2
libpve-cluster-api-perl: 8.0.7
libpve-cluster-perl: 8.0.7
libpve-common-perl: 8.2.3
libpve-guest-common-perl: 5.1.4
libpve-http-server-perl: 5.1.1
libpve-network-perl: 0.9.8
libpve-rs-perl: 0.8.10
libpve-storage-perl: 8.2.5
libqb0: 1.0.5-1
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 6.0.0-1
lxcfs: 6.0.0-pve2
novnc-pve: 1.4.0-4
proxmox-backup-client: 3.2.7-1
proxmox-backup-file-restore: 3.2.7-1
proxmox-firewall: 0.5.0
proxmox-kernel-helper: 8.1.0
proxmox-mail-forward: 0.2.3
proxmox-mini-journalreader: 1.4.0
proxmox-offline-mirror-helper: 0.6.7
proxmox-widget-toolkit: 4.2.3
pve-cluster: 8.0.7
pve-container: 5.2.0
pve-docs: 8.2.3
pve-edk2-firmware: 4.2023.08-4
pve-esxi-import-tools: 0.7.2
pve-firewall: 5.0.7
pve-firmware: 3.13-2
pve-ha-manager: 4.0.5
pve-i18n: 3.2.3
pve-qemu-kvm: 8.1.5-6
pve-xtermjs: 5.3.0-3
qemu-server: 8.2.4
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.2.6-pve1

Same on both nodes.

Search

Search

[SOLVED] Promox question marks on all machines and storage

ixproxmox

Renowned Member

Moayad

Proxmox Staff Member

ixproxmox

Renowned Member

Moayad

Proxmox Staff Member

ixproxmox

Renowned Member

ixproxmox

Renowned Member

ixproxmox

Renowned Member

ixproxmox

Renowned Member

cleitonpena

New Member

cortes

New Member

d_singh

New Member

ITexports

Member

Attachments

ITexports

Member

We value your privacy