Job status pve question

Maksimus · Mar 29, 2023

We connected the storage to the server (hardware) via a fiberchannel, set up a multipass, but any time you try to work with the storage, proxmox goes into question status.
But at the same time, VMs continue to work normally inside, but none of their proxmos tasks work, ssh is available, the console is also available via gui, but any other section is not available by time out error

The contents of the file /etc/multipass.conf

Code:

defaults {
    user_friendly_names yes
    find_multipaths yes
}

To get to the normal status of the green checkmark, you have to restart the server itself.

bbgeek17 · Mar 29, 2023

You need to start with examining the logs via "journalctl -n 1000" and include information about your "pveversion -verbose".

Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox

Maksimus · Mar 29, 2023

bbgeek17 said:
You need to start with examining the logs via "journalctl -n 1000" and include information about your "pveversion -verbose".

Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox

journalctl -n 1000 in attachment

Code:

root@HOST800:~# pveversion -verbose
proxmox-ve: 7.4-1 (running kernel: 5.15.102-1-pve)
pve-manager: 7.4-3 (running version: 7.4-3/9002ab8a)
pve-kernel-5.15: 7.3-3
pve-kernel-5.15.102-1-pve: 5.15.102-1
pve-kernel-5.15.85-1-pve: 5.15.85-1
pve-kernel-5.15.74-1-pve: 5.15.74-1
ceph-fuse: 15.2.17-pve1
corosync: 3.1.7-pve1
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.24-pve2
libproxmox-acme-perl: 1.4.4
libproxmox-backup-qemu0: 1.3.1-1
libproxmox-rs-perl: 0.2.1
libpve-access-control: 7.4-2
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.3-4
libpve-guest-common-perl: 4.2-4
libpve-http-server-perl: 4.2-1
libpve-rs-perl: 0.7.5
libpve-storage-perl: 7.4-2
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 5.0.2-2
lxcfs: 5.0.3-pve1
novnc-pve: 1.4.0-1
proxmox-backup-client: 2.3.3-1
proxmox-backup-file-restore: 2.3.3-1
proxmox-kernel-helper: 7.4-1
proxmox-mail-forward: 0.1.1-1
proxmox-mini-journalreader: 1.3-1
proxmox-widget-toolkit: 3.6.4
pve-cluster: 7.3-3
pve-container: 4.4-3
pve-docs: 7.4-2
pve-edk2-firmware: 3.20230228-1
pve-firewall: 4.3-1
pve-firmware: 3.6-4
pve-ha-manager: 3.6.0
pve-i18n: 2.11-1
pve-qemu-kvm: 7.2.0-8
pve-xtermjs: 4.16.0-1
qemu-server: 7.4-3
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.8.0~bpo11+3
vncterm: 1.7-1
zfsutils-linux: 2.1.9-pve1

bbgeek17 · Mar 29, 2023

Your cluster seems to be either misconfigured or in some sort of bad state.
The -n1000 is not enough to find when the issue started, you should examine further back in history and/or reboot to start a new cycle.

I would also recommend employing a firewall if you must expose your PVE to internet.

At some point you may need to run "pvecm updatecerts --force" but I think its too early to say when.

good luck

Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox

Maksimus · Mar 29, 2023

tell me how to restart proxmos without restarting the server itself?

bbgeek17 · Mar 29, 2023

There are a few methods floating on the web, for example https://easycomputertutorial.com/restart-proxmox-services/
However, I am not endorsing doing that - too many unknowns about your environment.
Output from following commands might be useful, but by no means decisive:
pvecm nodes
pvecm status

you only mentioned a single node, is it? Check status of "systemctl|grep pve" is anything failed?. The symptoms that you have provided are too generic to determine what is broken with any certainty.

Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox

Maksimus · Mar 30, 2023

Yes, one node without a cluster

pille99 · Mar 30, 2023

it looks like the ssh is the issue, at least one of them.
do an ssh servername, from and to each server, confirm the connection with yes, than the key will be written in the authxxxx file. the next time you login, no password or userinput is required. plz check that

you have a lot
r 29 15:47:35 HOST800 sshd[1780111]: pam_unix(sshd:auth): check pass; user unknown
Mar 29 15:47:35 HOST800 sshd[1780111]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=public ip
Mar 29 15:47:37 HOST800 sshd[1780111]: Failed password for invalid user admin from public ip port 2512 ssh2

something seems not right with the user. you use the user admin

Maksimus · Mar 30, 2023

pille99 said:
it looks like the ssh is the issue, at least one of them.
do an ssh servername, from and to each server, confirm the connection with yes, than the key will be written in the authxxxx file. the next time you login, no password or userinput is required. plz check that

you have a lot
r 29 15:47:35 HOST800 sshd[1780111]: pam_unix(sshd:auth): check pass; user unknown
Mar 29 15:47:35 HOST800 sshd[1780111]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=public ip
Mar 29 15:47:37 HOST800 sshd[1780111]: Failed password for invalid user admin from public ip port 2512 ssh2

something seems not right with the user. you use the user admin

we do not use the admin user, check ssh to which servers? the server is alone and without a cluster (storage is connected to it via a fiberchannel)

Maksimus · Mar 30, 2023

This morning I started testing again, created a lvm disk on the storage and decided to migrate the VM to it, but at the time of migration everything hung.
journal2.txt immediately after the task hangs (more precisely, after the gui stopped updating information)
journal3.txt after questions appeared in gui

Search

Search

Job status pve question

Maksimus

Member

bbgeek17

Distinguished Member

Maksimus

Member

Attachments

bbgeek17

Distinguished Member

Maksimus

Member

bbgeek17

Distinguished Member

Maksimus

Member

pille99

Active Member

Maksimus

Member

Maksimus

Member

Attachments