[SOLVED] Proxmox mit Fragezeichen, Dateisystem lvs 'verschluckt'

chrigiboy

Well-Known Member
Nov 6, 2018
93
1
48
Hallo,
bei meinem Cluster hat 1 Server immer ein Fragezeichen. Alle anderen Server scheinen zu funktionieren. Wenn ich pvestatd neustarte, ist der Server wieder online für ca. 5 Minuten. Der Neustart dauert statt bei allen anderen Server nicht nur wenige Sekunden sondern 3-4 Minuten. Die 2 lokalen Disks und 1 Netzwerkspeicher werden aber nie online angezeigt. Den Netzwerkspeicherpfad kann ich problemlos in der Shell öffnen # ls /mnt/pve/nas/.
Habt Ihr einen Tipp was ich tun kann, damit der Server wieder komplett da ist? Den Server kann ich leider nicht neu starten.
Zudem hat sich ein Backup aufgehängt. Lokal sehe ich den Prozess nicht, aber der Cluster zeigt noch auf dem betroffenen Server ein laufendes Backup an. Wenn ich es öffne zeigt es nur NA/NA/NA an.

-------------------
Name: FC1
Config Version: 21
Transport: knet
Secure auth: on

Quorum information
------------------
Date: Tue Mar 1 09:37:27 2022
Quorum provider: corosync_votequorum
Nodes: 11
Node ID: 0x0000000a
Ring ID: 1.7451a
Quorate: Yes

Votequorum information
----------------------
Expected votes: 11
Highest expected: 11
Total votes: 11
Quorum: 6
Flags: Quorate

Membership information
----------------------
Nodeid Votes Name
0x00000001 1 192.168.50.60
0x00000002 1 192.168.50.62
0x00000003 1 192.168.50.59
0x00000004 1 192.168.50.57
0x00000005 1 192.168.50.64
0x00000006 1 192.168.50.65
0x00000007 1 192.168.50.66
0x00000008 1 192.168.50.51
0x00000009 1 192.168.50.68
0x0000000a 1 192.168.50.61 (local)
0x0000000b 1 192.168.50.67




# pveversion -v
proxmox-ve: 7.1-1 (running kernel: 5.11.22-5-pve)
pve-manager: 7.1-10 (running version: 7.1-10/6ddebafe)
pve-kernel-helper: 7.1-8
pve-kernel-5.13: 7.1-6
pve-kernel-5.11: 7.0-10
pve-kernel-5.4: 6.4-4
pve-kernel-5.0: 6.0-11
pve-kernel-5.13.19-3-pve: 5.13.19-7
pve-kernel-5.11.22-7-pve: 5.11.22-12
pve-kernel-5.11.22-5-pve: 5.11.22-10
pve-kernel-5.11.22-2-pve: 5.11.22-4
pve-kernel-5.4.124-1-pve: 5.4.124-2
pve-kernel-5.4.114-1-pve: 5.4.114-1
pve-kernel-5.0.21-5-pve: 5.0.21-10
pve-kernel-5.0.15-1-pve: 5.0.15-1
ceph-fuse: 15.2.15-pve1
corosync: 3.1.5-pve2
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown: residual config
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.22-pve2
libproxmox-acme-perl: 1.4.1
libproxmox-backup-qemu0: 1.2.0-1
libpve-access-control: 7.1-6
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.1-2
libpve-guest-common-perl: 4.0-3
libpve-http-server-perl: 4.1-1
libpve-storage-perl: 7.0-15
libqb0: 1.0.5-1
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 4.0.11-1
lxcfs: 4.0.11-pve1
novnc-pve: 1.3.0-1
proxmox-backup-client: 2.1.5-1
proxmox-backup-file-restore: 2.1.5-1
proxmox-mini-journalreader: 1.3-1
proxmox-widget-toolkit: 3.4-5
pve-cluster: 7.1-3
pve-container: 4.1-3
pve-docs: 7.1-2
pve-edk2-firmware: 3.20210831-2
pve-firewall: 4.2-5
pve-firmware: 3.3-4
pve-ha-manager: 3.3-3
pve-i18n: 2.6-2
pve-qemu-kvm: 6.1.1-1
pve-xtermjs: 4.16.0-1
qemu-server: 7.1-4
smartmontools: 7.2-pve2
spiceterm: 3.2-2
swtpm: 0.7.0~rc1+2
vncterm: 1.7-1
zfsutils-linux: 2.1.2-pve1


● pvestatd.service - PVE Status Daemon
Loaded: loaded (/lib/systemd/system/pvestatd.service; enabled; vendor preset: enabled)
Active: failed (Result: timeout) since Tue 2022-03-01 09:30:50 CET; 9min ago
Process: 3437406 ExecStart=/usr/bin/pvestatd start (code=exited, status=0/SUCCESS)
Process: 3437653 ExecStop=/usr/bin/pvestatd stop (code=exited, status=0/SUCCESS)
Main PID: 3437407 (code=exited, status=0/SUCCESS)
CPU: 3.080s

Mar 01 09:28:31 px11 systemd[1]: Started PVE Status Daemon.
Mar 01 09:29:18 px11 systemd[1]: Stopping PVE Status Daemon...
Mar 01 09:29:19 px11 pvestatd[3437407]: received signal TERM
Mar 01 09:29:19 px11 pvestatd[3437407]: server closing
Mar 01 09:29:19 px11 pvestatd[3437407]: server stopped
Mar 01 09:30:50 px11 systemd[1]: pvestatd.service: State 'stop-sigterm' timed out. Killing.
Mar 01 09:30:50 px11 systemd[1]: pvestatd.service: Killing process 3437536 (vgs) with signal SIGKILL.
Mar 01 09:30:50 px11 systemd[1]: pvestatd.service: Failed with result 'timeout'.
Mar 01 09:30:50 px11 systemd[1]: Stopped PVE Status Daemon.
Mar 01 09:30:50 px11 systemd[1]: pvestatd.service: Consumed 3.080s CPU time.



syslog:
Feb 28 08:45:27 px11 systemd[1]: pvestatd.service: State 'stop-sigterm' timed out. Killing.
Feb 28 08:45:27 px11 systemd[1]: pvestatd.service: Killing process 2965096 (vgs) with signal SIGKILL.
Feb 28 08:45:27 px11 systemd[1]: pvestatd.service: Failed with result 'timeout'.
Feb 28 08:45:27 px11 systemd[1]: pvestatd.service: Consumed 6.351s CPU time.
Feb 28 08:45:28 px11 pvestatd[3120017]: starting server
Feb 28 09:42:03 px11 pvestatd[3120017]: received signal TERM
Feb 28 09:42:03 px11 pvestatd[3120017]: server closing
Feb 28 09:42:03 px11 pvestatd[3120017]: server stopped
Feb 28 09:43:34 px11 systemd[1]: pvestatd.service: State 'stop-sigterm' timed out. Killing.
Feb 28 09:43:34 px11 systemd[1]: pvestatd.service: Killing process 3120121 (vgs) with signal SIGKILL.
Feb 28 09:43:34 px11 systemd[1]: pvestatd.service: Failed with result 'timeout'.
Feb 28 09:43:34 px11 systemd[1]: pvestatd.service: Consumed 3.404s CPU time.
Feb 28 09:44:50 px11 pvestatd[3133875]: starting server
Feb 28 09:52:26 px11 pvestatd[3133875]: received signal TERM
Feb 28 09:52:26 px11 pvestatd[3133875]: server closing
Feb 28 09:52:26 px11 pvestatd[3133875]: server stopped
Feb 28 09:53:57 px11 systemd[1]: pvestatd.service: State 'stop-sigterm' timed out. Killing.
Feb 28 09:53:57 px11 systemd[1]: pvestatd.service: Killing process 3133985 (vgs) with signal SIGKILL.
Feb 28 09:53:57 px11 systemd[1]: pvestatd.service: Failed with result 'timeout'.
Feb 28 09:53:57 px11 systemd[1]: pvestatd.service: Consumed 3.161s CPU time.
Feb 28 09:53:58 px11 pvestatd[3136129]: starting server
Feb 28 10:07:04 px11 pvestatd[3140143]: send HUP to 3136129
Feb 28 10:07:04 px11 pvestatd[3136129]: received signal HUP
Feb 28 10:08:25 px11 pvestatd[3136129]: received signal TERM
Feb 28 10:08:25 px11 pvestatd[3136129]: server closing
Feb 28 10:08:25 px11 pvestatd[3136129]: server stopped
Feb 28 10:09:56 px11 systemd[1]: pvestatd.service: State 'stop-sigterm' timed out. Killing.
Feb 28 10:09:56 px11 systemd[1]: pvestatd.service: Killing process 3136263 (vgs) with signal SIGKILL.
Feb 28 10:09:56 px11 systemd[1]: pvestatd.service: Failed with result 'timeout'.
Feb 28 10:09:56 px11 systemd[1]: pvestatd.service: Consumed 4.355s CPU time.
Feb 28 10:09:57 px11 pvestatd[3141128]: starting server
Mar 1 08:41:58 px11 pvestatd[3141128]: received signal TERM
Mar 1 08:41:58 px11 pvestatd[3141128]: server closing
Mar 1 08:41:58 px11 pvestatd[3141128]: server stopped
Mar 1 08:43:29 px11 systemd[1]: pvestatd.service: State 'stop-sigterm' timed out. Killing.
Mar 1 08:43:29 px11 systemd[1]: pvestatd.service: Killing process 3141239 (vgs) with signal SIGKILL.
Mar 1 08:43:29 px11 systemd[1]: pvestatd.service: Failed with result 'timeout'.
Mar 1 08:43:29 px11 systemd[1]: pvestatd.service: Consumed 8.728s CPU time.
Mar 1 08:43:30 px11 pvestatd[3427769]: starting server
Mar 1 09:23:23 px11 pvestatd[3427769]: received signal TERM
Mar 1 09:23:23 px11 pvestatd[3427769]: server closing
Mar 1 09:23:23 px11 pvestatd[3427769]: server stopped
Mar 1 09:24:54 px11 systemd[1]: pvestatd.service: State 'stop-sigterm' timed out. Killing.
Mar 1 09:24:54 px11 systemd[1]: pvestatd.service: Killing process 3427894 (vgs) with signal SIGKILL.
Mar 1 09:24:54 px11 systemd[1]: pvestatd.service: Failed with result 'timeout'.
Mar 1 09:24:54 px11 systemd[1]: pvestatd.service: Consumed 3.243s CPU time.
Mar 1 09:24:55 px11 pvestatd[3436449]: starting server
Mar 1 09:26:49 px11 pvestatd[3436449]: received signal TERM
Mar 1 09:26:49 px11 pvestatd[3436449]: server closing
Mar 1 09:26:49 px11 pvestatd[3436449]: server stopped
Mar 1 09:28:20 px11 systemd[1]: pvestatd.service: State 'stop-sigterm' timed out. Killing.
Mar 1 09:28:20 px11 systemd[1]: pvestatd.service: Killing process 3436620 (vgs) with signal SIGKILL.
Mar 1 09:28:20 px11 systemd[1]: pvestatd.service: Failed with result 'timeout'.
Mar 1 09:28:20 px11 systemd[1]: pvestatd.service: Consumed 3.093s CPU time.
Mar 1 09:28:31 px11 pvestatd[3437407]: starting server
Mar 1 09:29:19 px11 pvestatd[3437407]: received signal TERM
Mar 1 09:29:19 px11 pvestatd[3437407]: server closing
Mar 1 09:29:19 px11 pvestatd[3437407]: server stopped
Mar 1 09:30:50 px11 systemd[1]: pvestatd.service: State 'stop-sigterm' timed out. Killing.
Mar 1 09:30:50 px11 systemd[1]: pvestatd.service: Killing process 3437536 (vgs) with signal SIGKILL.
Mar 1 09:30:50 px11 systemd[1]: pvestatd.service: Failed with result 'timeout'.
Mar 1 09:30:50 px11 systemd[1]: pvestatd.service: Consumed 3.080s CPU time.
Mar 1 09:30:58 px11 pvestatd[3438007]: Unknown option: verbose
Mar 1 09:31:09 px11 pvestatd[3438054]: Unknown option: verbose
Mar 1 09:31:26 px11 pvestatd[3438115]: starting server
 
lvs -a
scheint Minutenlang keine Ausgabe zu machen. Ich vermute, da hat sich was aufgehängt. Kann ich da etwas neu laden, ohne das mir alle Festplatten und Clients um die Ohren fliegen?
 
Hat jemand eine Idee? Ist echt doof, wenn der Server mit eine ?-Zeichen da ist. Backups macht er auch keine. Ping geht ohne Verlust in <0.4 sekunden.
 
lvs -a
scheint Minutenlang keine Ausgabe zu machen
Funktionieren alle Platten? Sprich, deutet bei dmesg oder in der /var/log/kern.log bzw auch /var/log/syslog Logdatei etwas darauf hin, das es Probleme mit der HW gibt?

Was ist denn so alles an Storages konfiguriert oder war mal konfiguriert seit die Node läuft?
 
Das Problem tauchte auf, als wir einen weiteren Proxmox hinzugefügt haben. Die Maschinen auf dem betroffenen System laufen noch, keine Festplattenprobleme erkennbar.

ich habe nur ein local-lvm und ein network NAS. Alle anderen Server haben kein Problem mit dem NAS.


[33390940.427078] call_decode: 50 callbacks suppressed
[33390940.427079] nfs: server 192.168.50.70 OK
[33390940.524174] nfs: server 192.168.50.70 OK
[33390940.534772] nfs: server 192.168.50.70 OK
[33390940.541188] nfs: server 192.168.50.70 OK
[33390940.656402] nfs: server 192.168.50.70 OK
[33390940.656950] nfs: server 192.168.50.70 OK
[33390940.748667] nfs: server 192.168.50.70 OK
[33390940.802795] nfs: server 192.168.50.70 OK
[33390941.051713] nfs: server 192.168.50.70 OK
[33390941.126295] rpc_check_timeout: 114 callbacks suppressed
[33390941.126296] nfs: server 192.168.50.70 not responding, still trying
[33390941.126559] nfs: server 192.168.50.70 not responding, still trying
[33390941.126627] nfs: server 192.168.50.70 not responding, still trying
[33390941.206166] nfs: server 192.168.50.70 not responding, still trying
[33390941.206463] nfs: server 192.168.50.70 not responding, still trying
[33390941.367676] nfs: server 192.168.50.70 not responding, still trying
[33390941.367830] nfs: server 192.168.50.70 not responding, still trying
[33390941.367970] nfs: server 192.168.50.70 not responding, still trying
[33390941.459601] nfs: server 192.168.50.70 not responding, still trying
[33390941.459680] nfs: server 192.168.50.70 OK
[33390941.459786] nfs: server 192.168.50.70 not responding, still trying
[33390945.439398] call_decode: 54 callbacks suppressed
[33390945.439399] nfs: server 192.168.50.70 OK
[33390945.519484] nfs: server 192.168.50.70 OK
[33390945.568369] nfs: server 192.168.50.70 OK
[33390950.990147] nfs: server 192.168.50.70 OK
[33390951.093480] nfs: server 192.168.50.70 OK
[33390951.134314] nfs: server 192.168.50.70 OK
[33390951.165033] rpc_check_timeout: 232 callbacks suppressed
[33390951.165033] nfs: server 192.168.50.70 not responding, still trying
[33390951.165302] nfs: server 192.168.50.70 not responding, still trying
[33390951.165405] nfs: server 192.168.50.70 not responding, still trying
[33390951.297494] nfs: server 192.168.50.70 not responding, still trying
[33390951.297697] nfs: server 192.168.50.70 not responding, still trying
[33390951.297893] nfs: server 192.168.50.70 not responding, still trying
[33390951.298043] nfs: server 192.168.50.70 not responding, still trying
[33390951.349507] nfs: server 192.168.50.70 OK
[33390951.353900] nfs: server 192.168.50.70 not responding, still trying
[33390951.354031] nfs: server 192.168.50.70 not responding, still trying
[33390951.354159] nfs: server 192.168.50.70 not responding, still trying
[33390955.542095] call_decode: 108 callbacks suppressed
***
[33390955.599086] nfs: server 192.168.50.70 OK
[33390957.868963] rpc_check_timeout: 123 callbacks suppressed
[33390957.868964] nfs: server 192.168.50.70 not responding, still trying
[33390957.869203] nfs: server 192.168.50.70 not responding, still trying
[33390958.035804] nfs: server 192.168.50.70 not responding, still trying
[33390958.035980] nfs: server 192.168.50.70 not responding, still trying
[33390958.036098] nfs: server 192.168.50.70 not responding, still trying
[33390958.036239] nfs: server 192.168.50.70 not responding, still trying
[33390958.282618] nfs: server 192.168.50.70 not responding, still trying
[33390958.282785] nfs: server 192.168.50.70 not responding, still trying
[33390958.282923] nfs: server 192.168.50.70 not responding, still trying
[33390958.283028] nfs: server 192.168.50.70 not responding, still trying
[33390960.669849] call_decode: 181 callbacks suppressed
[33390960.669850] nfs: server 192.168.50.70 OK
[33390960.675588] nfs: server 192.168.50.70 OK
[33390960.813765] nfs: server 192.168.50.70 OK
[33390960.919176] nfs: server 192.168.50.70 OK
[33390960.919474] nfs: server 192.168.50.70 OK
[33390960.932354] nfs: server 192.168.50.70 OK
[33390961.135154] nfs: server 192.168.50.70 OK
[33390961.233769] nfs: server 192.168.50.70 OK
[33390961.239428] nfs: server 192.168.50.70 OK
[33390961.239904] nfs: server 192.168.50.70 OK
[33390963.191676] rpc_check_timeout: 234 callbacks suppressed
[33390963.191677] nfs: server 192.168.50.70 not responding, still trying
[33390963.191960] nfs: server 192.168.50.70 not responding, still trying
[33390963.192194] nfs: server 192.168.50.70 not responding, still trying
[33390963.192319] nfs: server 192.168.50.70 not responding, still trying
[33390963.308179] nfs: server 192.168.50.70 not responding, still trying
[33390963.308489] nfs: server 192.168.50.70 not responding, still trying
[33390963.308597] nfs: server 192.168.50.70 not responding, still trying
[33390963.439906] nfs: server 192.168.50.70 not responding, still trying
[33390963.440007] nfs: server 192.168.50.70 not responding, still trying
[33390963.440202] nfs: server 192.168.50.70 not responding, still trying
[33390965.678065] call_decode: 53 callbacks suppressed
[33390965.678065] nfs: server 192.168.50.70 OK
[33390965.680578] nfs: server 192.168.50.70 OK
[33390965.841874] nfs: server 192.168.50.70 OK
[33390965.946713] nfs: server 192.168.50.70 OK
[33390966.149734] nfs: server 192.168.50.70 OK
[33390966.328818] nfs: server 192.168.50.70 OK
[33390966.391680] nfs: server 192.168.50.70 OK
[33390966.393323] nfs: server 192.168.50.70 OK
[33390966.619726] nfs: server 192.168.50.70 OK
[33390966.732116] nfs: server 192.168.50.70 OK
[33390968.206334] rpc_check_timeout: 259 callbacks suppressed
[33390968.206335] nfs: server 192.168.50.70 not responding, still trying
[33390968.206433] nfs: server 192.168.50.70 not responding, still trying
[33390968.206514] nfs: server 192.168.50.70 not responding, still trying
[33390968.206563] nfs: server 192.168.50.70 not responding, still trying
[33390968.206640] nfs: server 192.168.50.70 not responding, still trying
[33390968.206722] nfs: server 192.168.50.70 not responding, still trying
[33390968.251891] nfs: server 192.168.50.70 not responding, still trying
[33390968.252046] nfs: server 192.168.50.70 not responding, still trying
[33390968.252109] nfs: server 192.168.50.70 not responding, still trying
[33390968.252125] nfs: server 192.168.50.70 not responding, still trying
[33390970.778598] call_decode: 96 callbacks suppressed
[33390970.778599] nfs: server 192.168.50.70 OK
[33390970.778608] nfs: server 192.168.50.70 OK
[33390970.778722] nfs: server 192.168.50.70 OK
[33390970.778780] nfs: server 192.168.50.70 OK
[33390970.778836] nfs: server 192.168.50.70 OK
[33390970.778874] nfs: server 192.168.50.70 OK
[33390970.778926] nfs: server 192.168.50.70 OK
[33390970.779026] nfs: server 192.168.50.70 OK
[33390970.779059] nfs: server 192.168.50.70 OK
[33390970.779099] nfs: server 192.168.50.70 OK
[33390975.837623] call_decode: 615 callbacks suppressed
[33390975.837624] nfs: server 192.168.50.70 OK
[33390975.837653] nfs: server 192.168.50.70 OK
[33390975.837660] nfs: server 192.168.50.70 OK
[33390975.837690] nfs: server 192.168.50.70 OK
[33390975.837743] nfs: server 192.168.50.70 OK
[33390975.837815] nfs: server 192.168.50.70 OK
[33390975.837877] nfs: server 192.168.50.70 OK
[33390975.837928] nfs: server 192.168.50.70 OK
[33390975.837989] nfs: server 192.168.50.70 OK
[33390975.838039] nfs: server 192.168.50.70 OK
[33390990.327416] call_decode: 376 callbacks suppressed
[33390990.327417] nfs: server 192.168.50.70 OK
[33390990.327429] nfs: server 192.168.50.70 OK
[33390990.341193] nfs: server 192.168.50.70 OK
[33390990.341347] nfs: server 192.168.50.70 OK
[33390990.341406] nfs: server 192.168.50.70 OK
[33390990.341459] nfs: server 192.168.50.70 OK
[33390990.341515] nfs: server 192.168.50.70 OK
[33390990.341544] nfs: server 192.168.50.70 OK
[33390990.341585] nfs: server 192.168.50.70 OK
[33390990.341645] nfs: server 192.168.50.70 OK
[33390996.145075] call_decode: 66 callbacks suppressed
[33390996.145076] nfs: server 192.168.50.70 OK
[33390996.145085] nfs: server 192.168.50.70 OK
[33390996.145087] nfs: server 192.168.50.70 OK
[33390996.145099] nfs: server 192.168.50.70 OK
[33390996.145110] nfs: server 192.168.50.70 OK
[33390996.145122] nfs: server 192.168.50.70 OK
[33390996.145137] nfs: server 192.168.50.70 OK
[33390996.145142] nfs: server 192.168.50.70 OK
[33390996.571399] nfs: server 192.168.50.70 OK
[33390996.571406] nfs: server 192.168.50.70 OK
[33391001.564154] call_decode: 74 callbacks suppressed
[33391001.564155] nfs: server 192.168.50.70 OK
[33391001.564163] nfs: server 192.168.50.70 OK
[33391001.564171] nfs: server 192.168.50.70 OK
[33391001.564173] nfs: server 192.168.50.70 OK
[33391001.564176] nfs: server 192.168.50.70 OK
[33391001.564185] nfs: server 192.168.50.70 OK
[33391001.564194] nfs: server 192.168.50.70 OK
[33391001.564225] nfs: server 192.168.50.70 OK
[33391001.564232] nfs: server 192.168.50.70 OK
[33391001.564238] nfs: server 192.168.50.70 OK
[33391008.880130] call_decode: 322 callbacks suppressed
[33391008.880130] nfs: server 192.168.50.70 OK
[33391008.880136] nfs: server 192.168.50.70 OK
[33391008.880137] nfs: server 192.168.50.70 OK
[33391008.880147] nfs: server 192.168.50.70 OK
[33391008.880158] nfs: server 192.168.50.70 OK
[33391008.880169] nfs: server 192.168.50.70 OK
[33391008.880178] nfs: server 192.168.50.70 OK
[33391008.880202] nfs: server 192.168.50.70 OK
[33391008.880204] nfs: server 192.168.50.70 OK
[33391008.880206] nfs: server 192.168.50.70 OK





Mar 2 12:31:05 px1 pmxcfs[24437]: [status] notice: received log
Mar 2 12:31:06 px1 pmxcfs[24437]: [status] notice: received log
Mar 2 12:31:06 px1 pve-firewall[1498]: status update error: ipset_restore_cmdlist: Try `ipset help' for more information.
Mar 2 12:31:06 px1 pmxcfs[24437]: [status] notice: received log
Mar 2 12:31:06 px1 pmxcfs[24437]: [status] notice: received log
Mar 2 12:31:07 px1 pmxcfs[24437]: [status] notice: received log
Mar 2 12:31:07 px1 pmxcfs[24437]: [status] notice: received log
Mar 2 12:31:08 px1 pmxcfs[24437]: [status] notice: received log
Mar 2 12:31:08 px1 pmxcfs[24437]: [status] notice: received log
Mar 2 12:31:09 px1 pmxcfs[24437]: [status] notice: received log
Mar 2 12:31:09 px1 pmxcfs[24437]: [status] notice: received log
Mar 2 12:31:09 px1 pmxcfs[24437]: [status] notice: received log
Mar 2 12:31:10 px1 pmxcfs[24437]: [status] notice: received log
Mar 2 12:31:10 px1 pmxcfs[24437]: [status] notice: received log
Mar 2 12:31:16 px1 pve-firewall[1498]: status update error: ipset_restore_cmdlist: Try `ipset help' for more information.
Mar 2 12:31:26 px1 pve-firewall[1498]: status update error: ipset_restore_cmdlist: Try `ipset help' for more information.
Mar 2 12:31:36 px1 pve-firewall[1498]: status update error: ipset_restore_cmdlist: Try `ipset help' for more information.
Mar 2 12:31:40 px1 pmxcfs[24437]: [status] notice: received log
Mar 2 12:31:41 px1 pmxcfs[24437]: [status] notice: received log
Mar 2 12:31:41 px1 pmxcfs[24437]: [status] notice: received log
Mar 2 12:31:42 px1 pmxcfs[24437]: [status] notice: received log
Mar 2 12:31:42 px1 pmxcfs[24437]: [status] notice: received log
Mar 2 12:31:42 px1 pmxcfs[24437]: [status] notice: received log
Mar 2 12:31:43 px1 pmxcfs[24437]: [status] notice: received log
Mar 2 12:31:43 px1 pmxcfs[24437]: [status] notice: received log
Mar 2 12:31:44 px1 pmxcfs[24437]: [status] notice: received log
Mar 2 12:31:44 px1 pmxcfs[24437]: [status] notice: received log
Mar 2 12:31:44 px1 pmxcfs[24437]: [status] notice: received log
Mar 2 12:31:44 px1 pmxcfs[24437]: [status] notice: received log
Mar 2 12:31:45 px1 pmxcfs[24437]: [status] notice: received log
Mar 2 12:31:45 px1 pmxcfs[24437]: [status] notice: received log
Mar 2 12:31:46 px1 pmxcfs[24437]: [status] notice: received log
Mar 2 12:31:46 px1 pve-firewall[1498]: status update error: ipset_restore_cmdlist: Try `ipset help' for more information.
Mar 2 12:31:46 px1 pmxcfs[24437]: [status] notice: received log
Mar 2 12:31:46 px1 pmxcfs[24437]: [status] notice: received log
Mar 2 12:31:47 px1 pmxcfs[24437]: [status] notice: received log
Mar 2 12:31:47 px1 pmxcfs[24437]: [status] notice: received log
Mar 2 12:31:48 px1 pmxcfs[24437]: [status] notice: received log
Mar 2 12:31:48 px1 pmxcfs[24437]: [status] notice: received log
Mar 2 12:31:48 px1 pmxcfs[24437]: [status] notice: received log
Mar 2 12:31:49 px1 pmxcfs[24437]: [status] notice: received log
Mar 2 12:31:49 px1 pmxcfs[24437]: [status] notice: received log
Mar 2 12:31:49 px1 pmxcfs[24437]: [status] notice: received log
Mar 2 12:31:56 px1 pve-firewall[1498]: status update error: ipset_restore_cmdlist: Try `ipset help' for more information.
Mar 2 12:32:06 px1 pve-firewall[1498]: status update error: ipset_restore_cmdlist: Try `ipset help' for more information.
Mar 2 12:32:16 px1 pve-firewall[1498]: status update error: ipset_restore_cmdlist: Try `ipset help' for more information.
Mar 2 12:32:19 px1 pmxcfs[24437]: [status] notice: received log
Mar 2 12:32:20 px1 pmxcfs[24437]: [status] notice: received log
Mar 2 12:32:20 px1 pmxcfs[24437]: [status] notice: received log
Mar 2 12:32:20 px1 pmxcfs[24437]: [status] notice: received log
Mar 2 12:32:21 px1 pmxcfs[24437]: [status] notice: received log
Mar 2 12:32:21 px1 pmxcfs[24437]: [status] notice: received log
Mar 2 12:32:26 px1 pve-firewall[1498]: status update error: ipset_restore_cmdlist: Try `ipset help' for more information.
Mar 2 12:32:36 px1 pve-firewall[1498]: status update error: ipset_restore_cmdlist: Try `ipset help' for more information.
Mar 2 12:32:41 px1 pmxcfs[24437]: [status] notice: received log
 
Last edited:
Was ich noch erwähnen muss ist, dass das LVM-THIN auf einem RAID Kontroller ist. Dort ist aber alles I.O. und die Systeme laufen jetzt auch seit 5 Tagen in diesem Zustand und es funktioniert 'alles'. Ausser eben das Fragezeichen und dass ich Systeme weder verschieben noch eine Datensicherung ein VM anlegen kann.
 
Hmm. Mit "weder verschieben noch Sichern" meinst du immer aufs NFS schieben bzw. dorthin sichern?
 
ich habe ja ein ? Zeichen. Da kann ich die Maschinen zwar öffnen, aber nichts anstossen. Auch die Statistiken sind weg. Also da geht nicht mehr viel. Auch wenn die Festplatten auf dem NFS sind, würde mir eine 2. Hardware fehlen. Zudem bei so vielen Maschinen, wird auch eine 10 Gbit Anbindung zu langsam sein die Daten rechtzeitig verarbeiten zu können. Super wäre, wenn ich einen bestimmten Dienst neustarten könnte, keine Daten verliere und den Server nicht neustarten muss, damit alles funktioniert.

vm-px.png
 
Last edited:
Ah okay. Ein Fragezeichen an sich muss nicht bedeuten, dass nichts geht. Aber der "Loading..." deutet auf ein anderes Problem hin. Ich nehme an, du hast den gleichen Effekt, egal ob du eine VM auf der Node anklickst oder die Node px11 direkt?

Du hast noch Zugriff auf die Node? Via SSH? Laufen die folgenden Dienste?
Code:
systemctl status pve-cluster
systemctl status corosync
systemctl status pvedaemon
systemctl status pveproxy
 
Also ich kann links den Node px11 anklicken und dann auf 'Summary' wechseln und da sehe ich die CPU und RAM Auslastung in Realtime.
Wenn ich nun auch noch pvestatd neustarte, funktioniert der Zugriff für ca. 5 Minuten auf alle VM's. Es wird alles grün, ausser aller Storages, local, local-lvm und nas.


# systemctl status pve-cluster
systemctl status corosync
systemctl status pvedaemon
systemctl status pveproxy
● pve-cluster.service - The Proxmox VE cluster filesystem
Loaded: loaded (/lib/systemd/system/pve-cluster.service; enabled; vendor preset: enabled)
Active: active (running) since Tue 2022-03-01 09:44:33 CET; 1 day 7h ago
Process: 3440895 ExecStart=/usr/bin/pmxcfs (code=exited, status=0/SUCCESS)
Main PID: 3440896 (pmxcfs)
Tasks: 7 (limit: 618985)
Memory: 47.2M
CPU: 9min 43.669s
CGroup: /system.slice/pve-cluster.service
└─3440896 /usr/bin/pmxcfs

Mar 02 17:08:42 px11 pmxcfs[3440896]: [status] notice: received log
Mar 02 17:08:43 px11 pmxcfs[3440896]: [status] notice: received log
Mar 02 17:08:43 px11 pmxcfs[3440896]: [status] notice: received log
Mar 02 17:08:44 px11 pmxcfs[3440896]: [status] notice: received log
Mar 02 17:08:44 px11 pmxcfs[3440896]: [status] notice: received log
Mar 02 17:08:44 px11 pmxcfs[3440896]: [status] notice: received log
Mar 02 17:08:46 px11 pmxcfs[3440896]: [status] notice: received log
Mar 02 17:08:46 px11 pmxcfs[3440896]: [status] notice: received log
Mar 02 17:08:46 px11 pmxcfs[3440896]: [status] notice: received log
Mar 02 17:09:17 px11 pmxcfs[3440896]: [status] notice: received log
● corosync.service - Corosync Cluster Engine
Loaded: loaded (/lib/systemd/system/corosync.service; enabled; vendor preset: enabled)
Active: active (running) since Fri 2022-02-25 21:20:08 CET; 4 days ago
Docs: man:corosync
man:corosync.conf
man:corosync_overview
Main PID: 2399093 (corosync)
Tasks: 9 (limit: 618985)
Memory: 156.5M
CPU: 4h 36min 48.320s
CGroup: /system.slice/corosync.service
└─2399093 /usr/sbin/corosync -f

Feb 25 21:20:13 px11 corosync[2399093]: [KNET ] pmtud: PMTUD link change for host: 1 link: 0 from 469 to 1397
Feb 25 21:20:13 px11 corosync[2399093]: [KNET ] pmtud: PMTUD link change for host: 8 link: 0 from 469 to 1397
Feb 25 21:20:13 px11 corosync[2399093]: [KNET ] pmtud: Global data MTU changed to: 1397
Feb 25 21:20:15 px11 corosync[2399093]: [QUORUM] Sync members[11]: 1 2 3 4 5 6 7 8 9 10 11
Feb 25 21:20:15 px11 corosync[2399093]: [QUORUM] Sync joined[10]: 1 2 3 4 5 6 7 8 9 11
Feb 25 21:20:15 px11 corosync[2399093]: [TOTEM ] A new membership (1.7451a) was formed. Members joined: 1 2 3 4 5 6 7 8 9 11
Feb 25 21:20:15 px11 corosync[2399093]: [QUORUM] This node is within the primary component and will provide service.
Feb 25 21:20:15 px11 corosync[2399093]: [QUORUM] Members[11]: 1 2 3 4 5 6 7 8 9 10 11
Feb 25 21:20:15 px11 corosync[2399093]: [MAIN ] Completed service synchronization, ready to provide service.
Feb 28 21:16:45 px11 corosync[2399093]: [TOTEM ] Retransmit List: 30f000
● pvedaemon.service - PVE API Daemon
Loaded: loaded (/lib/systemd/system/pvedaemon.service; enabled; vendor preset: enabled)
Active: active (running) since Mon 2022-02-28 08:43:22 CET; 2 days ago
Process: 3119625 ExecStart=/usr/bin/pvedaemon start (code=exited, status=0/SUCCESS)
Process: 3140139 ExecReload=/usr/bin/pvedaemon restart (code=exited, status=0/SUCCESS)
Main PID: 3119627 (pvedaemon)
Tasks: 15 (limit: 618985)
Memory: 439.4M
CPU: 2min 4.232s
CGroup: /system.slice/pvedaemon.service
├─3119627 pvedaemon
├─3119628 pvedaemon worker
├─3119629 pvedaemon worker
├─3119630 pvedaemon worker
├─3120971 task UPID:px11:002F9F4B:233E8992:621C7E9C:vzdump:125:root@pam:
├─3120977 /sbin/vgs --separator : --noheadings --units b --unbuffered --nosuffix --options vg_name,vg_size,vg_free,lv_count
├─3131856 /sbin/vgs --separator : --noheadings --units b --unbuffered --nosuffix --options vg_name,vg_size,vg_free,lv_count
├─3132256 /sbin/vgs --separator : --noheadings --units b --unbuffered --nosuffix --options vg_name,vg_size,vg_free,lv_count
├─3132637 /sbin/vgs --separator : --noheadings --units b --unbuffered --nosuffix --options vg_name,vg_size,vg_free,lv_count
├─3140150 pvedaemon worker
├─3140151 pvedaemon worker
├─3140152 pvedaemon worker
├─3140153 /sbin/vgs --separator : --noheadings --units b --unbuffered --nosuffix --options vg_name,vg_size,vg_free,lv_count,pv_name,pv_size,pv_free
├─3140160 /sbin/vgs --separator : --noheadings --units b --unbuffered --nosuffix --options vg_name,vg_size,vg_free,lv_count
└─3140748 /sbin/vgs --separator : --noheadings --units b --unbuffered --nosuffix --options vg_name,vg_size,vg_free,lv_count,pv_name,pv_size,pv_free

Feb 28 10:07:03 px11 pvedaemon[3119627]: server closing
Feb 28 10:07:03 px11 pvedaemon[3119627]: server shutdown (restart)
Feb 28 10:07:03 px11 systemd[1]: Reloaded PVE API Daemon.
Feb 28 10:07:04 px11 pvedaemon[3119627]: restarting server
Feb 28 10:07:04 px11 pvedaemon[3119627]: starting 3 worker(s)
Feb 28 10:07:04 px11 pvedaemon[3119627]: worker 3140150 started
Feb 28 10:07:05 px11 pvedaemon[3119627]: worker 3140151 started
Feb 28 10:07:05 px11 pvedaemon[3119627]: worker 3140152 started
Feb 28 10:07:14 px11 pvedaemon[3140152]: <root@pam> successful auth for user 'root@pam'
Feb 28 10:07:44 px11 pvedaemon[3140152]: <root@pam> successful auth for user 'root@pam'
● pveproxy.service - PVE API Proxy Server
Loaded: loaded (/lib/systemd/system/pveproxy.service; enabled; vendor preset: enabled)
Active: active (running) since Tue 2022-03-01 09:44:44 CET; 1 day 7h ago
Process: 3440946 ExecStartPre=/usr/bin/pvecm updatecerts --silent (code=exited, status=0/SUCCESS)
Process: 3440979 ExecStart=/usr/bin/pveproxy start (code=exited, status=0/SUCCESS)
Process: 3614259 ExecReload=/usr/bin/pveproxy restart (code=exited, status=0/SUCCESS)
Main PID: 3440982 (pveproxy)
Tasks: 4 (limit: 618985)
Memory: 228.1M
CPU: 6min 30.237s
CGroup: /system.slice/pveproxy.service
├─3440982 pveproxy
├─3752879 pveproxy worker
├─3763358 pveproxy worker
└─3770413 pveproxy worker

Mar 02 16:59:13 px11 pveproxy[3770413]: proxy detected vanished client connection
Mar 02 16:59:43 px11 pveproxy[3770413]: proxy detected vanished client connection
Mar 02 17:01:39 px11 pveproxy[3752879]: proxy detected vanished client connection
Mar 02 17:02:20 px11 pveproxy[3770413]: proxy detected vanished client connection
Mar 02 17:02:52 px11 pveproxy[3770413]: proxy detected vanished client connection
Mar 02 17:03:24 px11 pveproxy[3763358]: proxy detected vanished client connection
Mar 02 17:04:29 px11 pveproxy[3763358]: proxy detected vanished client connection
Mar 02 17:06:00 px11 pveproxy[3770413]: proxy detected vanished client connection
Mar 02 17:06:18 px11 pveproxy[3770413]: proxy detected vanished client connection
Mar 02 17:08:01 px11 pveproxy[3770413]: proxy detected vanished client connection
 
Hey leute, ich habe das Problem gefunden.
Mit ps aux | grep lv habe ich die Prozesse aufgelistet. Dort fand ich diesen hängenden Prozess:
root 2395224 0.0 0.0 23752 20660 ? S Feb25 0:00 /sbin/lvcreate -aly -V 67108864k --name vm-152-disk-0 --thinpool pve/data

Ich habe den Prozess gekillt und das System ist online. OHNE REBOOT.
 
  • Like
Reactions: aaron

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!