Hallo,
bei meinem Cluster hat 1 Server immer ein Fragezeichen. Alle anderen Server scheinen zu funktionieren. Wenn ich pvestatd neustarte, ist der Server wieder online für ca. 5 Minuten. Der Neustart dauert statt bei allen anderen Server nicht nur wenige Sekunden sondern 3-4 Minuten. Die 2 lokalen Disks und 1 Netzwerkspeicher werden aber nie online angezeigt. Den Netzwerkspeicherpfad kann ich problemlos in der Shell öffnen # ls /mnt/pve/nas/.
Habt Ihr einen Tipp was ich tun kann, damit der Server wieder komplett da ist? Den Server kann ich leider nicht neu starten.
Zudem hat sich ein Backup aufgehängt. Lokal sehe ich den Prozess nicht, aber der Cluster zeigt noch auf dem betroffenen Server ein laufendes Backup an. Wenn ich es öffne zeigt es nur NA/NA/NA an.
-------------------
Name: FC1
Config Version: 21
Transport: knet
Secure auth: on
Quorum information
------------------
Date: Tue Mar 1 09:37:27 2022
Quorum provider: corosync_votequorum
Nodes: 11
Node ID: 0x0000000a
Ring ID: 1.7451a
Quorate: Yes
Votequorum information
----------------------
Expected votes: 11
Highest expected: 11
Total votes: 11
Quorum: 6
Flags: Quorate
Membership information
----------------------
Nodeid Votes Name
0x00000001 1 192.168.50.60
0x00000002 1 192.168.50.62
0x00000003 1 192.168.50.59
0x00000004 1 192.168.50.57
0x00000005 1 192.168.50.64
0x00000006 1 192.168.50.65
0x00000007 1 192.168.50.66
0x00000008 1 192.168.50.51
0x00000009 1 192.168.50.68
0x0000000a 1 192.168.50.61 (local)
0x0000000b 1 192.168.50.67
# pveversion -v
proxmox-ve: 7.1-1 (running kernel: 5.11.22-5-pve)
pve-manager: 7.1-10 (running version: 7.1-10/6ddebafe)
pve-kernel-helper: 7.1-8
pve-kernel-5.13: 7.1-6
pve-kernel-5.11: 7.0-10
pve-kernel-5.4: 6.4-4
pve-kernel-5.0: 6.0-11
pve-kernel-5.13.19-3-pve: 5.13.19-7
pve-kernel-5.11.22-7-pve: 5.11.22-12
pve-kernel-5.11.22-5-pve: 5.11.22-10
pve-kernel-5.11.22-2-pve: 5.11.22-4
pve-kernel-5.4.124-1-pve: 5.4.124-2
pve-kernel-5.4.114-1-pve: 5.4.114-1
pve-kernel-5.0.21-5-pve: 5.0.21-10
pve-kernel-5.0.15-1-pve: 5.0.15-1
ceph-fuse: 15.2.15-pve1
corosync: 3.1.5-pve2
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown: residual config
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.22-pve2
libproxmox-acme-perl: 1.4.1
libproxmox-backup-qemu0: 1.2.0-1
libpve-access-control: 7.1-6
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.1-2
libpve-guest-common-perl: 4.0-3
libpve-http-server-perl: 4.1-1
libpve-storage-perl: 7.0-15
libqb0: 1.0.5-1
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 4.0.11-1
lxcfs: 4.0.11-pve1
novnc-pve: 1.3.0-1
proxmox-backup-client: 2.1.5-1
proxmox-backup-file-restore: 2.1.5-1
proxmox-mini-journalreader: 1.3-1
proxmox-widget-toolkit: 3.4-5
pve-cluster: 7.1-3
pve-container: 4.1-3
pve-docs: 7.1-2
pve-edk2-firmware: 3.20210831-2
pve-firewall: 4.2-5
pve-firmware: 3.3-4
pve-ha-manager: 3.3-3
pve-i18n: 2.6-2
pve-qemu-kvm: 6.1.1-1
pve-xtermjs: 4.16.0-1
qemu-server: 7.1-4
smartmontools: 7.2-pve2
spiceterm: 3.2-2
swtpm: 0.7.0~rc1+2
vncterm: 1.7-1
zfsutils-linux: 2.1.2-pve1
● pvestatd.service - PVE Status Daemon
Loaded: loaded (/lib/systemd/system/pvestatd.service; enabled; vendor preset: enabled)
Active: failed (Result: timeout) since Tue 2022-03-01 09:30:50 CET; 9min ago
Process: 3437406 ExecStart=/usr/bin/pvestatd start (code=exited, status=0/SUCCESS)
Process: 3437653 ExecStop=/usr/bin/pvestatd stop (code=exited, status=0/SUCCESS)
Main PID: 3437407 (code=exited, status=0/SUCCESS)
CPU: 3.080s
Mar 01 09:28:31 px11 systemd[1]: Started PVE Status Daemon.
Mar 01 09:29:18 px11 systemd[1]: Stopping PVE Status Daemon...
Mar 01 09:29:19 px11 pvestatd[3437407]: received signal TERM
Mar 01 09:29:19 px11 pvestatd[3437407]: server closing
Mar 01 09:29:19 px11 pvestatd[3437407]: server stopped
Mar 01 09:30:50 px11 systemd[1]: pvestatd.service: State 'stop-sigterm' timed out. Killing.
Mar 01 09:30:50 px11 systemd[1]: pvestatd.service: Killing process 3437536 (vgs) with signal SIGKILL.
Mar 01 09:30:50 px11 systemd[1]: pvestatd.service: Failed with result 'timeout'.
Mar 01 09:30:50 px11 systemd[1]: Stopped PVE Status Daemon.
Mar 01 09:30:50 px11 systemd[1]: pvestatd.service: Consumed 3.080s CPU time.
syslog:
Feb 28 08:45:27 px11 systemd[1]: pvestatd.service: State 'stop-sigterm' timed out. Killing.
Feb 28 08:45:27 px11 systemd[1]: pvestatd.service: Killing process 2965096 (vgs) with signal SIGKILL.
Feb 28 08:45:27 px11 systemd[1]: pvestatd.service: Failed with result 'timeout'.
Feb 28 08:45:27 px11 systemd[1]: pvestatd.service: Consumed 6.351s CPU time.
Feb 28 08:45:28 px11 pvestatd[3120017]: starting server
Feb 28 09:42:03 px11 pvestatd[3120017]: received signal TERM
Feb 28 09:42:03 px11 pvestatd[3120017]: server closing
Feb 28 09:42:03 px11 pvestatd[3120017]: server stopped
Feb 28 09:43:34 px11 systemd[1]: pvestatd.service: State 'stop-sigterm' timed out. Killing.
Feb 28 09:43:34 px11 systemd[1]: pvestatd.service: Killing process 3120121 (vgs) with signal SIGKILL.
Feb 28 09:43:34 px11 systemd[1]: pvestatd.service: Failed with result 'timeout'.
Feb 28 09:43:34 px11 systemd[1]: pvestatd.service: Consumed 3.404s CPU time.
Feb 28 09:44:50 px11 pvestatd[3133875]: starting server
Feb 28 09:52:26 px11 pvestatd[3133875]: received signal TERM
Feb 28 09:52:26 px11 pvestatd[3133875]: server closing
Feb 28 09:52:26 px11 pvestatd[3133875]: server stopped
Feb 28 09:53:57 px11 systemd[1]: pvestatd.service: State 'stop-sigterm' timed out. Killing.
Feb 28 09:53:57 px11 systemd[1]: pvestatd.service: Killing process 3133985 (vgs) with signal SIGKILL.
Feb 28 09:53:57 px11 systemd[1]: pvestatd.service: Failed with result 'timeout'.
Feb 28 09:53:57 px11 systemd[1]: pvestatd.service: Consumed 3.161s CPU time.
Feb 28 09:53:58 px11 pvestatd[3136129]: starting server
Feb 28 10:07:04 px11 pvestatd[3140143]: send HUP to 3136129
Feb 28 10:07:04 px11 pvestatd[3136129]: received signal HUP
Feb 28 10:08:25 px11 pvestatd[3136129]: received signal TERM
Feb 28 10:08:25 px11 pvestatd[3136129]: server closing
Feb 28 10:08:25 px11 pvestatd[3136129]: server stopped
Feb 28 10:09:56 px11 systemd[1]: pvestatd.service: State 'stop-sigterm' timed out. Killing.
Feb 28 10:09:56 px11 systemd[1]: pvestatd.service: Killing process 3136263 (vgs) with signal SIGKILL.
Feb 28 10:09:56 px11 systemd[1]: pvestatd.service: Failed with result 'timeout'.
Feb 28 10:09:56 px11 systemd[1]: pvestatd.service: Consumed 4.355s CPU time.
Feb 28 10:09:57 px11 pvestatd[3141128]: starting server
Mar 1 08:41:58 px11 pvestatd[3141128]: received signal TERM
Mar 1 08:41:58 px11 pvestatd[3141128]: server closing
Mar 1 08:41:58 px11 pvestatd[3141128]: server stopped
Mar 1 08:43:29 px11 systemd[1]: pvestatd.service: State 'stop-sigterm' timed out. Killing.
Mar 1 08:43:29 px11 systemd[1]: pvestatd.service: Killing process 3141239 (vgs) with signal SIGKILL.
Mar 1 08:43:29 px11 systemd[1]: pvestatd.service: Failed with result 'timeout'.
Mar 1 08:43:29 px11 systemd[1]: pvestatd.service: Consumed 8.728s CPU time.
Mar 1 08:43:30 px11 pvestatd[3427769]: starting server
Mar 1 09:23:23 px11 pvestatd[3427769]: received signal TERM
Mar 1 09:23:23 px11 pvestatd[3427769]: server closing
Mar 1 09:23:23 px11 pvestatd[3427769]: server stopped
Mar 1 09:24:54 px11 systemd[1]: pvestatd.service: State 'stop-sigterm' timed out. Killing.
Mar 1 09:24:54 px11 systemd[1]: pvestatd.service: Killing process 3427894 (vgs) with signal SIGKILL.
Mar 1 09:24:54 px11 systemd[1]: pvestatd.service: Failed with result 'timeout'.
Mar 1 09:24:54 px11 systemd[1]: pvestatd.service: Consumed 3.243s CPU time.
Mar 1 09:24:55 px11 pvestatd[3436449]: starting server
Mar 1 09:26:49 px11 pvestatd[3436449]: received signal TERM
Mar 1 09:26:49 px11 pvestatd[3436449]: server closing
Mar 1 09:26:49 px11 pvestatd[3436449]: server stopped
Mar 1 09:28:20 px11 systemd[1]: pvestatd.service: State 'stop-sigterm' timed out. Killing.
Mar 1 09:28:20 px11 systemd[1]: pvestatd.service: Killing process 3436620 (vgs) with signal SIGKILL.
Mar 1 09:28:20 px11 systemd[1]: pvestatd.service: Failed with result 'timeout'.
Mar 1 09:28:20 px11 systemd[1]: pvestatd.service: Consumed 3.093s CPU time.
Mar 1 09:28:31 px11 pvestatd[3437407]: starting server
Mar 1 09:29:19 px11 pvestatd[3437407]: received signal TERM
Mar 1 09:29:19 px11 pvestatd[3437407]: server closing
Mar 1 09:29:19 px11 pvestatd[3437407]: server stopped
Mar 1 09:30:50 px11 systemd[1]: pvestatd.service: State 'stop-sigterm' timed out. Killing.
Mar 1 09:30:50 px11 systemd[1]: pvestatd.service: Killing process 3437536 (vgs) with signal SIGKILL.
Mar 1 09:30:50 px11 systemd[1]: pvestatd.service: Failed with result 'timeout'.
Mar 1 09:30:50 px11 systemd[1]: pvestatd.service: Consumed 3.080s CPU time.
Mar 1 09:30:58 px11 pvestatd[3438007]: Unknown option: verbose
Mar 1 09:31:09 px11 pvestatd[3438054]: Unknown option: verbose
Mar 1 09:31:26 px11 pvestatd[3438115]: starting server
bei meinem Cluster hat 1 Server immer ein Fragezeichen. Alle anderen Server scheinen zu funktionieren. Wenn ich pvestatd neustarte, ist der Server wieder online für ca. 5 Minuten. Der Neustart dauert statt bei allen anderen Server nicht nur wenige Sekunden sondern 3-4 Minuten. Die 2 lokalen Disks und 1 Netzwerkspeicher werden aber nie online angezeigt. Den Netzwerkspeicherpfad kann ich problemlos in der Shell öffnen # ls /mnt/pve/nas/.
Habt Ihr einen Tipp was ich tun kann, damit der Server wieder komplett da ist? Den Server kann ich leider nicht neu starten.
Zudem hat sich ein Backup aufgehängt. Lokal sehe ich den Prozess nicht, aber der Cluster zeigt noch auf dem betroffenen Server ein laufendes Backup an. Wenn ich es öffne zeigt es nur NA/NA/NA an.
-------------------
Name: FC1
Config Version: 21
Transport: knet
Secure auth: on
Quorum information
------------------
Date: Tue Mar 1 09:37:27 2022
Quorum provider: corosync_votequorum
Nodes: 11
Node ID: 0x0000000a
Ring ID: 1.7451a
Quorate: Yes
Votequorum information
----------------------
Expected votes: 11
Highest expected: 11
Total votes: 11
Quorum: 6
Flags: Quorate
Membership information
----------------------
Nodeid Votes Name
0x00000001 1 192.168.50.60
0x00000002 1 192.168.50.62
0x00000003 1 192.168.50.59
0x00000004 1 192.168.50.57
0x00000005 1 192.168.50.64
0x00000006 1 192.168.50.65
0x00000007 1 192.168.50.66
0x00000008 1 192.168.50.51
0x00000009 1 192.168.50.68
0x0000000a 1 192.168.50.61 (local)
0x0000000b 1 192.168.50.67
# pveversion -v
proxmox-ve: 7.1-1 (running kernel: 5.11.22-5-pve)
pve-manager: 7.1-10 (running version: 7.1-10/6ddebafe)
pve-kernel-helper: 7.1-8
pve-kernel-5.13: 7.1-6
pve-kernel-5.11: 7.0-10
pve-kernel-5.4: 6.4-4
pve-kernel-5.0: 6.0-11
pve-kernel-5.13.19-3-pve: 5.13.19-7
pve-kernel-5.11.22-7-pve: 5.11.22-12
pve-kernel-5.11.22-5-pve: 5.11.22-10
pve-kernel-5.11.22-2-pve: 5.11.22-4
pve-kernel-5.4.124-1-pve: 5.4.124-2
pve-kernel-5.4.114-1-pve: 5.4.114-1
pve-kernel-5.0.21-5-pve: 5.0.21-10
pve-kernel-5.0.15-1-pve: 5.0.15-1
ceph-fuse: 15.2.15-pve1
corosync: 3.1.5-pve2
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown: residual config
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.22-pve2
libproxmox-acme-perl: 1.4.1
libproxmox-backup-qemu0: 1.2.0-1
libpve-access-control: 7.1-6
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.1-2
libpve-guest-common-perl: 4.0-3
libpve-http-server-perl: 4.1-1
libpve-storage-perl: 7.0-15
libqb0: 1.0.5-1
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 4.0.11-1
lxcfs: 4.0.11-pve1
novnc-pve: 1.3.0-1
proxmox-backup-client: 2.1.5-1
proxmox-backup-file-restore: 2.1.5-1
proxmox-mini-journalreader: 1.3-1
proxmox-widget-toolkit: 3.4-5
pve-cluster: 7.1-3
pve-container: 4.1-3
pve-docs: 7.1-2
pve-edk2-firmware: 3.20210831-2
pve-firewall: 4.2-5
pve-firmware: 3.3-4
pve-ha-manager: 3.3-3
pve-i18n: 2.6-2
pve-qemu-kvm: 6.1.1-1
pve-xtermjs: 4.16.0-1
qemu-server: 7.1-4
smartmontools: 7.2-pve2
spiceterm: 3.2-2
swtpm: 0.7.0~rc1+2
vncterm: 1.7-1
zfsutils-linux: 2.1.2-pve1
● pvestatd.service - PVE Status Daemon
Loaded: loaded (/lib/systemd/system/pvestatd.service; enabled; vendor preset: enabled)
Active: failed (Result: timeout) since Tue 2022-03-01 09:30:50 CET; 9min ago
Process: 3437406 ExecStart=/usr/bin/pvestatd start (code=exited, status=0/SUCCESS)
Process: 3437653 ExecStop=/usr/bin/pvestatd stop (code=exited, status=0/SUCCESS)
Main PID: 3437407 (code=exited, status=0/SUCCESS)
CPU: 3.080s
Mar 01 09:28:31 px11 systemd[1]: Started PVE Status Daemon.
Mar 01 09:29:18 px11 systemd[1]: Stopping PVE Status Daemon...
Mar 01 09:29:19 px11 pvestatd[3437407]: received signal TERM
Mar 01 09:29:19 px11 pvestatd[3437407]: server closing
Mar 01 09:29:19 px11 pvestatd[3437407]: server stopped
Mar 01 09:30:50 px11 systemd[1]: pvestatd.service: State 'stop-sigterm' timed out. Killing.
Mar 01 09:30:50 px11 systemd[1]: pvestatd.service: Killing process 3437536 (vgs) with signal SIGKILL.
Mar 01 09:30:50 px11 systemd[1]: pvestatd.service: Failed with result 'timeout'.
Mar 01 09:30:50 px11 systemd[1]: Stopped PVE Status Daemon.
Mar 01 09:30:50 px11 systemd[1]: pvestatd.service: Consumed 3.080s CPU time.
syslog:
Feb 28 08:45:27 px11 systemd[1]: pvestatd.service: State 'stop-sigterm' timed out. Killing.
Feb 28 08:45:27 px11 systemd[1]: pvestatd.service: Killing process 2965096 (vgs) with signal SIGKILL.
Feb 28 08:45:27 px11 systemd[1]: pvestatd.service: Failed with result 'timeout'.
Feb 28 08:45:27 px11 systemd[1]: pvestatd.service: Consumed 6.351s CPU time.
Feb 28 08:45:28 px11 pvestatd[3120017]: starting server
Feb 28 09:42:03 px11 pvestatd[3120017]: received signal TERM
Feb 28 09:42:03 px11 pvestatd[3120017]: server closing
Feb 28 09:42:03 px11 pvestatd[3120017]: server stopped
Feb 28 09:43:34 px11 systemd[1]: pvestatd.service: State 'stop-sigterm' timed out. Killing.
Feb 28 09:43:34 px11 systemd[1]: pvestatd.service: Killing process 3120121 (vgs) with signal SIGKILL.
Feb 28 09:43:34 px11 systemd[1]: pvestatd.service: Failed with result 'timeout'.
Feb 28 09:43:34 px11 systemd[1]: pvestatd.service: Consumed 3.404s CPU time.
Feb 28 09:44:50 px11 pvestatd[3133875]: starting server
Feb 28 09:52:26 px11 pvestatd[3133875]: received signal TERM
Feb 28 09:52:26 px11 pvestatd[3133875]: server closing
Feb 28 09:52:26 px11 pvestatd[3133875]: server stopped
Feb 28 09:53:57 px11 systemd[1]: pvestatd.service: State 'stop-sigterm' timed out. Killing.
Feb 28 09:53:57 px11 systemd[1]: pvestatd.service: Killing process 3133985 (vgs) with signal SIGKILL.
Feb 28 09:53:57 px11 systemd[1]: pvestatd.service: Failed with result 'timeout'.
Feb 28 09:53:57 px11 systemd[1]: pvestatd.service: Consumed 3.161s CPU time.
Feb 28 09:53:58 px11 pvestatd[3136129]: starting server
Feb 28 10:07:04 px11 pvestatd[3140143]: send HUP to 3136129
Feb 28 10:07:04 px11 pvestatd[3136129]: received signal HUP
Feb 28 10:08:25 px11 pvestatd[3136129]: received signal TERM
Feb 28 10:08:25 px11 pvestatd[3136129]: server closing
Feb 28 10:08:25 px11 pvestatd[3136129]: server stopped
Feb 28 10:09:56 px11 systemd[1]: pvestatd.service: State 'stop-sigterm' timed out. Killing.
Feb 28 10:09:56 px11 systemd[1]: pvestatd.service: Killing process 3136263 (vgs) with signal SIGKILL.
Feb 28 10:09:56 px11 systemd[1]: pvestatd.service: Failed with result 'timeout'.
Feb 28 10:09:56 px11 systemd[1]: pvestatd.service: Consumed 4.355s CPU time.
Feb 28 10:09:57 px11 pvestatd[3141128]: starting server
Mar 1 08:41:58 px11 pvestatd[3141128]: received signal TERM
Mar 1 08:41:58 px11 pvestatd[3141128]: server closing
Mar 1 08:41:58 px11 pvestatd[3141128]: server stopped
Mar 1 08:43:29 px11 systemd[1]: pvestatd.service: State 'stop-sigterm' timed out. Killing.
Mar 1 08:43:29 px11 systemd[1]: pvestatd.service: Killing process 3141239 (vgs) with signal SIGKILL.
Mar 1 08:43:29 px11 systemd[1]: pvestatd.service: Failed with result 'timeout'.
Mar 1 08:43:29 px11 systemd[1]: pvestatd.service: Consumed 8.728s CPU time.
Mar 1 08:43:30 px11 pvestatd[3427769]: starting server
Mar 1 09:23:23 px11 pvestatd[3427769]: received signal TERM
Mar 1 09:23:23 px11 pvestatd[3427769]: server closing
Mar 1 09:23:23 px11 pvestatd[3427769]: server stopped
Mar 1 09:24:54 px11 systemd[1]: pvestatd.service: State 'stop-sigterm' timed out. Killing.
Mar 1 09:24:54 px11 systemd[1]: pvestatd.service: Killing process 3427894 (vgs) with signal SIGKILL.
Mar 1 09:24:54 px11 systemd[1]: pvestatd.service: Failed with result 'timeout'.
Mar 1 09:24:54 px11 systemd[1]: pvestatd.service: Consumed 3.243s CPU time.
Mar 1 09:24:55 px11 pvestatd[3436449]: starting server
Mar 1 09:26:49 px11 pvestatd[3436449]: received signal TERM
Mar 1 09:26:49 px11 pvestatd[3436449]: server closing
Mar 1 09:26:49 px11 pvestatd[3436449]: server stopped
Mar 1 09:28:20 px11 systemd[1]: pvestatd.service: State 'stop-sigterm' timed out. Killing.
Mar 1 09:28:20 px11 systemd[1]: pvestatd.service: Killing process 3436620 (vgs) with signal SIGKILL.
Mar 1 09:28:20 px11 systemd[1]: pvestatd.service: Failed with result 'timeout'.
Mar 1 09:28:20 px11 systemd[1]: pvestatd.service: Consumed 3.093s CPU time.
Mar 1 09:28:31 px11 pvestatd[3437407]: starting server
Mar 1 09:29:19 px11 pvestatd[3437407]: received signal TERM
Mar 1 09:29:19 px11 pvestatd[3437407]: server closing
Mar 1 09:29:19 px11 pvestatd[3437407]: server stopped
Mar 1 09:30:50 px11 systemd[1]: pvestatd.service: State 'stop-sigterm' timed out. Killing.
Mar 1 09:30:50 px11 systemd[1]: pvestatd.service: Killing process 3437536 (vgs) with signal SIGKILL.
Mar 1 09:30:50 px11 systemd[1]: pvestatd.service: Failed with result 'timeout'.
Mar 1 09:30:50 px11 systemd[1]: pvestatd.service: Consumed 3.080s CPU time.
Mar 1 09:30:58 px11 pvestatd[3438007]: Unknown option: verbose
Mar 1 09:31:09 px11 pvestatd[3438054]: Unknown option: verbose
Mar 1 09:31:26 px11 pvestatd[3438115]: starting server