After some GUI or CLI operations on containers (move volume to another storage for example), it happens that the hole node goes "crazy". The GUI shows every container with a grey question mark on it and no name associated and every command on containers doesn't work at all until I reboot the host!
Here is an example case:
You can see that the start command failed and that the node seems to be "broken" in some way.
What can I do to investigate?
/var/log/daemon.log :
/var/log/kern.log
Regards
Here is an example case:
You can see that the start command failed and that the node seems to be "broken" in some way.
What can I do to investigate?
Code:
# pveversion --verbose
proxmox-ve: 5.2-2 (running kernel: 4.13.13-5-pve)
pve-manager: 5.2-8 (running version: 5.2-8/fdf39912)
pve-kernel-4.15: 5.2-6
pve-kernel-4.13: 5.2-2
pve-kernel-4.15.18-3-pve: 4.15.18-22
pve-kernel-4.15.18-2-pve: 4.15.18-21
pve-kernel-4.15.18-1-pve: 4.15.18-19
pve-kernel-4.15.17-3-pve: 4.15.17-14
pve-kernel-4.15.17-2-pve: 4.15.17-10
pve-kernel-4.15.17-1-pve: 4.15.17-9
pve-kernel-4.15.15-1-pve: 4.15.15-6
pve-kernel-4.13.16-4-pve: 4.13.16-51
pve-kernel-4.13.16-3-pve: 4.13.16-50
pve-kernel-4.13.16-2-pve: 4.13.16-48
pve-kernel-4.13.16-1-pve: 4.13.16-46
pve-kernel-4.13.13-6-pve: 4.13.13-42
pve-kernel-4.13.13-5-pve: 4.13.13-38
corosync: 2.4.2-pve5
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: 1.2-2
libjs-extjs: 6.0.1-2
libpve-access-control: 5.0-8
libpve-apiclient-perl: 2.0-5
libpve-common-perl: 5.0-38
libpve-guest-common-perl: 2.0-17
libpve-http-server-perl: 2.0-10
libpve-storage-perl: 5.0-25
libqb0: 1.0.1-1
lvm2: 2.02.168-pve6
lxc-pve: 3.0.2+pve1-1
lxcfs: 3.0.0-1
novnc-pve: 1.0.0-2
proxmox-widget-toolkit: 1.0-19
pve-cluster: 5.0-30
pve-container: 2.0-26
pve-docs: 5.2-8
pve-firewall: 3.0-14
pve-firmware: 2.0-5
pve-ha-manager: 2.0-5
pve-i18n: 1.0-6
pve-libspice-server1: 0.12.8-3
pve-qemu-kvm: 2.11.2-1
pve-xtermjs: 1.0-5
qemu-server: 5.0-33
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3
zfsutils-linux: 0.7.9-pve1~bpo9
Code:
# pvestatd status
running
Code:
# ps auxwwwf | grep pve
root 3322 0.0 0.1 499260 80384 ? Ss janv.31 235:08 pve-firewall
root 31065 0.0 0.1 538512 114424 ? Ss janv.31 3:07 pvedaemon
root 12356 0.0 0.1 548868 112372 ? S 13:12 0:06 \_ pvedaemon worker
root 21108 0.0 0.1 548960 112396 ? S 13:39 0:05 \_ pvedaemon worker
root 10402 0.0 0.1 548868 112236 ? S 13:52 0:04 \_ pvedaemon worker
www-data 31215 0.0 0.1 540052 115848 ? Ss janv.31 4:53 pveproxy
www-data 2546 0.0 0.1 550876 114568 ? S 14:19 0:01 \_ pveproxy worker
www-data 3452 0.0 0.1 550468 113744 ? S 14:28 0:00 \_ pveproxy worker
www-data 11403 0.0 0.1 550168 113544 ? S 15:11 0:00 \_ pveproxy worker
root 31261 0.3 0.1 497380 79448 ? Ss janv.31 1014:42 pvestatd
root 16470 0.0 0.0 12788 948 pts/3 S+ 15:32 0:00 \_ grep --color=auto pve
root 6618 0.0 0.1 514908 84320 ? Ss août29 0:08 pve-ha-lrm
root 9418 0.0 0.1 515276 84712 ? Ss août29 0:05 pve-ha-crm
root 14716 0.0 0.0 89900 1920 ? Ssl 06:26 0:01 /usr/sbin/pvefw-logger
Code:
# pvecm status
Quorum information
------------------
Date: Thu Aug 30 15:33:39 2018
Quorum provider: corosync_votequorum
Nodes: 4
Node ID: 0x00000001
Ring ID: 1/496
Quorate: Yes
Votequorum information
----------------------
Expected votes: 4
Highest expected: 4
Total votes: 4
Quorum: 3
Flags: Quorate
Membership information
----------------------
Nodeid Votes Name
0x00000001 1 192.168.10.21 (local)
0x00000003 1 192.168.10.22
0x00000002 1 192.168.10.50
0x00000004 1 192.168.10.51
Code:
# cat '/var/log/pve/tasks/F/UPID:proxmox5-01:000054FB:6CAA9F15:5B87E3DF:vzstart:202:tom@pam:'
Job for pve-container@202.service failed because a timeout was exceeded.
See "systemctl status pve-container@202.service" and "journalctl -xe" for details.
TASK ERROR: command 'systemctl start pve-container@202' failed: exit code 1
/var/log/daemon.log :
Code:
Aug 30 14:23:01 proxmox5-01 pvedaemon[7314]: shutdown CT 202: UPID:proxmox5-01:00001C92:6CA9C075:5B87E1A5:vzshutdown:202:grobs@pam:
Aug 30 14:23:01 proxmox5-01 pvedaemon[7314]: command 'lxc-stop -n 202 --timeout 60' failed: exit code 1
Aug 30 14:23:02 proxmox5-01 pvedaemon[21108]: unable to get PID for CT 202 (not running?)
Aug 30 14:23:03 proxmox5-01 pvedaemon[21108]: unable to get PID for CT 202 (not running?)
Aug 30 14:28:15 proxmox5-01 pveproxy[21632]: worker exit
Aug 30 14:28:15 proxmox5-01 pveproxy[31215]: worker 21632 finished
Aug 30 14:28:15 proxmox5-01 pveproxy[31215]: starting 1 worker(s)
Aug 30 14:28:15 proxmox5-01 pveproxy[31215]: worker 3452 started
Aug 30 14:29:20 proxmox5-01 pmxcfs[27565]: [dcdb] notice: data verification successful
Aug 30 14:32:31 proxmox5-01 pvedaemon[21755]: starting CT 202: UPID:proxmox5-01:000054FB:6CAA9F15:5B87E3DF:vzstart:202:grobs@pam:
Aug 30 14:32:31 proxmox5-01 systemd[1]: Starting PVE LXC Container: 202...
Aug 30 14:32:31 proxmox5-01 systemd-udevd[21775]: Could not generate persistent MAC address for vethPEWIF2: No such file or directory
Aug 30 14:33:01 proxmox5-01 pveproxy[1983]: proxy detected vanished client connection
Aug 30 14:33:40 proxmox5-01 pmxcfs[27565]: [status] notice: received log
Aug 30 14:34:01 proxmox5-01 systemd[1]: pve-container@202.service: Start operation timed out. Terminating.
Aug 30 14:34:01 proxmox5-01 systemd[1]: Failed to start PVE LXC Container: 202.
Aug 30 14:34:01 proxmox5-01 systemd[1]: pve-container@202.service: Unit entered failed state.
Aug 30 14:34:01 proxmox5-01 systemd[1]: pve-container@202.service: Failed with result 'timeout'.
Aug 30 14:34:01 proxmox5-01 pvedaemon[21755]: command 'systemctl start pve-container@202' failed: exit code 1
Aug 30 14:34:32 proxmox5-01 pveproxy[2546]: proxy detected vanished client connection
Aug 30 14:36:03 proxmox5-01 pveproxy[1983]: proxy detected vanished client connection
Aug 30 14:37:34 proxmox5-01 pveproxy[1983]: proxy detected vanished client connection
/var/log/kern.log
Code:
Aug 30 14:23:01 proxmox5-01 pvedaemon[12356]: <grobs@pam> starting task UPID:proxmox5-01:00001C92:6CA9C075:5B87E1A5:vzshutdown:202:grobs@pam:
Aug 30 14:23:01 proxmox5-01 kernel: [18231069.108043] vmbr2: port 3(veth202i0) entered disabled state
Aug 30 14:23:01 proxmox5-01 kernel: [18231069.109240] device veth202i0 left promiscuous mode
Aug 30 14:23:01 proxmox5-01 kernel: [18231069.110170] vmbr2: port 3(veth202i0) entered disabled state
Aug 30 14:23:03 proxmox5-01 pvedaemon[12356]: <grobs@pam> end task UPID:proxmox5-01:00001C92:6CA9C075:5B87E1A5:vzshutdown:202:grobs@pam: OK
Aug 30 14:23:14 proxmox5-01 pvedaemon[16550]: <grobs@pam> move volume CT 202: move --volume rootfs --storage local-zfs
Aug 30 14:23:14 proxmox5-01 pvedaemon[10402]: <grobs@pam> starting task UPID:proxmox5-01:000040A6:6CA9C55A:5B87E1B2:move_volume:202:grobs@pam:
Aug 30 14:23:54 proxmox5-01 pvedaemon[10402]: <grobs@pam> end task UPID:proxmox5-01:000040A6:6CA9C55A:5B87E1B2:move_volume:202:grobs@pam: OK
Aug 30 14:32:31 proxmox5-01 pvedaemon[21108]: <grobs@pam> starting task UPID:proxmox5-01:000054FB:6CAA9F15:5B87E3DF:vzstart:202:grobs@pam:
Aug 30 14:32:31 proxmox5-01 kernel: [18231639.308536] IPv6: ADDRCONF(NETDEV_UP): veth202i0: link is not ready
Aug 30 14:32:32 proxmox5-01 kernel: [18231639.615995] vmbr2: port 3(veth202i0) entered blocking state
Aug 30 14:32:32 proxmox5-01 kernel: [18231639.616907] vmbr2: port 3(veth202i0) entered disabled state
Aug 30 14:32:32 proxmox5-01 kernel: [18231639.617818] device veth202i0 entered promiscuous mode
Aug 30 14:34:01 proxmox5-01 pvedaemon[21108]: <grobs@pam> end task UPID:proxmox5-01:000054FB:6CAA9F15:5B87E3DF:vzstart:202:grobs@pam: command 'systemctl start pve-container@202' failed: exit code 1
Code:
# uname -a
Linux proxmox5-01 4.13.13-5-pve #1 SMP PVE 4.13.13-38 (Fri, 26 Jan 2018 10:47:09 +0100) x86_64 GNU/Linux
Code:
proxmox5-01:/home/tom# dpkg -l | grep pve-kernel
ii pve-firmware 2.0-5 all Binary firmware code for the pve-kernel
ii pve-kernel-4.13 5.2-2 all Latest Proxmox VE Kernel Image
ii pve-kernel-4.13.13-5-pve 4.13.13-38 amd64 The Proxmox PVE Kernel Image
ii pve-kernel-4.13.13-6-pve 4.13.13-42 amd64 The Proxmox PVE Kernel Image
ii pve-kernel-4.13.16-1-pve 4.13.16-46 amd64 The Proxmox PVE Kernel Image
ii pve-kernel-4.13.16-2-pve 4.13.16-48 amd64 The Proxmox PVE Kernel Image
ii pve-kernel-4.13.16-3-pve 4.13.16-50 amd64 The Proxmox PVE Kernel Image
ii pve-kernel-4.13.16-4-pve 4.13.16-51 amd64 The Proxmox PVE Kernel Image
ii pve-kernel-4.15 5.2-6 all Latest Proxmox VE Kernel Image
ii pve-kernel-4.15.15-1-pve 4.15.15-6 amd64 The Proxmox PVE Kernel Image
ii pve-kernel-4.15.17-1-pve 4.15.17-9 amd64 The Proxmox PVE Kernel Image
ii pve-kernel-4.15.17-2-pve 4.15.17-10 amd64 The Proxmox PVE Kernel Image
ii pve-kernel-4.15.17-3-pve 4.15.17-14 amd64 The Proxmox PVE Kernel Image
ii pve-kernel-4.15.18-1-pve 4.15.18-19 amd64 The Proxmox PVE Kernel Image
ii pve-kernel-4.15.18-2-pve 4.15.18-21 amd64 The Proxmox PVE Kernel Image
ii pve-kernel-4.15.18-3-pve 4.15.18-22 amd64 The Proxmox PVE Kernel Image
Code:
# uptime
16:52:19 up 211 days, 2:33, 1 user, load average: 2,00, 2,25, 3,12
Regards
Last edited: