Hi! I have 2 nodes cluster
I upgraded to 7.14.2 last week and then have a problem with node interconnection: 401 errors, questions in web ui, etc, but all VM's works okay. Till yesterday. Yesterday it stops communicating between each other, and I reboot 1 of this (which stops responding).
After that, it boots too long (20-30 minutes), and no start web ui, auto boot only one VM (but I have totally 3 to auto start). I tried to upgrade 2 nodes to 8.1.3, and this bad node does not completely to upgrade. In journalctl/dmesg it always show timeouts.
pve-manager
I upgraded to 7.14.2 last week and then have a problem with node interconnection: 401 errors, questions in web ui, etc, but all VM's works okay. Till yesterday. Yesterday it stops communicating between each other, and I reboot 1 of this (which stops responding).
After that, it boots too long (20-30 minutes), and no start web ui, auto boot only one VM (but I have totally 3 to auto start). I tried to upgrade 2 nodes to 8.1.3, and this bad node does not completely to upgrade. In journalctl/dmesg it always show timeouts.
pve-manager
Code:
[ 725.573099] INFO: task pvestatd:2101 blocked for more than 604 seconds.
[ 725.573113] Tainted: P O 6.5.11-6-pve #1
[ 725.573121] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Code:
[ 604.741985] INFO: task pvecm:2132 blocked for more than 483 seconds.
[ 604.741993] Tainted: P O 6.5.11-6-pve #1
[ 604.742000] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 604.742008] task:pvecm state:D stack:0 pid:2132 ppid:1 flags:0x00004006
[ 604.742013] Call Trace:
[ 604.742014] <TASK>
[ 604.742017] __schedule+0x3fd/0x1450
[ 604.742021] ? srso_return_thunk+0x5/0x10
[ 604.742025] ? __wake_up_common_lock+0x8b/0xd0
[ 604.742035] schedule+0x63/0x110
[ 604.742039] request_wait_answer+0x1be/0x2a0
[ 604.742044] ? __pfx_autoremove_wake_function+0x10/0x10
[ 604.742050] fuse_simple_request+0x19d/0x2d0
[ 604.742055] fuse_create_open+0x243/0x570
[ 604.742070] fuse_atomic_open+0x139/0x180
[ 604.742075] path_openat+0x70f/0x1180
[ 604.742084] do_filp_open+0xaf/0x170
[ 604.742097] do_sys_openat2+0xb3/0xe0
[ 604.742101] ? handle_mm_fault+0xad/0x360
[ 604.742106] __x64_sys_openat+0x6c/0xa0
[ 604.742111] do_syscall_64+0x5b/0x90
[ 604.742115] ? srso_return_thunk+0x5/0x10
[ 604.742119] ? exc_page_fault+0x94/0x1b0
[ 604.742124] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
Code:
[ 483.909330] INFO: task pvecm:2132 blocked for more than 362 seconds.
[ 483.909338] Tainted: P O 6.5.11-6-pve #1
[ 483.909345] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 483.909353] task:pvecm state:D stack:0 pid:2132 ppid:1 flags:0x00004006
Code:
[ 604.741773] INFO: task pvestatd:2101 blocked for more than 483 seconds.
[ 604.741789] Tainted: P O 6.5.11-6-pve #1
[ 604.741796] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Code:
Cluster information
-------------------
Name: computing
Config Version: 2
Transport: knet
Secure auth: on
Quorum information
------------------
Date: Thu Nov 30 09:56:18 2023
Quorum provider: corosync_votequorum
Nodes: 2
Node ID: 0x00000001
Ring ID: 1.1717
Quorate: Yes
Votequorum information
----------------------
Expected votes: 2
Highest expected: 2
Total votes: 2
Quorum: 2
Flags: Quorate
Membership information
----------------------
Nodeid Votes Name
0x00000001 1 10.1.3.3 (local)
0x00000002 1 10.1.3.5
Code:
Job for pveproxy.service failed because a timeout was exceeded.
See "systemctl status pveproxy.service" and "journalctl -xeu pveproxy.service" for details.
dpkg: error processing package pve-manager (--configure):
installed pve-manager package post-installation script subprocess returned error exit status 1
dpkg: dependency problems prevent configuration of proxmox-ve:
proxmox-ve depends on pve-manager (>= 8.0.4); however:
Package pve-manager is not configured yet.
dpkg: error processing package proxmox-ve (--configure):
dependency problems - leaving unconfigured
Errors were encountered while processing:
pve-manager
proxmox-ve
E: Sub-process /usr/bin/dpkg returned an error code (1)
Last edited: