problem with UI

emmanuelv

New Member
Dec 9, 2021
2
0
1
54
hello

I recently upgraded my server to 7.1 ( pve-manager/7.1-7/df5740ad (running kernel: 5.13.19-2-pve) ) , it was working properly until yesterday, the server has crashed with a
Code:
watchdog: BUG: soft lockup - CPU#6 stuck for 8855s! [kcompactd0:66]

after reboot, I am able to ssh in the server but the pve UI is not reachable.
Code:
systemctl status 'pve*'
doest not report any noticeable error.

Bash:
 root@atlas:~#ss -tlpn
State      Recv-Q     Send-Q         Local Address:Port         Peer Address:Port    Process                                                                                                                                       
LISTEN     0          4096                 0.0.0.0:8008              0.0.0.0:*        users:(("docker-proxy",pid=1865,fd=4))                                                                                                       
LISTEN     0          4096                 0.0.0.0:9000              0.0.0.0:*        users:(("docker-proxy",pid=1821,fd=4))                                                                                                       
LISTEN     0          4096                 0.0.0.0:111               0.0.0.0:*        users:(("rpcbind",pid=1287,fd=4),("systemd",pid=1,fd=36))                                                                                    
LISTEN     0          4096               127.0.0.1:85                0.0.0.0:*        users:(("pvedaemon worke",pid=2473,fd=6),("pvedaemon worke",pid=2472,fd=6),("pvedaemon worke",pid=2471,fd=6),("pvedaemon",pid=2470,fd=6))    
LISTEN     0          4096                 0.0.0.0:406               0.0.0.0:*        users:(("docker-proxy",pid=1932,fd=4))                                                                                                       
LISTEN     0          128                  0.0.0.0:22                0.0.0.0:*        users:(("sshd",pid=1440,fd=3))                                                                                                               
LISTEN     0          100                127.0.0.1:25                0.0.0.0:*        users:(("master",pid=1662,fd=13))                                                                                                            
LISTEN     0          4096                 0.0.0.0:8000              0.0.0.0:*        users:(("docker-proxy",pid=1891,fd=4))                                                                                                       
LISTEN     0          4096                       *:8006                    *:*        users:(("pveproxy worker",pid=2571,fd=6),("pveproxy worker",pid=2570,fd=6),("pveproxy worker",pid=2569,fd=6),("pveproxy",pid=2568,fd=6))     
LISTEN     0          4096                    [::]:8008                 [::]:*        users:(("docker-proxy",pid=1874,fd=4))                                                                                                       
LISTEN     0          4096                    [::]:9000                 [::]:*        users:(("docker-proxy",pid=1827,fd=4))                                                                                                       
LISTEN     0          4096                    [::]:111                  [::]:*        users:(("rpcbind",pid=1287,fd=6),("systemd",pid=1,fd=38))                                                                                    
LISTEN     0          4096                    [::]:406                  [::]:*        users:(("docker-proxy",pid=1938,fd=4))                                                                                                       
LISTEN     0          128                     [::]:22                   [::]:*        users:(("sshd",pid=1440,fd=4))                                                                                                               
LISTEN     0          4096                       *:3128                    *:*        users:(("spiceproxy work",pid=2575,fd=6),("spiceproxy",pid=2574,fd=6))                                                                    
LISTEN     0          100                    [::1]:25                   [::]:*        users:(("master",pid=1662,fd=14))                                                                                                            
LISTEN     0          4096                    [::]:8000                 [::]:*        users:(("docker-proxy",pid=1897,fd=4))

I wonder why daemon and proxy are not bound to the same address ?
The hostname is properly defined in /etc/hosts
iptables has not been touched, and there is not DROP input rule anyway.
curl on the server reports an empty reply from the server.

I don't know anymore where to look, any help would be greatly appreciated

thanks
emmanuel
 
hi,

I wonder why daemon and proxy are not bound to the same address ?
pvedaemon is the higher privileged daemon performing the actions, pveproxy is the GUI/API that runs as www-data user.

Code:
watchdog: BUG: soft lockup - CPU#6 stuck for 8855s! [kcompactd0:66]

could you post the syslog entries from around this? there might be other relevant messages before that error.

you can enable persistent journaling to keep journals from consecutive boots: mkdir -p /var/log/journal

after reboot, I am able to ssh in the server but the pve UI is not reachable.
what error do you get when trying to reach the GUI?
you can run journalctl -f while trying to access it and post the output here.

also please be aware that we don't recommend running docker on the PVE host itself, as docker can mess with the network configuration of the host.
 
the syslog is full of exceptions, mostly this one :

Code:
Dec  8 17:57:03 atlas pve-firewall[2074]: status update error: command 'iptables-save' failed: got signal 11
Dec  8 17:57:13 atlas kernel: [422603.502114] general protection fault, probably for non-canonical address 0xf7ff9fee0e2cee00: 0000 [#9526] SMP NOPTI
Dec  8 17:57:13 atlas kernel: [422603.503871] CPU: 4 PID: 849468 Comm: pve-firewall Tainted: P      D   IO      5.13.19-2-pve #1
Dec  8 17:57:13 atlas kernel: [422603.505626] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./C3758D4I-4L, BIOS P1.60 09/17/2019
Dec  8 17:57:13 atlas kernel: [422603.507413] RIP: 0010:lock_page_memcg+0x26/0xb0
Dec  8 17:57:13 atlas kernel: [422603.509216] Code: 00 0f 1f 00 0f 1f 44 00 00 55 48 89 e5 41 55 41 54 49 89 fc 53 48 8b 47 08 48 8d 50 ff a8 01 4c 0f 45 e2 0f 1f 44 00 00 eb 3d <8b> 83 00 0e 00 00 85 c0 7e 4c 4c 8d ab 40 04 00 00 4c 89 ef e8 11
Dec  8 17:57:13 atlas kernel: [422603.512982] RSP: 0018:ffffb682e5cf39b8 EFLAGS: 00010286
Dec  8 17:57:13 atlas kernel: [422603.514882] RAX: f7ff9fee0e2ce000 RBX: f7ff9fee0e2ce000 RCX: 0000000000000000
Dec  8 17:57:13 atlas kernel: [422603.516799] RDX: fffff38684e8c3c7 RSI: 0000000000000000 RDI: fffff38684e8c400
Dec  8 17:57:13 atlas kernel: [422603.518724] RBP: ffffb682e5cf39d0 R08: 000055fe82635000 R09: 00000000ffffffff
Dec  8 17:57:13 atlas kernel: [422603.520663] R10: ffffffffffffffeb R11: 0000000000000000 R12: fffff38684e8c400
Dec  8 17:57:13 atlas kernel: [422603.522604] R13: ffff9fee0735b1a8 R14: 000055fe82636000 R15: ffffb682e5cf3ba8
Dec  8 17:57:13 atlas kernel: [422603.524555] FS:  0000000000000000(0000) GS:ffff9ff55fd00000(0000) knlGS:0000000000000000
Dec  8 17:57:13 atlas kernel: [422603.526534] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Dec  8 17:57:13 atlas kernel: [422603.528518] CR2: 00007fe68f7a02e0 CR3: 000000071b596000 CR4: 00000000003526e0
Dec  8 17:57:13 atlas kernel: [422603.530519] Call Trace:
Dec  8 17:57:13 atlas kernel: [422603.532505]  page_remove_rmap+0x18/0x330
Dec  8 17:57:13 atlas kernel: [422603.534495]  unmap_page_range+0x7c0/0xe80
Dec  8 17:57:13 atlas kernel: [422603.536459]  unmap_single_vma+0x7f/0xf0
Dec  8 17:57:13 atlas kernel: [422603.538399]  unmap_vmas+0x77/0xf0
Dec  8 17:57:13 atlas kernel: [422603.540326]  exit_mmap+0xab/0x1f0
Dec  8 17:57:13 atlas kernel: [422603.542240]  mmput+0x5f/0x140
Dec  8 17:57:13 atlas kernel: [422603.544149]  begin_new_exec+0x4d7/0xa50
Dec  8 17:57:13 atlas kernel: [422603.546062]  load_elf_binary+0x730/0x16f0
Dec  8 17:57:13 atlas kernel: [422603.547944]  ? __kernel_read+0x19d/0x2c0
Dec  8 17:57:13 atlas kernel: [422603.549779]  ? aa_get_task_label+0x49/0xd0
Dec  8 17:57:13 atlas kernel: [422603.551571]  ? ima_bprm_check+0x89/0xb0
Dec  8 17:57:13 atlas kernel: [422603.553309]  bprm_execve+0x27f/0x660
Dec  8 17:57:13 atlas kernel: [422603.555002]  do_execveat_common+0x192/0x1c0
Dec  8 17:57:13 atlas kernel: [422603.556652]  __x64_sys_execve+0x39/0x50
Dec  8 17:57:13 atlas kernel: [422603.558254]  do_syscall_64+0x61/0xb0
Dec  8 17:57:13 atlas kernel: [422603.559807]  ? handle_mm_fault+0xda/0x2c0
Dec  8 17:57:13 atlas kernel: [422603.561312]  ? exit_to_user_mode_prepare+0x37/0x1b0
Dec  8 17:57:13 atlas kernel: [422603.562778]  ? irqentry_exit_to_user_mode+0x9/0x20
Dec  8 17:57:13 atlas kernel: [422603.564204]  ? irqentry_exit+0x19/0x30
Dec  8 17:57:13 atlas kernel: [422603.565580]  ? exc_page_fault+0x8f/0x170
Dec  8 17:57:13 atlas kernel: [422603.566949]  ? asm_exc_page_fault+0x8/0x30
Dec  8 17:57:13 atlas kernel: [422603.568288]  entry_SYSCALL_64_after_hwframe+0x44/0xae
Dec  8 17:57:13 atlas kernel: [422603.569602] RIP: 0033:0x7fe68f72e6c7
Dec  8 17:57:13 atlas kernel: [422603.570898] Code: Unable to access opcode bytes at RIP 0x7fe68f72e69d.

I did restart the server again this afternoon. and now the UI responds. still do not know what was the problem but it seems it has disappeared...

thanks for your help.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!