Every time I try to install Linux 6.2 Kernel, the OS breaks

talormanda

New Member
May 18, 2023
13
0
1
Running 4x Intel Core i3-8109U @ 3.00GHz and 32GB of RAM on a NUC (boxnuc8i3bek1)

Every time I upgrade to the Linux 6.2 Kernel, I run into issues where the OS will fail to restart and I have to force it off. This is from a fresh install using all default options. Now, the local (proxmox) has a grey question mark over it. Is there something I am doing wrong? I tried to restart from the menu, and it took an extremely long time and then this error appeared on the screen:

Bash:
watchdog  watchdog0: watchdog did not stop!

Is there some log somewhere that I can look to help see why this happened? I tried to do a fresh install 4 or 5 times now and everything goes smoothly until I get the 6.2 Kernel on there. When it restarted and came back up, it seems to be okay so far.

UI:
1684382614895.png

Error on the screen before turning off:
IMG_20230516_204736.jpg
 
Last edited:
Hi,
please provide the jorunal since reboot by running journalctl -b > journal.txt and attach the generated file.
 
Hi,
please provide the jorunal since reboot by running journalctl -b > journal.txt and attach the generated file.
I just installed a VM and had it running for like an hour or so, then I stopped the VM and left proxmox idle with nothing else active. When going back to get this journal for you, the pve appears to be down again:

1684395443504.png

1684395500572.png

I am attempting to restart it cleanly again from the UI and will get you this journal as soon as I can. It appears to happen when the pve goes idle.
 
I had to force shutdown the pve by holding the button on my NUC. Here is the file.
 

Attachments

  • journal.txt
    112.8 KB · Views: 3
After forcing the machine off and getting the journal file for you, I left it running idle overnight. Came back to find that it's not pingable anymore and this is on the screen.

20230518_121854.jpg
 
It looks like memory corruption. Have you tested the memory?
Would you mean by this, the memtest that shows up upon boot?

Edit: I downloaded memtest86 version 10.4 and am currently running that against my 2x16GB sticks.
 
Last edited:
Memtest86 was a success - any thoughts on the journal file @Chris ?

1684469141038.png
 

Attachments

  • journal.txt
    117.1 KB · Views: 1
Last edited:
Yes, sorry but since you rebooted the system, the logs do not contain the errors as initially intended. Please provide the logs from a time range around when the errors happen journalctl --since <DATETIME> --until <DATETIME> > journal.txt
 
Yes, sorry but since you rebooted the system, the logs do not contain the errors as initially intended. Please provide the logs from a time range around when the errors happen journalctl --since <DATETIME> --until <DATETIME> > journal.txt

I am pretty sure this is when a crash occurred today. Around May 19 01:24 it locked up and I had to force power it off. Right at May 19 01:29:35 you can see where it started to boot again.
 

Attachments

  • journal.txt
    266.7 KB · Views: 4
Last edited:
Update: I saw some posts about c-states and people disabling them. This gave me a hunch and I disabled intel turbo boost in the BIOS. So far, it hasn't crashed in hours. I am going to wipe and reload the system to see if has resolved the problem. I am curious if any logs indicate and sync up to my findings?
 
@Chris Another update. I reinstalled and put the 6.2 kernel on again. After about 35 minutes, the device locked up. It stopped pinging and the screen was frozen at the "proxmox login:" screen and nothing was responding. I had to force restart it.

I am attaching the log from this timeframe. My restart was around [May 19 18:05:52] and the crash / no longer pinging started around [May 19 18:40:35].

If you look in the journal log around this time, you can see it starts to say some issues. I have no idea why this is happening. I have secure boot disabled, TPM disabled, turbo boost disabled, no VMs added, CPU usage is basically 0, memtest86 was passing after a 10 hour test.

Code:
May 19 18:40:35 proxmox kernel: BUG: unable to handle page fault for address: 00000000000f424b
May 19 18:40:35 proxmox kernel: #PF: supervisor write access in kernel mode
May 19 18:40:35 proxmox kernel: #PF: error_code(0x0002) - not-present page
May 19 18:40:35 proxmox kernel: PGD 0 P4D 0
May 19 18:40:35 proxmox kernel: Oops: 0002 [#1] PREEMPT SMP PTI
May 19 18:40:35 proxmox kernel: CPU: 0 PID: 518 Comm: watchdog-mux Tainted: P           O       6.2.11-2-pve #1
May 19 18:40:35 proxmox kernel: Hardware name: Intel(R) Client Systems NUC8i3BEK/NUC8BEB, BIOS BECFL357.86A.0092.2023.0214.1114 02/14/2023
May 19 18:40:35 proxmox kernel: RIP: 0010:osq_lock+0x3d/0x160
May 19 18:40:35 proxmox kernel: Code: 48 89 d3 48 83 ec 10 65 8b 05 ab e9 0c 69 83 c0 01 65 48 03 1d ec 73 0b 69 c7 43 10 00 00 00 00 48 c7 03 00 00 00 00 89 43 14 <87> 07 85 c0 0f 84 cf 00 00 00 83 e8 01 49 89 fc 48 98 48 3d ff 1f
May 19 18:40:35 proxmox kernel: RSP: 0018:ffffa8d3410a7d20 EFLAGS: 00010286
May 19 18:40:35 proxmox kernel: RAX: 0000000000000001 RBX: ffff944d9dc324c0 RCX: 0000000000000000
May 19 18:40:35 proxmox kernel: RDX: 00000000000324c0 RSI: 0000000000000000 RDI: 00000000000f424b
May 19 18:40:35 proxmox kernel: RBP: ffffa8d3410a7d40 R08: 0000000000000001 R09: 0000000000000000
May 19 18:40:35 proxmox kernel: R10: 0000000000000001 R11: 0000000000000000 R12: 00000000000f423f
May 19 18:40:35 proxmox kernel: R13: 00000000000f424b R14: ffff944646800000 R15: 0000000000000000
May 19 18:40:35 proxmox kernel: FS:  00007fe69b4ce540(0000) GS:ffff944d9dc00000(0000) knlGS:0000000000000000
May 19 18:40:35 proxmox kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
May 19 18:40:35 proxmox kernel: CR2: 00000000000f424b CR3: 000000010e1ae005 CR4: 00000000003706f0

Here is more logs from a recent crash:

Code:
May 20 01:34:08 proxmox kernel: INFO: task vgs:12452 blocked for more than 120 seconds.
May 20 01:34:08 proxmox kernel:       Tainted: P        W  O       6.2.11-2-pve #1
May 20 01:34:08 proxmox kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 20 01:34:08 proxmox kernel: task:vgs             state:D stack:0     pid:12452 ppid:826    flags:0x00000000
May 20 01:34:08 proxmox kernel: Call Trace:
May 20 01:34:08 proxmox kernel:  <TASK>
May 20 01:34:08 proxmox kernel:  __schedule+0x3ac/0x14b0
May 20 01:34:08 proxmox kernel:  schedule+0x68/0x100
May 20 01:34:08 proxmox kernel:  schedule_timeout+0x14b/0x160
May 20 01:34:08 proxmox kernel:  ? __percpu_ref_switch_mode+0xe7/0x1e0
May 20 01:34:08 proxmox kernel:  __wait_for_common+0x8f/0x190
May 20 01:34:08 proxmox kernel:  ? __pfx_schedule_timeout+0x10/0x10
May 20 01:34:08 proxmox kernel:  wait_for_completion+0x24/0x30
May 20 01:34:08 proxmox kernel:  __x64_sys_io_destroy+0xb9/0x110
May 20 01:34:08 proxmox kernel:  do_syscall_64+0x59/0x90
May 20 01:34:08 proxmox kernel:  ? exit_to_user_mode_prepare+0x37/0x180
May 20 01:34:08 proxmox kernel:  ? syscall_exit_to_user_mode+0x26/0x50
May 20 01:34:08 proxmox kernel:  ? do_syscall_64+0x69/0x90
May 20 01:34:08 proxmox kernel:  ? do_syscall_64+0x69/0x90
May 20 01:34:08 proxmox kernel:  entry_SYSCALL_64_after_hwframe+0x72/0xdc
May 20 01:34:08 proxmox kernel: RIP: 0033:0x7f5a18334f29
May 20 01:34:08 proxmox kernel: RSP: 002b:00007ffdcff1e048 EFLAGS: 00000246 ORIG_RAX: 00000000000000cf
May 20 01:34:08 proxmox kernel: RAX: ffffffffffffffda RBX: 00007f5a17e42f90 RCX: 00007f5a18334f29
May 20 01:34:08 proxmox kernel: RDX: 00000000000000b1 RSI: 00007f5a1840ebe0 RDI: 00007f5a18702000
May 20 01:34:08 proxmox kernel: RBP: 00007f5a18702000 R08: 00007f5a1840ebe0 R09: 0000000000000000
May 20 01:34:08 proxmox kernel: R10: 000055930f353c60 R11: 0000000000000246 R12: 0000000000000000
May 20 01:34:08 proxmox kernel: R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
May 20 01:34:08 proxmox kernel:  </TASK>
May 20 01:35:26 proxmox pvedaemon[829]: <root@pam> starting task UPID:proxmox:000033B1:0005A720:64685C1E:vncshell::root@pam:
May 20 01:35:26 proxmox pvedaemon[13233]: starting termproxy UPID:proxmox:000033B1:0005A720:64685C1E:vncshell::root@pam:
May 20 01:35:26 proxmox pvedaemon[11475]: <root@pam> successful auth for user 'root@pam'
May 20 01:35:26 proxmox login[13241]: pam_unix(login:session): session opened for user root(uid=0) by (uid=0)
May 20 01:35:26 proxmox systemd-logind[495]: New session 5 of user root.
May 20 01:35:26 proxmox systemd[1]: Started Session 5 of user root.
May 20 01:35:26 proxmox login[13246]: ROOT LOGIN  on '/dev/pts/1'
May 20 01:35:28 proxmox systemd[1]: session-5.scope: Succeeded.
May 20 01:35:28 proxmox systemd-logind[495]: Session 5 logged out. Waiting for processes to exit.
May 20 01:35:28 proxmox systemd-logind[495]: Removed session 5.
May 20 01:35:28 proxmox pvedaemon[829]: <root@pam> end task UPID:proxmox:000033B1:0005A720:64685C1E:vncshell::root@pam: OK
May 20 01:35:45 proxmox kernel: rcu: rcu_implicit_dynticks_qs: grp: 0-0 level: 0 ->gp_seq 0 ->completedqs 0
May 20 01:35:45 proxmox kernel: rcu: rcu_implicit_dynticks_qs: 0:0 ->qsmask 0x0 ->qsmaskinit 0x0 ->qsmaskinitnext 0x0 ->rcu_gp_init_mask 0x0
May 20 01:35:45 proxmox kernel: rcu: rcu_implicit_dynticks_qs 2: . online: -1192(0) offline: -1200(8)
May 20 01:36:08 proxmox kernel: INFO: task vgs:12452 blocked for more than 241 seconds.
May 20 01:36:08 proxmox kernel:       Tainted: P        W  O       6.2.11-2-pve #1
May 20 01:36:08 proxmox kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 20 01:36:08 proxmox kernel: task:vgs             state:D stack:0     pid:12452 ppid:826    flags:0x00000000
May 20 01:36:08 proxmox kernel: Call Trace:
May 20 01:36:08 proxmox kernel:  <TASK>
May 20 01:36:08 proxmox kernel:  __schedule+0x3ac/0x14b0
May 20 01:36:08 proxmox kernel:  schedule+0x68/0x100
May 20 01:36:08 proxmox kernel:  schedule_timeout+0x14b/0x160
May 20 01:36:08 proxmox kernel:  ? __percpu_ref_switch_mode+0xe7/0x1e0
May 20 01:36:08 proxmox kernel:  __wait_for_common+0x8f/0x190
May 20 01:36:08 proxmox kernel:  ? __pfx_schedule_timeout+0x10/0x10
May 20 01:36:08 proxmox kernel:  wait_for_completion+0x24/0x30
May 20 01:36:08 proxmox kernel:  __x64_sys_io_destroy+0xb9/0x110
May 20 01:36:08 proxmox kernel:  do_syscall_64+0x59/0x90
May 20 01:36:08 proxmox kernel:  ? exit_to_user_mode_prepare+0x37/0x180
May 20 01:36:08 proxmox kernel:  ? syscall_exit_to_user_mode+0x26/0x50
May 20 01:36:08 proxmox kernel:  ? do_syscall_64+0x69/0x90
May 20 01:36:08 proxmox kernel:  ? do_syscall_64+0x69/0x90
May 20 01:36:08 proxmox kernel:  entry_SYSCALL_64_after_hwframe+0x72/0xdc
May 20 01:36:08 proxmox kernel: RIP: 0033:0x7f5a18334f29
May 20 01:36:08 proxmox kernel: RSP: 002b:00007ffdcff1e048 EFLAGS: 00000246 ORIG_RAX: 00000000000000cf
May 20 01:36:08 proxmox kernel: RAX: ffffffffffffffda RBX: 00007f5a17e42f90 RCX: 00007f5a18334f29
May 20 01:36:08 proxmox kernel: RDX: 00000000000000b1 RSI: 00007f5a1840ebe0 RDI: 00007f5a18702000
May 20 01:36:08 proxmox kernel: RBP: 00007f5a18702000 R08: 00007f5a1840ebe0 R09: 0000000000000000
May 20 01:36:08 proxmox kernel: R10: 000055930f353c60 R11: 0000000000000246 R12: 0000000000000000
May 20 01:36:08 proxmox kernel: R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
May 20 01:36:08 proxmox kernel:  </TASK>
May 20 01:36:15 proxmox pveproxy[10686]: proxy detected vanished client connection
 

Attachments

  • journal.txt
    225.4 KB · Views: 1
Last edited:
Okay, the stack traces you post seem rather uncorrelated, so probably not directly related to the kernel. I stumbled upon this kernel bug report https://bugzilla.kernel.org/show_bug.cgi?id=215337 and there is also a rather long thread https://forum.proxmox.com/threads/vm-freezes-irregularly.111494/ which might be unrelated, but all point in direction of issues with power states.

Therefore, let's see which power states are available at runtime. You should be able to get infos about that using cpupower
Bash:
apt install linux-cpupower
cpupower idle-info
 
Okay, the stack traces you post seem rather uncorrelated, so probably not directly related to the kernel. I stumbled upon this kernel bug report https://bugzilla.kernel.org/show_bug.cgi?id=215337 and there is also a rather long thread https://forum.proxmox.com/threads/vm-freezes-irregularly.111494/ which might be unrelated, but all point in direction of issues with power states.

Therefore, let's see which power states are available at runtime. You should be able to get infos about that using cpupower
Bash:
apt install linux-cpupower
cpupower idle-info

Here is that:
Bash:
root@proxmox:~# cpupower idle-info
CPUidle driver: intel_idle
CPUidle governor: menu
analyzing CPU 0:

Number of idle states: 9
Available idle states: POLL C1 C1E C3 C6 C7s C8 C9 C10
POLL:
Flags/Description: CPUIDLE CORE POLL IDLE
Latency: 0
Usage: 7399
Duration: 151823
C1:
Flags/Description: MWAIT 0x00
Latency: 2
Usage: 3302
Duration: 2734712
C1E:
Flags/Description: MWAIT 0x01
Latency: 10
Usage: 6639
Duration: 3534769
C3:
Flags/Description: MWAIT 0x10
Latency: 70
Usage: 6001
Duration: 733782
C6:
Flags/Description: MWAIT 0x20
Latency: 85
Usage: 21105
Duration: 11914686
C7s:
Flags/Description: MWAIT 0x33
Latency: 124
Usage: 162
Duration: 289665
C8:
Flags/Description: MWAIT 0x40
Latency: 200
Usage: 209644
Duration: 667159875
C9:
Flags/Description: MWAIT 0x50
Latency: 480
Usage: 2403
Duration: 16220611
C10:
Flags/Description: MWAIT 0x60
Latency: 890
Usage: 149114
Duration: 3524724526

Also, I wanted to let you know about something. I took out my 32GB RAM that tested fine in a 10 hour memtest86, and put in the 4GB stick that came with this when I got it. So far, it was running for many hours without any crashes or errors in the syslog. I am in the middle of testing each stick by itself to see if that is part of the problem. The above code output is from when I had the 4GB stick in.

Edit to the above:
I left the 4GB stick in overnight for a few hours in slot 1, then a few hours again in slot 2 - no crashes.
I then put 1x 16GB stick in and left it for 1.5 hours - no issues.
I then put the 2nd 16GB stick in and left it for 3 hours - no issues.
I put both sticks of 16GB in and so far no issues at all, 100% - I am continuing to monitor this for a few hours.
I am very confused since the memory test came back positive!!!

The only error I have seen since putting in the 4 GB stick is:
Code:
May 20 10:19:01 proxmox kernel: i915 0000:00:02.0: [drm] *ERROR* DPCD write failed at:0x5c0
May 20 10:19:01 proxmox kernel: i915 0000:00:02.0: [drm] *ERROR* Failed to write infoframes

which I am not sure what that means, possibly related to HDMI / Displayport? Do we worry about this?
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!