Kernel Panic, whole server crashes about every day

Kabbone

New Member
Jul 27, 2021
3
1
3
36
I ran into the same problems. After the upgrade from 6.4 to 7 last weekend I experienced multiple hang ups during high IO.
The disabling of io_uring with "aio=native" solved the issue for me for now.

Kernel (with 6.4 and 7): 5.11
CPU: Intel i5-6500T (no intel-microcode pkg installed)
RAM: 16GB DDR4
BIOS: newest available (01/2020 Fujitsu Esprimo D756)
VM: Root-fs on Proxmox Host LVM ext4
2 Blockdevice Passthrough (DM-Crypt BTRFS RAID-1)
Application: Proxmox Backup on a NFS (located on the BTRFS RAID-1 mentionend above) hosted in an VM

EDIT: all drives use the virtio driver
 
Last edited:

wiresandenergy

New Member
Jun 18, 2021
1
0
1
36
AIO-Native and amd64-microcode seems to be working here as well. Was experiencing this exact issue.

Specs:
AMD Ryzen 5950X
128GB DDR4 3200
Nvidia GeForce 1080ti
Asus B550-F Motherboard
LSI 9221-8i HBA

Just so I'm clear, aio-native is a temporary fix until the underlying issue is resolved?
 
Last edited:

gouthamravee

Member
May 16, 2019
17
1
8
Hi folks, I've been having the similar crashes on two proxmox servers.
One running an Intel core i5 3570k and the other is AMD Ryzen 1800x. I have two other proxmox servers also on version 7, Ryzen 2200G and Intel Core i3 i3-3220T that do not crash. The only similarity between the two crashing servers is both have large multi terabyte arrays, not ZFS arrays. I monitor both using Netdata and always see a huge IO spike on the disks when they crash.

I've setup kernel crash logs on both, but neither have provided any information.

The week things have been okay on one server, but the other one had a crash recently. I'm going to use this weekend to setup the remote crash logs per - https://pve.proxmox.com/wiki/Kernel_Crash_Trace_Log

Wanted to post this so the proxmox team is aware it might not be limited to AMD systems.
The heavy disk use VMs on both servers have the HDDs passed through to them using virtio, no cache.

I will post more info as soon as I collect them.
 

Southsko

New Member
Jul 29, 2021
1
0
1
40
Wanted to post this so the proxmox team is aware it might not be limited to AMD systems.
The heavy disk use VMs on both servers have the HDDs passed through to them using virtio, no cache.


Dude! I had the same problem. When I disabled virtio passthrough to a vm my system was fine. I ended up using nfs instead.

This happened on an upgraded 6 system and a brand new 7 install.
 

Attachments

  • Screenshot_20210728-190808_Photos-04.jpeg
    Screenshot_20210728-190808_Photos-04.jpeg
    833.3 KB · Views: 21
Last edited:

t.lamprecht

Proxmox Staff Member
Staff member
Jul 28, 2015
4,711
1,259
164
South Tyrol/Italy
shop.proxmox.com
FYI, there's a newer kernel as package pve-kernel-5.11.22-3-pve version 5.11.22-6 which solves an issue with some unexpected EAGAIN's that the io_uring kernel code got from some subsystems softirq code paths.

Any how, please try to upgrade to that kernel and also reboot into it, the package is available on the pve-no-subscription repository at time of writing.
It's a bit hard to tell in general, as there's quite the mix of kernel oopses posted in this thread, but at least some of the issues reported here should be gone.
 
  • Like
Reactions: gouthamravee

luckyluk83

New Member
Jul 14, 2021
3
1
3
38
I've had kernel panic every day for the past week or so.
It will hang on random or while trying to make a backup and then doing some files copying.

My vms are:
OMV managing the storage
few Ubuntu Servers
Win10Pro
POPOS
Manjaro

My config is
e5-2698 v3
128GB ECC DDR4 2133
2x120GB SSD
6X250GB SSD
2x16TB HDD
10x4TB SAS HDD on LSI 2008

Today I've upgraded to the newest kernel as advised in t.lamprecht post.
 
  • Like
Reactions: t.lamprecht

gouthamravee

Member
May 16, 2019
17
1
8
FYI, there's a newer kernel as package pve-kernel-5.11.22-3-pve version 5.11.22-6 which solves an issue with some unexpected EAGAIN's that the io_uring kernel code got from some subsystems softirq code paths.

Any how, please try to upgrade to that kernel and also reboot into it, the package is available on the pve-no-subscription repository at time of writing.
It's a bit hard to tell in general, as there's quite the mix of kernel oopses posted in this thread, but at least some of the issues reported here should be gone.
Thank you!

I tried to setup the remote kernel debugging last night but was not successful.
Will try again this weekend, but for now I've updated all my servers to the latest kernel and rebooted.

As a side note one of the servers has also been intermittently losing network connection, though it only seems to be affecting SSH connections. I'll still be able to access the VMs but proxmox it self will drop any ssh connections, usually appearing as offline on the proxmox dashboard.

Initially thought it was the ethernet cable but replacing with a known good cable didn't fix the issue.
That didn't happen for a while, but after updating and rebooting it just happened again.
 
Last edited:

luckyluk83

New Member
Jul 14, 2021
3
1
3
38
I've done the update and the server didn't even last 2 hours.
I'll the 6.4 for now and see how it goes without updating the kernel
 

timproxmox

New Member
Dec 18, 2019
3
0
1
46
maybe not related, we rolled back to 6.4 after having multipath issues with pve7. This was not clear to begin with as the symptoms was the VM's went into a panic and stopped responding requiring reboots and volume repairs.
 

t.lamprecht

Proxmox Staff Member
Staff member
Jul 28, 2015
4,711
1,259
164
South Tyrol/Italy
shop.proxmox.com
This was not clear to begin with as the symptoms was the VM's went into a panic and stopped responding requiring reboots and volume repairs.
This issue here is affecting only the host kernel, I mean if that crashes the VMs won't be happy, but pure VM crash and no host would mean that you have some different issue.

I've done the update and the server didn't even last 2 hours.
In what sense, you also only talked about hanging stuff, did you looked into the host and checked if that still worked, did others VMs still work, was there any error messages/panics in the syslog/dmesg? Any actual info could at least help to look into it.
 

luckyluk83

New Member
Jul 14, 2021
3
1
3
38
t.lamprecht i think I've the solution for the Proxmox host hanging with last 2 kernel versions.
Since few months I've had Turbo Unlock bios on my Rampage V x99 Motherboard for E5-2698 v3. Everything worked fine until the second to last kernel. Then yesterday I've upgraded the Proxmox to the newest kernel as advised by you but still I've had problems even sooner than before. So what I've done is to come back to the original bios and the system is working fine for more than 12 hours now with 10 vms running, occupying 70% of CPU and 60GB of RAM. Disks are busy as well as the network. Maybe there was something in the last kernel which didn't like the missing microcode in the Turbo Unlock Bios ?
 

gouthamravee

Member
May 16, 2019
17
1
8
After updating to pve-kernel-5.11.22-3-pve as suggested by @t.lamprecht I haven't had kernel panics on the two nodes that presented that problem.
I enabled all the VMs on both nodes that hit the disks hard, and did a few things that are known high disk IO tasks. So far high disk IO has been stable.

Though I have been having other issues per this post - https://forum.proxmox.com/threads/proxmox-7-losing-only-ssh-connection-to-node.93634/#post-407449
I'm an idiot for this ^

Specs on the nodes

Node 2 : (NVR and backup)
PCPartPicker Part List

CPU: AMD Ryzen 7 1800X 3.6 GHz 8-Core Processor ($315.00 @ Amazon)
CPU Cooler: Noctua NH-L9a-AM4 33.84 CFM CPU Cooler ($44.95 @ Amazon)
Motherboard: Gigabyte B450M DS3H V2 Micro ATX AM4 Motherboard ($64.00 @ Amazon)
Memory: Corsair Vengeance LPX 16 GB (2 x 8 GB) DDR4-2400 CL16 Memory ($86.99 @ Amazon)
Memory: PNY XLR8 16 GB (2 x 8 GB) DDR4-2666 CL16 Memory
Storage: Crucial MX500 500 GB 2.5" Solid State Drive ($49.30 @ Newegg)
Storage: Western Digital Caviar Green 3 TB 3.5" 5400RPM Internal Hard Drive
Storage: Western Digital Caviar Green 3 TB 3.5" 5400RPM Internal Hard Drive
Storage: Western Digital Caviar Green 3 TB 3.5" 5400RPM Internal Hard Drive
Storage: Western Digital Blue 6 TB 3.5" 5400RPM Internal Hard Drive ($147.09 @ Amazon)
Power Supply: Thermaltake Smart Series 430 W 80+ Certified ATX Power Supply ($38.24 @ Amazon)
Custom: Cudy Gigabit PCI-E/PCI Express Gigabit Network Adapters, 10/100/1000Mbps, 32-bit Gigabit PCIe Ethernet Adapter, Free Driver on Windows 10/8.1/8 (PE10) ($14.99 @ Amazon)
Custom: 2U (2 x 5.25" + 9 x 3.5" HDD Bay) Rackmount Chassis
Total: $760.56
Prices include shipping, taxes, and discounts when available
Generated by PCPartPicker 2021-07-30 11:24 EDT-0400


Node 4 : (NAS and media management)
PCPartPicker Part List

CPU: Intel Core i5-3570K 3.4 GHz Quad-Core Processor ($252.59 @ Amazon)
CPU Cooler: Thermaltake CLP0556 39.7 CFM Sleeve Bearing CPU Cooler ($14.99 @ Amazon)
Motherboard: Gigabyte GA-Z77X-UD3H ATX LGA1155 Motherboard
Memory: G.Skill Ripjaws X Series 32 GB (4 x 8 GB) DDR3-1600 CL9 Memory ($169.99 @ Newegg)
Storage: Western Digital Blue 3 TB 3.5" 5400RPM Internal Hard Drive ($129.00 @ Amazon)
Storage: Western Digital Blue 3 TB 3.5" 5400RPM Internal Hard Drive ($129.00 @ Amazon)
Storage: Western Digital Red 4 TB 3.5" 5400RPM Internal Hard Drive
Storage: Western Digital Red 4 TB 3.5" 5400RPM Internal Hard Drive
Storage: Western Digital RE 4 TB 3.5" 7200RPM Internal Hard Drive
Storage: Western Digital RE 4 TB 3.5" 7200RPM Internal Hard Drive
Storage: Western Digital RE 4 TB 3.5" 7200RPM Internal Hard Drive
Storage: Western Digital RE 4 TB 3.5" 7200RPM Internal Hard Drive
Storage: Hitachi Ultrastar 7K4000 4 TB 3.5" 7200RPM Internal Hard Drive
Storage: Hitachi Ultrastar 7K4000 4 TB 3.5" 7200RPM Internal Hard Drive
Storage: Hitachi Ultrastar 7K4000 4 TB 3.5" 7200RPM Internal Hard Drive
Storage: Hitachi Ultrastar 7K4000 4 TB 3.5" 7200RPM Internal Hard Drive
Storage: Hitachi Ultrastar 7K4000 4 TB 3.5" 7200RPM Internal Hard Drive
Power Supply: Rosewill Fortress 650 W 80+ Platinum Certified ATX Power Supply
Custom: SAS9211-8I 8PORT Int 6GB Sata+sas Pcie 2.0 ($149.90 @ Amazon)
Custom: SAS9211-8I 8PORT Int 6GB Sata+sas Pcie 2.0 ($149.90 @ Amazon)
Custom: 4U (6 x 5.25" + 7 x 3.5" HDD Bay) Rackmount Chassis
Total: $995.37
Prices include shipping, taxes, and discounts when available
Generated by PCPartPicker 2021-07-30 12:06 EDT-0400
 
Last edited:
  • Like
Reactions: fabian

t.lamprecht

Proxmox Staff Member
Staff member
Jul 28, 2015
4,711
1,259
164
South Tyrol/Italy
shop.proxmox.com
Since few months I've had Turbo Unlock bios on my Rampage V x99 Motherboard for E5-2698 v3. Everything worked fine until the second to last kernel. Then yesterday I've upgraded the Proxmox to the newest kernel as advised by you but still I've had problems even sooner than before. So what I've done is to come back to the original bios and the system is working fine for more than 12 hours now with 10 vms running, occupying 70% of CPU and 60GB of RAM.
Yeah some BIOS FW feature can definitively interfere with system stability, and a newer kernel can also make such interference surface as new issues.

Maybe there was something in the last kernel which didn't like the missing microcode in the Turbo Unlock Bios ?
The very last kernel is just a single patch in SCSI related io_uring code, fixing the issue this thread is/should be about:
https://git.proxmox.com/?p=pve-kernel.git;a=commitdiff;h=437b51a73b3fbfe4e5b708316c685060214a21cc

It's so minimal and also actually would fix things like that, so I really would be surprised if that was the cause of the regression you're seeing.

A bit before that some more stable updates got pulled in, those could include a patch that would regress on your system, I skimmed the list through and nothing obvious, clock/stepping/scheduler/power related stuck out.
For now I'd recommend keeping the BIOS closer to the default, and that turbo setting disabled.
 

cocoboig

New Member
May 25, 2020
11
3
3
50
After the upgrade to Proxmox 7, after a day, more or less, my system begins to malfunction and the syslog shows:

Code:
[15595.468370] perf: interrupt took too long (2504 > 2500), lowering kernel.perf_event_max_sample_rate to 79750
[25015.394686] perf: interrupt took too long (3132 > 3130), lowering kernel.perf_event_max_sample_rate to 63750
[58117.333752] perf: interrupt took too long (3919 > 3915), lowering kernel.perf_event_max_sample_rate to 51000
[88329.013489] INFO: task pvesr:503547 blocked for more than 120 seconds.
[88329.013522]       Tainted: P           O      5.11.22-3-pve #1
[88329.013541] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[88329.013565] task:pvesr           state:D stack:    0 pid:503547 ppid:     1 flags:0x00000000
[88329.013570] Call Trace:
[88329.013574]  __schedule+0x2ca/0x880
[88329.013582]  schedule+0x4f/0xc0
[88329.013584]  rwsem_down_write_slowpath+0x212/0x590
[88329.013591]  down_write+0x43/0x50
[88329.013594]  filename_create+0x7e/0x160
[88329.013600]  do_mkdirat+0x58/0x140
[88329.013604]  __x64_sys_mkdir+0x1b/0x20
[88329.013607]  do_syscall_64+0x38/0x90
[88329.013611]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[88329.013615] RIP: 0033:0x7f52d25b1b07
[88329.013618] RSP: 002b:00007ffe695daaf8 EFLAGS: 00000246 ORIG_RAX: 0000000000000053
[88329.013621] RAX: ffffffffffffffda RBX: 000055dd275d12a0 RCX: 00007f52d25b1b07
[88329.013622] RDX: 000055dd257b7a05 RSI: 00000000000001ff RDI: 000055dd2b63a400
[88329.013624] RBP: 0000000000000000 R08: 000055dd2ba7f228 R09: 0000000000000000
[88329.013625] R10: 0000000000000008 R11: 0000000000000246 R12: 000055dd2b63a400
[88329.013626] R13: 000055dd288ca5f8 R14: 000055dd2b862e58 R15: 00000000000001ff
[88449.842943] INFO: task pvesr:503547 blocked for more than 241 seconds.
[88449.842973]       Tainted: P           O      5.11.22-3-pve #1
[88449.842992] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[88449.843015] task:pvesr           state:D stack:    0 pid:503547 ppid:     1 flags:0x00000000
[88449.843019] Call Trace:
[88449.843023]  __schedule+0x2ca/0x880
[88449.843031]  schedule+0x4f/0xc0
[88449.843033]  rwsem_down_write_slowpath+0x212/0x590
[88449.843040]  down_write+0x43/0x50
[88449.843043]  filename_create+0x7e/0x160
[88449.843049]  do_mkdirat+0x58/0x140
[88449.843052]  __x64_sys_mkdir+0x1b/0x20
[88449.843056]  do_syscall_64+0x38/0x90
[88449.843059]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[88449.843064] RIP: 0033:0x7f52d25b1b07
[88449.843066] RSP: 002b:00007ffe695daaf8 EFLAGS: 00000246 ORIG_RAX: 0000000000000053
[88449.843069] RAX: ffffffffffffffda RBX: 000055dd275d12a0 RCX: 00007f52d25b1b07
[88449.843071] RDX: 000055dd257b7a05 RSI: 00000000000001ff RDI: 000055dd2b63a400
[88449.843072] RBP: 0000000000000000 R08: 000055dd2ba7f228 R09: 0000000000000000
[88449.843073] R10: 0000000000000008 R11: 0000000000000246 R12: 000055dd2b63a400
[88449.843074] R13: 000055dd288ca5f8 R14: 000055dd2b862e58 R15: 00000000000001ff
[88570.672693] INFO: task pvesr:503547 blocked for more than 362 seconds.
[88570.672722]       Tainted: P           O      5.11.22-3-pve #1
[88570.672741] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[88570.672765] task:pvesr           state:D stack:    0 pid:503547 ppid:     1 flags:0x00000000
[88570.672769] Call Trace:
[88570.672773]  __schedule+0x2ca/0x880
[88570.672780]  schedule+0x4f/0xc0
[88570.672782]  rwsem_down_write_slowpath+0x212/0x590
[88570.672789]  down_write+0x43/0x50
[88570.672792]  filename_create+0x7e/0x160
[88570.672797]  do_mkdirat+0x58/0x140
[88570.672801]  __x64_sys_mkdir+0x1b/0x20
[88570.672804]  do_syscall_64+0x38/0x90
[88570.672807]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[88570.672812] RIP: 0033:0x7f52d25b1b07
[88570.672814] RSP: 002b:00007ffe695daaf8 EFLAGS: 00000246 ORIG_RAX: 0000000000000053
[88570.672817] RAX: ffffffffffffffda RBX: 000055dd275d12a0 RCX: 00007f52d25b1b07
[88570.672818] RDX: 000055dd257b7a05 RSI: 00000000000001ff RDI: 000055dd2b63a400
[88570.672820] RBP: 0000000000000000 R08: 000055dd2ba7f228 R09: 0000000000000000
[88570.672821] R10: 0000000000000008 R11: 0000000000000246 R12: 000055dd2b63a400
[88570.672822] R13: 000055dd288ca5f8 R14: 000055dd2b862e58 R15: 00000000000001ff
[88691.502613] INFO: task pvesr:503547 blocked for more than 483 seconds.
[88691.502646]       Tainted: P           O      5.11.22-3-pve #1
[88691.502665] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[88691.502689] task:pvesr           state:D stack:    0 pid:503547 ppid:     1 flags:0x00000000
[88691.502694] Call Trace:
[88691.502698]  __schedule+0x2ca/0x880
[88691.502705]  schedule+0x4f/0xc0
[88691.502708]  rwsem_down_write_slowpath+0x212/0x590
[88691.502714]  down_write+0x43/0x50
[88691.502717]  filename_create+0x7e/0x160
[88691.502723]  do_mkdirat+0x58/0x140
[88691.502727]  __x64_sys_mkdir+0x1b/0x20
[88691.502730]  do_syscall_64+0x38/0x90
[88691.502734]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[88691.502738] RIP: 0033:0x7f52d25b1b07
[88691.502741] RSP: 002b:00007ffe695daaf8 EFLAGS: 00000246 ORIG_RAX: 0000000000000053
[88691.502744] RAX: ffffffffffffffda RBX: 000055dd275d12a0 RCX: 00007f52d25b1b07
[88691.502746] RDX: 000055dd257b7a05 RSI: 00000000000001ff RDI: 000055dd2b63a400
[88691.502747] RBP: 0000000000000000 R08: 000055dd2ba7f228 R09: 0000000000000000
[88691.502748] R10: 0000000000000008 R11: 0000000000000246 R12: 000055dd2b63a400
[88691.502750] R13: 000055dd288ca5f8 R14: 000055dd2b862e58 R15: 00000000000001ff
[88812.332695] INFO: task pvesr:503547 blocked for more than 604 seconds.
[88812.332725]       Tainted: P           O      5.11.22-3-pve #1
[88812.332743] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[88812.332765] task:pvesr           state:D stack:    0 pid:503547 ppid:     1 flags:0x00000000
[88812.332769] Call Trace:
[88812.332773]  __schedule+0x2ca/0x880
[88812.332779]  schedule+0x4f/0xc0
[88812.332782]  rwsem_down_write_slowpath+0x212/0x590
[88812.332788]  down_write+0x43/0x50
[88812.332790]  filename_create+0x7e/0x160
[88812.332812]  do_mkdirat+0x58/0x140
[88812.332815]  __x64_sys_mkdir+0x1b/0x20
[88812.332819]  do_syscall_64+0x38/0x90
[88812.332822]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[88812.332827] RIP: 0033:0x7f52d25b1b07
[88812.332829] RSP: 002b:00007ffe695daaf8 EFLAGS: 00000246 ORIG_RAX: 0000000000000053
[88812.332832] RAX: ffffffffffffffda RBX: 000055dd275d12a0 RCX: 00007f52d25b1b07
[88812.332833] RDX: 000055dd257b7a05 RSI: 00000000000001ff RDI: 000055dd2b63a400
[88812.332834] RBP: 0000000000000000 R08: 000055dd2ba7f228 R09: 0000000000000000
[88812.332836] R10: 0000000000000008 R11: 0000000000000246 R12: 000055dd2b63a400
[88812.332837] R13: 000055dd288ca5f8 R14: 000055dd2b862e58 R15: 00000000000001ff
[88933.162930] INFO: task pvesr:503547 blocked for more than 724 seconds.
[88933.163015]       Tainted: P           O      5.11.22-3-pve #1
[88933.163074] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[88933.163131] task:pvesr           state:D stack:    0 pid:503547 ppid:     1 flags:0x00000000
[88933.163135] Call Trace:
[88933.163139]  __schedule+0x2ca/0x880
[88933.163146]  schedule+0x4f/0xc0
[88933.163149]  rwsem_down_write_slowpath+0x212/0x590
[88933.163155]  down_write+0x43/0x50
[88933.163158]  filename_create+0x7e/0x160
[88933.163164]  do_mkdirat+0x58/0x140
[88933.163167]  __x64_sys_mkdir+0x1b/0x20
[88933.163170]  do_syscall_64+0x38/0x90
[88933.163174]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[88933.163178] RIP: 0033:0x7f52d25b1b07
[88933.163181] RSP: 002b:00007ffe695daaf8 EFLAGS: 00000246 ORIG_RAX: 0000000000000053
[88933.163183] RAX: ffffffffffffffda RBX: 000055dd275d12a0 RCX: 00007f52d25b1b07
[88933.163185] RDX: 000055dd257b7a05 RSI: 00000000000001ff RDI: 000055dd2b63a400
[88933.163186] RBP: 0000000000000000 R08: 000055dd2ba7f228 R09: 0000000000000000
[88933.163188] R10: 0000000000000008 R11: 0000000000000246 R12: 000055dd2b63a400
[88933.163189] R13: 000055dd288ca5f8 R14: 000055dd2b862e58 R15: 00000000000001ff
[89053.993122] INFO: task pvesr:503547 blocked for more than 845 seconds.
[89053.993153]       Tainted: P           O      5.11.22-3-pve #1
[89053.993172] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[89053.993196] task:pvesr           state:D stack:    0 pid:503547 ppid:     1 flags:0x00000000
[89053.993200] Call Trace:
[89053.993203]  __schedule+0x2ca/0x880
[89053.993210]  schedule+0x4f/0xc0
[89053.993213]  rwsem_down_write_slowpath+0x212/0x590
[89053.993218]  down_write+0x43/0x50
[89053.993221]  filename_create+0x7e/0x160
[89053.993227]  do_mkdirat+0x58/0x140
[89053.993230]  __x64_sys_mkdir+0x1b/0x20
[89053.993233]  do_syscall_64+0x38/0x90
[89053.993237]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[89053.993241] RIP: 0033:0x7f52d25b1b07
[89053.993243] RSP: 002b:00007ffe695daaf8 EFLAGS: 00000246 ORIG_RAX: 0000000000000053
[89053.993246] RAX: ffffffffffffffda RBX: 000055dd275d12a0 RCX: 00007f52d25b1b07
[89053.993248] RDX: 000055dd257b7a05 RSI: 00000000000001ff RDI: 000055dd2b63a400
[89053.993249] RBP: 0000000000000000 R08: 000055dd2ba7f228 R09: 0000000000000000
[89053.993250] R10: 0000000000000008 R11: 0000000000000246 R12: 000055dd2b63a400
[89053.993252] R13: 000055dd288ca5f8 R14: 000055dd2b862e58 R15: 00000000000001ff
[89174.823608] INFO: task pvesr:503547 blocked for more than 966 seconds.
[89174.823692]       Tainted: P           O      5.11.22-3-pve #1
[89174.823752] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[89174.823819] task:pvesr           state:D stack:    0 pid:503547 ppid:     1 flags:0x00000000
[89174.823823] Call Trace:
[89174.823827]  __schedule+0x2ca/0x880
[89174.823834]  schedule+0x4f/0xc0
[89174.823836]  rwsem_down_write_slowpath+0x212/0x590
[89174.823843]  down_write+0x43/0x50
[89174.823845]  filename_create+0x7e/0x160
[89174.823851]  do_mkdirat+0x58/0x140
[89174.823854]  __x64_sys_mkdir+0x1b/0x20
[89174.823857]  do_syscall_64+0x38/0x90
[89174.823861]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[89174.823865] RIP: 0033:0x7f52d25b1b07
[89174.823868] RSP: 002b:00007ffe695daaf8 EFLAGS: 00000246 ORIG_RAX: 0000000000000053
[89174.823870] RAX: ffffffffffffffda RBX: 000055dd275d12a0 RCX: 00007f52d25b1b07
[89174.823872] RDX: 000055dd257b7a05 RSI: 00000000000001ff RDI: 000055dd2b63a400
[89174.823873] RBP: 0000000000000000 R08: 000055dd2ba7f228 R09: 0000000000000000
[89174.823875] R10: 0000000000000008 R11: 0000000000000246 R12: 000055dd2b63a400
[89174.823876] R13: 000055dd288ca5f8 R14: 000055dd2b862e58 R15: 00000000000001ff
[89295.654089] INFO: task pvesr:503547 blocked for more than 1087 seconds.
[89295.654136]       Tainted: P           O      5.11.22-3-pve #1
[89295.654155] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[89295.654178] task:pvesr           state:D stack:    0 pid:503547 ppid:     1 flags:0x00000000
[89295.654182] Call Trace:
[89295.654186]  __schedule+0x2ca/0x880
[89295.654193]  schedule+0x4f/0xc0
[89295.654195]  rwsem_down_write_slowpath+0x212/0x590
[89295.654202]  down_write+0x43/0x50
[89295.654205]  filename_create+0x7e/0x160
[89295.654211]  do_mkdirat+0x58/0x140
[89295.654214]  __x64_sys_mkdir+0x1b/0x20
[89295.654217]  do_syscall_64+0x38/0x90
[89295.654221]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[89295.654226] RIP: 0033:0x7f52d25b1b07
[89295.654228] RSP: 002b:00007ffe695daaf8 EFLAGS: 00000246 ORIG_RAX: 0000000000000053
[89295.654231] RAX: ffffffffffffffda RBX: 000055dd275d12a0 RCX: 00007f52d25b1b07
[89295.654232] RDX: 000055dd257b7a05 RSI: 00000000000001ff RDI: 000055dd2b63a400
[89295.654234] RBP: 0000000000000000 R08: 000055dd2ba7f228 R09: 0000000000000000
[89295.654235] R10: 0000000000000008 R11: 0000000000000246 R12: 000055dd2b63a400
[89295.654236] R13: 000055dd288ca5f8 R14: 000055dd2b862e58 R15: 00000000000001ff
[89416.488541] INFO: task pvesr:503547 blocked for more than 1208 seconds.
[89416.488627]       Tainted: P           O      5.11.22-3-pve #1
[89416.488685] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[89416.488758] task:pvesr           state:D stack:    0 pid:503547 ppid:     1 flags:0x00000000
[89416.488770] Call Trace:
[89416.488777]  __schedule+0x2ca/0x880
[89416.488792]  schedule+0x4f/0xc0
[89416.488800]  rwsem_down_write_slowpath+0x212/0x590
[89416.488824]  down_write+0x43/0x50
[89416.488827]  filename_create+0x7e/0x160
[89416.488833]  do_mkdirat+0x58/0x140
[89416.488836]  __x64_sys_mkdir+0x1b/0x20
[89416.488839]  do_syscall_64+0x38/0x90
[89416.488843]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[89416.488847] RIP: 0033:0x7f52d25b1b07
[89416.488849] RSP: 002b:00007ffe695daaf8 EFLAGS: 00000246 ORIG_RAX: 0000000000000053
[89416.488852] RAX: ffffffffffffffda RBX: 000055dd275d12a0 RCX: 00007f52d25b1b07
[89416.488853] RDX: 000055dd257b7a05 RSI: 00000000000001ff RDI: 000055dd2b63a400
[89416.488855] RBP: 0000000000000000 R08: 000055dd2ba7f228 R09: 0000000000000000
[89416.488856] R10: 0000000000000008 R11: 0000000000000246 R12: 000055dd2b63a400
[89416.488857] R13: 000055dd288ca5f8 R14: 000055dd2b862e58 R15: 00000000000001ff

Kernel used:
Linux proxmox 5.11.22-3-pve #1 SMP PVE 5.11.22-6 (Wed, 28 Jul 2021 10:51:12 +0200) x86_64 GNU/Linux

CPU:
Intel(R) Xeon(R) CPU D-1541 @ 2.10GHz

Memory:
64GB

Updated virtio drivers in VM to last version, as recommended.

I have 3 servers with Proxmox 7, only 1 of them with high I/O sometimes (ZFS replication and sync) has malfunction.

I think that "pvesr" (Proxmox VE Storage Replication) process hasn't working as expected.
 
Last edited:

passedpawn1986

New Member
Jul 10, 2021
2
0
1
35
FYI, there's a newer kernel as package pve-kernel-5.11.22-3-pve version 5.11.22-6 which solves an issue with some unexpected EAGAIN's that the io_uring kernel code got from some subsystems softirq code paths.

Any how, please try to upgrade to that kernel and also reboot into it, the package is available on the pve-no-subscription repository at time of writing.
It's a bit hard to tell in general, as there's quite the mix of kernel oopses posted in this thread, but at least some of the issues reported here should be gone.
Thank you! I updated to this yesterday and removed ,aio=native from my VM drive config.
So far it seems to be running good.
 
Last edited:

galeido

New Member
Aug 2, 2021
9
0
1
35
We also have the same problem with version 7. Machines running kernel version 5.11.22-6 crash constantly -> 5.11.22-5 seems to be working properly. Crashing also happens with through 5.11.22-<minors>.

Intel -based hypervisors
 

Fabian_E

Proxmox Staff Member
Staff member
Aug 1, 2019
1,337
206
68
Hi,
We also have the same problem with version 7. Machines running kernel version 5.11.22-6 crash constantly -> 5.11.22-5 seems to be working properly. Crashing also happens with through 5.11.22-<minors>.

Intel -based hypervisors
so the only 5.11.22 kernel that's working is 5.11.22-5? Do you have a crash trace from syslog (or try using netconsole if you can't get it otherwise).
 

galeido

New Member
Aug 2, 2021
9
0
1
35
Yes, the only stable kernel seems to be that 5.11.22-5, e.g. 5.11.22-1 hosts seem to be stuck in the load. To add to the oddity, we have one host that doesn’t seem to work stably even on a -5 kernel but crashes when a load is generated over ZFS or NFS. I can't replicate that problem on other machines and same kernel version.

I’ll watch crash trace later this week, there’s nothing essential visible on syslog. Machine just freezes.

PS: The problem also seems to be limited to machines running ZFS+NFS. Machines that have hardware raid+NFS work normally.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get your own in 60 seconds.

Buy now!