I kept crashing the server, how do I fix it?

frenk970

Active Member
Jan 20, 2020
63
3
28
27
Good morning,
i have problems with my server (CPU i9-9900K 32GB RAM 512GB SSD NVMe (Sabrent) for O.S., 512GB + 512GB SSD Sata (Crucial) (zfspool 1TB for VMs) GTX 1660 GPU (passthrough on VM Windows), only it kept crashing telling me the problem it fits in some sectors of the NVMe disk, but I'm not sure that's the case.
As a VM I have 1 Ubuntu 20.10 4GB RAM 128GB Disk, 1 Ubuntu 20.04 4GB RAM 128GB Disk, 1 Windows Server 2019 16GB RAM (with 8GB Ballooning) 300GB Disk and a VM with HassOS 128GB Disk and 3GB RAM.

I wouldn't want it to be because too much RAM is occupied, but it does it even if it has all VMs turned off or is the NVMe disk not well supported by Proxmox or I need to optimize something, can anyone help me understand?

I'm trying to create a configuration that I can then reproduce to users (mine is a test server for the moment)
 
only it kept crashing telling me the problem it fits in some sectors of the NVMe disk, but I'm not sure that's the case.
Can you please post the exact error that you got?
 
Can you please post the exact error that you got?
I am doing some tests and from 2 days or 3 days that it does not crash, I have increased the swap partition, I would not want my theory that I have little RAM to be correct and that trying to use the swap partition fills it up and unable to write in it anymore the whole system crashes
 
Last edited:
Can you please post the exact error that you got?
from the log how do I understand when it crashed ?? because the mail arrives after everything has started
this is the email that arrives when it crashes

This message was generated by the smartd daemon running on:

host name: pve
DNS domain: local

The following warning/error was logged by the smartd daemon:

Device: /dev/nvme0, number of Error Log entries increased from 244 to 245

Device info:
Sabrent, S/N:17A807051CBE02057204, FW:RKT303.3, 512 GB

For details see host's SYSLOG.

You can also use the smartctl utility for further investigation.
The original message about this issue was sent at Sat Mar 20 11:42:58 2021 CET
Another message will be sent in 24 hours if the problem persists.
 
Last edited:
post the output of those 2 commands above please.

here are the outputs of the two commands
root@pve:~# smartctl -a /dev/nvme0


smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.4.103-1-pve] (local build)


Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org





=== START OF INFORMATION SECTION ===


Model Number: Sabrent


Serial Number: 17A807051CBE02057204


Firmware Version: RKT303.3


PCI Vendor/Subsystem ID: 0x1987


IEEE OUI Identifier: 0x6479a7


Total NVM Capacity: 512,110,190,592 [512 GB]


Unallocated NVM Capacity: 0


Controller ID: 1


NVMe Version: 1.3


Number of Namespaces: 1


Namespace 1 Size/Capacity: 512,110,190,592 [512 GB]


Namespace 1 Formatted LBA Size: 512


Namespace 1 IEEE EUI-64: 6479a7 366143df74


Local Time is: Mon Apr 12 22:55:07 2021 CEST


Firmware Updates (0x12): 1 Slot, no Reset required


Optional Admin Commands (0x0017): Security Format Frmw_DL Self_Test


Optional NVM Commands (0x005d): Comp DS_Mngmt Wr_Zero Sav/Sel_Feat Timestmp


Log Page Attributes (0x08): Telmtry_Lg


Maximum Data Transfer Size: 512 Pages


Warning Comp. Temp. Threshold: 75 Celsius


Critical Comp. Temp. Threshold: 80 Celsius





Supported Power States


St Op Max Active Idle RL RT WL WT Ent_Lat Ex_Lat


0 + 6.80W - - 0 0 0 0 0 0


1 + 5.74W - - 1 1 1 1 0 0


2 + 5.21W - - 2 2 2 2 0 0


3 - 0.0490W - - 3 3 3 3 2000 2000


4 - 0.0018W - - 4 4 4 4 25000 25000





Supported LBA Sizes (NSID 0x1)


Id Fmt Data Metadt Rel_Perf


0 + 512 0 2


1 - 4096 0 1





=== START OF SMART DATA SECTION ===


SMART overall-health self-assessment test result: PASSED





SMART/Health Information (NVMe Log 0x02)


Critical Warning: 0x00


Temperature: 29 Celsius


Available Spare: 100%


Available Spare Threshold: 5%


Percentage Used: 9%


Data Units Read: 4,384,807 [2.24 TB]


Data Units Written: 8,810,231 [4.51 TB]


Host Read Commands: 63,736,580


Host Write Commands: 110,961,858


Controller Busy Time: 843


Power Cycles: 192


Power On Hours: 5,279


Unsafe Shutdowns: 184


Media and Data Integrity Errors: 0


Error Information Log Entries: 245


Warning Comp. Temperature Time: 0


Critical Comp. Temperature Time: 0





Error Information (NVMe Log 0x01, 16 of 63 entries)


No Errors Logged

root@pve:~# dmesg --level=err,warn


[ 0.000000] secureboot: Secure boot could not be determined (mode 0)


[ 0.007586] secureboot: Secure boot could not be determined (mode 0)


[ 0.204608] ENERGY_PERF_BIAS: Set to 'normal', was 'performance'


[ 1.033782] platform eisa.0: EISA: Cannot allocate resource for mainboard


[ 1.033783] platform eisa.0: Cannot allocate resource for EISA slot 1


[ 1.033784] platform eisa.0: Cannot allocate resource for EISA slot 2


[ 1.033784] platform eisa.0: Cannot allocate resource for EISA slot 3


[ 1.033785] platform eisa.0: Cannot allocate resource for EISA slot 4


[ 1.033785] platform eisa.0: Cannot allocate resource for EISA slot 5


[ 1.033786] platform eisa.0: Cannot allocate resource for EISA slot 6


[ 1.033786] platform eisa.0: Cannot allocate resource for EISA slot 7


[ 1.033787] platform eisa.0: Cannot allocate resource for EISA slot 8


[ 1.249229] nvme nvme0: missing or invalid SUBNQN field.


[ 1.344629] acpi PNP0C14:02: duplicate WMI GUID 05901221-D566-11D1-B2F0-00A0C9062910 (first instance was on PNP0C14:01)


[ 1.349901] usb: port power management may be unreliable


[ 1.729454] ata1.00: supports DRM functions and may not be fully accessible


[ 1.734271] ata1.00: supports DRM functions and may not be fully accessible


[ 1.734293] ata3.00: supports DRM functions and may not be fully accessible


[ 1.736174] ata3.00: supports DRM functions and may not be fully accessible


[ 2.804887] hid-generic 0003:1532:0531.0002: No inputs registered, leaving


[ 3.653605] spl: loading out-of-tree module taints kernel.


[ 3.654705] znvpair: module license 'CDDL' taints kernel.


[ 3.654706] Disabling lock debugging due to kernel taint


[ 4.002017] sd 6:0:0:0: [sdd] No Caching mode page found


[ 4.002023] sd 6:0:0:0: [sdd] Assuming drive cache: write through


[ 4.711057] nvidia-gpu 0000:01:00.3: i2c timeout error e0000000


[ 4.711074] ucsi_ccg 0-0008: i2c_transfer failed -110


[ 4.711088] ucsi_ccg 0-0008: ucsi_ccg_init failed - -110


[ 4.711104] ucsi_ccg: probe of 0-0008 failed with error -110


[ 5.335052] usb 5-1.4: current rate 16000 is different from the runtime rate 24000


[ 5.340775] usb 5-1.4: current rate 16000 is different from the runtime rate 32000


[ 5.347047] usb 5-1.4: current rate 16000 is different from the runtime rate 48000


[ 37.447068] new mount options do not match the existing superblock, will be ignored


[ 37.794381] Started bpfilter


[ 38.842825] usb 5-1.4: current rate 16000 is different from the runtime rate 48000


[ 38.850711] usb 5-1.4: current rate 16000 is different from the runtime rate 48000


[ 38.873697] usb 5-1.4: current rate 16000 is different from the runtime rate 48000


[ 39.063637] xhci_hcd 0000:02:00.0: WARNING: Host System Error


[ 39.063662] DMAR: DRHD: handling fault status reg 2


[ 39.064239] DMAR: [DMA Read] Request device [02:00.0] PASID ffffffff fault addr fff02000 [fault reason 06] PTE Read access is not set


[ 39.079643] xhci_hcd 0000:02:00.0: Host halt failed, -110


[ 45.340613] usb 5-1.4: timeout: still 12 active urbs on EP #84


[ 46.340658] usb 5-1.4: timeout: still 12 active urbs on EP #84


[ 49.536655] xhci_hcd 0000:02:00.0: xHCI host not responding to stop endpoint command.


[ 49.552661] xhci_hcd 0000:02:00.0: Host halt failed, -110


[ 49.552662] xhci_hcd 0000:02:00.0: xHCI host controller not responding, assume dead


[ 49.552692] xhci_hcd 0000:02:00.0: HC died; cleaning up


[ 50.298281] xhci_hcd 0000:02:00.0: Host halt failed, -110


[ 50.298283] xhci_hcd 0000:02:00.0: Host controller not halted, aborting reset.


[ 56.205405] xpad 1-4:1.0: xpad_try_sending_next_out_packet - usb_submit_urb failed with result -2


[ 255.861236] kvm [4419]: vcpu0, guest rIP: 0xfffff8036af3501b ignored rdmsr: 0x1ad


[ 255.861295] kvm [4419]: vcpu0, guest rIP: 0xfffff8036af3501b ignored rdmsr: 0x1a2


[ 255.861352] kvm [4419]: vcpu1, guest rIP: 0xfffff8036af3501b ignored rdmsr: 0x1ad


[ 255.861387] kvm [4419]: vcpu1, guest rIP: 0xfffff8036af3501b ignored rdmsr: 0x1a2


[ 255.861462] kvm [4419]: vcpu2, guest rIP: 0xfffff8036af3501b ignored rdmsr: 0x1ad


[ 255.861488] kvm [4419]: vcpu2, guest rIP: 0xfffff8036af3501b ignored rdmsr: 0x1a2


[ 255.861541] kvm [4419]: vcpu3, guest rIP: 0xfffff8036af3501b ignored rdmsr: 0x1ad


[ 255.861567] kvm [4419]: vcpu3, guest rIP: 0xfffff8036af3501b ignored rdmsr: 0x1a2


[ 255.910189] kvm [4419]: vcpu4, guest rIP: 0xfffff8036af3501b ignored rdmsr: 0x1ad


[ 255.910217] kvm [4419]: vcpu4, guest rIP: 0xfffff8036af3501b ignored rdmsr: 0x1a2


[ 309.645365] kvm_get_msr_common: 6 callbacks suppressed


[ 309.645366] kvm [4419]: vcpu0, guest rIP: 0xfffff8036af3501b ignored rdmsr: 0x1ad


[ 309.645395] kvm [4419]: vcpu0, guest rIP: 0xfffff8036af3501b ignored rdmsr: 0x1a2


[ 309.666966] kvm [4419]: vcpu1, guest rIP: 0xfffff8036af3501b ignored rdmsr: 0x1ad


[ 309.666995] kvm [4419]: vcpu1, guest rIP: 0xfffff8036af3501b ignored rdmsr: 0x1a2


[ 309.667718] kvm [4419]: vcpu2, guest rIP: 0xfffff8036af3501b ignored rdmsr: 0x1ad


[ 309.667753] kvm [4419]: vcpu2, guest rIP: 0xfffff8036af3501b ignored rdmsr: 0x1a2


[ 309.668441] kvm [4419]: vcpu3, guest rIP: 0xfffff8036af3501b ignored rdmsr: 0x1ad


[ 309.668480] kvm [4419]: vcpu3, guest rIP: 0xfffff8036af3501b ignored rdmsr: 0x1a2


[ 309.668533] kvm [4419]: vcpu4, guest rIP: 0xfffff8036af3501b ignored rdmsr: 0x1ad


[ 309.668561] kvm [4419]: vcpu4, guest rIP: 0xfffff8036af3501b ignored rdmsr: 0x1a2


[ 2140.610800] kvm_get_msr_common: 6 callbacks suppressed


[ 2140.610801] kvm [4419]: vcpu0, guest rIP: 0xfffff8036af4501b ignored rdmsr: 0x1ad


[ 2140.610924] kvm [4419]: vcpu0, guest rIP: 0xfffff8036af4501b ignored rdmsr: 0x1a2


[ 2140.620807] kvm [4419]: vcpu1, guest rIP: 0xfffff8036af4501b ignored rdmsr: 0x1ad


[ 2140.620836] kvm [4419]: vcpu1, guest rIP: 0xfffff8036af4501b ignored rdmsr: 0x1a2


[ 2140.620870] kvm [4419]: vcpu2, guest rIP: 0xfffff8036af4501b ignored rdmsr: 0x1ad


[ 2140.620894] kvm [4419]: vcpu2, guest rIP: 0xfffff8036af4501b ignored rdmsr: 0x1a2


[ 2140.634778] kvm [4419]: vcpu3, guest rIP: 0xfffff8036af4501b ignored rdmsr: 0x1ad


[ 2140.634806] kvm [4419]: vcpu3, guest rIP: 0xfffff8036af4501b ignored rdmsr: 0x1a2


[ 2140.634881] kvm [4419]: vcpu4, guest rIP: 0xfffff8036af4501b ignored rdmsr: 0x1ad


[ 2140.634904] kvm [4419]: vcpu4, guest rIP: 0xfffff8036af4501b ignored rdmsr: 0x1a2


[ 3971.482443] kvm_get_msr_common: 6 callbacks suppressed


[ 3971.482444] kvm [4419]: vcpu0, guest rIP: 0xfffff8036b03501b ignored rdmsr: 0x1ad


[ 3971.482483] kvm [4419]: vcpu0, guest rIP: 0xfffff8036b03501b ignored rdmsr: 0x1a2


[ 3971.482527] kvm [4419]: vcpu1, guest rIP: 0xfffff8036b03501b ignored rdmsr: 0x1ad


[ 3971.482551] kvm [4419]: vcpu1, guest rIP: 0xfffff8036b03501b ignored rdmsr: 0x1a2


[ 3971.489821] kvm [4419]: vcpu2, guest rIP: 0xfffff8036b03501b ignored rdmsr: 0x1ad


[ 3971.489845] kvm [4419]: vcpu2, guest rIP: 0xfffff8036b03501b ignored rdmsr: 0x1a2


[ 3971.496449] kvm [4419]: vcpu3, guest rIP: 0xfffff8036b03501b ignored rdmsr: 0x1ad


[ 3971.496486] kvm [4419]: vcpu3, guest rIP: 0xfffff8036b03501b ignored rdmsr: 0x1a2


[ 3971.502108] kvm [4419]: vcpu4, guest rIP: 0xfffff8036b03501b ignored rdmsr: 0x1ad


[ 3971.502144] kvm [4419]: vcpu4, guest rIP: 0xfffff8036b03501b ignored rdmsr: 0x1a2


[ 5802.699553] kvm_get_msr_common: 6 callbacks suppressed


[ 5802.699554] kvm [4419]: vcpu0, guest rIP: 0xfffff8036af3501b ignored rdmsr: 0x1ad


[ 5802.699582] kvm [4419]: vcpu0, guest rIP: 0xfffff8036af3501b ignored rdmsr: 0x1a2


[ 5802.699648] kvm [4419]: vcpu1, guest rIP: 0xfffff8036af3501b ignored rdmsr: 0x1ad


[ 5802.699675] kvm [4419]: vcpu1, guest rIP: 0xfffff8036af3501b ignored rdmsr: 0x1a2


[ 5802.699746] kvm [4419]: vcpu2, guest rIP: 0xfffff8036af3501b ignored rdmsr: 0x1ad


[ 5802.699770] kvm [4419]: vcpu2, guest rIP: 0xfffff8036af3501b ignored rdmsr: 0x1a2


[ 5802.699922] kvm [4419]: vcpu3, guest rIP: 0xfffff8036af3501b ignored rdmsr: 0x1ad


[ 5802.699946] kvm [4419]: vcpu3, guest rIP: 0xfffff8036af3501b ignored rdmsr: 0x1a2


[ 5802.699976] kvm [4419]: vcpu4, guest rIP: 0xfffff8036af3501b ignored rdmsr: 0x1ad


[ 5802.700007] kvm [4419]: vcpu4, guest rIP: 0xfffff8036af3501b ignored rdmsr: 0x1a2


[ 7635.295340] kvm_get_msr_common: 6 callbacks suppressed


[ 7635.295341] kvm [4419]: vcpu0, guest rIP: 0xfffff8036af4501b ignored rdmsr: 0x1ad


[ 7635.295966] kvm [4419]: vcpu0, guest rIP: 0xfffff8036af4501b ignored rdmsr: 0x1a2


[ 7635.296605] kvm [4419]: vcpu1, guest rIP: 0xfffff8036af4501b ignored rdmsr: 0x1ad


[ 7635.297139] kvm [4419]: vcpu1, guest rIP: 0xfffff8036af4501b ignored rdmsr: 0x1a2


[ 7635.297714] kvm [4419]: vcpu2, guest rIP: 0xfffff8036af4501b ignored rdmsr: 0x1ad


[ 7635.298223] kvm [4419]: vcpu2, guest rIP: 0xfffff8036af4501b ignored rdmsr: 0x1a2


[ 7635.407047] kvm [4419]: vcpu3, guest rIP: 0xfffff8036af4501b ignored rdmsr: 0x1ad


[ 7635.407554] kvm [4419]: vcpu3, guest rIP: 0xfffff8036af4501b ignored rdmsr: 0x1a2


[ 7635.408088] kvm [4419]: vcpu4, guest rIP: 0xfffff8036af4501b ignored rdmsr: 0x1ad


[ 7635.408592] kvm [4419]: vcpu4, guest rIP: 0xfffff8036af4501b ignored rdmsr: 0x1a2


[ 9465.370559] kvm_get_msr_common: 6 callbacks suppressed


[ 9465.370560] kvm [4419]: vcpu0, guest rIP: 0xfffff8036b03501b ignored rdmsr: 0x1ad


[ 9465.371133] kvm [4419]: vcpu0, guest rIP: 0xfffff8036b03501b ignored rdmsr: 0x1a2


[ 9465.371694] kvm [4419]: vcpu1, guest rIP: 0xfffff8036b03501b ignored rdmsr: 0x1ad


[ 9465.372190] kvm [4419]: vcpu1, guest rIP: 0xfffff8036b03501b ignored rdmsr: 0x1a2


[ 9465.372680] kvm [4419]: vcpu2, guest rIP: 0xfffff8036b03501b ignored rdmsr: 0x1ad


[ 9465.373148] kvm [4419]: vcpu2, guest rIP: 0xfffff8036b03501b ignored rdmsr: 0x1a2


[ 9465.373628] kvm [4419]: vcpu3, guest rIP: 0xfffff8036b03501b ignored rdmsr: 0x1ad


[ 9465.374097] kvm [4419]: vcpu3, guest rIP: 0xfffff8036b03501b ignored rdmsr: 0x1a2


[ 9465.395315] kvm [4419]: vcpu4, guest rIP: 0xfffff8036b03501b ignored rdmsr: 0x1ad


[ 9465.395775] kvm [4419]: vcpu4, guest rIP: 0xfffff8036b03501b ignored rdmsr: 0x1a2


[11297.788648] kvm_get_msr_common: 6 callbacks suppressed


[11297.788649] kvm [4419]: vcpu0, guest rIP: 0xfffff8036af4501b ignored rdmsr: 0x1ad


[11297.789189] kvm [4419]: vcpu0, guest rIP: 0xfffff8036af4501b ignored rdmsr: 0x1a2


[11297.789721] kvm [4419]: vcpu1, guest rIP: 0xfffff8036af4501b ignored rdmsr: 0x1ad


[11297.790188] kvm [4419]: vcpu1, guest rIP: 0xfffff8036af4501b ignored rdmsr: 0x1a2


[11297.790868] kvm [4419]: vcpu2, guest rIP: 0xfffff8036af4501b ignored rdmsr: 0x1ad


[11297.791333] kvm [4419]: vcpu2, guest rIP: 0xfffff8036af4501b ignored rdmsr: 0x1a2


[11297.791836] kvm [4419]: vcpu3, guest rIP: 0xfffff8036af4501b ignored rdmsr: 0x1ad


[11297.792295] kvm [4419]: vcpu3, guest rIP: 0xfffff8036af4501b ignored rdmsr: 0x1a2


[11297.792829] kvm [4419]: vcpu4, guest rIP: 0xfffff8036af4501b ignored rdmsr: 0x1ad


[11297.793284] kvm [4419]: vcpu4, guest rIP: 0xfffff8036af4501b ignored rdmsr: 0x1a2
 
Last edited:
lspci
apt-get install nvme-cli
nvme error-log /dev/nvme0
nvme error-log /dev/nvme0n1

can you post the outputs, except of the apt install?

and by the way, something is wrong with your usb controller, either the kernel hasn't support or it's broken, i don't know.

you could install an pve-kernel-5.11, probably it helps about usb, but your description sounds anyway more like the nvme drive is dieing. Would backup everything on it, as long as you can. Not 100% sure, can be the usb controller either, but in my opinion the system shouldn't crash/freeze just because of the usb controller.

Cheers
 
lspci
apt-get install nvme-cli
nvme error-log /dev/nvme0
nvme error-log /dev/nvme0n1

can you post the outputs, except of the apt install?

and by the way, something is wrong with your usb controller, either the kernel hasn't support or it's broken, i don't know.

you could install an pve-kernel-5.11, probably it helps about usb, but your description sounds anyway more like the nvme drive is dieing. Would backup everything on it, as long as you can. Not 100% sure, can be the usb controller either, but in my opinion the system shouldn't crash/freeze just because of the usb controller.

Cheers

root@pve:~# lspci


00:00.0 Host bridge: Intel Corporation 8th Gen Core 8-core Desktop Processor Host Bridge/DRAM Registers [Coffee Lake S] (rev 0d)


00:01.0 PCI bridge: Intel Corporation Skylake PCIe Controller (x16) (rev 0d)


00:02.0 VGA compatible controller: Intel Corporation Device 3e98 (rev 02)


00:12.0 Signal processing controller: Intel Corporation Cannon Lake PCH Thermal Controller (rev 10)


00:14.0 USB controller: Intel Corporation Cannon Lake PCH USB 3.1 xHCI Host Controller (rev 10)


00:14.2 RAM memory: Intel Corporation Cannon Lake PCH Shared SRAM (rev 10)


00:16.0 Communication controller: Intel Corporation Cannon Lake PCH HECI Controller (rev 10)


00:17.0 SATA controller: Intel Corporation Cannon Lake PCH SATA AHCI Controller (rev 10)


00:1b.0 PCI bridge: Intel Corporation Cannon Lake PCH PCI Express Root Port (rev f0)


00:1b.4 PCI bridge: Intel Corporation Cannon Lake PCH PCI Express Root Port (rev f0)


00:1d.0 PCI bridge: Intel Corporation Cannon Lake PCH PCI Express Root Port (rev f0)


00:1f.0 ISA bridge: Intel Corporation Z390 Chipset LPC/eSPI Controller (rev 10)


00:1f.3 Audio device: Intel Corporation Cannon Lake PCH cAVS (rev 10)


00:1f.4 SMBus: Intel Corporation Cannon Lake PCH SMBus Controller (rev 10)


00:1f.5 Serial bus controller [0c80]: Intel Corporation Cannon Lake PCH SPI Controller (rev 10)


00:1f.6 Ethernet controller: Intel Corporation Ethernet Connection (7) I219-V (rev 10)


01:00.0 VGA compatible controller: NVIDIA Corporation TU116 [GeForce GTX 1660] (rev a1)


01:00.1 Audio device: NVIDIA Corporation Device 1aeb (rev a1)


01:00.2 USB controller: NVIDIA Corporation Device 1aec (rev a1)


01:00.3 Serial bus controller [0c80]: NVIDIA Corporation Device 1aed (rev a1)


02:00.0 USB controller: VIA Technologies, Inc. VL805 USB 3.0 Host Controller (rev 01)


04:00.0 Non-Volatile memory controller: Phison Electronics Corporation E12 NVMe Controller (rev 01)
 
lspci
apt-get install nvme-cli
nvme error-log /dev/nvme0
nvme error-log /dev/nvme0n1

can you post the outputs, except of the apt install?

and by the way, something is wrong with your usb controller, either the kernel hasn't support or it's broken, i don't know.

you could install an pve-kernel-5.11, probably it helps about usb, but your description sounds anyway more like the nvme drive is dieing. Would backup everything on it, as long as you can. Not 100% sure, can be the usb controller either, but in my opinion the system shouldn't crash/freeze just because of the usb controller.

Cheers
i installed pve-kernel-5.11 and i am using Kernel: 5.11.7-1-pve now, let's see how it behaves unlike Kernel: 5.4.103-1-pve
 
you nvme drive looks okay.

02:00.0 USB controller: VIA Technologies, Inc. VL805 USB 3.0 Host Controller (rev 01)

probably it's this thing, if it crashes, try to plug it out. looks like an pcie usb card to me.
 
you nvme drive looks okay.

02:00.0 USB controller: VIA Technologies, Inc. VL805 USB 3.0 Host Controller (rev 01)

probably it's this thing, if it crashes, try to plug it out. looks like an pcie usb card to me.
No, impossible, this is a physical card that I use in passthrough with the Windows VM to manage some devices that are not managed by passing them from proxmox.
But he did it anyway even before he connected that card
 
installing the kernel to version 5.11 (pve-kernel-5.11.7-1-pve: 5.11.7-1~bpo10) are 3 days and 8h that does not crash
 
  • Like
Reactions: Ramalama
i had something similar after upgrading to latest version 6.3-6 + pve-kernel 5.4.106-1
after reboot it just hangs at the "Loading initial ramdisk" disk flickers sometimes then hangs solid, also had one occurance where led was flickering and it just stopped in that state. After an additional reset host booted fine, could repeat this multiple times as it only took some time after i worked in a windows client that host froze up again no strange log entries visible. Im also passing through an amd gpu and some usb controllers and wireless devices to the windows client.

Solution was also to install the test kernel Version 5.11 (pve-kernel-5.11.7-1-pve: 5.11.7-1~bpo10), machine is solid as before now.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!