PVE 7.4 Host Freeze randomly

Pandaaaa906

New Member
May 15, 2022
7
1
3
CPU: AMD R9 5900HX
MEM: 32G

it just freeze randomly, might be in couple of hours or miniutes after boots.

couple days ago, i found it would got 100% freeze, if i move vm disk from one physical disk to another.
but i think i fixed it by changing dirty* options in /etc/sysctl.conf
still get random freeze
Code:
vm.dirty_background_ratio = 5
vm.dirty_ratio = 10

all of these seem happen after i added two more disks and setup zfs raidz1
but it still get freezed, even i export the zfs pool, no vm start.
could some one help me? where should i look into?

i have tried:
reset bios settings to default
remove all settings in /etc/modprobe.d and /etc/modules


BIOS reseted, grub reseted and enable kdump

pageshot of 'pve - Proxmox Virtual Environment' @ 2023-05-27-1154'22.png

/etc/default/grub
Code:
GRUB_CMDLINE_LINUX_DEFAULT="quiet nomodeset crashkernel=1024M"
# GRUB_CMDLINE_LINUX_DEFAULT="quiet amd_iommu=on iommu=pt pcie_acs_override=downstream,multifunction video=vesafb:off video=efifb:off video=simplefb:off video=vesa:off initcall_blacklist=sysfb_init"

I got kdump enabled, but catch nothing
Code:
dmesg -HT | grep crash
[Sat May 27 10:02:00 2023] Command line: BOOT_IMAGE=/boot/vmlinuz-6.2.6-1-pve root=/dev/mapper/pve-root ro quiet nomodeset crashkernel=1024M crashkernel=384M-:128M
[Sat May 27 10:02:00 2023] Reserving 128MB of memory at 3552MB for crashkernel (System RAM: 30617MB)
[Sat May 27 10:02:00 2023] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-6.2.6-1-pve root=/dev/mapper/pve-root ro quiet nomodeset crashkernel=1024M crashkernel=384M-:128M
[Sat May 27 10:02:04 2023] pstore: Using crash dump compression: deflate

Code:
ll /var/crash/
total 4.0K
-rw-r--r-- 1 root root   0 May 27 10:02 kdump_lock
-rw-r--r-- 1 root root 276 May 27 10:02 kexec_cmd

dmesg got one error, but it seem irrelevant.
Code:
dmesg -HT | grep error
[Sat May 27 10:02:00 2023] ACPI Error: Aborting method \_SB.GPIO._EVT due to previous error (AE_NOT_EXIST) (20221020/psparse-529)

Code:
lsblk
NAME                         MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda                            8:0    0 111.8G  0 disk
├─sda1                         8:1    0   512M  0 part
└─sda2                         8:2    0 111.3G  0 part
sdb                            8:16   0   3.6T  0 disk
├─sdb1                         8:17   0    16M  0 part
└─sdb2                         8:18   0   3.6T  0 part
sdc                            8:32   0  10.9T  0 disk
├─sdc1                         8:33   0  10.9T  0 part
└─sdc9                         8:41   0     8M  0 part
sdd                            8:48   0  10.9T  0 disk
├─sdd1                         8:49   0  10.9T  0 part
└─sdd9                         8:57   0     8M  0 part
sde                            8:64   0  10.9T  0 disk
├─sde1                         8:65   0  10.9T  0 part
└─sde9                         8:73   0     8M  0 part
zd0                          230:0    0  14.9T  0 disk
└─zd0p1                      230:1    0  14.9T  0 part
nvme1n1                      259:0    0 931.5G  0 disk
└─nvme1n1p1                  259:1    0 931.5G  0 part /mnt/pve/rc20-1t
nvme0n1                      259:2    0 931.5G  0 disk
├─nvme0n1p1                  259:3    0  1007K  0 part
├─nvme0n1p2                  259:4    0   512M  0 part /boot/efi
└─nvme0n1p3                  259:5    0   931G  0 part
  ├─pve-swap                 253:0    0     8G  0 lvm  [SWAP]
  ├─pve-root                 253:1    0    96G  0 lvm  /
  ├─pve-data_tmeta           253:2    0   8.1G  0 lvm
  │ └─pve-data-tpool         253:4    0 794.8G  0 lvm
  │   ├─pve-data             253:5    0 794.8G  1 lvm
  │   ├─pve-vm--100--disk--0 253:6    0   256G  0 lvm
  │   ├─pve-vm--102--disk--0 253:7    0    64G  0 lvm
  │   └─pve-vm--100--disk--1 253:8    0     4M  0 lvm
  └─pve-data_tdata           253:3    0 794.8G  0 lvm
    └─pve-data-tpool         253:4    0 794.8G  0 lvm
      ├─pve-data             253:5    0 794.8G  1 lvm
      ├─pve-vm--100--disk--0 253:6    0   256G  0 lvm
      ├─pve-vm--102--disk--0 253:7    0    64G  0 lvm
      └─pve-vm--100--disk--1 253:8    0     4M  0 lvm

Code:
pveversion -v
proxmox-ve: 7.4-1 (running kernel: 6.2.6-1-pve)
pve-manager: 7.4-3 (running version: 7.4-3/9002ab8a)
pve-kernel-5.15: 7.4-3
pve-kernel-6.2.6-1-pve: 6.2.6-1
pve-kernel-5.15.107-2-pve: 5.15.107-2
pve-kernel-5.15.74-1-pve: 5.15.74-1
pve-kernel-5.15.30-2-pve: 5.15.30-3
ceph-fuse: 17.2.6-pve1
corosync: 3.1.7-pve1
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx4
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.24-pve2
libproxmox-acme-perl: 1.4.4
libproxmox-backup-qemu0: 1.3.1-1
libproxmox-rs-perl: 0.2.1
libpve-access-control: 7.4-3
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.4-1
libpve-guest-common-perl: 4.2-4
libpve-http-server-perl: 4.2-3
libpve-rs-perl: 0.7.6
libpve-storage-perl: 7.4-2
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 5.0.2-2
lxcfs: 5.0.3-pve1
novnc-pve: 1.4.0-1
proxmox-backup-client: 2.4.2-1
proxmox-backup-file-restore: 2.4.2-1
proxmox-kernel-helper: 7.4-1
proxmox-mail-forward: 0.1.1-1
proxmox-mini-journalreader: 1.3-1
proxmox-offline-mirror-helper: 0.5.1-1
proxmox-widget-toolkit: 3.7.0
pve-cluster: 7.3-3
pve-container: 4.4-3
pve-docs: 7.4-2
pve-edk2-firmware: 3.20230228-2
pve-firewall: 4.3-2
pve-firmware: 3.6-5
pve-ha-manager: 3.6.1
pve-i18n: 2.12-1
pve-qemu-kvm: 7.2.0-8
pve-xtermjs: 4.16.0-1
qemu-server: 7.4-3
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.8.0~bpo11+3
vncterm: 1.7-1
zfsutils-linux: 2.1.11-pve1
 
Last edited:
Hello,

Can you check the Syslog at the time when the server got freeze, looking for an interesting message?
 
I got random freeze on 5900HX too. PvE 8.0.4
After applied the non-free-firmware split, it’s improved a lot. But LXC make the node unstable again.
No logs or screen outputs, just not responsive.