PVE 6.1 hard freezing with BTRFS scrub

Adam Talbot

Member
Apr 11, 2018
8
3
23
Not sure exactly whats going on here. I have 10X SSD's in a BTRFS RAID6 configuration. When ever I start a scrub, the whole system hard freezes. Like the blinking cursor stops blinking. I get no kernel panic, or anything to go on. PVE 6.0 did not have this problem. I have rebuilt this server a few times, with a clean install of 6.1.

I know the 5.3.10-1-pve was VERY unstable with BTRFS, causing my system to hard lock up. The 5.3.13-1-pve is much better, but still freezing.

Any ideas on how I can enable debugging, or a kdump, to catch this bug? Option B, is there a pre-release/alpha kernel I can try?

System info.
Code:
root@nas:~# uname -a
Linux nas 5.3.13-1-pve #1 SMP PVE 5.3.13-1 (Thu, 05 Dec 2019 07:18:14 +0100) x86_64 GNU/Linux

root@nas:~# cat /etc/debian_version
10.2

root@nas:~# btrfs fi show
Label: none  uuid: 7f8fbec9-f779-4af6-8ea5-ec296b48dc82
        Total devices 10 FS bytes used 1.96TiB
        devid    1 size 450.00GiB used 253.00GiB path /dev/sdc1
        devid    2 size 450.00GiB used 253.00GiB path /dev/sdd1
        devid    3 size 450.00GiB used 253.00GiB path /dev/sde1
        devid    4 size 450.00GiB used 253.00GiB path /dev/sdf1
        devid    5 size 450.00GiB used 253.00GiB path /dev/sdg1
        devid    6 size 450.00GiB used 253.00GiB path /dev/sdh1
        devid    7 size 450.00GiB used 253.00GiB path /dev/sdi1
        devid    8 size 450.00GiB used 253.00GiB path /dev/sdj1
        devid    9 size 450.00GiB used 253.01GiB path /dev/sdk1
        devid   10 size 450.00GiB used 253.01GiB path /dev/sdl1

root@nas:~# dpkg -l | grep pve-
ii  libpve-access-control                6.0-5                       all          Proxmox VE access control library
ii  libpve-apiclient-perl                3.0-2                       all          Proxmox VE API client library
ii  libpve-cluster-api-perl              6.1-2                       all          Proxmox Virtual Environment cluster Perl API modules.
ii  libpve-cluster-perl                  6.1-2                       all          Proxmox Virtual Environment cluster Perl modules.
ii  libpve-common-perl                   6.0-9                       all          Proxmox VE base library
ii  libpve-guest-common-perl             3.0-3                       all          Proxmox VE common guest-related modules
ii  libpve-http-server-perl              3.0-3                       all          Proxmox Asynchrounous HTTP Server Implementation
ii  libpve-storage-perl                  6.1-2                       all          Proxmox VE storage management library
ii  libpve-u2f-server-perl               1.1-1                       amd64        Perl bindings for libu2f-server
ii  pve-cluster                          6.1-2                       amd64        "pmxcfs" distributed cluster filesystem for Proxmox Virtual Environment.
ii  pve-container                        3.0-14                      all          Proxmox VE Container management tool
ii  pve-docs                             6.1-3                       all          Proxmox VE Documentation
ii  pve-edk2-firmware                    2.20191127-1                all          edk2 based firmware modules for virtual machines
ii  pve-firewall                         4.0-9                       amd64        Proxmox VE Firewall
ii  pve-firmware                         3.0-4                       all          Binary firmware code for the pve-kernel
ii  pve-ha-manager                       3.0-8                       amd64        Proxmox VE HA Manager
ii  pve-i18n                             2.0-3                       all          Internationalization support for Proxmox VE
ii  pve-kernel-5.0                       6.0-11                      all          Latest Proxmox VE Kernel Image
ii  pve-kernel-5.0.15-1-pve              5.0.15-1                    amd64        The Proxmox PVE Kernel Image
ii  pve-kernel-5.0.21-5-pve              5.0.21-10                   amd64        The Proxmox PVE Kernel Image
ii  pve-kernel-5.3                       6.1-1                       all          Latest Proxmox VE Kernel Image
ii  pve-kernel-5.3.10-1-pve              5.3.10-1                    amd64        The Proxmox PVE Kernel Image
ii  pve-kernel-5.3.13-1-pve              5.3.13-1                    amd64        The Proxmox PVE Kernel Image
ii  pve-kernel-helper                    6.1-1                       all          Function for various kernel maintenance tasks.
ii  pve-manager                          6.1-3                       amd64        Proxmox Virtual Environment Management Tools
ii  pve-qemu-kvm                         4.1.1-2                     amd64        Full virtualization on x86 hardware
ii  pve-xtermjs                          3.13.2-1                    all          HTML/JS Shell client


root@nas:~# cat /etc/fstab
UUID=bed1e672-ad90-49c0-bacc-6a7d635ac503 / xfs defaults,discard,noatime,nodiratime 0 1
UUID=DA73-D466 /boot/efi vfat defaults 0 1
UUID=7f8fbec9-f779-4af6-8ea5-ec296b48dc82 /data btrfs defaults,nofail,noatime,ssd,discard,compress=lzo 0 2
/data/vmfs.img  /data/vmfs_pve xfs defaults,discard,noatime,nodiratime 0 2
proc /proc proc defaults 0 0

root@nas:~# cat /proc/cmdline
BOOT_IMAGE=/boot/vmlinuz-5.3.13-1-pve root=UUID=bed1e672-ad90-49c0-bacc-6a7d635ac503 ro intel_iommu=on

root@nas:~# lscpu
Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
Address sizes:       46 bits physical, 48 bits virtual
CPU(s):              16
On-line CPU(s) list: 0-15
Thread(s) per core:  2
Core(s) per socket:  4
Socket(s):           2
NUMA node(s):        2
Vendor ID:           GenuineIntel
CPU family:          6
Model:               62
Model name:          Intel(R) Xeon(R) CPU E5-2637 v2 @ 3.50GHz
Stepping:            4
CPU MHz:             1723.883
CPU max MHz:         3800.0000
CPU min MHz:         1200.0000
BogoMIPS:            7000.48
Virtualization:      VT-x
L1d cache:           32K
L1i cache:           32K
L2 cache:            256K
L3 cache:            15360K
NUMA node0 CPU(s):   0-3,8-11
NUMA node1 CPU(s):   4-7,12-15
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm cpuid_fault epb pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms xsaveopt dtherm ida arat pln pts flush_l1d

root@nas:/etc/apt# pveversion -v
proxmox-ve: 6.1-2 (running kernel: 5.3.13-1-pve)
pve-manager: 6.1-3 (running version: 6.1-3/37248ce6)
pve-kernel-5.3: 6.1-1
pve-kernel-helper: 6.1-1
pve-kernel-5.0: 6.0-11
pve-kernel-5.3.13-1-pve: 5.3.13-1
pve-kernel-5.3.10-1-pve: 5.3.10-1
pve-kernel-5.0.21-5-pve: 5.0.21-10
pve-kernel-5.0.15-1-pve: 5.0.15-1
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.0.2-pve4
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: 0.8.35+pve1
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.13-pve1
libpve-access-control: 6.0-5
libpve-apiclient-perl: 3.0-2
libpve-common-perl: 6.0-9
libpve-guest-common-perl: 3.0-3
libpve-http-server-perl: 3.0-3
libpve-storage-perl: 6.1-2
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve3
lxc-pve: 3.2.1-1
lxcfs: 3.0.3-pve60
novnc-pve: 1.1.0-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.1-1
pve-cluster: 6.1-2
pve-container: 3.0-14
pve-docs: 6.1-3
pve-edk2-firmware: 2.20191127-1
pve-firewall: 4.0-9
pve-firmware: 3.0-4
pve-ha-manager: 3.0-8
pve-i18n: 2.0-3
pve-qemu-kvm: 4.1.1-2
pve-xtermjs: 3.13.2-1
qemu-server: 6.1-3
smartmontools: 7.0-pve2
spiceterm: 3.1-1
vncterm: 1.6-1
zfsutils-linux: 0.8.2-pve2

It's a Supermicro motherboard: https://www.supermicro.com/products/motherboard/Xeon/C600/X9DRW-3LN4F_.cfm
 
Last edited:
Working under the assumption that I was having some bad interaction between BTRFS and VM's I pulled the VM storage (vmfs) off of the BTRFS volume. After a clean reboot system still hard hung with in 5m of btrfs scrub started.... I am running out of ideas. Might need to fall back to proxmox 6.0.
 
Working under the assumption that I was having some bad interaction between BTRFS and VM's I pulled the VM storage (vmfs) off of the BTRFS volume. After a clean reboot system still hard hung with in 5m of btrfs scrub started.... I am running out of ideas. Might need to fall back to proxmox 6.0.
Or stop using btrfs which is unsupported in proxmox. If you insists on using btrfs avoid its raid5 or raid6 as the plague since it is unstable and has been for years. Read more here: https://forum.proxmox.com/threads/proxmox-with-zfs-or-btrfs.50962/
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!