AMD EPYC 7401P 24-Core and SSDs for ZFS

Seems my Proxmox Install is regularly crashing/rebooting at least from what journalctl says:

Code:
Dec 23 10:13:00 epyc1 systemd[1]: Starting Proxmox VE replication runner...
Dec 23 10:13:00 epyc1 systemd[1]: Started Proxmox VE replication runner.
-- Reboot --
Dec 23 10:14:51 epyc1 kernel: random: get_random_bytes called from start_kernel+0x42/0x4f3 with crng_init
Dec 23 10:14:51 epyc1 kernel: Linux version 4.13.13-2-pve (root@nora) (gcc version 6.3.0 20170516 (Debian
Dec 23 10:14:51 epyc1 kernel: Command line: BOOT_IMAGE=/ROOT/pve-1@/boot/vmlinuz-4.13.13-2-pve root=ZFS=r
Dec 23 10:14:51 epyc1 kernel: KERNEL supported cpus:
Dec 23 10:14:51 epyc1 kernel:   Intel GenuineIntel
Dec 23 10:14:51 epyc1 kernel:   AMD AuthenticAMD
Dec 23 10:14:51 epyc1 kernel:   Centaur CentaurHauls

Any ideas how to debug this further? coredumpctl doesn't list any coredumps after being enabled and nothing can be found in the journal.
 
We will run the system itself on 2 additional SSDs in Software Raid 1 and want to use the 2x 1,92TB SSDs for a dedicated ZFS pool.
@proxmox team: Any idea why this happens with ZFS? Do you change anything in the PVE Kernel that might cause this issue?

Shouldn't the ZFS Memory consumtin honor the settings? Why doesn't it seem to honor those?
Why does it freeze the system?
 
What kind of DIsks are you using? I had EX41S-SSD Servers running with ZFS in RAID-1 with SSDs for 2 months without any ZFS related crashes.
Did you install the ZFS Raid-1 with the PVE ISO or by hand via installimage/rescue?
 
Seems my Proxmox Install is regularly crashing/rebooting at least from what journalctl says:

Code:
Dec 23 10:13:00 epyc1 systemd[1]: Starting Proxmox VE replication runner...
Dec 23 10:13:00 epyc1 systemd[1]: Started Proxmox VE replication runner.
-- Reboot --
Dec 23 10:14:51 epyc1 kernel: random: get_random_bytes called from start_kernel+0x42/0x4f3 with crng_init
Dec 23 10:14:51 epyc1 kernel: Linux version 4.13.13-2-pve (root@nora) (gcc version 6.3.0 20170516 (Debian
Dec 23 10:14:51 epyc1 kernel: Command line: BOOT_IMAGE=/ROOT/pve-1@/boot/vmlinuz-4.13.13-2-pve root=ZFS=r
Dec 23 10:14:51 epyc1 kernel: KERNEL supported cpus:
Dec 23 10:14:51 epyc1 kernel:   Intel GenuineIntel
Dec 23 10:14:51 epyc1 kernel:   AMD AuthenticAMD
Dec 23 10:14:51 epyc1 kernel:   Centaur CentaurHauls

Any ideas how to debug this further? coredumpctl doesn't list any coredumps after being enabled and nothing can be found in the journal.
Single-Node Setup right?
 
Regarding the reported problems with PVE and ZFS storage:

I just setup a nested PVE Cluster with 1 Disk for System + 2 Disks for ZFS Raid1. Limited ZFS RAM usage, deployed some LXC container, activated replication tasks and now let's see how this is going.
Update:
So far so good.
 
Last edited:
Seems like Hetzner solved the crashes by reducing the ram clock to 2400 Mhz. Runs fine now.
 
So the ZFS in RAID1 is stable now?

I'd like try using PVE 5.1+ with the SX 131 machine using Raidz2. Since the rescue image does not support ZFS I'll have to install via ISO. But this is no issue since the Hetzner personal connects a pen drive with PVE to machines if asked nicely.
 
So the ZFS in RAID1 is stable now?

I'd like try using PVE 5.1+ with the SX 131 machine using Raidz2. Since the rescue image does not support ZFS I'll have to install via ISO. But this is no issue since the Hetzner personal connects a pen drive with PVE to machines if asked nicely.

ZFS in RAID 1 works perfectly fine. In the Hetzner KVM/Lara Console you can mount a ISO Image, so you won't need an pen drive.
 
I know about how to mount ISOs. But the LARA doesn't always work properly. And for that reason Hetzner has a set of different latest images on pen drives prepaired already. Just ask for it and it's even faster than having to mount anything.
Thanks for the hint though. :)
 
I will set up a node into a cluster this week and can provide feedback afterwards.

Running a Ryzon CPU with Proxmox for some weeks already wIthout issues.
 
Hi,

The problems comes when the node gets on idle and can´t recovery from that state

I fixed all my problems with the latest kernel and disabling c-states.


For me is now stable.
 
The new server runs stable and fast in Falkenstein. Nice to see 48 cores in the dashboard. :)
No adjustment required so far.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!