AMD EPYC 7401P 24-Core and SSDs for ZFS

spyfly

Active Member
Jul 22, 2017
30
0
26
22
Seems my Proxmox Install is regularly crashing/rebooting at least from what journalctl says:

Code:
Dec 23 10:13:00 epyc1 systemd[1]: Starting Proxmox VE replication runner...
Dec 23 10:13:00 epyc1 systemd[1]: Started Proxmox VE replication runner.
-- Reboot --
Dec 23 10:14:51 epyc1 kernel: random: get_random_bytes called from start_kernel+0x42/0x4f3 with crng_init
Dec 23 10:14:51 epyc1 kernel: Linux version 4.13.13-2-pve (root@nora) (gcc version 6.3.0 20170516 (Debian
Dec 23 10:14:51 epyc1 kernel: Command line: BOOT_IMAGE=/ROOT/pve-1@/boot/vmlinuz-4.13.13-2-pve root=ZFS=r
Dec 23 10:14:51 epyc1 kernel: KERNEL supported cpus:
Dec 23 10:14:51 epyc1 kernel:   Intel GenuineIntel
Dec 23 10:14:51 epyc1 kernel:   AMD AuthenticAMD
Dec 23 10:14:51 epyc1 kernel:   Centaur CentaurHauls

Any ideas how to debug this further? coredumpctl doesn't list any coredumps after being enabled and nothing can be found in the journal.
 

DerDanilo

Renowned Member
Jan 21, 2017
457
109
63
We will run the system itself on 2 additional SSDs in Software Raid 1 and want to use the 2x 1,92TB SSDs for a dedicated ZFS pool.
@proxmox team: Any idea why this happens with ZFS? Do you change anything in the PVE Kernel that might cause this issue?

Shouldn't the ZFS Memory consumtin honor the settings? Why doesn't it seem to honor those?
Why does it freeze the system?
 

DerDanilo

Renowned Member
Jan 21, 2017
457
109
63
Seems my Proxmox Install is regularly crashing/rebooting at least from what journalctl says:

Code:
Dec 23 10:13:00 epyc1 systemd[1]: Starting Proxmox VE replication runner...
Dec 23 10:13:00 epyc1 systemd[1]: Started Proxmox VE replication runner.
-- Reboot --
Dec 23 10:14:51 epyc1 kernel: random: get_random_bytes called from start_kernel+0x42/0x4f3 with crng_init
Dec 23 10:14:51 epyc1 kernel: Linux version 4.13.13-2-pve (root@nora) (gcc version 6.3.0 20170516 (Debian
Dec 23 10:14:51 epyc1 kernel: Command line: BOOT_IMAGE=/ROOT/pve-1@/boot/vmlinuz-4.13.13-2-pve root=ZFS=r
Dec 23 10:14:51 epyc1 kernel: KERNEL supported cpus:
Dec 23 10:14:51 epyc1 kernel:   Intel GenuineIntel
Dec 23 10:14:51 epyc1 kernel:   AMD AuthenticAMD
Dec 23 10:14:51 epyc1 kernel:   Centaur CentaurHauls

Any ideas how to debug this further? coredumpctl doesn't list any coredumps after being enabled and nothing can be found in the journal.
Single-Node Setup right?
 

DerDanilo

Renowned Member
Jan 21, 2017
457
109
63
Regarding the reported problems with PVE and ZFS storage:

I just setup a nested PVE Cluster with 1 Disk for System + 2 Disks for ZFS Raid1. Limited ZFS RAM usage, deployed some LXC container, activated replication tasks and now let's see how this is going.
Update:
So far so good.
 
Last edited:

DerDanilo

Renowned Member
Jan 21, 2017
457
109
63
So the ZFS in RAID1 is stable now?

I'd like try using PVE 5.1+ with the SX 131 machine using Raidz2. Since the rescue image does not support ZFS I'll have to install via ISO. But this is no issue since the Hetzner personal connects a pen drive with PVE to machines if asked nicely.
 

spyfly

Active Member
Jul 22, 2017
30
0
26
22
So the ZFS in RAID1 is stable now?

I'd like try using PVE 5.1+ with the SX 131 machine using Raidz2. Since the rescue image does not support ZFS I'll have to install via ISO. But this is no issue since the Hetzner personal connects a pen drive with PVE to machines if asked nicely.

ZFS in RAID 1 works perfectly fine. In the Hetzner KVM/Lara Console you can mount a ISO Image, so you won't need an pen drive.
 

DerDanilo

Renowned Member
Jan 21, 2017
457
109
63
I know about how to mount ISOs. But the LARA doesn't always work properly. And for that reason Hetzner has a set of different latest images on pen drives prepaired already. Just ask for it and it's even faster than having to mount anything.
Thanks for the hint though. :)
 

DerDanilo

Renowned Member
Jan 21, 2017
457
109
63
I will set up a node into a cluster this week and can provide feedback afterwards.

Running a Ryzon CPU with Proxmox for some weeks already wIthout issues.
 

J.Carlos

Member
Oct 23, 2017
21
2
8
29
Hi,

The problems comes when the node gets on idle and can´t recovery from that state

I fixed all my problems with the latest kernel and disabling c-states.


For me is now stable.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get your own in 60 seconds.

Buy now!