I thought that this was related to privileged containers, but later I caught similar freezes on unprivileged containers too. What was common in all cases was a preliminary host update:
Start-Date: 2024-03-02 14:47:02
Commandline: apt dist-upgrade
Install: pve-kernel-5.15.143-1-pve:amd64...
I have repeatedly come across a situation in which updating the host (kernel and some other components) leads to the fact that when the container is stopped, the host freezes.
Is it possible to solve this problem in some way, in addition to the method of first stopping all containers before...
May be next python script will helps proxmox developers: - https://github.com/jimsalterjrs/ioztat - it just parse /proc/spl/kstat/zfs/<pool>/objset-* (on zfs 0.8):
# ioztat -xS -s operations
operations throughput opsize
dataset read write...
Thanks, I know about this document and I don't find NUMA architecture in EPYC (Zen 2) unpleasant, on the contrary - a lot of customization options make it flexible.
About Ceph, current bench show next (also not any tunings, just waited few days):
# rados bench -p bench 30 -t 256 -b 1024 write...
@FXKai thx!
Our network stack: Cisco Nexus 3172 (with n9k firmware) with Intel 82599ES
CPU NUMA on 2xAMD EPYC 7702 64-Core Processor:
NUMA node0 CPU(s): 0-63,128-191
NUMA node1 CPU(s): 64-127,192-255
Hi FXKai!
Can you explain next, please:
1) which switch using in your cluster?
2) mellanox drivers from pve or installed expetialy?
3) do you use dpdk?
I have same problem with:
# ceph --version
ceph version 16.2.7 (f9aa029788115b5df5eeee328f584156565ee5b7) pacific (stable)
8 nodes (CPU 2xEPYC 64 cores/RAM 2TB/Eth 2x10Gbit/s LACP), fresh install pve 7.1
2 nvme SSDPE2KE076T8 7.68TB per node used for CEPH, each nvme device splitted on 4 pices...
Confirm
update from
# pveversion
pve-manager/7.1-10/6ddebafe (running kernel: 5.13.19-6-pve)
to
# pveversion
pve-manager/7.1-12/b3c09de3 (running kernel: 5.13.19-6-pve)
after this message:
ovs-vswitchd.service is a disabled or a static unit not running, not starting it.
network is disable...
Also for comparsion XFS (direct) IO:
# mount | grep xfs | grep test
/dev/nvme6n1p1 on /mnt/test1 type xfs (rw,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota)
# fio --time_based --name=benchmark --size=15G --runtime=30 --filename=/mnt/test1/test.file --ioengine=libaio --randrepeat=0...
Only increase zfs_dirty_data_max (4294967296 -> 10737418240 -> 21474836480 -> 42949672960) compensate performance penalties, but this is background record same slow per nvme devices ~10k iops per device:
# fio --time_based --name=benchmark --size=15G --runtime=30 --filename=/mnt/zfs/g-fio.test...
AND:
man fio:
I/O size
size=int
The total size of file I/O for each thread of this job. Fio will run until this many bytes has been transferred, unless runtime is limited by other options (such as runtime, for in‐
stance, or increased/decreased by...
FYI (about poor performance ZFS with 4k):
ZFS (NVME SSD x4 in RAIDZ1 and 1 NVME SSD for LOG):
# zpool get all | egrep 'ashift|trim'
zfs-p1 ashift 13 local
zfs-p1 autotrim on local
# zfs get...
This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
By continuing to use this site, you are consenting to our use of cookies.