one more Info,
with my setup I get kernel panic, have a look at the screenshot attached
my system is a skylake Fujitsu D3417-B Mainboard with 64GB RAM and one E3-1245-v5 XEON
Proxmox is installed on the new 128 GB NVMe SSD with ext4
Currently, only one VM, a Windows 2008 R2 server is running on this system, with 16 GB virtual RAM
Code:
...
nvme0n1 259:0 0 119.2G 0 disk
├─nvme0n1p1 259:1 0 1007K 0 part
├─nvme0n1p2 259:2 0 127M 0 part
└─nvme0n1p3 259:3 0 119.1G 0 part
├─pve-root 251:0 0 29.8G 0 lvm /
├─pve-swap 251:1 0 4G 0 lvm
├─pve-data_tmeta 251:2 0 72M 0 lvm
│ └─pve-data 251:4 0 70.5G 0 lvm
└─pve-data_tdata 251:3 0 70.5G 0 lvm
└─pve-data 251:4 0 70.5G 0 lvm
As you can see, I give it only 4GB for swap.
and a ZFS Raid 10 pool with 4x 2 TB 4kn hard drives, as described above
Also I get an ACPI Error:
Code:
root@oprox:~# dmesg | grep -i error
....
[ 0.581966] ACPI Error: [\_SB_.PCI0.LPCB.H_EC.ECAV] Namespace lookup failure, AE_NOT_FOUND (20150930/psargs-359)
[ 0.581970] ACPI Error: Method parse/execution failed [\_TZ.FNCL] (Node ffff880fed91f4d8), AE_NOT_FOUND (20150930/psparse-542)
[ 0.581976] ACPI Error: Method parse/execution failed [\_TZ.FN02._ON] (Node ffff880fed91f168), AE_NOT_FOUND (20150930/psparse-542)
[ 0.589970] ACPI Error: [\_SB_.PCI0.LPCB.H_EC.ECAV] Namespace lookup failure, AE_NOT_FOUND (20150930/psargs-359)
[ 0.589975] ACPI Error: Method parse/execution failed [\_TZ.FNCL] (Node ffff880fed91f4d8), AE_NOT_FOUND (20150930/psparse-542)
[ 0.589980] ACPI Error: Method parse/execution failed [\_TZ.FN02._ON] (Node ffff880fed91f168), AE_NOT_FOUND (20150930/psparse-542)
[ 0.610004] ACPI Error: [\_SB_.PCI0.LPCB.H_EC.ECAV] Namespace lookup failure, AE_NOT_FOUND (20150930/psargs-359)
-------------------------------------------------------------------------
Well, with these many errors and the kernel panic I made some adjustments. On the one hand, to talk about what is going on, on the other hand to listen to your opinion.
1:
ssd check with
Code:
$ smartctl /dev/nvme0 -x
SMART/Health Information (NVMe Log 0x02, NSID 0xffffffff)
Critical Warning: 0x00
Temperature: 45 Celsius
Available Spare: 100%
Available Spare Threshold: 10%
Percentage Used: 0%
Data Units Read: 1,303,319 [667 GB]
Data Units Written: 799,662 [409 GB]
Host Read Commands: 12,615,280
Host Write Commands: 5,916,573
Controller Busy Time: 92
Power Cycles: 23
Power On Hours: 373
Unsafe Shutdowns: 2
Media and Data Integrity Errors: 0
Error Information Log Entries: 0
Warning Comp. Temperature Time: 0
Critical Comp. Temperature Time: 0
Error Information (NVMe Log 0x01, max 64 entries)
No Errors Logged
As you can see there are no Problems with the NVMe SSD
2:
build a weekly chron job for trim
Code:
root@oprox:~# cat /etc/cron.weekly/fstrim
#!/bin/sh
## trim das root / file systems welches auf der NVMe SSD liegt
## /sbin/fstrim --all || true
LOG=/var/log/batched_discard.log
echo "*** $(date -R) ***" >> $LOG
/sbin/fstrim -v / >> $LOG
##/sbin/fstrim -v /home >> $LOG
3:
Concerning the ACPI Errors:
The only solution to this problem is either a BIOS update or a suitable kernel
Both are up to date
Therefore, I first have as workaround all energy saving modes disabled:
Code:
$ systemctl mask sleep.target suspend.target hibernate.target hybrid-sleep.target
In the BIOS I
- disable the Intel AMT (Intel Active Management Technology)
- and let the "Package C State limit in auto mode
4:
Now I have completely changed the swapping
4a: I switched the swapping from the SSD to the ZFS Raid 10.
Once to release the SSD, and with 64GB I have enough RAM, and it is more easy to build a bigger swap file on the ZFS pool
Code:
$ swapoff -a
root@oprox:~$ zfs create -V 16G -b $(getconf PAGESIZE) \
-o logbias=throughput -o sync=always \
-o primarycache=metadata -o secondarycache=none \
-o com.sun:auto-snapshot=false r10pool/swap
$ mkswap -f /dev/zvol/r10pool/swap
$ swapon /dev/zvol/r10pool/swap
And I have exchanged the "swap" line in the /etc/fstab to:
Code:
/dev/zvol/r10pool/swap none swap defaults 0 0
4b: I configure the swappiness
to look was is going on:
Code:
$ cat /proc/sys/vm/swappiness => 60
in the /etc/sysctl.conf with
and with:
you can change it immediately
With 1 I changed it to the lowest value (0 = disabled)
the recommended value in the wiki is = 10
have a look at:
Code:
$ cat /proc/swaps
$ cat /proc/sys/vm/swappiness
$ free -hm
5:
Now I limited the ARC cache
Code:
root@oprox:~# cat /etc/modprobe.d/zfs.conf
## den von zfs arc max zu verwendende RAM-Speicher:
## in Byte, hier =
## 2GB=2147483648
## 4GB=4294967296
## 8GB=8589934592,
## 12GB=12884901888:
options zfs zfs_arc_min=2147483648
options zfs zfs_arc_max=12884901888
now you have to:
and have a look at:
Code:
$ arcstat.py 3 4
$ cat /proc/spl/kstat/zfs/arcstats
$ cat /sys/module/zfs/parameters/zfs_arc_max
Now I have to test everything.
About feedback I would be pleased
regards,
maxprox