For some weird reason my recently installed "new" NAS is randomly rebooting (approximatively 1-2 times each hour).
System:
- Supermicro X10SLL-F
- Intel Xeon E3-1270 V3
- 4 x 8GB Unbuffered ECC DDR3
- 2 x Crucial MX100 256GB
- 1 x IBM Exp ServeRAID M1015 (LSI 9220-8i) -> PCI-e pass-through to NAS VM
- 1 x MELLANOX CONNECTX-2 EN 10GBE
IPMI/BMC event log: no warning/error
Proxmox Installation
Community repository with the latest updates applied
root@pve72:~# pveversion -v
Issues appears on:
- pve-kernel-5.4.41-1-pve
Limit ZFS memory usage
According to https://pve.proxmox.com/wiki/ZFS_on_Linux#_limit_zfs_memory_usage, I explicitely try to limit ZFS memory usage to 4GB of the total 32GB:
root@pve72:~# cat /etc/modprobe.d/zfs.conf
Reduce Swappiness
For some reason, this is ignored. At each reboot sysctl vm.swappiness returns 60 (default).
root@pve72:~# cat /etc/sysctl.d/swappiness.conf
Putting the same in /etc/sysctl.conf yields the same result (setting is ignored).
VMs:
- 1 single Gentoo Linux 5.4.x VM (NAS) with 4vCPU and 12GB dedicated RAM
CPU temperatures:
Kind of hot (~ 70°C-75°C) during a ZFS snapshot transfer from old NAS to new NAS. Tried to increase fan PWM Duty cycle to 100% and open chassis to provide some more ventilation to the CPU.
Current investigation
I'm currently testing the older pve-kernel-5.3.18-3-pve to see if the issue appears also in that case.
I have several such systems (Xeon E3 v3) and (until now) didn't have problems with them. The current NAS is running on a E5 v2 though.
For reference, the current software configuration running on my current virtualized NAS
- Supermicro X9SRL-F
- 1 x Intel Xeon E5-2680 v2
- 8 x 32GB ECC Registered DDR3
root@pve04:/tools_nfs/Proxmox# pveversion -v
I see different versions of qemu, proxmox VE, kernel as well as ZFS.
Additional information
On the affected host, with the 5.4.x kernel, I see the following line in /var/log/messages just before a new system boot
This is the IBM M1015 / LSI SAS2008 controller that is pass-through to the NAS VM.
System:
- Supermicro X10SLL-F
- Intel Xeon E3-1270 V3
- 4 x 8GB Unbuffered ECC DDR3
- 2 x Crucial MX100 256GB
- 1 x IBM Exp ServeRAID M1015 (LSI 9220-8i) -> PCI-e pass-through to NAS VM
- 1 x MELLANOX CONNECTX-2 EN 10GBE
IPMI/BMC event log: no warning/error
Proxmox Installation
Community repository with the latest updates applied
root@pve72:~# pveversion -v
Code:
proxmox-ve: 6.2-1 (running kernel: 5.3.18-3-pve)
pve-manager: 6.2-4 (running version: 6.2-4/9824574a)
pve-kernel-5.4: 6.2-2
pve-kernel-helper: 6.2-2
pve-kernel-5.3: 6.1-6
pve-kernel-5.4.41-1-pve: 5.4.41-1
pve-kernel-5.3.18-3-pve: 5.3.18-3
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.0.3-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: residual config
ifupdown2: 2.0.1-1+pve8
libjs-extjs: 6.0.1-10
libknet1: 1.15-pve1
libproxmox-acme-perl: 1.0.4
libpve-access-control: 6.1-1
libpve-apiclient-perl: 3.0-3
libpve-common-perl: 6.1-2
libpve-guest-common-perl: 3.0-10
libpve-http-server-perl: 3.0-5
libpve-storage-perl: 6.1-8
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.2-1
lxcfs: 4.0.3-pve2
novnc-pve: 1.1.0-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.2-1
pve-cluster: 6.1-8
pve-container: 3.1-6
pve-docs: 6.2-4
pve-edk2-firmware: 2.20200229-1
pve-firewall: 4.1-2
pve-firmware: 3.1-1
pve-ha-manager: 3.0-9
pve-i18n: 2.1-2
pve-qemu-kvm: 5.0.0-2
pve-xtermjs: 4.3.0-1
qemu-server: 6.2-2
smartmontools: 7.1-pve2
spiceterm: 3.1-1
vncterm: 1.6-1
zfsutils-linux: 0.8.4-pve1
Issues appears on:
- pve-kernel-5.4.41-1-pve
Limit ZFS memory usage
According to https://pve.proxmox.com/wiki/ZFS_on_Linux#_limit_zfs_memory_usage, I explicitely try to limit ZFS memory usage to 4GB of the total 32GB:
root@pve72:~# cat /etc/modprobe.d/zfs.conf
Code:
# Limit RAM usage to 4GB maximum
options zfs zfs_arc_max=4294967296
Reduce Swappiness
For some reason, this is ignored. At each reboot sysctl vm.swappiness returns 60 (default).
root@pve72:~# cat /etc/sysctl.d/swappiness.conf
Code:
# Reduce swappiness to avoid high IO load
vm.swappiness = 10
Putting the same in /etc/sysctl.conf yields the same result (setting is ignored).
VMs:
- 1 single Gentoo Linux 5.4.x VM (NAS) with 4vCPU and 12GB dedicated RAM
CPU temperatures:
Kind of hot (~ 70°C-75°C) during a ZFS snapshot transfer from old NAS to new NAS. Tried to increase fan PWM Duty cycle to 100% and open chassis to provide some more ventilation to the CPU.
Current investigation
I'm currently testing the older pve-kernel-5.3.18-3-pve to see if the issue appears also in that case.
I have several such systems (Xeon E3 v3) and (until now) didn't have problems with them. The current NAS is running on a E5 v2 though.
For reference, the current software configuration running on my current virtualized NAS
- Supermicro X9SRL-F
- 1 x Intel Xeon E5-2680 v2
- 8 x 32GB ECC Registered DDR3
root@pve04:/tools_nfs/Proxmox# pveversion -v
Code:
proxmox-ve: 6.1-2 (running kernel: 5.3.18-3-pve)
pve-manager: 6.1-8 (running version: 6.1-8/806edfe1)
pve-kernel-helper: 6.1-8
pve-kernel-5.3: 6.1-6
pve-kernel-5.3.18-3-pve: 5.3.18-3
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.0.3-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: 0.8.35+pve1
libjs-extjs: 6.0.1-10
libknet1: 1.15-pve1
libpve-access-control: 6.0-6
libpve-apiclient-perl: 3.0-3
libpve-common-perl: 6.0-17
libpve-guest-common-perl: 3.0-5
libpve-http-server-perl: 3.0-5
libpve-storage-perl: 6.1-5
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 3.2.1-1
lxcfs: 4.0.1-pve1
novnc-pve: 1.1.0-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.1-3
pve-cluster: 6.1-4
pve-container: 3.0-23
pve-docs: 6.1-6
pve-edk2-firmware: 2.20200229-1
pve-firewall: 4.0-10
pve-firmware: 3.0-7
pve-ha-manager: 3.0-9
pve-i18n: 2.0-4
pve-qemu-kvm: 4.1.1-4
pve-xtermjs: 4.3.0-1
qemu-server: 6.1-7
smartmontools: 7.1-pve2
spiceterm: 3.1-1
vncterm: 1.6-1
zfsutils-linux: 0.8.3-pve1
I see different versions of qemu, proxmox VE, kernel as well as ZFS.
Additional information
On the affected host, with the 5.4.x kernel, I see the following line in /var/log/messages just before a new system boot
Code:
vfio-pci 0000:05:00.0: Invalid PCI ROM header signature: expecting 0xaa55, got 0xffff
Attachments
Last edited: