Good day,
I've been clueless for the last 2-3 days since my Truenas vm keeps crashing randomly (with error logs) and sometimes the Proxmox7 host itself reboot (but without any error logs or at least I didn't know were to find them.) That was not happening or at least I can't recall when it happened last since now this is happening almost every 4-5 hours.
I have a PCI passthrough for that VM (SAS2008 PCI-Express Fusion-MPT SAS-2 [Falcon]) with 8 disks attached configured in the vm to make a ZFS R2 volume with 1 spare.
The vm logs shows as follow that something seams to be related to the disks but when I ran individually smartctl -t long / smartctl -a on each of them I couldn't find any error. "All functions" and "ROM-bar" are disabled for that passthrough.
At that point I'm considering managing the zfs volume at the host itself to avoid the passthrough just to test if it will still crash but that would represent a good level of effort for me and the main reason why I did it that way initially was because I would get clear email notifications from Truenas when one of the disk was getting fragile / broken (that happened in the past) and I don't know if Proxmox itself can achieve that.
Any advice / recommendations appreciated
VM configuration:
Here is some samples of the crash I get in truenas :
On the host itself, I can't find any error log.. in the case below, the host rebooted at around 00:40.
kernel.log :
syslog :
...
messages :
I've been clueless for the last 2-3 days since my Truenas vm keeps crashing randomly (with error logs) and sometimes the Proxmox7 host itself reboot (but without any error logs or at least I didn't know were to find them.) That was not happening or at least I can't recall when it happened last since now this is happening almost every 4-5 hours.
I have a PCI passthrough for that VM (SAS2008 PCI-Express Fusion-MPT SAS-2 [Falcon]) with 8 disks attached configured in the vm to make a ZFS R2 volume with 1 spare.
The vm logs shows as follow that something seams to be related to the disks but when I ran individually smartctl -t long / smartctl -a on each of them I couldn't find any error. "All functions" and "ROM-bar" are disabled for that passthrough.
At that point I'm considering managing the zfs volume at the host itself to avoid the passthrough just to test if it will still crash but that would represent a good level of effort for me and the main reason why I did it that way initially was because I would get clear email notifications from Truenas when one of the disk was getting fragile / broken (that happened in the past) and I don't know if Proxmox itself can achieve that.
Any advice / recommendations appreciated
VM configuration:
Code:
root@pve:/etc/pve/qemu-server# cat 1000.conf
agent: 1
bios: ovmf
boot: order=scsi0;ide2
cores: 16
hostpci0: 0000:04:00.0,rombar=0
hotplug: disk,network
ide2: local:iso/TrueNAS-12.0-U2.1.iso,media=cdrom,size=917476K
memory: 16384
name: TrueNas
net0: virtio=DE:DA:F0:37:CF:C9,bridge=vmbr0,firewall=1
numa: 1
ostype: l26
protection: 1
scsi0: R1_1.6TB_SSD_EVO860:vm-1000-disk-0,cache=writeback,discard=on,size=32G
scsihw: virtio-scsi-pci
smbios1: uuid=zzz
sockets: 1
startup: order=2,up=30
vga: std
vmgenid: zzz
Here is some samples of the crash I get in truenas :
Code:
cat /data/crash/info.0
Dump header from device: /dev/da8p1
Architecture: amd64
Architecture Version: 4
Dump Length: 602112
Blocksize: 512
Compression: none
Dumptime: Sun Jul 18 22:09:22 2021
Hostname: truenas.zzzzzz
Magic: FreeBSD Text Dump
Version String: FreeBSD 12.2-RELEASE-p6 df578562304(HEAD) TRUENAS
Panic String: general protection fault
Dump Parity: 4158703732
Bounds: 0
Dump Status: good
cat /data/crash/info.1
Dump header from device: /dev/da6p1
Architecture: amd64
Architecture Version: 4
Dump Length: 630784
Blocksize: 512
Compression: none
Dumptime: Sat Jul 17 18:55:51 2021
Hostname: truenas.zzzzzz
Magic: FreeBSD Text Dump
Version String: FreeBSD 12.2-RELEASE-p6 df578562304(HEAD) TRUENAS
Panic String: page fault
Dump Parity: 1491579462
Bounds: 1
Dump Status: good
cat /data/crash/info.2
Dump header from device: /dev/da7p1
Architecture: amd64
Architecture Version: 4
Dump Length: 624128
Blocksize: 512
Compression: none
Dumptime: Sun Jul 18 01:45:36 2021
Hostname: truenas.zzzzzz
Magic: FreeBSD Text Dump
Version String: FreeBSD 12.2-RELEASE-p6 df578562304(HEAD) TRUENAS
Panic String: page fault
Dump Parity: 1334293062
Bounds: 2
Dump Status: good
cat /data/crash/info.3
Dump header from device: /dev/da8p1
Architecture: amd64
Architecture Version: 4
Dump Length: 631296
Blocksize: 512
Compression: none
Dumptime: Sun Jul 18 21:06:08 2021
Hostname: truenas.zzzzzz
Magic: FreeBSD Text Dump
Version String: FreeBSD 12.2-RELEASE-p6 df578562304(HEAD) TRUENAS
Panic String: privileged instruction fault
Dump Parity: 517715731
Bounds: 4
Dump Status: good
On the host itself, I can't find any error log.. in the case below, the host rebooted at around 00:40.
kernel.log :
Code:
...
Jul 18 21:27:07 pve kernel: [31256.199735] fwbr1000i0: port 2(tap1000i0) entered forwarding state
Jul 19 00:44:02 pve kernel: [ 0.000000] Linux version 5.11.22-1-pve (build@proxmox) (gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.
2) #1 SMP PVE 5.11.22-2 (Fri, 02 Jul 2021 16:22:45 +0200) ()
...
syslog :
...
Code:
Jul 19 00:42:00 pve systemd[1]: Starting Proxmox VE replication runner...
Jul 19 00:42:01 pve systemd[1]: pvesr.service: Succeeded.
Jul 19 00:42:01 pve systemd[1]: Finished Proxmox VE replication runner.
-- Reboot --
Jul 19 00:44:00 pve kernel: Linux version 5.11.22-1-pve (build@proxmox) (gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2) #1 SMP PVE 5.11.22-2 (Fri, 02 Jul 2021 16:22:45 +0200) ()
Jul 19 00:44:00 pve kernel: Command line: initrd=\EFI\proxmox\5.11.22-1-pve\initrd.img-5.11.22-1-pve root=ZFS=rpool/ROOT/pve-1 boot=zfs
...
messages :
Code:
...
Jul 18 21:27:07 pve kernel: [31256.199735] fwbr1000i0: port 2(tap1000i0) entered forwarding state
Jul 19 00:44:02 pve kernel: [ 0.000000] Linux version 5.11.22-1-pve (build@proxmox) (gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2) #1 SMP PVE 5.11.22-2 (Fri, 02 Jul 2021 16:22:45 +0200) ()
...