[SOLVED] Problems with zfs after major proxmox update

jcesclapez

Member
Mar 2, 2021
11
1
8
32
My problem started after updating proxmox to version 6.


I had four nodes running proxmox 5.X with a uptime of about 600 days, after upgrading to 6.3, only one node starts freezing every two day.

This is de syslog information

Mar 2 11:06:38 genespx4 kernel: [145843.806614] INFO: task zvol:607 blocked for more than 120 seconds.
Mar 2 11:06:38 genespx4 kernel: [145843.806679] Tainted: P O 5.4.98-1-pve #1
Mar 2 11:06:38 genespx4 kernel: [145843.806723] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Mar 2 11:06:38 genespx4 kernel: [145843.806779] zvol D 0 607 2 0x80004000
Mar 2 11:06:38 genespx4 kernel: [145843.806784] Call Trace:
Mar 2 11:06:38 genespx4 kernel: [145843.806802] __schedule+0x2e6/0x6f0
Mar 2 11:06:38 genespx4 kernel: [145843.806805] schedule+0x33/0xa0
Mar 2 11:06:38 genespx4 kernel: [145843.806819] cv_wait_common+0x104/0x130 [spl]
Mar 2 11:06:38 genespx4 kernel: [145843.806827] ? wait_woken+0x80/0x80
Mar 2 11:06:38 genespx4 kernel: [145843.806836] __cv_wait+0x15/0x20 [spl]
Mar 2 11:06:38 genespx4 kernel: [145843.806964] zil_commit_impl+0x241/0xdb0 [zfs]
Mar 2 11:06:38 genespx4 kernel: [145843.807076] zil_commit+0x3d/0x60 [zfs]
Mar 2 11:06:38 genespx4 kernel: [145843.807181] zvol_write+0x325/0x4e0 [zfs]
Mar 2 11:06:38 genespx4 kernel: [145843.807192] taskq_thread+0x2f7/0x4e0 [spl]
Mar 2 11:06:38 genespx4 kernel: [145843.807200] ? wake_up_q+0x80/0x80
Mar 2 11:06:38 genespx4 kernel: [145843.807306] ? zvol_os_create_minor+0x7a0/0x7a0 [zfs]
Mar 2 11:06:38 genespx4 kernel: [145843.807312] kthread+0x120/0x140
Mar 2 11:06:38 genespx4 kernel: [145843.807321] ? task_done+0xb0/0xb0 [spl]
Mar 2 11:06:38 genespx4 kernel: [145843.807324] ? kthread_park+0x90/0x90
Mar 2 11:06:38 genespx4 kernel: [145843.807329] ret_from_fork+0x35/0x40
Mar 2 11:06:38 genespx4 kernel: [145843.807347] INFO: task z_wr_iss:1058 blocked for more than 120 seconds.
Mar 2 11:06:38 genespx4 kernel: [145843.807398] Tainted: P O 5.4.98-1-pve #1
Mar 2 11:06:38 genespx4 kernel: [145843.807441] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Mar 2 11:06:38 genespx4 kernel: [145843.807496] z_wr_iss D 0 1058 2 0x80004000

Mar 2 11:08:44 genespx4 zed: eid=1464 class=deadman pool='rpool' vdev=sdb1 size=49152 offset=578887012352 priority=3 err=0 flags=0x180880 bookmark=142:1:2:4
Mar 2 11:08:44 genespx4 zed: eid=1465 class=deadman pool='rpool' vdev=sdb1 size=49152 offset=773487357952 priority=3 err=0 flags=0x180880 bookmark=142:1:2:4
Mar 2 11:08:44 genespx4 zed: eid=1466 class=deadman pool='rpool' vdev=sdb1 size=24576 offset=581936488448 priority=3 err=0 flags=0x180880 bookmark=142:1:1:4943
Mar 2 11:08:45 genespx4 zed: eid=1467 class=deadman pool='rpool' vdev=sdb1 size=24576 offset=780181196800 priority=3 err=0 flags=0x180880 bookmark=142:1:1:4943

Code:
root@genespx4:~# pveversion -v
proxmox-ve: 6.3-1 (running kernel: 5.4.98-1-pve)
pve-manager: 6.3-4 (running version: 6.3-4/0a38c56f)
pve-kernel-5.4: 6.3-5
pve-kernel-helper: 6.3-5
pve-kernel-5.4.98-1-pve: 5.4.98-1
pve-kernel-5.4.78-2-pve: 5.4.78-2
pve-kernel-4.15: 5.4-19
pve-kernel-4.15.18-30-pve: 4.15.18-58
pve-kernel-4.15.18-21-pve: 4.15.18-48
pve-kernel-4.15.18-20-pve: 4.15.18-46
pve-kernel-4.15.18-12-pve: 4.15.18-36
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.1.0-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: 0.8.35+pve1
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.20-pve1
libproxmox-acme-perl: 1.0.7
libproxmox-backup-qemu0: 1.0.3-1
libpve-access-control: 6.1-3
libpve-apiclient-perl: 3.1-3
libpve-common-perl: 6.3-4
libpve-guest-common-perl: 3.1-5
libpve-http-server-perl: 3.1-1
libpve-storage-perl: 6.3-7
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.6-2
lxcfs: 4.0.6-pve1
novnc-pve: 1.1.0-1
openvswitch-switch: 2.12.3-1
proxmox-backup-client: 1.0.8-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.4-5
pve-cluster: 6.2-1
pve-container: 3.3-4
pve-docs: 6.3-1
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-3
pve-firmware: 3.2-2
pve-ha-manager: 3.1-1
pve-i18n: 2.2-2
pve-qemu-kvm: 5.2.0-2
pve-xtermjs: 4.7.0-3
qemu-server: 6.3-5
smartmontools: 7.1-pve2
spiceterm: 3.1-1
vncterm: 1.6-2
zfsutils-linux: 2.0.3-pve1


Tried adding more ram for zfs in modprobe.d zfs.conf, also disabling zfs options zfs_vdev_scheduler = none by another proxmox forum post.
And of course checking the disk, SMART and doing performance tests.

I can't reproduce this error, it just happens over time


Does anyone know what may be happening?

Thanks in advance.
 
Last edited:
In addition to what @avw asked:
How is the pool configured:
Code:
zpool status
any other messages in `dmesg` or the system journal?
you could also try to scrub the zpool (keep in mind that this causes some I/O and could make the situation worse)

the issue does not directly seem related to the other thread you linked (different hardware - there the issue might be related to qemu)
 
Thanks for reply.

zpool status output
Code:
root@genespx4:~# zpool status
  pool: rpool
 state: ONLINE
  scan: scrub repaired 0B in 00:38:04 with 0 errors on Sun Feb 28 13:01:32 2021
config:

        NAME        STATE     READ WRITE CKSUM
        rpool       ONLINE       0     0     0
          mirror-0  ONLINE       0     0     0
            sda     ONLINE       0     0     0
            sdb     ONLINE       0     0     0

errors: No known data errors

I can do zpool scrub rpool, it finish without any problem.


last dmesg output

Code:
[    5.724652] device-mapper: thin: Data device (dm-3) discard unsupported: Disabling discard passdown.
[    5.924415] raid6: sse2x4   gen()  4940 MB/s
[    5.972417] raid6: sse2x4   xor()  3272 MB/s
[    6.020414] raid6: sse2x2   gen()  4182 MB/s
[    6.068408] raid6: sse2x2   xor()  2804 MB/s
[    6.116410] raid6: sse2x1   gen()  5456 MB/s
[    6.164404] raid6: sse2x1   xor()  5468 MB/s
[    6.164405] raid6: using algorithm sse2x1 gen() 5456 MB/s
[    6.164406] raid6: .... xor() 5468 MB/s, rmw enabled
[    6.164407] raid6: using ssse3x2 recovery algorithm
[    6.165710] xor: automatically using best checksumming function   avx       
[    6.176283] Btrfs loaded, crc32c=crc32c-intel
[    6.545562] EXT4-fs (dm-1): mounted filesystem with ordered data mode. Opts: (null)
[    7.481404] systemd[1]: Inserted module 'autofs4'
[    7.608214] systemd[1]: systemd 241 running in system mode. (+PAM +AUDIT +SELINUX +IMA +APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD -IDN2 +IDN -PCRE2 default-hierarchy=hybrid)
[    7.628595] systemd[1]: Detected architecture x86-64.
[    7.657698] systemd[1]: Set hostname to <genespx4>.
[    8.906777] systemd[1]: Listening on RPCbind Server Activation Socket.
[    8.907043] systemd[1]: Listening on initctl Compatibility Named Pipe.
[    8.907222] systemd[1]: Listening on udev Kernel Socket.
[    8.907520] systemd[1]: Listening on Journal Audit Socket.
[    8.908806] systemd[1]: Created slice User and Session Slice.
[    8.908964] systemd[1]: Started Forward Password Requests to Wall Directory Watch.
[    8.909222] systemd[1]: Listening on Journal Socket.
[    8.917281] random: crng init done
[    8.917284] random: 7 urandom warning(s) missed due to ratelimiting
[    8.967074] EXT4-fs (dm-1): re-mounted. Opts: errors=remount-ro
[    9.040927] RPC: Registered named UNIX socket transport module.
[    9.040930] RPC: Registered udp transport module.
[    9.040932] RPC: Registered tcp transport module.
[    9.040933] RPC: Registered tcp NFSv4.1 backchannel transport module.
[    9.049704] Loading iSCSI transport class v2.0-870.
[    9.092207] iscsi: registered transport (tcp)
[    9.220138] iscsi: registered transport (iser)
[    9.293994] systemd-journald[584]: Received request to flush runtime journal from PID 1
[    9.358658] spl: loading out-of-tree module taints kernel.
[    9.382185] znvpair: module license 'CDDL' taints kernel.
[    9.382187] Disabling lock debugging due to kernel taint
[    9.529193] The 'zfs_vdev_scheduler' module option is not supported.
[    9.543739] power_meter ACPI000D:00: Found ACPI power meter.
[    9.543801] power_meter ACPI000D:00: Ignoring unsafe software power cap!
[    9.543809] power_meter ACPI000D:00: hwmon_device_register() is deprecated. Please convert the driver to use hwmon_device_register_with_info().
[    9.552730] IPMI message handler: version 39.2
[    9.561731] ipmi device interface
[    9.570971] ipmi_si: IPMI System Interface driver
[    9.571023] ipmi_si dmi-ipmi-si.0: ipmi_platform: probing via SMBIOS
[    9.571029] ipmi_platform: ipmi_si: SMBIOS: io 0xca2 regsize 1 spacing 1 irq 0
[    9.571031] ipmi_si: Adding SMBIOS-specified kcs state machine
[    9.571162] ipmi_si IPI0001:00: ipmi_platform: probing via ACPI
[    9.571260] ipmi_si IPI0001:00: ipmi_platform: [io  0x0ca2-0x0ca3] regsize 1 spacing 1 irq 0
[    9.571263] ipmi_si dmi-ipmi-si.0: Removing SMBIOS-specified kcs state machine in favor of ACPI
[    9.571265] ipmi_si: Adding ACPI-specified kcs state machine
[    9.571431] ipmi_si: Trying ACPI-specified kcs state machine at i/o address 0xca2, slave address 0x20, irq 0
[    9.590817] dca service started, version 1.12.1
[    9.733404] ZFS: Loaded module v2.0.3-pve1, ZFS pool version 5000, ZFS filesystem version 5
[    9.737816] input: PC Speaker as /devices/platform/pcspkr/input/input4
[    9.741049] ioatdma: Intel(R) QuickData Technology Driver 5.00
[    9.742004] RAPL PMU: API unit is 2^-32 Joules, 2 fixed counters, 163840 ms ovfl timer
[    9.742007] RAPL PMU: hw unit of domain pp0-core 2^-16 Joules
[    9.742009] RAPL PMU: hw unit of domain package 2^-16 Joules
[    9.787447] cryptd: max_cpu_qlen set to 1000
[    9.862597] AVX version of gcm_enc/dec engaged.
[    9.862600] AES CTR mode by8 optimization enabled
[    9.960636] mgag200 0000:01:00.1: remove_conflicting_pci_framebuffers: bar 0: 0xf5000000 -> 0xf5ffffff
[    9.960641] mgag200 0000:01:00.1: remove_conflicting_pci_framebuffers: bar 1: 0xf7de0000 -> 0xf7de3fff
[    9.960643] mgag200 0000:01:00.1: remove_conflicting_pci_framebuffers: bar 2: 0xf7000000 -> 0xf77fffff
[    9.960647] mgag200 0000:01:00.1: vgaarb: deactivate vga console
[    9.962024] Console: switching to colour dummy device 80x25
[    9.971034] [TTM] Zone  kernel: Available graphics memory: 49474464 KiB
[    9.971037] [TTM] Zone   dma32: Available graphics memory: 2097152 KiB
[    9.971038] [TTM] Initializing pool allocator
[    9.971046] [TTM] Initializing DMA pool allocator
[   10.010629] fbcon: mgag200drmfb (fb0) is primary device
[   10.052804] ipmi_si IPI0001:00: IPMI message handler: Found new BMC (man_id: 0x00000b, prod_id: 0x2000, dev_id: 0x13)
[   10.101066] ipmi_si IPI0001:00: IPMI kcs interface initialized
[   10.102623] ipmi_ssif: IPMI SSIF Interface driver
[   10.122845] EDAC sbridge: Seeking for: PCI ID 8086:3ca0
[   10.122890] EDAC sbridge: Seeking for: PCI ID 8086:3ca0
[   10.122908] EDAC sbridge: Seeking for: PCI ID 8086:3ca0
[   10.122916] EDAC sbridge: Seeking for: PCI ID 8086:3ca8
[   10.122933] EDAC sbridge: Seeking for: PCI ID 8086:3ca8
[   10.122944] EDAC sbridge: Seeking for: PCI ID 8086:3ca8
[   10.122949] EDAC sbridge: Seeking for: PCI ID 8086:3c71
[   10.122966] EDAC sbridge: Seeking for: PCI ID 8086:3c71
[   10.122977] EDAC sbridge: Seeking for: PCI ID 8086:3c71
[   10.122981] EDAC sbridge: Seeking for: PCI ID 8086:3caa
[   10.122999] EDAC sbridge: Seeking for: PCI ID 8086:3caa
[   10.123010] EDAC sbridge: Seeking for: PCI ID 8086:3caa
[   10.123014] EDAC sbridge: Seeking for: PCI ID 8086:3cab
[   10.123032] EDAC sbridge: Seeking for: PCI ID 8086:3cab
[   10.123043] EDAC sbridge: Seeking for: PCI ID 8086:3cab
[   10.123047] EDAC sbridge: Seeking for: PCI ID 8086:3cac
[   10.123065] EDAC sbridge: Seeking for: PCI ID 8086:3cac
[   10.123077] EDAC sbridge: Seeking for: PCI ID 8086:3cac
[   10.123080] EDAC sbridge: Seeking for: PCI ID 8086:3cad
[   10.123098] EDAC sbridge: Seeking for: PCI ID 8086:3cad
[   10.123109] EDAC sbridge: Seeking for: PCI ID 8086:3cad
[   10.123113] EDAC sbridge: Seeking for: PCI ID 8086:3cb8
[   10.123132] EDAC sbridge: Seeking for: PCI ID 8086:3cb8
[   10.123143] EDAC sbridge: Seeking for: PCI ID 8086:3cb8
[   10.123145] EDAC sbridge: Seeking for: PCI ID 8086:3cf4
[   10.123177] EDAC sbridge: Seeking for: PCI ID 8086:3cf4
[   10.123188] EDAC sbridge: Seeking for: PCI ID 8086:3cf4
[   10.123194] EDAC sbridge: Seeking for: PCI ID 8086:3cf6
[   10.123210] EDAC sbridge: Seeking for: PCI ID 8086:3cf6
[   10.123221] EDAC sbridge: Seeking for: PCI ID 8086:3cf6
[   10.123226] EDAC sbridge: Seeking for: PCI ID 8086:3cf5
[   10.123243] EDAC sbridge: Seeking for: PCI ID 8086:3cf5
[   10.123254] EDAC sbridge: Seeking for: PCI ID 8086:3cf5
[   10.123462] EDAC MC0: Giving out device to module sb_edac controller Sandy Bridge SrcID#0_Ha#0: DEV 0000:1f:0e.0 (INTERRUPT)
[   10.123716] EDAC MC1: Giving out device to module sb_edac controller Sandy Bridge SrcID#1_Ha#0: DEV 0000:3f:0e.0 (INTERRUPT)
[   10.123718] EDAC sbridge:  Ver: 1.1.2 
[   10.132458] Console: switching to colour frame buffer device 128x48
[   10.135123] mgag200 0000:01:00.1: fb0: mgag200drmfb frame buffer device
[   10.164419] intel_rapl_common: Found RAPL domain package
[   10.164422] intel_rapl_common: Found RAPL domain core
[   10.165235] intel_rapl_common: Found RAPL domain package
[   10.165239] intel_rapl_common: Found RAPL domain core
[   10.176498] Adding 8388604k swap on /dev/mapper/pve-swap.  Priority:-2 extents:1 across:8388604k FS
[   10.200497] [drm] Initialized mgag200 1.0.0 20110418 for 0000:01:00.1 on minor 0
[   11.720733]  zd0: p1
[   11.727259]  zd16: p1 p2
[   11.740422]  zd32: p1
[   11.749599]  zd48: p1 p2 < p5 >
[   11.757833]  zd64: p1 p2
[   11.768172]  zd80: p1
[   11.775377]  zd96: p1
[   12.428204] audit: type=1400 audit(1615176213.150:2): apparmor="STATUS" operation="profile_load" profile="unconfined" name="/usr/bin/lxc-start" pid=2228 comm="apparmor_parser"
[   12.432769] audit: type=1400 audit(1615176213.158:3): apparmor="STATUS" operation="profile_load" profile="unconfined" name="nvidia_modprobe" pid=2227 comm="apparmor_parser"
[   12.432777] audit: type=1400 audit(1615176213.158:4): apparmor="STATUS" operation="profile_load" profile="unconfined" name="nvidia_modprobe//kmod" pid=2227 comm="apparmor_parser"
[   12.439389] audit: type=1400 audit(1615176213.162:5): apparmor="STATUS" operation="profile_load" profile="unconfined" name="/usr/bin/man" pid=2229 comm="apparmor_parser"
[   12.439398] audit: type=1400 audit(1615176213.162:6): apparmor="STATUS" operation="profile_load" profile="unconfined" name="man_filter" pid=2229 comm="apparmor_parser"
[   12.439403] audit: type=1400 audit(1615176213.162:7): apparmor="STATUS" operation="profile_load" profile="unconfined" name="man_groff" pid=2229 comm="apparmor_parser"
[   12.467348] audit: type=1400 audit(1615176213.190:8): apparmor="STATUS" operation="profile_load" profile="unconfined" name="/usr/sbin/tcpdump" pid=2225 comm="apparmor_parser"
[   12.516549] audit: type=1400 audit(1615176213.242:9): apparmor="STATUS" operation="profile_load" profile="unconfined" name="lxc-container-default" pid=2226 comm="apparmor_parser"
[   12.516557] audit: type=1400 audit(1615176213.242:10): apparmor="STATUS" operation="profile_load" profile="unconfined" name="lxc-container-default-cgns" pid=2226 comm="apparmor_parser"
[   12.516562] audit: type=1400 audit(1615176213.242:11): apparmor="STATUS" operation="profile_load" profile="unconfined" name="lxc-container-default-with-mounting" pid=2226 comm="apparmor_parser"
[   12.613215] softdog: initialized. soft_noboot=0 soft_margin=60 sec soft_panic=0 (nowayout=0)
[   12.635099] new mount options do not match the existing superblock, will be ignored
[   13.618302] openvswitch: Open vSwitch switching datapath
[   14.913295] device ovs-system entered promiscuous mode
[   15.129358] device vmbr0 entered promiscuous mode
[   16.058061] device eno1 entered promiscuous mode
[   16.058847] device eno3 entered promiscuous mode
[   16.067623] device bond0 entered promiscuous mode
[   16.650207] bpfilter: Loaded bpfilter_umh pid 2671
[   16.650744] Started bpfilter
[   18.445753] sctp: Hash tables configured (bind 2048/2048)
[   19.674757] tg3 0000:02:00.0 eno1: Link is up at 1000 Mbps, full duplex
[   19.674780] tg3 0000:02:00.0 eno1: Flow control is off for TX and off for RX
[   19.674784] tg3 0000:02:00.0 eno1: EEE is disabled
[   19.674820] IPv6: ADDRCONF(NETDEV_CHANGE): eno1: link becomes ready
[   19.811434] tg3 0000:02:00.2 eno3: Link is up at 1000 Mbps, full duplex
[   19.811438] tg3 0000:02:00.2 eno3: Flow control is off for TX and off for RX
[   19.811440] tg3 0000:02:00.2 eno3: EEE is disabled
[   19.811467] IPv6: ADDRCONF(NETDEV_CHANGE): eno3: link becomes ready
[   27.559490] L1TF CPU bug present and SMT on, data leak possible. See CVE-2018-3646 and https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/l1tf.html for details.
[   28.468137] device tap140i0 entered promiscuous mode
[   31.051983] FS-Cache: Loaded
[   31.108950] FS-Cache: Netfs 'nfs' registered for caching
[   31.372960] NFS: Registering the id_resolver key type
[   31.372972] Key type id_resolver registered
[   31.372973] Key type id_legacy registered
[   33.692598] device tap142i0 entered promiscuous mode
[12655.687538] perf: interrupt took too long (2518 > 2500), lowering kernel.perf_event_max_sample_rate to 79250
[13784.871475] perf: interrupt took too long (3155 > 3147), lowering kernel.perf_event_max_sample_rate to 63250
[15173.053684] perf: interrupt took too long (3948 > 3943), lowering kernel.perf_event_max_sample_rate to 50500
[17057.982819] perf: interrupt took too long (4953 > 4935), lowering kernel.perf_event_max_sample_rate to 40250
[20451.371027] perf: interrupt took too long (6193 > 6191), lowering kernel.perf_event_max_sample_rate to 32250
[30582.312588] perf: interrupt took too long (7750 > 7741), lowering kernel.perf_event_max_sample_rate to 25750

Regards.
 
Nothing stands out :/
However if the systems were running without a single reboot for 600 days - maybe the disk did not like the reboot?
 
Nothing stands out :/
However if the systems were running without a single reboot for 600 days - maybe the disk did not like the reboot?

Maybe.. idk

Then what can i do? Replace sdb?

For now i´m doing a schelude reboot all days at 5:00AM and it works wells, without this the server fails after two and a half days.


It gets freezed.


Regards.
 
is it always after 2.5 days? and is the stacktrace printed always exaclty the same?
Yes, more or less 2.5 days or 3 days.

Always print the same syslog at comment #1



I do a lot of stressing tools on vms like CrystalDiskMark and never crash, can´t reproduce it.


Regards.
 
New logs

It crashed at this moment, so.. the daily reset doesn't work.

Code:
Mar 16 13:08:59 genespx4 kernel: [29159.379210] mce: CPU13: Core temperature above threshold, cpu clock throttled (total events = 1)
Mar 16 13:08:59 genespx4 kernel: [29159.379211] mce: CPU29: Core temperature above threshold, cpu clock throttled (total events = 1)
Mar 16 13:08:59 genespx4 kernel: [29159.379214] mce: CPU8: Package temperature above threshold, cpu clock throttled (total events = 1)
Mar 16 13:08:59 genespx4 kernel: [29159.379215] mce: CPU11: Package temperature above threshold, cpu clock throttled (total events = 1)
Mar 16 13:08:59 genespx4 kernel: [29159.379217] mce: CPU10: Package temperature above threshold, cpu clock throttled (total events = 1)
Mar 16 13:08:59 genespx4 kernel: [29159.379219] mce: CPU14: Package temperature above threshold, cpu clock throttled (total events = 1)
Mar 16 13:08:59 genespx4 kernel: [29159.379220] mce: CPU30: Package temperature above threshold, cpu clock throttled (total events = 1)
Mar 16 13:08:59 genespx4 kernel: [29159.379223] mce: CPU25: Package temperature above threshold, cpu clock throttled (total events = 1)
Mar 16 13:08:59 genespx4 kernel: [29159.379224] mce: CPU9: Package temperature above threshold, cpu clock throttled (total events = 1)
Mar 16 13:08:59 genespx4 kernel: [29159.379225] mce: CPU24: Package temperature above threshold, cpu clock throttled (total events = 1)
Mar 16 13:08:59 genespx4 kernel: [29159.379227] mce: CPU12: Package temperature above threshold, cpu clock throttled (total events = 1)
Mar 16 13:08:59 genespx4 kernel: [29159.379228] mce: CPU28: Package temperature above threshold, cpu clock throttled (total events = 1)
Mar 16 13:08:59 genespx4 kernel: [29159.379229] mce: CPU26: Package temperature above threshold, cpu clock throttled (total events = 1)
Mar 16 13:08:59 genespx4 kernel: [29159.379231] mce: CPU31: Package temperature above threshold, cpu clock throttled (total events = 1)
Mar 16 13:08:59 genespx4 kernel: [29159.379232] mce: CPU15: Package temperature above threshold, cpu clock throttled (total events = 1)
Mar 16 13:08:59 genespx4 kernel: [29159.379233] mce: CPU27: Package temperature above threshold, cpu clock throttled (total events = 1)
Mar 16 13:08:59 genespx4 kernel: [29159.379234] mce: CPU29: Package temperature above threshold, cpu clock throttled (total events = 1)
Mar 16 13:08:59 genespx4 kernel: [29159.379236] mce: CPU13: Package temperature above threshold, cpu clock throttled (total events = 1)
Mar 16 13:08:59 genespx4 kernel: [29159.380171] mce: CPU13: Core temperature/speed normal
Mar 16 13:08:59 genespx4 kernel: [29159.380179] mce: CPU29: Core temperature/speed normal
Mar 16 13:08:59 genespx4 kernel: [29159.380180] mce: CPU27: Package temperature/speed normal
Mar 16 13:08:59 genespx4 kernel: [29159.380182] mce: CPU25: Package temperature/speed normal
Mar 16 13:08:59 genespx4 kernel: [29159.380183] mce: CPU24: Package temperature/speed normal
Mar 16 13:08:59 genespx4 kernel: [29159.380184] mce: CPU9: Package temperature/speed normal
Mar 16 13:08:59 genespx4 kernel: [29159.380185] mce: CPU12: Package temperature/speed normal
Mar 16 13:08:59 genespx4 kernel: [29159.380185] mce: CPU28: Package temperature/speed normal
Mar 16 13:08:59 genespx4 kernel: [29159.380187] mce: CPU30: Package temperature/speed normal
Mar 16 13:08:59 genespx4 kernel: [29159.380187] mce: CPU8: Package temperature/speed normal
Mar 16 13:08:59 genespx4 kernel: [29159.380188] mce: CPU26: Package temperature/speed normal
Mar 16 13:08:59 genespx4 kernel: [29159.380189] mce: CPU11: Package temperature/speed normal
Mar 16 13:08:59 genespx4 kernel: [29159.380190] mce: CPU14: Package temperature/speed normal
Mar 16 13:08:59 genespx4 kernel: [29159.380191] mce: CPU15: Package temperature/speed normal
Mar 16 13:08:59 genespx4 kernel: [29159.380191] mce: CPU10: Package temperature/speed normal
Mar 16 13:08:59 genespx4 kernel: [29159.380192] mce: CPU31: Package temperature/speed normal
Mar 16 13:08:59 genespx4 kernel: [29159.380193] mce: CPU29: Package temperature/speed normal
Mar 16 13:08:59 genespx4 kernel: [29159.380193] mce: CPU13: Package temperature/speed normal

Code:
Mar 16 13:08:01 genespx4 zed: eid=478 class=deadman pool='rpool' vdev=sdb1 size=40960 offset=786817560576 priority=3 err=0 flags=0x40080c80 delay=28752527ms
Mar 16 13:08:01 genespx4 zed: eid=479 class=deadman pool='rpool' vdev=sdb1 size=8192 offset=786817568768 priority=3 err=0 flags=0x380880 bookmark=292:1:0:2006197
Mar 16 13:08:02 genespx4 zed: eid=480 class=deadman pool='rpool' vdev=sdb1 size=40960 offset=786817560576 priority=3 err=0 flags=0x40080c80 delay=28752527ms
Mar 16 13:08:02 genespx4 zed: eid=481 class=deadman pool='rpool' vdev=sdb1 size=8192 offset=786817560576 priority=3 err=0 flags=0x380880 bookmark=292:1:0:2006196
Mar 16 13:08:02 genespx4 zed: eid=482 class=deadman pool='rpool' vdev=sdb1 size=40960 offset=786817560576 priority=3 err=0 flags=0x40080c80 delay=28752527ms
Mar 16 13:08:02 genespx4 zed: eid=483 class=deadman pool='rpool' vdev=sdb1 size=8192 offset=786817585152 priority=3 err=0 flags=0x380880 bookmark=292:1:0:2006327
Mar 16 13:08:03 genespx4 zed: eid=484 class=deadman pool='rpool' vdev=sdb1 size=40960 offset=786817560576 priority=3 err=0 flags=0x40080c80 delay=28752527ms
Mar 16 13:08:03 genespx4 zed: eid=485 class=deadman pool='rpool' vdev=sdb1 size=8192 offset=786817576960 priority=3 err=0 flags=0x380880 bookmark=292:1:0:2006326
Mar 16 13:08:03 genespx4 zed: eid=486 class=deadman pool='rpool' vdev=sdb1 size=40960 offset=786817560576 priority=3 err=0 flags=0x40080c80 delay=28752527ms

The first logs i posted still appearing

Regards.