Are these kernel panics? Now what?

proksmoks · Nov 29, 2021

Hi, for a while I'm having trouble with my Proxmox setup. It's probably not Proxmox related and my hardware is bit exotic, but maybe someone can point me in to the right direction.

After a few days, sometimes weeks, I get what I think are kernel panics (??), dmesg gives:

Code:

[844132.205609] Call Trace:
[844132.205614]  __schedule+0x2e6/0x6f0
[844132.205616]  schedule+0x33/0xa0
[844132.205621]  spl_panic+0xf9/0xfb [spl]
[844132.205625]  ? spl_kmem_cache_alloc+0x7c/0x770 [spl]
[844132.205628]  ? spl_kmem_cache_alloc+0x14d/0x770 [spl]
[844132.205630]  ? __wake_up_common_lock+0x8c/0xc0
[844132.205677]  ? zio_taskq_member.isra.14.constprop.20+0x70/0x70 [zfs]
[844132.205701]  arc_buf_type.isra.23+0x4e/0x50 [zfs]
[844132.205745]  arc_change_state.isra.29+0x27/0x480 [zfs]
[844132.205765]  arc_freed+0xa7/0xc0 [zfs]
[844132.205816]  zio_free_sync+0x52/0x100 [zfs]
[844132.205849]  spa_free_sync_cb+0x3b/0x50 [zfs]
[844132.205881]  ? spa_avz_build+0xf0/0xf0 [zfs]
[844132.205901]  bplist_iterate+0xd1/0x140 [zfs]
[844132.205949]  spa_sync+0x5c9/0xff0 [zfs]
[844132.205951]  ? mutex_lock+0x12/0x30
[844132.205998]  ? spa_txg_history_init_io+0x104/0x110 [zfs]
[844132.206032]  txg_sync_thread+0x2e1/0x4a0 [zfs]
[844132.206066]  ? txg_thread_exit.isra.13+0x60/0x60 [zfs]
[844132.206070]  thread_generic_wrapper+0x74/0x90 [spl]
[844132.206072]  kthread+0x120/0x140
[844132.206075]  ? __thread_exit+0x20/0x20 [spl]
[844132.206076]  ? kthread_park+0x90/0x90
[844132.206078]  ret_from_fork+0x35/0x40

There are usually more messages like this but I don't know what I am looking at. I'm left with a weirdly unresponsive pve. Most of the 10+ containers keep functioning normally. But also most won't respond to a shutdown -h command. Killing stuff with kill -9 $pid doesn't work. The node won't reboot or shutdown either and I have to resort to a power cycle.

I've let memtest86 run its course over a night but zero errors. The hardware is a Fujitsu 3643 board and i5-8400T cpu (an engineering sample!) that should give me an very low power setup.

Any hints very much appreciated.

dylanw · Nov 30, 2021

Hi,

The actual error message should be a couple of lines above this (maybe starting with "INFO"). This only shows the call trace which led to the issue. Could you post the full message and the output of pveversion -v?

proksmoks · Nov 30, 2021

This particular message wasn't written to the logs (due to the problems?). I grep-ed in /var/log/* but the last message like this was:

Code:

Nov 29 09:37:00 pve systemd[1]: Starting Proxmox VE replication runner...
Nov 29 09:37:00 pve systemd[1]: pvesr.service: Succeeded.
Nov 29 09:37:00 pve systemd[1]: Started Proxmox VE replication runner.
Nov 29 09:37:11 pve kernel: [844011.373667] INFO: task txg_sync:1033 blocked for more than 1087 seconds.
Nov 29 09:37:11 pve kernel: [844011.373672]       Tainted: P           O      5.4.128-1-pve #1
Nov 29 09:37:11 pve kernel: [844011.373673] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Nov 29 09:37:11 pve kernel: [844011.373675] txg_sync        D    0  1033      2 0x80004000
Nov 29 09:37:11 pve kernel: [844011.373677] Call Trace:
Nov 29 09:37:11 pve kernel: [844011.373681]  __schedule+0x2e6/0x6f0
Nov 29 09:37:11 pve kernel: [844011.373684]  schedule+0x33/0xa0
Nov 29 09:37:11 pve kernel: [844011.373689]  spl_panic+0xf9/0xfb [spl]
Nov 29 09:37:11 pve kernel: [844011.373692]  ? spl_kmem_cache_alloc+0x7c/0x770 [spl]
Nov 29 09:37:11 pve kernel: [844011.373695]  ? spl_kmem_cache_alloc+0x14d/0x770 [spl]
Nov 29 09:37:11 pve kernel: [844011.373697]  ? __wake_up_common_lock+0x8c/0xc0
Nov 29 09:37:11 pve kernel: [844011.373742]  ? zio_taskq_member.isra.14.constprop.20+0x70/0x70 [zfs]
Nov 29 09:37:11 pve kernel: [844011.373762]  arc_buf_type.isra.23+0x4e/0x50 [zfs]
Nov 29 09:37:11 pve kernel: [844011.373782]  arc_change_state.isra.29+0x27/0x480 [zfs]
Nov 29 09:37:11 pve kernel: [844011.373803]  arc_freed+0xa7/0xc0 [zfs]
Nov 29 09:37:11 pve kernel: [844011.373841]  zio_free_sync+0x52/0x100 [zfs]
Nov 29 09:37:11 pve kernel: [844011.373933]  ? spa_avz_build+0xf0/0xf0 [zfs]
Nov 29 09:37:11 pve kernel: [844011.373957]  bplist_iterate+0xd1/0x140 [zfs]
Nov 29 09:37:11 pve kernel: [844011.374020]  spa_sync+0x5c9/0xff0 [zfs]
Nov 29 09:37:11 pve kernel: [844011.374055]  ? spa_txg_history_init_io+0x104/0x110 [zfs]
Nov 29 09:37:11 pve kernel: [844011.374123]  ? txg_thread_exit.isra.13+0x60/0x60 [zfs]
Nov 29 09:37:11 pve kernel: [844011.374128]  kthread+0x120/0x140
Nov 29 09:37:11 pve kernel: [844011.374132]  ? kthread_park+0x90/0x90
Nov 29 09:38:00 pve systemd[1]: Starting Proxmox VE replication runner...
Nov 29 09:38:00 pve systemd[1]: pvesr.service: Succeeded.

There are some more txg_sync:1033 blocked for more than X seconds before this one.

pveversion:

Code:

root@pve:~# pveversion -v
proxmox-ve: 6.4-1 (running kernel: 5.4.128-1-pve)
pve-manager: 6.4-13 (running version: 6.4-13/9f411e79)
pve-kernel-5.4: 6.4-5
pve-kernel-helper: 6.4-5
pve-kernel-5.4.128-1-pve: 5.4.128-2
pve-kernel-5.4.114-1-pve: 5.4.114-1
pve-kernel-5.4.106-1-pve: 5.4.106-1
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.1.2-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: 0.8.35+pve1
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.20-pve1
libproxmox-acme-perl: 1.1.0
libproxmox-backup-qemu0: 1.1.0-1
libpve-access-control: 6.4-3
libpve-apiclient-perl: 3.1-3
libpve-common-perl: 6.4-3
libpve-guest-common-perl: 3.1-5
libpve-http-server-perl: 3.2-3
libpve-storage-perl: 6.4-1
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.6-2
lxcfs: 4.0.6-pve1
novnc-pve: 1.1.0-1
proxmox-backup-client: 1.1.13-2
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.6-1
pve-cluster: 6.4-1
pve-container: 3.3-6
pve-docs: 6.4-2
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-4
pve-firmware: 3.2-4
pve-ha-manager: 3.1-1
pve-i18n: 2.3-1
pve-qemu-kvm: 5.2.0-6
pve-xtermjs: 4.7.0-3
qemu-server: 6.4-2
smartmontools: 7.2-pve2
spiceterm: 3.1-1
vncterm: 1.6-2
zfsutils-linux: 2.0.5-pve1~bpo10+1

I notice txg_sync points to ZFS. I run the pool on two 2.5 inch drives (again, to keep power usage down). They're in a ZFS mirror and 1 of them is connected on USB3.

I also notice apt update & apt --list upgradable lists quite some packages.

LnxBil · Dec 1, 2021

proksmoks said:
I notice txg_sync points to ZFS. I run the pool on two 2.5 inch drives (again, to keep power usage down). They're in a ZFS mirror and 1 of them is connected on USB3.

This could lead to the timeouts you saw. Please post zpool status -v (with CODE tags)

proksmoks · Dec 1, 2021

Code:

root@pve:~# zpool status -v
  pool: vijver
 state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
        attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
        using 'zpool clear' or replace the device with 'zpool replace'.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-9P
  scan: resilvered 2.54T in 09:39:43 with 0 errors on Tue Nov 16 03:25:03 2021
config:

        NAME        STATE     READ WRITE CKSUM
        vijver      ONLINE       0     0     0
          mirror-0  ONLINE       0     0     0
            sdb     ONLINE       0     0     0
            sdc1    ONLINE       0     0     1

errors: No known data errors

Please note that the reservering followed when I replaced a failed 2.5" drive. I'm not sure what the CKSUM column means but I'm worried.

zpool history shows nothing special but zpool events tells this:

Code:

Dec  1 2021 01:01:37.483128881 ereport.fs.zfs.checksum

Meanwhile I'm clinging to the No known data errors statement and cherish the backups.

LnxBil · Dec 2, 2021

That output shows what I expected. You get rid of the cksum error as described: clear and do a manual scrub in order to get a current integrity test.

While scrubbing, monitor the disk i/O usage with dstat/iostat or any other monitoring tool, especially the i/o times. I would suspect that the USB device is slower, maybe much slower.

proksmoks · Dec 2, 2021

LnxBil said:
That output shows what I expected. You get rid of the cksum error as described: clear and do a manual scrub in order to get a current integrity test.

While scrubbing, monitor the disk i/O usage with dstat/iostat or any other monitoring tool, especially the i/o times. I would suspect that the USB device is slower, maybe much slower.

Allright, I will let know what happens.

proksmoks · Dec 3, 2021

Well, good morning, the scrub finished:

Code:

Every 10.0s: zpool status -vv                                                                                                               pve: Fri Dec  3 09:53:08 2021
  pool: vijver
 state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
        attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
        using 'zpool clear' or replace the device with 'zpool replace'.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-9P
  scan: scrub repaired 112K in 14:35:50 with 0 errors on Fri Dec  3 02:53:21 2021

config:
        NAME        STATE     READ WRITE CKSUM
        vijver      ONLINE       0     0     0
          mirror-0  ONLINE       0     0     0
            sdb     ONLINE       0     0     0
            sdc1    ONLINE       0     0     7

errors: No known data errors

The checksum error seems to be generated at 1 AM every day. The scrub seems to have added a few. I cannot find anything in the logs, I looked at CRON, usb driver messages around 1 AM, and other activities but I didn't see anything.

I checked the Solaris ZFS Administration Guide, particularly the Determining the Type of Device Failure part, and I think low numbers in the CKSUM column, especially when they're not accompanied by driver messages (i.e. USB in my case) are not too worrisome. Still, the guide is for Solaris, not zfs-linux.

I have a lot of apt updates to do on my node, I think that will be next on my list.

LnxBil · Dec 3, 2021

proksmoks said:
I think low numbers in the CKSUM column

I like systems without errors and that is not normal to have those.

proksmoks said:
The checksum error seems to be generated at 1 AM every day. The scrub seems to have added a few. I cannot find anything in the logs, I looked at CRON, usb driver messages around 1 AM, and other activities but I didn't see anything.

If the I/O is just slow, it will not in any log. There are a lot of errors that are not detected by drivers and stuff. Most silent data corruption errors are especially not detectable by the driver, therfore ZFS checks specifically for that by reading the data and comparing the checksums and if it does not match, you have a problem. You should not have read errors.

As already stated, I think that the problem is in the I/O path. I would never run a pool with internal and external devices. External ones are more likely to not work properly, what you can see in your example. I recommend moving to a all-internal setup.

proksmoks · Dec 3, 2021

LnxBil said:
I like systems without errors and that is not normal to have those.

If the I/O is just slow, it will not in any log. There are a lot of errors that are not detected by drivers and stuff. Most silent data corruption errors are especially not detectable by the driver, therfore ZFS checks specifically for that by reading the data and comparing the checksums and if it does not match, you have a problem. You should not have read errors.

As already stated, I think that the problem is in the I/O path. I would never run a pool with internal and external devices. External ones are more likely to not work properly, what you can see in your example. I recommend moving to a all-internal setup.

Yeah, I hear you. In a quest to bring down the power consumption I upgraded (?) to 2.5inch disks.

(In case anyone wonders what I'm doing, since I couldn't find a replacement 5TB 2.5inch disk I bought an WD elements external disk. These have a USB interface directly on the PCB; there's not an enclosure that contains a Sata -> USB3 adapter. So unless I start swapping PCBs I can only connect this drive through USB.)

LnxBil · Dec 4, 2021

proksmoks said:
(In case anyone wonders what I'm doing, since I couldn't find a replacement 5TB 2.5inch disk I bought an WD elements external disk. These have a USB interface directly on the PCB; there's not an enclosure that contains a Sata -> USB3 adapter. So unless I start swapping PCBs I can only connect this drive through USB.)

Oh, I was not aware that direct USB is at thing.

proksmoks · Dec 4, 2021

As 2.5" hdd external drives where some 25% cheaper than exactly the same hdd without Sata > USB adapter, enclosure and cable, this was, for a short time, a source of ~~cheap~~ not too expensive 2.5" harddisks - if you were willing to forego the warranty.

Here's a (complete) teardown on youtube of a WD drive.

https://youtu.be/wP4l_L81NKw

I realize the quest for lower power consumption has gone too far and have my sights set on a 3.5" NAS hdd as replacement.

Still, the 2.5" drives are quieter, too.

whataboutpereira · Dec 4, 2021

proksmoks said:
As 2.5" hdd external drives where some 25% cheaper than exactly the same hdd without Sata > USB adapter, enclosure and cable, this was, for a short time, a source of ~~cheap~~ not too expensive 2.5" harddisks - if you were willing to forego the warranty.

Here's a (complete) teardown on youtube of a WD drive.

https://youtu.be/wP4l_L81NKw

I realize the quest for lower power consumption has gone too far and have my sights set on a 3.5" NAS hdd as replacement.

Still, the 2.5" drives are quieter, too.

There are 5TB 2.5'' Seagate Barracuda Compute drives. However the 3-5TB 2.5'' drives are 15mm high and don't fit everywhere.

proksmoks · Dec 4, 2021

whataboutpereira said:
There are 5TB 2.5'' Seagate Barracuda Compute drives. However the 3-5TB 2.5'' drives are 15mm high and don't fit everywhere.

Behold, and be awe struck by my well laid out setup.

I have no problem with 15mm. Or 16mm.

Right now a Seagate 5TB disk, the ST5000LM000 is sold locally for 105 euros ($119) in external form and for 125 euros ($141) as a naked disk.

LnxBil · Dec 6, 2021

proksmoks said:
Behold, and be awe struck by my well laid out setup.

View attachment 32044

I have no problem with 15mm. Or 16mm.

Right now a Seagate 5TB disk, the ST5000LM000 is sold locally for 105 euros ($119) in external form and for 125 euros ($141) as a naked disk.

Nice setup. Some kind of PicoPSU with an "unenclosed brick psu" in the lower left corner?

proksmoks · Dec 6, 2021

PicoPSU - and a PicoUPS, with an unenclosed motorcycle battery on the floor.

The "unenclosed PSU" is actually the UPS, with a brick psu and a battery hooked up to it. It works a treat.

Search

Search

Are these kernel panics? Now what?

proksmoks

Member

dylanw

Proxmox Retired Staff

proksmoks

Member

LnxBil

Distinguished Member

proksmoks

Member

LnxBil

Distinguished Member

proksmoks

Member

proksmoks

Member

LnxBil

Distinguished Member

proksmoks

Member

LnxBil

Distinguished Member

proksmoks

Member

whataboutpereira

Member

proksmoks

Member

LnxBil

Distinguished Member

proksmoks

Member