Hi,
I have a single node with proxmox 5.4-13, and tonight it stops to work. I had to hard reboot the node...
I have 3 zfs pool (one for proxmox in raid 1, one for my HDD disks in raidz2 and one for my SSD disks in raidz2) and all the pools are online and scrub is ok.
pve version is:
Tonight, and this is the second time in 20 days, all VMs (a mix of Linux and Windows) became unresponsive and I cannot stop them. The only thing that I can do is hard reboot the node. In kernel logs, and on the monitor, I have:
I think is a zfs problem, but what happened?
Help me please...
Thak you very much
I have a single node with proxmox 5.4-13, and tonight it stops to work. I had to hard reboot the node...
I have 3 zfs pool (one for proxmox in raid 1, one for my HDD disks in raidz2 and one for my SSD disks in raidz2) and all the pools are online and scrub is ok.
pve version is:
Code:
proxmox-ve: 5.4-2 (running kernel: 4.15.18-24-pve)
pve-manager: 5.4-13 (running version: 5.4-13/aee6f0ec)
pve-kernel-4.15: 5.4-12
pve-kernel-4.15.18-24-pve: 4.15.18-52
pve-kernel-4.15.18-21-pve: 4.15.18-48
pve-kernel-4.15.18-16-pve: 4.15.18-41
pve-kernel-4.15.18-11-pve: 4.15.18-34
pve-kernel-4.15.18-9-pve: 4.15.18-30
pve-kernel-4.15.18-8-pve: 4.15.18-28
pve-kernel-4.15.18-7-pve: 4.15.18-27
pve-kernel-4.15.17-1-pve: 4.15.17-9
corosync: 2.4.4-pve1
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: 1.2-2
libjs-extjs: 6.0.1-2
libpve-access-control: 5.1-12
libpve-apiclient-perl: 2.0-5
libpve-common-perl: 5.0-56
libpve-guest-common-perl: 2.0-20
libpve-http-server-perl: 2.0-14
libpve-storage-perl: 5.0-44
libqb0: 1.0.3-1~bpo9
lvm2: 2.02.168-pve6
lxc-pve: 3.1.0-7
lxcfs: 3.0.3-pve1
novnc-pve: 1.0.0-3
openvswitch-switch: 2.7.0-3
proxmox-widget-toolkit: 1.0-28
pve-cluster: 5.0-38
pve-container: 2.0-41
pve-docs: 5.4-2
pve-edk2-firmware: 1.20190312-1
pve-firewall: 3.0-22
pve-firmware: 2.0-7
pve-ha-manager: 2.0-9
pve-i18n: 1.1-4
pve-libspice-server1: 0.14.1-2
pve-qemu-kvm: 3.0.1-4
pve-xtermjs: 3.12.0-1
qemu-server: 5.0-54
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3
zfsutils-linux: 0.7.13-pve1~bpo2
Tonight, and this is the second time in 20 days, all VMs (a mix of Linux and Windows) became unresponsive and I cannot stop them. The only thing that I can do is hard reboot the node. In kernel logs, and on the monitor, I have:
Code:
Jan 22 00:32:42 dt-prox1 kernel: [1408979.021318] ------------[ cut here ]------------
Jan 22 00:32:42 dt-prox1 kernel: [1408979.021320] kernel BUG at mm/slub.c:296!
Jan 22 00:32:42 dt-prox1 kernel: [1408979.021350] invalid opcode: 0000 [#1] SMP PTI
Jan 22 00:32:42 dt-prox1 kernel: [1408979.021368] Modules linked in: veth tcp_diag inet_diag ebtable_filter ebtables ip6t_REJECT nf_reject_ipv6 ip6table_filter ip6_tables ipt_REJECT nf_reject_ipv4 xt_physdev xt_tcpudp xt_comment xt_addrtype xt_multiport xt_conntrack xt_s
et xt_mark ip_set_hash_net ip_set iptable_filter softdog openvswitch nsh nf_conntrack_ipv6 nf_nat_ipv6 nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_defrag_ipv6 nf_nat nf_conntrack libcrc32c nfnetlink_log nfnetlink 8021q garp mrp ipmi_ssif intel_rapl skx_edac x86_pkg_t
emp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd glue_helper cryptd intel_cstate snd_pcm intel_rapl_perf snd_timer snd soundcore ast ttm pcspkr drm_kms_helper drm i2c_algo_bit fb_sys_fops syscopyarea
Jan 22 00:32:42 dt-prox1 kernel: [1408979.021627] sysfillrect sysimgblt joydev input_leds lpc_ich shpchp mei_me mei ioatdma wmi ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter acpi_pad mac_hid vhost_net vhost tap ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi sunrpc scsi_transport_iscsi ip_tables x_tables autofs4 ses enclosure zfs(PO) zunicode(PO) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) btrfs xor zstd_compress raid6_pq hid_generic usbmouse usbkbd usbhid hid ixgbe mdio igb(O) dca mpt3sas raid_class scsi_transport_sas i40e ptp pps_core i2c_i801 ahci libahci
Jan 22 00:32:42 dt-prox1 kernel: [1408979.021812] CPU: 17 PID: 49577 Comm: z_wr_int_7 Tainted: P O 4.15.18-21-pve #1
Jan 22 00:32:42 dt-prox1 kernel: [1408979.021841] Hardware name: Supermicro SSG-6029P-E1CR12L/X11DPH-T, BIOS 2.1 06/15/2018
Jan 22 00:32:42 dt-prox1 kernel: [1408979.021872] RIP: 0010:__slab_free+0x1a2/0x330
Jan 22 00:32:42 dt-prox1 kernel: [1408979.021889] RSP: 0018:ffffb8a524cd7a70 EFLAGS: 00010246
Jan 22 00:32:42 dt-prox1 kernel: [1408979.021908] RAX: ffff9484779a6f60 RBX: ffff9484779a6f60 RCX: 00000001002a0029
Jan 22 00:32:42 dt-prox1 kernel: [1408979.021933] RDX: ffff9484779a6f60 RSI: ffffe57cbede6980 RDI: ffff9464bf407600
Jan 22 00:32:42 dt-prox1 kernel: [1408979.021957] RBP: ffffb8a524cd7b10 R08: 0000000000000001 R09: ffffffffc01b9c2c
Jan 22 00:32:42 dt-prox1 kernel: [1408979.021981] R10: ffffb8a524cd7b30 R11: 0000000000000000 R12: ffff9484779a6f60
Jan 22 00:32:42 dt-prox1 kernel: [1408979.022006] R13: ffff9464bf407600 R14: ffffe57cbede6980 R15: ffff9484779a6f60
Jan 22 00:32:42 dt-prox1 kernel: [1408979.022917] FS: 0000000000000000(0000) GS:ffff9484beec0000(0000) knlGS:0000000000000000
Jan 22 00:32:42 dt-prox1 kernel: [1408979.023725] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jan 22 00:32:42 dt-prox1 kernel: [1408979.024512] CR2: 00005643be998248 CR3: 00000006ea20a005 CR4: 00000000007626e0
Jan 22 00:32:42 dt-prox1 kernel: [1408979.025508] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Jan 22 00:32:42 dt-prox1 kernel: [1408979.026512] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Jan 22 00:32:42 dt-prox1 kernel: [1408979.027670] PKRU: 55555554
Jan 22 00:32:42 dt-prox1 kernel: [1408979.028637] Call Trace:
Jan 22 00:32:42 dt-prox1 kernel: [1408979.029555] ? __update_load_avg_blocked_se.isra.36+0xd1/0x150
Jan 22 00:32:42 dt-prox1 kernel: [1408979.030529] ? __mutex_lock.isra.5+0x474/0x500
Jan 22 00:32:42 dt-prox1 kernel: [1408979.031455] ? ttwu_do_wakeup+0x1e/0x140
Jan 22 00:32:42 dt-prox1 kernel: [1408979.032393] kmem_cache_free+0x1af/0x1e0
Jan 22 00:32:42 dt-prox1 kernel: [1408979.033237] ? kmem_cache_free+0x1af/0x1e0
Jan 22 00:32:42 dt-prox1 kernel: [1408979.034098] spl_kmem_cache_free+0x13c/0x1c0 [spl]
Jan 22 00:32:42 dt-prox1 kernel: [1408979.034875] arc_hdr_destroy+0xa7/0x1b0 [zfs]
Jan 22 00:32:42 dt-prox1 kernel: [1408979.035688] arc_freed+0x69/0xc0 [zfs]
Jan 22 00:32:42 dt-prox1 kernel: [1408979.036467] zio_free_sync+0x41/0x100 [zfs]
Jan 22 00:32:42 dt-prox1 kernel: [1408979.037251] zio_free+0x90/0xd0 [zfs]
Jan 22 00:32:42 dt-prox1 kernel: [1408979.037997] dsl_free+0x11/0x20 [zfs]
Jan 22 00:32:42 dt-prox1 kernel: [1408979.038727] dsl_dataset_block_kill+0x257/0x490 [zfs]
Jan 22 00:32:42 dt-prox1 kernel: [1408979.039432] ? kmem_cache_free+0x1af/0x1e0
Jan 22 00:32:42 dt-prox1 kernel: [1408979.040143] ? kmem_cache_free+0x1af/0x1e0
Jan 22 00:32:42 dt-prox1 kernel: [1408979.040870] dbuf_write_done+0x162/0x1b0 [zfs]
Jan 22 00:32:42 dt-prox1 kernel: [1408979.041562] arc_write_done+0x86/0x3f0 [zfs]
Jan 22 00:32:42 dt-prox1 kernel: [1408979.042225] zio_done+0x2d0/0xe60 [zfs]
Jan 22 00:32:42 dt-prox1 kernel: [1408979.042839] ? kfree+0x165/0x180
Jan 22 00:32:42 dt-prox1 kernel: [1408979.043509] ? spl_kmem_free+0x33/0x40 [spl]
Jan 22 00:32:42 dt-prox1 kernel: [1408979.044228] zio_execute+0x95/0xf0 [zfs]
Jan 22 00:32:42 dt-prox1 kernel: [1408979.044904] taskq_thread+0x2ae/0x4d0 [spl]
Jan 22 00:32:42 dt-prox1 kernel: [1408979.045474] ? wake_up_q+0x80/0x80
Jan 22 00:32:42 dt-prox1 kernel: [1408979.046051] ? zio_reexecute+0x390/0x390 [zfs]
Jan 22 00:32:42 dt-prox1 kernel: [1408979.046591] kthread+0x105/0x140
Jan 22 00:32:42 dt-prox1 kernel: [1408979.047111] ? task_done+0xb0/0xb0 [spl]
Jan 22 00:32:42 dt-prox1 kernel: [1408979.047613] ? kthread_create_worker_on_cpu+0x70/0x70
Jan 22 00:32:42 dt-prox1 kernel: [1408979.048178] ret_from_fork+0x35/0x40
Jan 22 00:32:42 dt-prox1 kernel: [1408979.048702] Code: ff ff ff 75 5d 48 83 bd 68 ff ff ff 00 0f 84 b9 fe ff ff 48 8b b5 60 ff ff ff 48 8b bd 68 ff ff ff e8 63 85 79 00 e9 a1 fe ff ff <0f> 0b 80 4d ab 80 4c 8b 45 88 31 d2 4c 8b 4d a8 4c 89 f6 e8 66
Jan 22 00:32:42 dt-prox1 kernel: [1408979.049735] RIP: __slab_free+0x1a2/0x330 RSP: ffffb8a524cd7a70
Jan 22 00:32:42 dt-prox1 kernel: [1408979.050243] ---[ end trace 1503c118398c1e08 ]---
I think is a zfs problem, but what happened?
Help me please...
Thak you very much
Last edited: