Hello everyone,
I've been using proxmox for some time and I really like it. Your posts and advices were here in htis forum were very usefull to me throughout all the setups and optimizations that I did to the VMs I use.
It has come a time for me to also write in this forum. Unfortunately with a plead for help.
I've recently created a new ZFS pool on a couple of disks I have. Then I moved all my VMs to it. ZFS is really nice as a whole but since I started using it I started to get this weird hangs. At first I thought it was related to the ARC configuration. I read some documentation. again here in the forum, and optimized the dirty ratio and the ARC size. It seemed to have worked for a while but I got the problem againt today. This time I caught it while it was piling up and it turns out it is some kind of an issue with iowait. When I looked at the logs (/var/log/messages) I found the following dump:
I looked over the internet but all I found is a guy who had a memory problem. I am already checking the RAM for any issues with memtest86+ but I am already almost over half of it and still no errors. That's why I decided to ask if the above dump makes sense to any of you and if you can point me out to a possible solution.
Thanks a lot in advance.
I've been using proxmox for some time and I really like it. Your posts and advices were here in htis forum were very usefull to me throughout all the setups and optimizations that I did to the VMs I use.
It has come a time for me to also write in this forum. Unfortunately with a plead for help.
I've recently created a new ZFS pool on a couple of disks I have. Then I moved all my VMs to it. ZFS is really nice as a whole but since I started using it I started to get this weird hangs. At first I thought it was related to the ARC configuration. I read some documentation. again here in the forum, and optimized the dirty ratio and the ARC size. It seemed to have worked for a while but I got the problem againt today. This time I caught it while it was piling up and it turns out it is some kind of an issue with iowait. When I looked at the logs (/var/log/messages) I found the following dump:
Code:
Aug 28 12:06:11 pve kernel: [98445.956030] fwbr116i0: port 2(tap116i0) entered forwarding state
Aug 28 12:52:08 pve kernel: [101203.260579] fwbr116i0: port 2(tap116i0) entered disabled state
Aug 28 12:52:08 pve kernel: [101203.293818] fwbr116i0: port 1(fwln116i0) entered disabled state
Aug 28 12:52:08 pve kernel: [101203.293890] vmbr0: port 9(fwpr116p0) entered disabled state
Aug 28 12:52:08 pve kernel: [101203.296943] device fwln116i0 left promiscuous mode
Aug 28 12:52:08 pve kernel: [101203.296948] fwbr116i0: port 1(fwln116i0) entered disabled state
Aug 28 12:52:08 pve kernel: [101203.343154] device fwpr116p0 left promiscuous mode
Aug 28 12:52:08 pve kernel: [101203.343156] vmbr0: port 9(fwpr116p0) entered disabled state
Aug 28 18:19:35 pve kernel: [120849.445985] PGD 0 P4D 0
Aug 28 18:19:35 pve kernel: [120849.446005] Oops: 0000 [#1] SMP PTI
Aug 28 18:19:35 pve kernel: [120849.446032] CPU: 5 PID: 1326 Comm: z_wr_iss Tainted: P O 5.4.55-1-pve #1
Aug 28 18:19:35 pve kernel: [120849.446080] Hardware name: ASUSTeK COMPUTER INC. Z10PE-D16 Series/Z10PE-D16 Series, BIOS 3502 11/24/2017
Aug 28 18:19:35 pve kernel: [120849.446205] RIP: 0010:lz4_compress_zfs+0x3e5/0x7e0 [zfs]
Aug 28 18:19:35 pve kernel: [120849.446239] Code: c8 48 29 df 49 29 d8 c1 ee 13 66 41 89 7c 75 00 69 31 b1 79 37 9e c1 ee 13 49 8d 74 75 00 0f b7 3e 66 44 89 06 8b 31 48 01 df <39> 37 0f 85 86 03 00 00
c6 00 00 4c 8d 40 01 48 89 ce 4c 8d 59 04
Aug 28 18:19:35 pve kernel: [120849.446338] RSP: 0018:ffff972249bd7cb0 EFLAGS: 00010202
Aug 28 18:19:35 pve kernel: [120849.446358] RAX: ffff8b0bf02a411b RBX: 0000000023658001 RCX: ffff8afdddb4a178
Aug 28 18:19:35 pve kernel: [120849.446383] RDX: ffff8afdddb4b000 RSI: 00000000bac58001 RDI: 0000000023658001
Aug 28 18:19:35 pve kernel: [120849.446409] RBP: ffff972249bd7d08 R08: ffff8afdba4f2177 R09: ffff8afdddb4aff4
Aug 28 18:19:35 pve kernel: [120849.446434] R10: ffff8b0bf02a4e00 R11: ffff8afdddb4a178 R12: 0000000000001000
Aug 28 18:19:35 pve kernel: [120849.446460] R13: ffff8b0be7b2c000 R14: ffff8afdddb4a112 R15: 0000000000000023
Aug 28 18:19:35 pve kernel: [120849.446485] FS: 0000000000000000(0000) GS:ffff8b0bff540000(0000) knlGS:0000000000000000
Aug 28 18:19:35 pve kernel: [120849.446514] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Aug 28 18:19:35 pve kernel: [120849.446535] CR2: 0000000023658001 CR3: 00000010de20a006 CR4: 00000000003626e0
Aug 28 18:19:35 pve kernel: [120849.446560] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Aug 28 18:19:35 pve kernel: [120849.446585] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Aug 28 18:19:35 pve kernel: [120849.446610] Call Trace:
Aug 28 18:19:35 pve kernel: [120849.446670] zio_compress_data+0x85/0xc0 [zfs]
Aug 28 18:19:35 pve kernel: [120849.446732] zio_write_compress+0x426/0x770 [zfs]
Aug 28 18:19:35 pve kernel: [120849.446794] zio_execute+0x99/0xf0 [zfs]
Aug 28 18:19:35 pve kernel: [120849.446817] taskq_thread+0x2ec/0x4d0 [spl]
Aug 28 18:19:35 pve kernel: [120849.446838] ? wake_up_q+0x80/0x80
Aug 28 18:19:35 pve kernel: [120849.446895] ? zio_taskq_member.isra.12.constprop.17+0x70/0x70 [zfs]
Aug 28 18:19:35 pve kernel: [120849.446922] kthread+0x120/0x140
Aug 28 18:19:35 pve kernel: [120849.446940] ? task_done+0xb0/0xb0 [spl]
Aug 28 18:19:35 pve kernel: [120849.446958] ? kthread_park+0x90/0x90
Aug 28 18:19:35 pve kernel: [120849.446975] ret_from_fork+0x35/0x40
Aug 28 18:19:35 pve kernel: [120849.446991] Modules linked in: tcp_diag inet_diag rpcsec_gss_krb5 auth_rpcgss nfsv4 nfs lockd grace fscache veth ebtable_filter ebtables ip_set ip6table_raw iptable_raw ip6table_filter ip6_tables msr iptable_filter bpfilter bonding softdog nfnetlink_log nfnetlink intel_rapl_msr intel_rapl_common sb_edac x86_pkg_temp_thermal intel_powerclamp kvm_intel kvm irqbypass crct10dif_pclmul zfs(PO) crc32_pclmul ghash_clmulni_intel aesni_intel zunicode(PO) crypto_simd zlua(PO) cryptd glue_helper zavl(PO) rapl icp(PO) ipmi_ssif dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio ast drm_vram_helper intel_cstate ttm pcspkr drm_kms_helper drm fb_sys_fops syscopyarea sysfillrect sysimgblt mei_me mei ipmi_si ipmi_devintf ipmi_msghandler acpi_pad acpi_power_meter mac_hid zcommon(PO) znvpair(PO) spl(O) vhost_net vhost tap ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp sunrpc libiscsi_tcp libiscsi scsi_transport_iscsi nct6775 hwmon_vid coretemp ip_tables x_tables autofs4 btrfs xor zstd_compress hid_generic
Aug 28 18:19:35 pve kernel: [120849.447026] usbhid hid raid6_pq libcrc32c xhci_pci ehci_pci ahci i2c_i801 lpc_ich ehci_hcd xhci_hcd igb libahci i2c_algo_bit dca wmi
Aug 28 18:19:35 pve kernel: [120849.452519] CR2: 0000000023658001
Aug 28 18:19:35 pve kernel: [120849.453471] ---[ end trace c626264b918d11a4 ]---
Aug 28 18:19:35 pve kernel: [120849.461752] RIP: 0010:lz4_compress_zfs+0x3e5/0x7e0 [zfs]
Aug 28 18:19:35 pve kernel: [120849.462666] Code: c8 48 29 df 49 29 d8 c1 ee 13 66 41 89 7c 75 00 69 31 b1 79 37 9e c1 ee 13 49 8d 74 75 00 0f b7 3e 66 44 89 06 8b 31 48 01 df <39> 37 0f 85 86 03 00 00
c6 00 00 4c 8d 40 01 48 89 ce 4c 8d 59 04
Aug 28 18:19:35 pve kernel: [120849.464543] RSP: 0018:ffff972249bd7cb0 EFLAGS: 00010202
Aug 28 18:19:35 pve kernel: [120849.465478] RAX: ffff8b0bf02a411b RBX: 0000000023658001 RCX: ffff8afdddb4a178
Aug 28 18:19:35 pve kernel: [120849.466442] RDX: ffff8afdddb4b000 RSI: 00000000bac58001 RDI: 0000000023658001
Aug 28 18:19:35 pve kernel: [120849.467372] RBP: ffff972249bd7d08 R08: ffff8afdba4f2177 R09: ffff8afdddb4aff4
Aug 28 18:19:35 pve kernel: [120849.468310] R10: ffff8b0bf02a4e00 R11: ffff8afdddb4a178 R12: 0000000000001000
Aug 28 18:19:35 pve kernel: [120849.469253] R13: ffff8b0be7b2c000 R14: ffff8afdddb4a112 R15: 0000000000000023
Aug 28 18:19:35 pve kernel: [120849.470191] FS: 0000000000000000(0000) GS:ffff8b0bff540000(0000) knlGS:0000000000000000
Aug 28 18:19:35 pve kernel: [120849.471119] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Aug 28 18:19:35 pve kernel: [120849.472046] CR2: 0000000023658001 CR3: 00000010de20a006 CR4: 00000000003626e0
Aug 28 18:19:35 pve kernel: [120849.472961] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Aug 28 18:19:35 pve kernel: [120849.473850] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
I looked over the internet but all I found is a guy who had a memory problem. I am already checking the RAM for any issues with memtest86+ but I am already almost over half of it and still no errors. That's why I decided to ask if the above dump makes sense to any of you and if you can point me out to a possible solution.
Thanks a lot in advance.