GFS2 crashing when list storage VM disks

elyor

Member
Sep 22, 2022
1
0
6
Proxmox cluster configuring with shared storage on directory, which is mounted through multipath (SAN - multipathd - block device (sdb) - gfs2 - /mnt/storage)
When we opened VM Disks of storage, gfs2 crashing and server freezes.

Code:
Jun 16 10:40:43 srv09 kernel: [2465185.770466] gfs2: fsid=Prodata-cls01:ssd2.2: G:  s:SH n:2/e84a0b8d f:Iqob t:SH d:EX/0 a:0 v:0 r:4 m:50 p:3
Jun 16 10:40:43 srv09 kernel: [2465185.771337] gfs2: fsid=Prodata-cls01:ssd2.2:  H: s:SH f:HD e:0 p:1908636 [qemu-img] gfs2_file_read_iter+0x105/0x400 [gfs2]
Jun 16 10:40:43 srv09 kernel: [2465185.772146] gfs2: fsid=Prodata-cls01:ssd2.2:  I: n:12/3897166733 t:8 f:0x00 d:0x00000000 s:11144855552 p:21
Jun 16 10:40:43 srv09 kernel: [2465185.772903] ------------[ cut here ]------------
Jun 16 10:40:43 srv09 kernel: [2465185.772905] kernel BUG at fs/gfs2/glock.c:329!
Jun 16 10:40:43 srv09 kernel: [2465185.773658] invalid opcode: 0000 [#1] SMP PTI
Jun 16 10:40:43 srv09 kernel: [2465185.774382] CPU: 0 PID: 121290 Comm: kworker/0:1H Tainted: P           O      5.15.85-1-pve #1
Jun 16 10:40:43 srv09 kernel: [2465185.775105] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 10/16/2020
Jun 16 10:40:43 srv09 kernel: [2465185.775924] Workqueue: dlm_callback dlm_callback_work [dlm]
Jun 16 10:40:43 srv09 kernel: [2465185.776687] RIP: 0010:demote_incompat_holders+0xe2/0x100 [gfs2]
Jun 16 10:40:43 srv09 kernel: [2465185.777467] Code: af 66 83 7b 22 01 75 a8 41 f6 45 20 20 74 a1 f6 43 20 20 0f 85 54 ff ff ff eb 95 ba 01 00 00 00 4c 89 f6 31 ff e8 de e5 ff ff <0f> 0b 83 ea 02 66 83 fa 01 0f 86 34 ff ff ff e9 72 ff ff ff 66 2e
Jun 16 10:40:43 srv09 kernel: [2465185.778992] RSP: 0018:ffffb22d33407cd0 EFLAGS: 00010246
Jun 16 10:40:43 srv09 kernel: [2465185.779772] RAX: 0000000000000000 RBX: ffffb22d59613da0 RCX: ffff934dbf620588
Jun 16 10:40:43 srv09 kernel: [2465185.780667] RDX: 0000000000000000 RSI: 0000000000000027 RDI: ffff934dbf620580
Jun 16 10:40:43 srv09 kernel: [2465185.781539] RBP: ffffb22d33407cf0 R08: 0000000000000003 R09: 0000000000000001
Jun 16 10:40:43 srv09 kernel: [2465185.782307] R10: ffff92efddd008c0 R11: ffffffffc09b80c0 R12: ffff934f6b49c498
Jun 16 10:40:43 srv09 kernel: [2465185.783089] R13: ffffb22d33407d08 R14: ffff934f6b49c458 R15: 0000000000000032
Jun 16 10:40:43 srv09 kernel: [2465185.783919] FS:  0000000000000000(0000) GS:ffff934dbf600000(0000) knlGS:0000000000000000
Jun 16 10:40:43 srv09 kernel: [2465185.784768] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jun 16 10:40:43 srv09 kernel: [2465185.785571] CR2: 000055e1878b15d8 CR3: 0000007d3f210001 CR4: 00000000003726f0
Jun 16 10:40:43 srv09 kernel: [2465185.786325] Call Trace:
Jun 16 10:40:43 srv09 kernel: [2465185.787090]  <TASK>
Jun 16 10:40:43 srv09 kernel: [2465185.787808]  gfs2_glock_cb+0xcc/0x170 [gfs2]
Jun 16 10:40:43 srv09 kernel: [2465185.788566]  ? sync_wait_cb+0x20/0x20 [gfs2]
Jun 16 10:40:43 srv09 kernel: [2465185.789320]  gdlm_bast+0x1a/0x50 [gfs2]
Jun 16 10:40:43 srv09 kernel: [2465185.790034]  dlm_callback_work+0x138/0x2c0 [dlm]
Jun 16 10:40:43 srv09 kernel: [2465185.790753]  ? gdlm_put_lock+0x160/0x160 [gfs2]
Jun 16 10:40:43 srv09 kernel: [2465185.791481]  process_one_work+0x22b/0x3d0
Jun 16 10:40:43 srv09 kernel: [2465185.792137]  worker_thread+0x53/0x420
Jun 16 10:40:43 srv09 kernel: [2465185.792808]  ? process_one_work+0x3d0/0x3d0
Jun 16 10:40:43 srv09 kernel: [2465185.793525]  kthread+0x12a/0x150
Jun 16 10:40:43 srv09 kernel: [2465185.794155]  ? set_kthread_struct+0x50/0x50
Jun 16 10:40:43 srv09 kernel: [2465185.794790]  ret_from_fork+0x22/0x30
Jun 16 10:40:43 srv09 kernel: [2465185.795400]  </TASK>
Jun 16 10:40:43 srv09 kernel: [2465185.796033] Modules linked in: udp_diag tcp_diag inet_diag ebt_arp xt_mac act_police cls_basic sch_ingress sch_htb gfs2 ebtable_filter ebtables ip6table_raw ip6t_REJECT nf_reject_ipv6 ip6table_filter ip6_tables iptable_raw ipt_REJECT nf_reject_ipv4 xt_mark xt_set xt_physdev xt_addrtype xt_comment xt_tcpudp xt_multiport xt_conntrack iptable_filter bpfilter ip_set_hash_net ip_set dlm sctp ip6_udp_tunnel udp_tunnel nf_tables nfnetlink_cttimeout bonding tls openvswitch nsh nf_conncount nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c softdog nfnetlink_log nfnetlink dm_service_time dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua ipmi_ssif intel_rapl_msr intel_rapl_common sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul ghash_clmulni_intelaesni_intel crypto_simd cryptd mgag200 rapl drm_kms_helper cec rc_core i2c_algo_bit intel_cstate serio_raw pcspkr fb_sys_fops syscopyarea sysfillrect sysimgblt joydev input_leds zfs(PO) hpilo
Jun 16 10:40:43 srv09 kernel: [2465185.796093]  ioatdma dca zunicode(PO) zzstd(O) zlua(O) acpi_ipmi zavl(PO) ipmi_si ipmi_devintf icp(PO) ipmi_msghandler acpi_tad mac_hid acpi_power_meter zcommon(PO) znvpair(PO) spl(O) vhost_net vhost vhost_iotlb tap ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi drm sunrpc ip_tables x_tables autofs4 simplefb uas usb_storage hid_generic usbkbd usbmouse usbhid hid crc32_pclmul psmouse lpfc nvmet_fc nvmet xhci_pci nvme_fc i2c_i801 xhci_pci_renesas nvme_fabrics uhci_hcd ehci_pci lpc_ich i2c_smbus tg3 xhci_hcd nvme_core ehci_hcd scsi_transport_fc hpsa scsi_transport_sas wmi
Jun 16 10:40:43 srv09 kernel: [2465185.805598] ---[ end trace 64d53356f2819600 ]---
Jun 16 10:40:43 srv09 kernel: [2465185.816566] RIP: 0010:demote_incompat_holders+0xe2/0x100 [gfs2]
Jun 16 10:40:43 srv09 kernel: [2465185.817473] Code: af 66 83 7b 22 01 75 a8 41 f6 45 20 20 74 a1 f6 43 20 20 0f 85 54 ff ff ff eb 95 ba 01 00 00 00 4c 89 f6 31 ff e8 de e5 ff ff <0f> 0b 83 ea 02 66 83 fa 01 0f 86 34 ff ff ff e9 72 ff ff ff 66 2e
Jun 16 10:40:43 srv09 kernel: [2465185.819174] RSP: 0018:ffffb22d33407cd0 EFLAGS: 00010246
Jun 16 10:40:43 srv09 kernel: [2465185.820054] RAX: 0000000000000000 RBX: ffffb22d59613da0 RCX: ffff934dbf620588
Jun 16 10:40:43 srv09 kernel: [2465185.820834] RDX: 0000000000000000 RSI: 0000000000000027 RDI: ffff934dbf620580
Jun 16 10:40:43 srv09 kernel: [2465185.821630] RBP: ffffb22d33407cf0 R08: 0000000000000003 R09: 0000000000000001
Jun 16 10:40:43 srv09 kernel: [2465185.822463] R10: ffff92efddd008c0 R11: ffffffffc09b80c0 R12: ffff934f6b49c498
Jun 16 10:40:43 srv09 kernel: [2465185.823295] R13: ffffb22d33407d08 R14: ffff934f6b49c458 R15: 0000000000000032
Jun 16 10:40:43 srv09 kernel: [2465185.824154] FS:  0000000000000000(0000) GS:ffff934dbf600000(0000) knlGS:0000000000000000
Jun 16 10:40:43 srv09 kernel: [2465185.825015] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jun 16 10:40:43 srv09 kernel: [2465185.825768] CR2: 000055e1878b15d8 CR3: 0000007d3f210001 CR4: 00000000003726f0

Additionally, we use DLM with gfs2.
Versions of soft:
gfs2-utils 3.3.0-2
dlm-controld 4.1.0-1
multipath-tools 0.8.5-2+deb11u1
proxmox-ve 7.2-1

Why its happened and how we could solve this problem?

Thanks to all.
 
If I am reading the trace correctly, this appears to be an assertion in the distributed lock management code used by GFS. Since GFS is not officially supported by Proxmox, you should consult with your vendor. If you are supporting GFS alone, you might want to check in with the mailing list. I know that Redhat is actively developing GFS2.

Good Luck!

P.S. I see that you are running PVE 7.2, you may want to update to latest 7.4 which will bring in other/newer packages (ie multipath). You also dont list kernel version, may want to look at bringing that up to date. I dont keep track of GFS2 versioning, but the folks on that specific mailing list may be able to point out if it needs updating.


Blockbridge: Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!