Issue with qemu-ga/fsfreeze and NFSD running in Guest

Lephisto

Well-Known Member
Jun 22, 2019
181
18
58
48
Hi,

I have a reproduceable Issue with the qemu-guest-agent and the NFS server running inside a guest VM. Each night, when snapshot backups are run, and thus fsfreeze is issues to the Guest VM, I have a nasty Kernel debug stuff in my logs:

Code:
Mar 26 01:30:00 publikore-data qemu-ga: info: guest-ping called
Mar 26 01:30:00 publikore-data qemu-ga: info: guest-fsfreeze called
Mar 26 01:30:01 publikore-data kernel: [10844965.818729] ------------[ cut here ]------------
Mar 26 01:30:01 publikore-data kernel: [10844965.818732] WARNING: CPU: 1 PID: 811 at fs/ext4/ext4_jbd2.c:75 ext4_journal_check_start+0x68/0xb0
Mar 26 01:30:01 publikore-data kernel: [10844965.818739] Modules linked in: nfnetlink tls binfmt_misc nls_iso8859_1 joydev input_leds serio_raw sch_fq_codel dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua drm nfsd auth_rpcgss efi_pstore nfs_acl lockd grace sunrpc ip_tables x_tables autofs4 btrfs blake2b_generic zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear virtio_net net_failover psmouse failover virtio_scsi floppy
Mar 26 01:30:01 publikore-data kernel: [10844965.818766] CPU: 1 PID: 811 Comm: nfsd Tainted: G        W         5.15.0-89-generic #99-Ubuntu
Mar 26 01:30:01 publikore-data kernel: [10844965.818768] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.org 04/01/2014
Mar 26 01:30:01 publikore-data kernel: [10844965.818770] RIP: 0010:ext4_journal_check_start+0x68/0xb0
Mar 26 01:30:01 publikore-data kernel: [10844965.818772] Code: 02 00 00 31 c0 48 85 d2 74 07 8b 02 83 e0 02 75 24 4c 8b 65 f8 c9 c3 cc cc cc cc 4c 8b 65 f8 b8 fb ff ff ff c9 c3 cc cc cc cc <0f> 0b eb ce b8 e2 ff ff ff eb dc 44 8b 42 10 68 08 50 c1 91 45 31
Mar 26 01:30:01 publikore-data kernel: [10844965.818773] RSP: 0018:ffffad69c0abbad8 EFLAGS: 00010246
Mar 26 01:30:01 publikore-data kernel: [10844965.818775] RAX: ffff8ab5c8a38000 RBX: 0000000000000001 RCX: 0000000000000002
Mar 26 01:30:01 publikore-data kernel: [10844965.818777] RDX: 0000000000000000 RSI: 0000000000001766 RDI: ffff8ab5c8b29000
Mar 26 01:30:01 publikore-data kernel: [10844965.818778] RBP: ffffad69c0abbae0 R08: 0000000000000000 R09: ffffffff90a7e73a
Mar 26 01:30:01 publikore-data kernel: [10844965.818779] R10: ffff8ab5c434ad68 R11: 0000000000000000 R12: ffff8ab5c8b29000
Mar 26 01:30:01 publikore-data kernel: [10844965.818780] R13: 0000000000000000 R14: 0000000000000008 R15: ffff8ab5c8b29000
Mar 26 01:30:01 publikore-data kernel: [10844965.818783] FS:  0000000000000000(0000) GS:ffff8ab6f7d00000(0000) knlGS:0000000000000000
Mar 26 01:30:01 publikore-data kernel: [10844965.818785] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Mar 26 01:30:01 publikore-data kernel: [10844965.818786] CR2: 00007f5d91b721a0 CR3: 0000000103500000 CR4: 00000000000006e0
Mar 26 01:30:01 publikore-data kernel: [10844965.818790] Call Trace:
Mar 26 01:30:01 publikore-data kernel: [10844965.818792]  <TASK>
Mar 26 01:30:01 publikore-data kernel: [10844965.818795]  ? show_trace_log_lvl+0x1d6/0x2ea
Mar 26 01:30:01 publikore-data kernel: [10844965.818800]  ? show_trace_log_lvl+0x1d6/0x2ea
Mar 26 01:30:01 publikore-data kernel: [10844965.818803]  ? __ext4_journal_start_sb+0x35/0x130
Mar 26 01:30:01 publikore-data kernel: [10844965.818805]  ? show_regs.part.0+0x23/0x29
Mar 26 01:30:01 publikore-data kernel: [10844965.818807]  ? show_regs.cold+0x8/0xd
Mar 26 01:30:01 publikore-data kernel: [10844965.818809]  ? ext4_journal_check_start+0x68/0xb0
Mar 26 01:30:01 publikore-data kernel: [10844965.818811]  ? __warn+0x8c/0x100
Mar 26 01:30:01 publikore-data kernel: [10844965.818814]  ? ext4_journal_check_start+0x68/0xb0
Mar 26 01:30:01 publikore-data kernel: [10844965.818816]  ? report_bug+0xa4/0xd0
Mar 26 01:30:01 publikore-data kernel: [10844965.818821]  ? handle_bug+0x39/0x90
Mar 26 01:30:01 publikore-data kernel: [10844965.818825]  ? exc_invalid_op+0x19/0x70
Mar 26 01:30:01 publikore-data kernel: [10844965.818827]  ? asm_exc_invalid_op+0x1b/0x20
Mar 26 01:30:01 publikore-data kernel: [10844965.818830]  ? ext4_dirty_inode+0x3a/0x80
Mar 26 01:30:01 publikore-data kernel: [10844965.818833]  ? ext4_journal_check_start+0x68/0xb0
Mar 26 01:30:01 publikore-data kernel: [10844965.818835]  ? ext4_journal_check_start+0x13/0xb0
Mar 26 01:30:01 publikore-data kernel: [10844965.818836]  __ext4_journal_start_sb+0x35/0x130
Mar 26 01:30:01 publikore-data kernel: [10844965.818838]  ext4_dirty_inode+0x3a/0x80
Mar 26 01:30:01 publikore-data kernel: [10844965.818840]  __mark_inode_dirty+0x5e/0x330
Mar 26 01:30:01 publikore-data kernel: [10844965.818843]  generic_update_time+0x6c/0xd0
Mar 26 01:30:01 publikore-data kernel: [10844965.818845]  file_update_time+0x127/0x140
Mar 26 01:30:01 publikore-data kernel: [10844965.818848]  ? mk_fsid+0x110/0x110 [nfsd]
Mar 26 01:30:01 publikore-data kernel: [10844965.818869]  file_modified+0x27/0x40
Mar 26 01:30:01 publikore-data kernel: [10844965.818870]  ext4_buffered_write_iter+0x57/0x170
Mar 26 01:30:01 publikore-data kernel: [10844965.818873]  ext4_file_write_iter+0x43/0x60
Mar 26 01:30:01 publikore-data kernel: [10844965.818875]  do_iter_readv_writev+0x14a/0x1b0
Mar 26 01:30:01 publikore-data kernel: [10844965.818878]  do_iter_write+0x8c/0x160
Mar 26 01:30:01 publikore-data kernel: [10844965.818881]  vfs_iter_write+0x19/0x30
Mar 26 01:30:01 publikore-data kernel: [10844965.818883]  nfsd_vfs_write+0x149/0x610 [nfsd]
Mar 26 01:30:01 publikore-data kernel: [10844965.818901]  ? nfs4_put_stid+0xa4/0xc0 [nfsd]
Mar 26 01:30:01 publikore-data kernel: [10844965.818923]  nfsd4_write+0x130/0x1b0 [nfsd]
Mar 26 01:30:01 publikore-data kernel: [10844965.818941]  nfsd4_proc_compound+0x41c/0x770 [nfsd]
Mar 26 01:30:01 publikore-data kernel: [10844965.818958]  ? nfsd_cache_lookup+0x3b7/0x4a0 [nfsd]
Mar 26 01:30:01 publikore-data kernel: [10844965.818977]  nfsd_dispatch+0x16c/0x270 [nfsd]
Mar 26 01:30:01 publikore-data kernel: [10844965.818994]  svc_process_common+0x3da/0x720 [sunrpc]
Mar 26 01:30:01 publikore-data kernel: [10844965.819021]  ? nfsd_svc+0x190/0x190 [nfsd]
Mar 26 01:30:01 publikore-data kernel: [10844965.819035]  svc_process+0xbc/0x100 [sunrpc]
Mar 26 01:30:01 publikore-data kernel: [10844965.819057]  nfsd+0xed/0x150 [nfsd]
Mar 26 01:30:01 publikore-data kernel: [10844965.819071]  ? nfsd_shutdown_threads+0x90/0x90 [nfsd]
Mar 26 01:30:01 publikore-data kernel: [10844965.819085]  kthread+0x12a/0x150
Mar 26 01:30:01 publikore-data kernel: [10844965.819089]  ? set_kthread_struct+0x50/0x50
Mar 26 01:30:01 publikore-data kernel: [10844965.819091]  ret_from_fork+0x22/0x30
Mar 26 01:30:01 publikore-data kernel: [10844965.819096]  </TASK>
Mar 26 01:30:01 publikore-data kernel: [10844965.819096] ---[ end trace a8672e992bf3892a ]---
Mar 26 01:30:01 publikore-data kernel: [10844965.819115] ------------[ cut here ]------------

The VM is running Ubuntu 22.04 Kernel 5.15.0-89-generic

I suspect that somehow after fsfreezing the nfs server is still able to write stuff. Has anyone else experienced this and how could I mitigate this?

regards
 
Mar 26 01:30:01 publikore-data kernel: [10844965.818811] ? __warn+0x8c/0x100
Mar 26 01:30:01 publikore-data kernel: [10844965.818814] ? ext4_journal_check_start+0x68/0xb0
Mar 26 01:30:01 publikore-data kernel: [10844965.818816] ? report_bug+0xa4/0xd0 <<<<<<<<<<<<<<<<<<<<<<<
Mar 26 01:30:01 publikore-data kernel: [10844965.818821] ? handle_bug+0x39/0x90 <<<<<<<<<<<<<<<<<<<<<
Mar 26 01:30:01 publikore-data kernel: [10844965.818825] ? exc_invalid_op+0x19/0x70
This appears to be a Kernel bug in EXT4, probably related to fs_freeze.
You should report it to linux-ext4 mailing list: https://ext4.wiki.kernel.org/index.php/Mailinglists


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox