Issue with qemu-ga/fsfreeze and NFSD running in Guest

Lephisto

Well-Known Member
Jun 22, 2019
154
16
58
47
Hi,

I have a reproduceable Issue with the qemu-guest-agent and the NFS server running inside a guest VM. Each night, when snapshot backups are run, and thus fsfreeze is issues to the Guest VM, I have a nasty Kernel debug stuff in my logs:

Code:
Mar 26 01:30:00 publikore-data qemu-ga: info: guest-ping called
Mar 26 01:30:00 publikore-data qemu-ga: info: guest-fsfreeze called
Mar 26 01:30:01 publikore-data kernel: [10844965.818729] ------------[ cut here ]------------
Mar 26 01:30:01 publikore-data kernel: [10844965.818732] WARNING: CPU: 1 PID: 811 at fs/ext4/ext4_jbd2.c:75 ext4_journal_check_start+0x68/0xb0
Mar 26 01:30:01 publikore-data kernel: [10844965.818739] Modules linked in: nfnetlink tls binfmt_misc nls_iso8859_1 joydev input_leds serio_raw sch_fq_codel dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua drm nfsd auth_rpcgss efi_pstore nfs_acl lockd grace sunrpc ip_tables x_tables autofs4 btrfs blake2b_generic zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear virtio_net net_failover psmouse failover virtio_scsi floppy
Mar 26 01:30:01 publikore-data kernel: [10844965.818766] CPU: 1 PID: 811 Comm: nfsd Tainted: G        W         5.15.0-89-generic #99-Ubuntu
Mar 26 01:30:01 publikore-data kernel: [10844965.818768] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.org 04/01/2014
Mar 26 01:30:01 publikore-data kernel: [10844965.818770] RIP: 0010:ext4_journal_check_start+0x68/0xb0
Mar 26 01:30:01 publikore-data kernel: [10844965.818772] Code: 02 00 00 31 c0 48 85 d2 74 07 8b 02 83 e0 02 75 24 4c 8b 65 f8 c9 c3 cc cc cc cc 4c 8b 65 f8 b8 fb ff ff ff c9 c3 cc cc cc cc <0f> 0b eb ce b8 e2 ff ff ff eb dc 44 8b 42 10 68 08 50 c1 91 45 31
Mar 26 01:30:01 publikore-data kernel: [10844965.818773] RSP: 0018:ffffad69c0abbad8 EFLAGS: 00010246
Mar 26 01:30:01 publikore-data kernel: [10844965.818775] RAX: ffff8ab5c8a38000 RBX: 0000000000000001 RCX: 0000000000000002
Mar 26 01:30:01 publikore-data kernel: [10844965.818777] RDX: 0000000000000000 RSI: 0000000000001766 RDI: ffff8ab5c8b29000
Mar 26 01:30:01 publikore-data kernel: [10844965.818778] RBP: ffffad69c0abbae0 R08: 0000000000000000 R09: ffffffff90a7e73a
Mar 26 01:30:01 publikore-data kernel: [10844965.818779] R10: ffff8ab5c434ad68 R11: 0000000000000000 R12: ffff8ab5c8b29000
Mar 26 01:30:01 publikore-data kernel: [10844965.818780] R13: 0000000000000000 R14: 0000000000000008 R15: ffff8ab5c8b29000
Mar 26 01:30:01 publikore-data kernel: [10844965.818783] FS:  0000000000000000(0000) GS:ffff8ab6f7d00000(0000) knlGS:0000000000000000
Mar 26 01:30:01 publikore-data kernel: [10844965.818785] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Mar 26 01:30:01 publikore-data kernel: [10844965.818786] CR2: 00007f5d91b721a0 CR3: 0000000103500000 CR4: 00000000000006e0
Mar 26 01:30:01 publikore-data kernel: [10844965.818790] Call Trace:
Mar 26 01:30:01 publikore-data kernel: [10844965.818792]  <TASK>
Mar 26 01:30:01 publikore-data kernel: [10844965.818795]  ? show_trace_log_lvl+0x1d6/0x2ea
Mar 26 01:30:01 publikore-data kernel: [10844965.818800]  ? show_trace_log_lvl+0x1d6/0x2ea
Mar 26 01:30:01 publikore-data kernel: [10844965.818803]  ? __ext4_journal_start_sb+0x35/0x130
Mar 26 01:30:01 publikore-data kernel: [10844965.818805]  ? show_regs.part.0+0x23/0x29
Mar 26 01:30:01 publikore-data kernel: [10844965.818807]  ? show_regs.cold+0x8/0xd
Mar 26 01:30:01 publikore-data kernel: [10844965.818809]  ? ext4_journal_check_start+0x68/0xb0
Mar 26 01:30:01 publikore-data kernel: [10844965.818811]  ? __warn+0x8c/0x100
Mar 26 01:30:01 publikore-data kernel: [10844965.818814]  ? ext4_journal_check_start+0x68/0xb0
Mar 26 01:30:01 publikore-data kernel: [10844965.818816]  ? report_bug+0xa4/0xd0
Mar 26 01:30:01 publikore-data kernel: [10844965.818821]  ? handle_bug+0x39/0x90
Mar 26 01:30:01 publikore-data kernel: [10844965.818825]  ? exc_invalid_op+0x19/0x70
Mar 26 01:30:01 publikore-data kernel: [10844965.818827]  ? asm_exc_invalid_op+0x1b/0x20
Mar 26 01:30:01 publikore-data kernel: [10844965.818830]  ? ext4_dirty_inode+0x3a/0x80
Mar 26 01:30:01 publikore-data kernel: [10844965.818833]  ? ext4_journal_check_start+0x68/0xb0
Mar 26 01:30:01 publikore-data kernel: [10844965.818835]  ? ext4_journal_check_start+0x13/0xb0
Mar 26 01:30:01 publikore-data kernel: [10844965.818836]  __ext4_journal_start_sb+0x35/0x130
Mar 26 01:30:01 publikore-data kernel: [10844965.818838]  ext4_dirty_inode+0x3a/0x80
Mar 26 01:30:01 publikore-data kernel: [10844965.818840]  __mark_inode_dirty+0x5e/0x330
Mar 26 01:30:01 publikore-data kernel: [10844965.818843]  generic_update_time+0x6c/0xd0
Mar 26 01:30:01 publikore-data kernel: [10844965.818845]  file_update_time+0x127/0x140
Mar 26 01:30:01 publikore-data kernel: [10844965.818848]  ? mk_fsid+0x110/0x110 [nfsd]
Mar 26 01:30:01 publikore-data kernel: [10844965.818869]  file_modified+0x27/0x40
Mar 26 01:30:01 publikore-data kernel: [10844965.818870]  ext4_buffered_write_iter+0x57/0x170
Mar 26 01:30:01 publikore-data kernel: [10844965.818873]  ext4_file_write_iter+0x43/0x60
Mar 26 01:30:01 publikore-data kernel: [10844965.818875]  do_iter_readv_writev+0x14a/0x1b0
Mar 26 01:30:01 publikore-data kernel: [10844965.818878]  do_iter_write+0x8c/0x160
Mar 26 01:30:01 publikore-data kernel: [10844965.818881]  vfs_iter_write+0x19/0x30
Mar 26 01:30:01 publikore-data kernel: [10844965.818883]  nfsd_vfs_write+0x149/0x610 [nfsd]
Mar 26 01:30:01 publikore-data kernel: [10844965.818901]  ? nfs4_put_stid+0xa4/0xc0 [nfsd]
Mar 26 01:30:01 publikore-data kernel: [10844965.818923]  nfsd4_write+0x130/0x1b0 [nfsd]
Mar 26 01:30:01 publikore-data kernel: [10844965.818941]  nfsd4_proc_compound+0x41c/0x770 [nfsd]
Mar 26 01:30:01 publikore-data kernel: [10844965.818958]  ? nfsd_cache_lookup+0x3b7/0x4a0 [nfsd]
Mar 26 01:30:01 publikore-data kernel: [10844965.818977]  nfsd_dispatch+0x16c/0x270 [nfsd]
Mar 26 01:30:01 publikore-data kernel: [10844965.818994]  svc_process_common+0x3da/0x720 [sunrpc]
Mar 26 01:30:01 publikore-data kernel: [10844965.819021]  ? nfsd_svc+0x190/0x190 [nfsd]
Mar 26 01:30:01 publikore-data kernel: [10844965.819035]  svc_process+0xbc/0x100 [sunrpc]
Mar 26 01:30:01 publikore-data kernel: [10844965.819057]  nfsd+0xed/0x150 [nfsd]
Mar 26 01:30:01 publikore-data kernel: [10844965.819071]  ? nfsd_shutdown_threads+0x90/0x90 [nfsd]
Mar 26 01:30:01 publikore-data kernel: [10844965.819085]  kthread+0x12a/0x150
Mar 26 01:30:01 publikore-data kernel: [10844965.819089]  ? set_kthread_struct+0x50/0x50
Mar 26 01:30:01 publikore-data kernel: [10844965.819091]  ret_from_fork+0x22/0x30
Mar 26 01:30:01 publikore-data kernel: [10844965.819096]  </TASK>
Mar 26 01:30:01 publikore-data kernel: [10844965.819096] ---[ end trace a8672e992bf3892a ]---
Mar 26 01:30:01 publikore-data kernel: [10844965.819115] ------------[ cut here ]------------

The VM is running Ubuntu 22.04 Kernel 5.15.0-89-generic

I suspect that somehow after fsfreezing the nfs server is still able to write stuff. Has anyone else experienced this and how could I mitigate this?

regards
 
Mar 26 01:30:01 publikore-data kernel: [10844965.818811] ? __warn+0x8c/0x100
Mar 26 01:30:01 publikore-data kernel: [10844965.818814] ? ext4_journal_check_start+0x68/0xb0
Mar 26 01:30:01 publikore-data kernel: [10844965.818816] ? report_bug+0xa4/0xd0 <<<<<<<<<<<<<<<<<<<<<<<
Mar 26 01:30:01 publikore-data kernel: [10844965.818821] ? handle_bug+0x39/0x90 <<<<<<<<<<<<<<<<<<<<<
Mar 26 01:30:01 publikore-data kernel: [10844965.818825] ? exc_invalid_op+0x19/0x70
This appears to be a Kernel bug in EXT4, probably related to fs_freeze.
You should report it to linux-ext4 mailing list: https://ext4.wiki.kernel.org/index.php/Mailinglists


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!