I have problem with backup and disk or Kernel

frenk970

Well-Known Member
Jan 20, 2020
85
3
48
27
Hello everyone,

I’m experiencing issues when running backups on Proxmox. Every time I attempt a backup, the machine returns the errors shown in the screenshot.

1739261717344.jpeg

I suspect the problem might be related to the disk, but I also have doubts that it could be caused by the Kernel. Has anyone encountered a similar issue or has any suggestions on how to fix it?

Proxmox Version: 8.3.3
Kernel Version: 6.11.11-1-pve

Thanks in advance for your help!
 
could you post the full message? e.g. by connection via SSH and running `dmesg -w` and triggering the issue?
 
could you post the full message? e.g. by connection via SSH and running `dmesg -w` and triggering the issue?
I try, but the problem doesn't come out right away and so I have to wait for it to back up the machine that's giving me problems and it has 350GB to back up so it's not immediate
 
could you post the full message? e.g. by connection via SSH and running `dmesg -w` and triggering the issue?
here is the problem again
[ 1469.558926] ------------[ cut here ]------------
[ 1469.558928] kernel BUG at fs/ext4/inode.c:1877!
[ 1469.558942] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
[ 1469.558948] CPU: 11 PID: 285 Comm: kworker/u32:4 Tainted: P O 6.11.11-1-pve #1
[ 1469.558958] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./Z390 Phantom Gaming 4S, BIOS P1.40 12/05/2019
[ 1469.558967] Workqueue: writeback wb_workfn (flush-8:16)
[ 1469.558977] RIP: 0010:mpage_submit_folio+0xb6/0xc0
[ 1469.558985] Code: 48 89 de e8 6c 8c 02 00 85 c0 75 09 49 8b 54 24 08 48 83 2a 01 5b 41 5c 41 5d 41 5e 5d 31 d2 31 c9 31 f6 31 ff c3 cc cc cc cc <0f> 0b 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90
[ 1469.559000] RSP: 0018:ffffa41d80a978b0 EFLAGS: 00010206
[ 1469.559007] RAX: 0000000000d1bfc3 RBX: ffff8ae00c239e38 RCX: 0000000000001000
[ 1469.559013] RDX: ffff8ae00c239e39 RSI: fffff2a9416e4d80 RDI: ffffa41d80a97a90
[ 1469.559020] RBP: ffffa41d80a979c0 R08: ffff8ae00c239e38 R09: 0000000000000000
[ 1469.559027] R10: 0000000000001000 R11: 000000000000000c R12: 0000000000d1bfc4
[ 1469.559041] R13: 0000000000df9fc4 R14: fffff2a9416e4d80 R15: ffffa41d80a97a90
[ 1469.559049] FS: 0000000000000000(0000) GS:ffff8ae03df80000(0000) knlGS:0000000000000000
[ 1469.559056] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1469.559062] CR2: 0000006bbf11725a CR3: 0000000e2fe36003 CR4: 00000000003726f0
[ 1469.559069] Call Trace:
[ 1469.559073] <TASK>
[ 1469.559077] ? show_regs+0x6d/0x80
[ 1469.559084] ? die+0x37/0xa0
[ 1469.559088] ? do_trap+0xd4/0xf0
[ 1469.559093] ? do_error_trap+0x71/0xb0
[ 1469.559097] ? mpage_submit_folio+0xb6/0xc0
[ 1469.559103] ? exc_invalid_op+0x52/0x80
[ 1469.559108] ? mpage_submit_folio+0xb6/0xc0
[ 1469.559114] ? asm_exc_invalid_op+0x1b/0x20
[ 1469.559121] ? mpage_submit_folio+0xb6/0xc0
[ 1469.559127] ? mpage_map_and_submit_buffers+0x1ae/0x340
[ 1469.559137] ? wake_up_q+0x50/0xa0
[ 1469.559144] ext4_do_writepages+0x770/0xe10
[ 1469.559151] ext4_writepages+0xb5/0x190
[ 1469.559156] do_writepages+0xcd/0x1f0
[ 1469.559160] ? wakeup_preempt+0x68/0x80
[ 1469.559166] ? ttwu_do_activate+0x74/0x250
[ 1469.559171] __writeback_single_inode+0x44/0x370
[ 1469.559177] writeback_sb_inodes+0x211/0x510
[ 1469.559184] __writeback_inodes_wb+0x54/0x100
[ 1469.559190] ? queue_io+0x82/0x120
[ 1469.559195] wb_writeback+0x2df/0x350
[ 1469.559200] wb_workfn+0x368/0x4d0
[ 1469.559205] ? __schedule+0x433/0x1500
[ 1469.559211] ? add_timer+0x20/0x40
[ 1469.559217] process_one_work+0x173/0x350
[ 1469.559224] worker_thread+0x306/0x440
[ 1469.559231] ? __pfx_worker_thread+0x10/0x10
[ 1469.559236] kthread+0xef/0x120
[ 1469.559240] ? __pfx_kthread+0x10/0x10
[ 1469.559247] ret_from_fork+0x44/0x70
[ 1469.559254] ? __pfx_kthread+0x10/0x10
[ 1469.559259] ret_from_fork_asm+0x1b/0x30
[ 1469.559265] </TASK>
[ 1469.559268] Modules linked in: veth ebtable_filter ebtables ip_set ip6table_raw iptable_raw ip6table_filter ip6_tables iptable_filter cmac nls_utf8 cifs cifs_arc4 nls_ucs2_utils rdma_cm iw_cm ib_cm ib_core cifs_md4 netfs scsi_transport_iscsi nf_tables nvme_fabrics nvme_keyring bonding tls qrtr softdog nfnetlink_log nfnetlink binfmt_misc rc_hauppauge em28xx_rc si2157 si2168 i2c_mux em28xx_dvb dvb_core joydev intel_rapl_msr intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common em28xx tveeprom videodev mc input_leds cdc_acm intel_tcc_cooling snd_sof_pci_intel_cnl snd_sof_intel_hda_common soundwire_intel snd_sof_intel_hda_mlink soundwire_cadence snd_sof_intel_hda snd_hda_codec_hdmi snd_sof_pci snd_sof_xtensa_dsp snd_sof x86_pkg_temp_thermal intel_powerclamp snd_sof_utils snd_soc_hdac_hda snd_hda_ext_core snd_soc_acpi_intel_match kvm_intel snd_soc_acpi soundwire_generic_allocation snd_hda_codec_realtek soundwire_bus i915 snd_hda_codec_generic snd_soc_core kvm snd_compress ac97_bus snd_pcm_dmaengine
[ 1469.559304] snd_hda_intel crct10dif_pclmul snd_intel_dspcfg polyval_clmulni snd_intel_sdw_acpi polyval_generic ghash_clmulni_intel sha256_ssse3 snd_hda_codec mei_hdcp sha1_ssse3 mei_pxp aesni_intel drm_buddy snd_hda_core ttm crypto_simd snd_hwdep cryptd drm_display_helper snd_pcm rapl cmdlinepart intel_pmc_core snd_timer cec ucsi_ccg intel_vsec spi_nor mei_me snd rc_core typec_ucsi pmt_telemetry intel_cstate intel_wmi_thunderbolt mtd wmi_bmof typec pcspkr ee1004 soundcore i2c_algo_bit intel_pch_thermal mei pmt_class acpi_tad acpi_pad mac_hid zfs(PO) spl(O) vhost_net vhost vhost_iotlb tap nfsd nct6775 nct6775_core hwmon_vid auth_rpcgss coretemp nfs_acl vfio_pci lockd vfio_pci_core grace irqbypass vfio_iommu_type1 vfio iommufd sunrpc efi_pstore dmi_sysfs ip_tables x_tables autofs4 btrfs blake2b_generic xor raid6_pq hid_logitech_hidpp hid_logitech_dj hid_generic usbmouse usbkbd usbhid hid dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio libcrc32c ixgbe nvme xhci_pci xhci_pci_renesas crc32_pclmul nvme_core e1000e
[ 1469.559738] xfrm_algo ahci spi_intel_pci xhci_hcd i2c_i801 dca spi_intel nvme_auth i2c_smbus i2c_nvidia_gpu mdio libahci i2c_ccgx_ucsi video wmi
[ 1469.561651] ---[ end trace 0000000000000000 ]---
[ 1469.738915] RIP: 0010:mpage_submit_folio+0xb6/0xc0
[ 1469.739405] Code: 48 89 de e8 6c 8c 02 00 85 c0 75 09 49 8b 54 24 08 48 83 2a 01 5b 41 5c 41 5d 41 5e 5d 31 d2 31 c9 31 f6 31 ff c3 cc cc cc cc <0f> 0b 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90
[ 1469.739871] RSP: 0018:ffffa41d80a978b0 EFLAGS: 00010206
[ 1469.740260] RAX: 0000000000d1bfc3 RBX: ffff8ae00c239e38 RCX: 0000000000001000
[ 1469.740620] RDX: ffff8ae00c239e39 RSI: fffff2a9416e4d80 RDI: ffffa41d80a97a90
[ 1469.740985] RBP: ffffa41d80a979c0 R08: ffff8ae00c239e38 R09: 0000000000000000
[ 1469.741347] R10: 0000000000001000 R11: 000000000000000c R12: 0000000000d1bfc4
[ 1469.741709] R13: 0000000000df9fc4 R14: fffff2a9416e4d80 R15: ffffa41d80a97a90
[ 1469.742062] FS: 0000000000000000(0000) GS:ffff8ae03df80000(0000) knlGS:0000000000000000
[ 1469.742476] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1469.742847] CR2: 0000006bbf11725a CR3: 00000001959aa001 CR4: 00000000003726f0
[ 1469.743238] ------------[ cut here ]------------
[ 1469.743583] WARNING: CPU: 11 PID: 285 at kernel/exit.c:821 do_exit+0x8e5/0xaf0
[ 1469.743929] Modules linked in: veth ebtable_filter ebtables ip_set ip6table_raw iptable_raw ip6table_filter ip6_tables iptable_filter cmac nls_utf8 cifs cifs_arc4 nls_ucs2_utils rdma_cm iw_cm ib_cm ib_core cifs_md4 netfs scsi_transport_iscsi nf_tables nvme_fabrics nvme_keyring bonding tls qrtr softdog nfnetlink_log nfnetlink binfmt_misc rc_hauppauge em28xx_rc si2157 si2168 i2c_mux em28xx_dvb dvb_core joydev intel_rapl_msr intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common em28xx tveeprom videodev mc input_leds cdc_acm intel_tcc_cooling snd_sof_pci_intel_cnl snd_sof_intel_hda_common soundwire_intel snd_sof_intel_hda_mlink soundwire_cadence snd_sof_intel_hda snd_hda_codec_hdmi snd_sof_pci snd_sof_xtensa_dsp snd_sof x86_pkg_temp_thermal intel_powerclamp snd_sof_utils snd_soc_hdac_hda snd_hda_ext_core snd_soc_acpi_intel_match kvm_intel snd_soc_acpi soundwire_generic_allocation snd_hda_codec_realtek soundwire_bus i915 snd_hda_codec_generic snd_soc_core kvm snd_compress ac97_bus snd_pcm_dmaengine
[ 1469.743965] snd_hda_intel crct10dif_pclmul snd_intel_dspcfg polyval_clmulni snd_intel_sdw_acpi polyval_generic ghash_clmulni_intel sha256_ssse3 snd_hda_codec mei_hdcp sha1_ssse3 mei_pxp aesni_intel drm_buddy snd_hda_core ttm crypto_simd snd_hwdep cryptd drm_display_helper snd_pcm rapl cmdlinepart intel_pmc_core snd_timer cec ucsi_ccg intel_vsec spi_nor mei_me snd rc_core typec_ucsi pmt_telemetry intel_cstate intel_wmi_thunderbolt mtd wmi_bmof typec pcspkr ee1004 soundcore i2c_algo_bit intel_pch_thermal mei pmt_class acpi_tad acpi_pad mac_hid zfs(PO) spl(O) vhost_net vhost vhost_iotlb tap nfsd nct6775 nct6775_core hwmon_vid auth_rpcgss coretemp nfs_acl vfio_pci lockd vfio_pci_core grace irqbypass vfio_iommu_type1 vfio iommufd sunrpc efi_pstore dmi_sysfs ip_tables x_tables autofs4 btrfs blake2b_generic xor raid6_pq hid_logitech_hidpp hid_logitech_dj hid_generic usbmouse usbkbd usbhid hid dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio libcrc32c ixgbe nvme xhci_pci xhci_pci_renesas crc32_pclmul nvme_core e1000e
[ 1469.745857] xfrm_algo ahci spi_intel_pci xhci_hcd i2c_i801 dca spi_intel nvme_auth i2c_smbus i2c_nvidia_gpu mdio libahci i2c_ccgx_ucsi video wmi
[ 1469.748425] CPU: 11 PID: 285 Comm: kworker/u32:4 Tainted: P D O 6.11.11-1-pve #1
[ 1469.748885] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./Z390 Phantom Gaming 4S, BIOS P1.40 12/05/2019
[ 1469.749334] Workqueue: writeback wb_workfn (flush-8:16)
[ 1469.749810] RIP: 0010:do_exit+0x8e5/0xaf0
[ 1469.750271] Code: e9 3a f8 ff ff 48 8b bb f8 09 00 00 31 f6 e8 92 e0 ff ff e9 ee fd ff ff 4c 89 ee bf 05 06 00 00 e8 60 3d 01 00 e9 66 f8 ff ff <0f> 0b e9 94 f7 ff ff 0f 0b e9 4d f7 ff ff 48 89 df e8 b5 31 14 00
[ 1469.750762] RSP: 0018:ffffa41d80a97ec8 EFLAGS: 00010282
[ 1469.751235] RAX: 0000000000000000 RBX: ffff8ad0d6dfd280 RCX: 0000000000000000
[ 1469.751727] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[ 1469.752214] RBP: ffffa41d80a97f20 R08: 0000000000000000 R09: 0000000000000000
[ 1469.752702] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8ad0c2961f80
[ 1469.753187] R13: 000000000000000b R14: ffff8ad0d6e04a40 R15: ffff8ad0d6dfd280
[ 1469.753683] FS: 0000000000000000(0000) GS:ffff8ae03df80000(0000) knlGS:0000000000000000
[ 1469.754165] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1469.754663] CR2: 0000006bbf11725a CR3: 00000001959aa001 CR4: 00000000003726f0
[ 1469.755147] Call Trace:
[ 1469.755643] <TASK>
[ 1469.756122] ? show_regs+0x6d/0x80
[ 1469.756622] ? __warn+0x89/0x160
[ 1469.757115] ? do_exit+0x8e5/0xaf0
[ 1469.757617] ? report_bug+0x17e/0x1b0
[ 1469.758108] ? handle_bug+0x46/0x90
[ 1469.758604] ? exc_invalid_op+0x18/0x80
[ 1469.759100] ? asm_exc_invalid_op+0x1b/0x20
[ 1469.759586] ? do_exit+0x8e5/0xaf0
[ 1469.760082] ? do_exit+0x72/0xaf0
[ 1469.760574] ? __pfx_worker_thread+0x10/0x10
[ 1469.761073] ? kthread+0xef/0x120
[ 1469.761564] make_task_dead+0x83/0x170
[ 1469.762060] rewind_stack_and_make_dead+0x17/0x20
[ 1469.762558] </TASK>
[ 1469.763087] ---[ end trace 0000000000000000 ]---
 
Last edited:
when the job on the screen in the GUI gave an error I wrote this:
INFO: aborting backup job
ERROR: VM 105 qmp command 'backup-cancel' failed - unable to connect to VM 105 qmp socket - timeout after 5988 retries
INFO: resuming VM again
 
what is the backup target? does it work if you use the 6.8 based kernel?
 
giving the command "sudo smartctl -a /dev/nvme0n1" on the OS disk gives me this Output and I know that the disk has some errors

smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.11.11-1-pve] (local build)
Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Number: Sabrent
Serial Number: 17A807051CBE02057204
Firmware Version: RKT303.3
PCI Vendor/Subsystem ID: 0x1987
IEEE OUI Identifier: 0x6479a7
Total NVM Capacity: 512,110,190,592 [512 GB]
Unallocated NVM Capacity: 0
Controller ID: 1
NVMe Version: 1.3
Number of Namespaces: 1
Namespace 1 Size/Capacity: 512,110,190,592 [512 GB]
Namespace 1 Formatted LBA Size: 512
Namespace 1 IEEE EUI-64: 6479a7 366143df74
Local Time is: Tue Feb 11 11:26:29 2025 CET
Firmware Updates (0x12): 1 Slot, no Reset required
Optional Admin Commands (0x0017): Security Format Frmw_DL Self_Test
Optional NVM Commands (0x005d): Comp DS_Mngmt Wr_Zero Sav/Sel_Feat Timestmp
Log Page Attributes (0x08): Telmtry_Lg
Maximum Data Transfer Size: 512 Pages
Warning Comp. Temp. Threshold: 75 Celsius
Critical Comp. Temp. Threshold: 80 Celsius

Supported Power States
St Op Max Active Idle RL RT WL WT Ent_Lat Ex_Lat
0 + 6.80W - - 0 0 0 0 0 0
1 + 5.74W - - 1 1 1 1 0 0
2 + 5.21W - - 2 2 2 2 0 0
3 - 0.0490W - - 3 3 3 3 2000 2000
4 - 0.0018W - - 4 4 4 4 25000 25000

Supported LBA Sizes (NSID 0x1)
Id Fmt Data Metadt Rel_Perf
0 + 512 0 2
1 - 4096 0 1

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02)
Critical Warning: 0x00
Temperature: 27 Celsius
Available Spare: 100%
Available Spare Threshold: 5%
Percentage Used: 82%
Data Units Read: 46,020,369 [23.5 TB]
Data Units Written: 56,221,650 [28.7 TB]
Host Read Commands: 2,548,334,510
Host Write Commands: 1,876,313,417
Controller Busy Time: 5,391
Power Cycles: 1,423
Power On Hours: 38,047
Unsafe Shutdowns: 1,353
Media and Data Integrity Errors: 0
Error Information Log Entries: 598
Warning Comp. Temperature Time: 0
Critical Comp. Temperature Time: 0

Error Information (NVMe Log 0x01, 16 of 63 entries)
Num ErrCount SQId CmdId Status PELoc LBA NSID VS
0 598 0 0x000c 0x4004 0x028 0 0 -
 
then the most likely cause is that some on-disk file system structure is corrupt cause of the failing disk..
 
So if I replace the disk do I have to reinstall ProxMox or can I clone the disk? the problem is that I can't make backups to restore the VMs and I have backups that are too old (2 weeks) to restore
 
Last edited:
if it is only your backups that are stored on that disk, you can just create backups on a new, working disk.. if your VM disks *and* backups are stored on the broken disk, then you can only try to recover what is still readable (this is the reason why storing data and backups on the same device is not a good idea).
 
The disk indicated that the problem is that of the OS, but for 2 weeks I didn't understand why it gave errors when making backups and the backups were on a separate disk and then copied to the NAS, but they are too old for what the VMs do.
The thing I don't understand is why ProxMox works great without being unstable, but when it's time to make backups the job freezes and to definitively stop the backup you have to restart everything.
 
could you give more details about your storage, guest and backup setup? I am still not sure which filesystem/partition is triggering the issue ;) I would try backing up to a different target (the stack trace above is for writing, not reading, so to me it would seem to be the backup target disk that is causing problems, which would also align with only backup jobs triggering it, and regular guest operations not).
 
could you give more details about your storage, guest and backup setup? I am still not sure which filesystem/partition is triggering the issue ;) I would try backing up to a different target (the stack trace above is for writing, not reading, so to me it would seem to be the backup target disk that is causing problems, which would also align with only backup jobs triggering it, and regular guest operations not).
the problem triggers on any disk I backup, I also tried on my NAS connected via Samba, but nothing, as if the job has problems and out of nowhere the job freezes.
 
okay. please clearly describe your setup:

- storage.cfg
- lsblk/mount
- vm config
- backup settings
 
okay - and a backup of which VM is triggering the issue?
 
there is no VM 106 in your screenshot or list...