System freezes during backup

tehNiemer

New Member
Feb 26, 2024
10
0
1
I'm having regular, but not predictable, issues with my whole system locking up during vzdump backups. Sometimes I'm able to access the host via the GUI, and all the VMs and containers have question marks, sometimes I can't access it at all, same goes for ssh. I've seen a few threads around talking about this issue but none of the suggested solutions in those seem to be working for me.

I've tried lowering the bwlimit and backing up to a CIFS share, NFS share, and PBS (running in a LXC on this machine with ZFS storage passed through as a directory). None of it seems to fix the issue. Any insight would be appreciated.

Here's the backup log from one of the times it froze, I was able to stop it this time and reboot from the GUI.
https://gist.github.com/tehniemer/2b0b090f917aee8336af5e5061a7e9a8

And a chunk of the syslog from a similar, but separate, situation as the above:
https://gist.github.com/tehniemer/b0f33fb3985e1ed49f7f8037966e3bc4

And finally, a chunk of the kernel log from a time I completely lost access to the host and had to do a hard reboot.
https://gist.github.com/tehniemer/8adf4fcb85226b978355e0a628e4d841
 
Hi,
it sounds like there might be an issue with your network card/driver:
Code:
2024-04-11T15:57:21.380776-05:00 phpve01 kernel: [80743.007688] ------------[ cut here ]------------
2024-04-11T15:57:21.535536-05:00 phpve01 kernel: [80743.007693] NETDEV WATCHDOG: eno1 (e1000e): transmit queue 0 timed out 6184 ms
2024-04-11T15:57:21.535546-05:00 phpve01 kernel: [80743.007710] WARNING: CPU: 6 PID: 0 at net/sched/sch_generic.c:525 dev_watchdog+0x260/0x270
2024-04-11T15:57:21.535547-05:00 phpve01 kernel: [80743.007716] Modules linked in: dm_snapshot tcp_diag inet_diag bluetooth ecdh_generic ecc cmac nls_utf8 cifs cifs_arc4 rdma_cm iw_cm ib_cm ib_core cifs_md4 nf_conntrack_netlink xt_nat xt_tcpudp xt_conntrack nft_chain_nat xt_MASQUERADE nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xfrm_user xfrm_algo xt_addrtype nft_compat overlay cfg80211 nfsv3 nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4 nfs lockd grace fscache netfs veth ebtable_filter ebtables ip_set nf_tables scsi_transport_iscsi bonding tls softdog ip6table_filter ip6table_raw ip6_tables sunrpc nfnetlink_log iptable_filter iptable_raw nfnetlink bpfilter binfmt_misc snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio intel_rapl_msr intel_rapl_common intel_tcc_cooling snd_sof_pci_intel_cnl x86_pkg_temp_thermal snd_sof_intel_hda_common intel_powerclamp coretemp soundwire_intel snd_sof_intel_hda_mlink soundwire_cadence kvm_intel snd_sof_intel_hda snd_sof_pci snd_sof_xtensa_dsp snd_sof snd_sof_utils snd_soc_hdac_hda kvm snd_hda_ext_core
2024-04-11T15:57:22.110715-05:00 phpve01 kernel: [80743.007760]  snd_soc_acpi_intel_match snd_soc_acpi soundwire_generic_allocation soundwire_bus crct10dif_pclmul polyval_clmulni snd_soc_core polyval_generic ghash_clmulni_intel sha256_ssse3 sha1_ssse3 snd_compress aesni_intel ac97_bus snd_pcm_dmaengine crypto_simd i915 cryptd mei_hdcp mei_pxp snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi rapl drm_buddy snd_hda_codec ttm drm_display_helper snd_hda_core snd_hwdep cec snd_pcm cmdlinepart rc_core snd_timer intel_cstate spi_nor snd drm_kms_helper apex(OE) mei_me wmi_bmof soundcore mtd gasket(OE) cp210x i2c_algo_bit pcspkr ee1004 mei usbserial intel_pch_thermal joydev input_leds acpi_tad acpi_pad mac_hid zfs(PO) spl(O) vhost_net vhost vhost_iotlb tap vfio_pci vfio_pci_core irqbypass vfio_iommu_type1 vfio iommufd drm efi_pstore dmi_sysfs ip_tables x_tables autofs4 btrfs blake2b_generic xor raid6_pq simplefb hid_logitech_hidpp hid_logitech_dj hid_generic usbkbd usbmouse usbhid hid dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio libcrc32c nvme xhci_pci
2024-04-11T15:57:22.110719-05:00 phpve01 kernel: [80743.007802]  xhci_pci_renesas crc32_pclmul i2c_i801 spi_intel_pci e1000e nvme_core xhci_hcd spi_intel i2c_smbus ahci igc bnx2 nvme_common libahci video wmi pinctrl_cannonlake
2024-04-11T15:57:22.110719-05:00 phpve01 kernel: [80743.007811] CPU: 6 PID: 0 Comm: swapper/6 Tainted: P           OE      6.5.13-5-pve #1
2024-04-11T15:57:22.110719-05:00 phpve01 kernel: [80743.007812] Hardware name: Supermicro Super Server/X12SAE, BIOS 2.8 11/01/2023
2024-04-11T15:57:22.110720-05:00 phpve01 kernel: [80743.007813] RIP: 0010:dev_watchdog+0x260/0x270
2024-04-11T15:57:22.110720-05:00 phpve01 kernel: [80743.007815] Code: ff ff 48 89 df c6 05 68 ef 77 01 01 e8 19 7d f9 ff 44 8b 45 cc 44 89 f9 48 89 de 48 89 c2 48 c7 c7 98 bc c3 bb e8 b0 74 33 ff <0f> 0b e9 1d ff ff ff 66 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90
2024-04-11T15:57:22.110721-05:00 phpve01 kernel: [80743.007816] RSP: 0000:ffffb53b002c8e40 EFLAGS: 00010246
2024-04-11T15:57:22.110721-05:00 phpve01 kernel: [80743.007818] RAX: 0000000000000000 RBX: ffff8cff92614000 RCX: 0000000000000000
2024-04-11T15:57:22.110721-05:00 phpve01 kernel: [80743.007819] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
2024-04-11T15:57:22.110721-05:00 phpve01 kernel: [80743.007820] RBP: ffffb53b002c8e78 R08: 0000000000000000 R09: 0000000000000000
2024-04-11T15:57:22.110722-05:00 phpve01 kernel: [80743.007821] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8cff926144c8
2024-04-11T15:57:22.110722-05:00 phpve01 kernel: [80743.007821] R13: ffff8cff9261441c R14: 0000000000000000 R15: 0000000000000000
2024-04-11T15:57:22.110725-05:00 phpve01 kernel: [80743.007822] FS:  0000000000000000(0000) GS:ffff8d0aac380000(0000) knlGS:0000000000000000
2024-04-11T15:57:22.110726-05:00 phpve01 kernel: [80743.007823] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
2024-04-11T15:57:22.110726-05:00 phpve01 kernel: [80743.007824] CR2: 00000000005a4ba7 CR3: 000000099057e006 CR4: 00000000003726e0
2024-04-11T15:57:22.110727-05:00 phpve01 kernel: [80743.007825] Call Trace:
2024-04-11T15:57:22.110727-05:00 phpve01 kernel: [80743.007826]  <IRQ>
2024-04-11T15:57:22.110727-05:00 phpve01 kernel: [80743.007829]  ? show_regs+0x6d/0x80
2024-04-11T15:57:22.177605-05:00 phpve01 kernel: [80743.007832]  ? __warn+0x89/0x160
2024-04-11T15:57:22.177613-05:00 phpve01 kernel: [80743.007835]  ? dev_watchdog+0x260/0x270
2024-04-11T15:57:22.177614-05:00 phpve01 kernel: [80743.007837]  ? report_bug+0x17e/0x1b0
2024-04-11T15:57:22.177614-05:00 phpve01 kernel: [80743.007839]  ? handle_bug+0x46/0x90
2024-04-11T15:57:22.177615-05:00 phpve01 kernel: [80743.007842]  ? exc_invalid_op+0x18/0x80
2024-04-11T15:57:22.177615-05:00 phpve01 kernel: [80743.007843]  ? asm_exc_invalid_op+0x1b/0x20
2024-04-11T15:57:22.177617-05:00 phpve01 kernel: [80743.007846]  ? dev_watchdog+0x260/0x270
2024-04-11T15:57:22.177618-05:00 phpve01 kernel: [80743.007848]  ? __pfx_dev_watchdog+0x10/0x10
2024-04-11T15:57:22.177618-05:00 phpve01 kernel: [80743.007850]  call_timer_fn+0x29/0x160
2024-04-11T15:57:22.177618-05:00 phpve01 kernel: [80743.007852]  ? __pfx_dev_watchdog+0x10/0x10
2024-04-11T15:57:22.177619-05:00 phpve01 kernel: [80743.007854]  __run_timers+0x259/0x310
2024-04-11T15:57:22.177619-05:00 phpve01 kernel: [80743.007856]  run_timer_softirq+0x1d/0x40
2024-04-11T15:57:22.177619-05:00 phpve01 kernel: [80743.007858]  __do_softirq+0xd1/0x303
2024-04-11T15:57:22.177619-05:00 phpve01 kernel: [80743.007860]  __irq_exit_rcu+0x75/0xa0
2024-04-11T15:57:22.177620-05:00 phpve01 kernel: [80743.007862]  irq_exit_rcu+0xe/0x20
2024-04-11T15:57:22.177620-05:00 phpve01 kernel: [80743.007863]  sysvec_apic_timer_interrupt+0x92/0xd0
2024-04-11T15:57:22.249784-05:00 phpve01 kernel: [80743.007864]  </IRQ>
2024-04-11T15:57:22.249794-05:00 phpve01 kernel: [80743.007865]  <TASK>
2024-04-11T15:57:22.249795-05:00 phpve01 kernel: [80743.007865]  asm_sysvec_apic_timer_interrupt+0x1b/0x20
2024-04-11T15:57:22.249796-05:00 phpve01 kernel: [80743.007867] RIP: 0010:cpuidle_enter_state+0xce/0x470
2024-04-11T15:57:22.249796-05:00 phpve01 kernel: [80743.007869] Code: c5 0f ff e8 64 f6 ff ff 8b 53 04 49 89 c6 0f 1f 44 00 00 31 ff e8 c2 c1 0e ff 80 7d d7 00 0f 85 e7 01 00 00 fb 0f 1f 44 00 00 <45> 85 ff 0f 88 83 01 00 00 49 63 d7 4c 89 f1 48 8d 04 52 48 8d 04
2024-04-11T15:57:22.249797-05:00 phpve01 kernel: [80743.007870] RSP: 0000:ffffb53b00163e50 EFLAGS: 00000246
2024-04-11T15:57:22.249797-05:00 phpve01 kernel: [80743.007871] RAX: 0000000000000000 RBX: ffffd53affb80168 RCX: 0000000000000000
2024-04-11T15:57:22.268087-05:00 phpve01 kernel: [80743.007872] RDX: 0000000000000006 RSI: 0000000000000000 RDI: 0000000000000000
2024-04-11T15:57:22.268097-05:00 phpve01 kernel: [80743.007872] RBP: ffffb53b00163e88 R08: 0000000000000000 R09: 0000000000000000
2024-04-11T15:57:22.268098-05:00 phpve01 kernel: [80743.007873] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001
2024-04-11T15:57:22.268098-05:00 phpve01 kernel: [80743.007873] R13: ffffffffbc66a2e0 R14: 0000496f724a88dd R15: 0000000000000001
2024-04-11T15:57:22.268099-05:00 phpve01 kernel: [80743.007875]  cpuidle_enter+0x2e/0x50
2024-04-11T15:57:22.268099-05:00 phpve01 kernel: [80743.007878]  call_cpuidle+0x23/0x60
2024-04-11T15:57:22.268101-05:00 phpve01 kernel: [80743.007880]  do_idle+0x202/0x260
2024-04-11T15:57:22.268101-05:00 phpve01 kernel: [80743.007882]  cpu_startup_entry+0x2a/0x30
2024-04-11T15:57:22.268102-05:00 phpve01 kernel: [80743.007883]  start_secondary+0x119/0x140
2024-04-11T15:57:22.268102-05:00 phpve01 kernel: [80743.007885]  secondary_startup_64_no_verify+0x190/0x19b
2024-04-11T15:57:22.268102-05:00 phpve01 kernel: [80743.007888]  </TASK>
2024-04-11T15:57:22.268103-05:00 phpve01 kernel: [80743.007888] ---[ end trace 0000000000000000 ]---
2024-04-11T15:57:22.268103-05:00 phpve01 kernel: [80743.007902] e1000e 0000:00:1f.6 eno1: Reset adapter unexpectedly
2024-04-11T15:57:22.268111-05:00 phpve01 kernel: [80743.097367] vmbr0: port 1(eno1) entered disabled state
2024-04-11T15:57:24.978111-05:00 phpve01 kernel: [80746.846180] e1000e 0000:00:1f.6 eno1: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None
2024-04-11T15:57:24.978121-05:00 phpve01 kernel: [80746.846262] vmbr0: port 1(eno1) entered blocking state
2024-04-11T15:57:24.978121-05:00 phpve01 kernel: [80746.846265] vmbr0: port 1(eno1) entered forwarding state

Do you have the latest BIOS updates installed? The 6.8 opt-in kernel might also be worth giving a shot.
 
Interesting, I do have two ethernet ports, one 1G and one 2.5G. I'm currently using the 2.5G, I'll switch to the 1G and see if that makes any difference.

Looks like gasket isn't quite ready on 6.8, which I need for a TPU accelerator, so I'll have to wait on the kernel.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!