Proxmox disconnects from network daily rebooting necessary for a couple of hours

rene415

Member
Jun 29, 2021
3
0
6
29
Hello,

I am having some issues with my server, It happened to me a while ago with another build that I had but this time is really haven't done what I did to the last one.

The problem is that my server keeps disconnecting from the network and I notice that the blinking HDD light on my case goes out, when i try to access my server it keeps disconnecting. I usually have to restart the server but I don't know why. It usually happens late at night when I go to sleep or sometimes after just an hour of booting back up. I know that is is bad to restart the server from time to time like that but I have no way of accessing the server.

One thing that I noticed is that when I restart the machine i check for updates and there is a new update. I update it and it last for a few days until I have to restart and then install new updates.

I am not sure what it could be, I check syslogs and I do see serval things that pop up on the last minutes of the connection


Such as:
"entered blocking state"
"kernel: RIP: 0010:lock_page_memcg+0x23/0xa0"
"kernel: RIP: 0033:0x7f9226d95699"

I just don't know how to deal with it.


Any suggestions or ideas would be greatly appreciated.
 

Attachments

  • Proxmox.png
    Proxmox.png
    76.8 KB · Views: 10
Such as:
"entered blocking state"
"kernel: RIP: 0010:lock_page_memcg+0x23/0xa0"
"kernel: RIP: 0033:0x7f9226d95699"
please post the complete trace and a few lines around it - then we might have a better idea where the issue is
(please post as text inside code tags)
 
Here are the logs that I keep getting, Not sure where it starts. I kept seeing kernel issues so i think it starts from there



Code:
May 28 08:51:33 prometheus kernel: BUG: Bad page state in process swapper/2  pfn:18aa29
May 28 08:51:33 prometheus kernel: page:00000000492de447 refcount:2 mapcount:1 mapping:00000000e9f6a074 index:0x9 pfn:0x18aa29
May 28 08:51:33 prometheus kernel: memcg:ffff98eca0ece000
May 28 08:51:33 prometheus kernel: aops:ext4_da_aops ino:6090b dentry name:"netstandard.dll"
May 28 08:51:33 prometheus kernel: flags: 0x17ffffc0020016(referenced|uptodate|lru|mappedtodisk|node=0|zone=2|lastcpupid=0x1fffff)
May 28 08:51:33 prometheus kernel: raw: 0017ffffc0020016 dead000000000100 dead000000000122 ffff98ecc0926308
May 28 08:51:33 prometheus kernel: raw: 0000000000000009 0000000000000000 0000000200000000 ffff98eca0ece000
May 28 08:51:33 prometheus kernel: page dumped because: page still charged to cgroup
May 28 08:51:33 prometheus kernel: Modules linked in: tcp_diag inet_diag nft_limit xt_LOG nf_log_syslog xt_limit xt_comment xt_tcpudp nft_chain_nat xt_MASQUERADE nf_nat xt_state xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nft_compat nft_counter binfmt_misc veth nls_utf8 cifs cifs_arc4 cifs_md4 fscache netfs nf_tables bonding tls softdog nfnetlink_log nfnetlink snd_hda_codec_hdmi intel_rapl_msr intel_rapl_common intel_tcc_cooling x86_pkg_temp_thermal snd_hda_codec_realtek intel_powerclamp snd_hda_codec_generic coretemp ledtrig_audio mei_hdcp kvm_intel kvm irqbypass crct10dif_pclmul ghash_clmulni_intel aesni_intel crypto_simd cryptd snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi rapl intel_cstate i915 pcspkr snd_hda_codec intel_wmi_thunderbolt efi_pstore snd_hda_core snd_hwdep ttm snd_pcm mxm_wmi snd_timer drm_kms_helper snd soundcore ee1004 cec rc_core i2c_algo_bit mei_me fb_sys_fops syscopyarea sysfillrect sysimgblt mei intel_pch_thermal mac_hid zfs(PO) acpi_pad zunicode(PO) zzstd(O) zlua(O)
May 28 08:51:33 prometheus kernel:  zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) vhost_net vhost vhost_iotlb tap ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi drm sunrpc ip_tables x_tables autofs4 btrfs blake2b_generic xor zstd_compress ses enclosure scsi_transport_sas raid6_pq libcrc32c simplefb uas usb_storage crc32_pclmul i2c_i801 i2c_smbus xhci_pci ahci alx xhci_pci_renesas mdio xhci_hcd libahci wmi video
May 28 08:51:33 prometheus kernel: CPU: 2 PID: 0 Comm: swapper/2 Tainted: P           O      5.15.35-1-pve #1
May 28 08:51:33 prometheus kernel: Hardware name: MSI MS-7977/Z170A-G45 GAMING (MS-7977), BIOS 2.D0 07/02/2018
May 28 08:51:33 prometheus kernel: Call Trace:
May 28 08:51:33 prometheus kernel:  <IRQ>
May 28 08:51:33 prometheus kernel:  dump_stack_lvl+0x4a/0x5f
May 28 08:51:33 prometheus kernel:  dump_stack+0x10/0x12
May 28 08:51:33 prometheus kernel:  bad_page.cold+0x63/0x94
May 28 08:51:33 prometheus kernel:  check_free_page_bad+0x66/0x70
May 28 08:51:33 prometheus kernel:  free_pcppages_bulk+0x1c3/0x390
May 28 08:51:33 prometheus kernel:  free_unref_page_commit.constprop.0+0x12b/0x170
May 28 08:51:33 prometheus kernel:  free_unref_page+0xdf/0x180
May 28 08:51:33 prometheus kernel:  __put_page+0x70/0xd0
May 28 08:51:33 prometheus kernel:  skb_release_data+0x109/0x170
May 28 08:51:33 prometheus kernel:  consume_skb+0x3b/0xb0
May 28 08:51:33 prometheus kernel:  validate_xmit_skb+0x1ea/0x360
May 28 08:51:33 prometheus kernel:  validate_xmit_skb_list+0x4d/0x70
May 28 08:51:33 prometheus kernel:  sch_direct_xmit+0x145/0x390
May 28 08:51:33 prometheus kernel:  __qdisc_run+0x15d/0x5b0
May 28 08:51:33 prometheus kernel:  net_tx_action+0x11a/0x290
May 28 08:51:33 prometheus kernel:  __do_softirq+0xd9/0x2e6
May 28 08:51:33 prometheus kernel:  irq_exit_rcu+0x8c/0xb0
May 28 08:51:33 prometheus kernel:  common_interrupt+0x8a/0xa0
May 28 08:51:33 prometheus kernel:  </IRQ>
May 28 08:51:33 prometheus kernel:  <TASK>
May 28 08:51:33 prometheus kernel:  asm_common_interrupt+0x1e/0x40
May 28 08:51:33 prometheus kernel: RIP: 0010:cpuidle_enter_state+0xd9/0x620
May 28 08:51:33 prometheus kernel: Code: 3d 24 54 83 75 e8 d7 8c 71 ff 49 89 c7 0f 1f 44 00 00 31 ff e8 38 99 71 ff 80 7d d0 00 0f 85 5a 01 00 00 fb 66 0f 1f 44 00 00 <45> 85 f6 0f 88 66 01 00 00 4d 63 ee 49 83 fd 09 0f 87 e1 03 00 00
May 28 08:51:33 prometheus kernel: RSP: 0018:ffffbf4f800e7e38 EFLAGS: 00000246
May 28 08:51:33 prometheus kernel: RAX: ffff98f196d30b40 RBX: ffffdf4f7fd00000 RCX: 0000000000000000
May 28 08:51:33 prometheus kernel: RDX: 0000000000000315 RSI: 00000000248799d1 RDI: 0000000000000000
May 28 08:51:33 prometheus kernel: RBP: ffffbf4f800e7e88 R08: 00000624c5190d79 R09: 0000000000030d40
May 28 08:51:33 prometheus kernel: R10: 0000000000000007 R11: 071c71c71c71c71c R12: ffffffff8bcd3a80
May 28 08:51:33 prometheus kernel: R13: 0000000000000003 R14: 0000000000000003 R15: 00000624c5190d79
May 28 08:51:33 prometheus kernel:  ? cpuidle_enter_state+0xc8/0x620
May 28 08:51:33 prometheus kernel:  cpuidle_enter+0x2e/0x40
May 28 08:51:33 prometheus kernel:  do_idle+0x209/0x2b0
May 28 08:51:33 prometheus kernel:  cpu_startup_entry+0x20/0x30
May 28 08:51:33 prometheus kernel:  start_secondary+0x12a/0x180
May 28 08:51:33 prometheus kernel:  secondary_startup_64_no_verify+0xc2/0xcb
May 28 08:51:33 prometheus kernel:  </TASK>
May 28 08:51:33 prometheus kernel: BUG: Bad page state in process swapper/2  pfn:18aa2a
May 28 08:51:33 prometheus kernel: page:00000000d0fb933c refcount:2 mapcount:1 mapping:00000000e9f6a074 index:0xa pfn:0x18aa2a
May 28 08:51:33 prometheus systemd-journald[371]: Missed 402 kernel messages
May 28 08:51:33 prometheus kernel: flags: 0x17ffffc0020016(referenced|uptodate|lru|mappedtodisk|node=0|zone=2|lastcpupid=0x1fffff)
May 28 08:51:33 prometheus kernel: raw: 0017ffffc0020016 dead000000000100 dead000000000122 ffff98ecc0926308
May 28 08:51:33 prometheus kernel: raw: 0000000000000012 0000000000000000 0000000200000000 ffff98eca0ece000
May 28 08:51:33 prometheus kernel: page dumped because: page still charged to cgroup
May 28 08:51:33 prometheus kernel: Modules linked in: tcp_diag inet_diag nft_limit xt_LOG nf_log_syslog xt_limit xt_comment xt_tcpudp nft_chain_nat xt_MASQUERADE nf_nat xt_state xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nft_compat nft_counter binfmt_misc veth nls_utf8 cifs cifs_arc4 cifs_md4 fscache netfs nf_tables bonding tls softdog nfnetlink_log nfnetlink snd_hda_codec_hdmi intel_rapl_msr intel_rapl_common intel_tcc_cooling x86_pkg_temp_thermal snd_hda_codec_realtek intel_powerclamp snd_hda_codec_generic coretemp ledtrig_audio mei_hdcp kvm_intel kvm irqbypass crct10dif_pclmul ghash_clmulni_intel aesni_intel crypto_simd cryptd snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi rapl intel_cstate i915 pcspkr snd_hda_codec intel_wmi_thunderbolt efi_pstore snd_hda_core snd_hwdep ttm snd_pcm mxm_wmi snd_timer drm_kms_helper snd soundcore ee1004 cec rc_core i2c_algo_bit mei_me fb_sys_fops syscopyarea sysfillrect sysimgblt mei intel_pch_thermal mac_hid zfs(PO) acpi_pad zunicode(PO) zzstd(O) zlua(O)
May 28 08:51:33 prometheus kernel:  zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) vhost_net vhost vhost_iotlb tap ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi drm sunrpc ip_tables x_tables autofs4 btrfs blake2b_generic xor zstd_compress ses enclosure scsi_transport_sas raid6_pq libcrc32c simplefb uas usb_storage crc32_pclmul i2c_i801 i2c_smbus xhci_pci ahci alx xhci_pci_renesas mdio xhci_hcd libahci wmi video
May 28 08:51:33 prometheus kernel: CPU: 2 PID: 0 Comm: swapper/2 Tainted: P    B      O      5.15.35-1-pve #1
May 28 08:51:33 prometheus kernel: Hardware name: MSI MS-7977/Z170A-G45 GAMING (MS-7977), BIOS 2.D0 07/02/2018
May 28 08:51:33 prometheus kernel: Call Trace:
May 28 08:51:33 prometheus kernel:  <IRQ>
May 28 08:51:33 prometheus kernel:  dump_stack_lvl+0x4a/0x5f
May 28 08:51:33 prometheus kernel:  dump_stack+0x10/0x12
May 28 08:51:33 prometheus kernel:  bad_page.cold+0x63/0x94
May 28 08:51:33 prometheus kernel:  check_free_page_bad+0x66/0x70
May 28 08:51:33 prometheus kernel:  free_pcppages_bulk+0x1c3/0x390
May 28 08:51:33 prometheus kernel:  free_unref_page_commit.constprop.0+0x12b/0x170
May 28 08:51:33 prometheus kernel:  free_unref_page+0xdf/0x180
May 28 08:51:33 prometheus kernel:  __put_page+0x70/0xd0
May 28 08:51:33 prometheus kernel:  skb_release_data+0x109/0x170
May 28 08:51:33 prometheus kernel:  consume_skb+0x3b/0xb0
May 28 08:51:33 prometheus kernel:  validate_xmit_skb+0x1ea/0x360
May 28 08:51:33 prometheus kernel:  validate_xmit_skb_list+0x4d/0x70
May 28 08:51:33 prometheus kernel:  sch_direct_xmit+0x145/0x390
May 28 08:51:33 prometheus kernel:  __qdisc_run+0x15d/0x5b0
May 28 08:51:33 prometheus kernel:  net_tx_action+0x11a/0x290
May 28 08:51:33 prometheus kernel:  __do_softirq+0xd9/0x2e6
May 28 08:51:33 prometheus kernel:  irq_exit_rcu+0x8c/0xb0
May 28 08:51:33 prometheus kernel:  common_interrupt+0x8a/0xa0
May 28 08:51:33 prometheus kernel:  </IRQ>
May 28 08:51:33 prometheus kernel:  <TASK>
May 28 08:51:33 prometheus kernel:  asm_common_interrupt+0x1e/0x40
May 28 08:51:33 prometheus kernel: RIP: 0010:cpuidle_enter_state+0xd9/0x620
May 28 08:51:33 prometheus kernel: Code: 3d 24 54 83 75 e8 d7 8c 71 ff 49 89 c7 0f 1f 44 00 00 31 ff e8 38 99 71 ff 80 7d d0 00 0f 85 5a 01 00 00 fb 66 0f 1f 44 00 00 <45> 85 f6 0f 88 66 01 00 00 4d 63 ee 49 83 fd 09 0f 87 e1 03 00 00
May 28 08:51:33 prometheus kernel: RSP: 0018:ffffbf4f800e7e38 EFLAGS: 00000246
May 28 08:51:33 prometheus kernel: RAX: ffff98f196d30b40 RBX: ffffdf4f7fd00000 RCX: 0000000000000000
May 28 08:51:33 prometheus kernel: RDX: 0000000000000315 RSI: 00000000248799d1 RDI: 0000000000000000
May 28 08:51:33 prometheus kernel: RBP: ffffbf4f800e7e88 R08: 00000624c5190d79 R09: 0000000000030d40
May 28 08:51:33 prometheus kernel: R10: 0000000000000007 R11: 071c71c71c71c71c R12: ffffffff8bcd3a80
May 28 08:51:33 prometheus kernel: R13: 0000000000000003 R14: 0000000000000003 R15: 00000624c5190d79
May 28 08:51:33 prometheus kernel:  ? cpuidle_enter_state+0xc8/0x620
May 28 08:51:33 prometheus kernel:  cpuidle_enter+0x2e/0x40
May 28 08:51:33 prometheus kernel:  do_idle+0x209/0x2b0
May 28 08:51:33 prometheus kernel:  cpu_startup_entry+0x20/0x30
May 28 08:51:33 prometheus kernel:  start_secondary+0x12a/0x180
May 28 08:51:33 prometheus kernel:  secondary_startup_64_no_verify+0xc2/0xcb
May 28 08:51:33 prometheus kernel:  </TASK>
May 28 08:51:33 prometheus kernel: BUG: Bad page state in process swapper/2  pfn:18aa33
May 28 08:51:33 prometheus kernel: page:000000005c6b5a23 refcount:2 mapcount:1 mapping:00000000e9f6a074 index:0x13 pfn:0x18aa33
May 28 08:51:33 prometheus kernel: memcg:ffff98eca0ece000
May 28 08:51:33 prometheus kernel: aops:ext4_da_aops ino:6090b dentry name:"netstandard.dll"
May 28 08:51:33 prometheus kernel: flags: 0x17ffffc0020016(referenced|uptodate|lru|mappedtodisk|node=0|zone=2|lastcpupid=0x1fffff)
May 28 08:51:33 prometheus kernel: raw: 0017ffffc0020016 dead000000000100 dead000000000122 ffff98ecc0926308
May 28 08:51:33 prometheus kernel: raw: 0000000000000013 0000000000000000 0000000200000000 ffff98eca0ece000
May 28 08:51:33 prometheus kernel: page dumped because: page still charged to cgroup
May 28 08:51:33 prometheus kernel: Modules linked in: tcp_diag inet_diag nft_limit xt_LOG nf_log_syslog xt_limit xt_comment xt_tcpudp nft_chain_nat xt_MASQUERADE nf_nat xt_state xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nft_compat nft_counter binfmt_misc veth nls_utf8 cifs cifs_arc4 cifs_md4 fscache netfs nf_tables bonding tls softdog nfnetlink_log nfnetlink snd_hda_codec_hdmi intel_rapl_msr intel_rapl_common intel_tcc_cooling x86_pkg_temp_thermal snd_hda_codec_realtek intel_powerclamp snd_hda_codec_generic coretemp ledtrig_audio mei_hdcp kvm_intel kvm irqbypass crct10dif_pclmul ghash_clmulni_intel aesni_intel crypto_simd cryptd snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi rapl intel_cstate i915 pcspkr snd_hda_codec intel_wmi_thunderbolt efi_pstore snd_hda_core snd_hwdep ttm snd_pcm mxm_wmi snd_timer drm_kms_helper snd soundcore ee1004 cec rc_core i2c_algo_bit mei_me fb_sys_fops syscopyarea sysfillrect sysimgblt mei intel_pch_thermal mac_hid zfs(PO) acpi_pad zunicode(PO) zzstd(O) zlua(O)
May 28 08:51:33 prometheus kernel:  zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) vhost_net vhost vhost_iotlb tap ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi drm sunrpc ip_tables x_tables autofs4 btrfs blake2b_generic xor zstd_compress ses enclosure scsi_transport_sas raid6_pq libcrc32c simplefb uas usb_storage crc32_pclmul i2c_i801 i2c_smbus xhci_pci ahci alx xhci_pci_renesas mdio xhci_hcd libahci wmi video
May 28 08:51:33 prometheus kernel: CPU: 2 PID: 0 Comm: swapper/2 Tainted: P    B      O      5.15.35-1-pve #1
May 28 08:51:33 prometheus kernel: Hardware name: MSI MS-7977/Z170A-G45 GAMING (MS-7977), BIOS 2.D0 07/02/2018
May 28 08:51:33 prometheus kernel: Call Trace:
May 28 08:51:33 prometheus kernel:  <IRQ>
 
Hmm - looks unfamiliar ...

* do you have containers (pct/lxc) running on that machine - if yes - do you have any non-default configs enabled for those containers?
* do you have made any modifications to the configuration - compared to a plain PVE system?

else:
* the BIOS is from 2018 - I'd recommend to upgrade it - since these things can be caused by an outdated BIOS (+new kernel)
* last but not least - I'd suggest to run a memtest for an extended period - maybe it's a broken memory stick

I hope this helps!
 
  • Like
Reactions: rene415