Proxmox Host - random freeze(?)

Semmo

New Member
May 27, 2019
26
0
1
33
I still have no crashlog but since i disabled the c-states there is no freeze anymore... it's with an Intel i7-3770.

But there was an kernel update also... hmmm...
 

n1nj4888

Member
Jan 13, 2019
109
2
18
39
I also haven’t had any similar crashes in the last couple of weeks. I haven’t changed anything c-state related in the BIOS but I did move all VMs off the suspect node a couple of days ago (so that it was idle to increase chances of a c-state crash) and still no crash...

Possibly the kernel update has fixed this?
 

n1nj4888

Member
Jan 13, 2019
109
2
18
39
I still have no crashlog but since i disabled the c-states there is no freeze anymore... it's with an Intel i7-3770.

But there was an kernel update also... hmmm...
Have you tried re-enabling the c-state changes to see whether it still crashes?
 

JanN

New Member
Mar 11, 2019
1
0
1
58
I've read several times about c-states higher than 1 are causing freezes/crashes on Linux servers...
BR
Jan
 

n1nj4888

Member
Jan 13, 2019
109
2
18
39
No I didn't because I'm so happy that it works ;) have you tried disabling your c-states?
Nope! I’ve left the c-states as is and haven’t had another crash even though I’ve experimented both with putting the suspect node under load and completely free for prolonged periods of time... still no crash...

I suspect the kernel upgrade fixed it...
 

Semmo

New Member
May 27, 2019
26
0
1
33
The problem is back... after about 3 weeks with no problems. I did the last updates a few days ago and i'm on kernel

"Linux proxmox 4.15.18-18-pve #1 SMP PVE 4.15.18-44 (Wed, 03 Jul 2019 11:19:13 +0200) x86_64 GNU/Linux"

atm.

Does somebody else have the problem again?
 

n1nj4888

Member
Jan 13, 2019
109
2
18
39
I’ve had no problems for the last few weeks and, just before reading this post, I updated to the following kernel.

Linux pve-host1 4.15.18-18-pve #1 SMP PVE 4.15.18-44 (Wed, 03 Jul 2019 11:19:13 +0200) x86_64 GNU/Linux

I’ve got netconsole running in case of any similar kernel panics and will post any results here but my setup has been solid for the last few weeks...
 

Semmo

New Member
May 27, 2019
26
0
1
33
I’ve had no problems for the last few weeks and, just before reading this post, I updated to the following kernel.

Linux pve-host1 4.15.18-18-pve #1 SMP PVE 4.15.18-44 (Wed, 03 Jul 2019 11:19:13 +0200) x86_64 GNU/Linux

I’ve got netconsole running in case of any similar kernel panics and will post any results here but my setup has been solid for the last few weeks...
I switched back do 4.14.18-17-pve just to try it out. I still cannot use netconsole because it's a dedicated server at a hoster and i dont wan't to rent another one with vlan...

If you get an error it would be nice to see the problem.
thx
 

Semmo

New Member
May 27, 2019
26
0
1
33
The older kernel doesn't help :(

Is it possible to run the netconsole over vpn? Or will the vpn fail atm when kernel panic happens?
Or is there any way to encrypt the netconsole traffic?
 

Semmo

New Member
May 27, 2019
26
0
1
33
And this is a never ending story of bugs....
https://github.com/zfsonlinux/zfs/issues/6476

It's not working for me too. I do the crash but get no file in /var/crash -.- Anyone else with kdump + zfs root here? Is it possible to store the crashdump on a smb/cifs share?


EDIT:

I updated to VE 6.0 now and can not install / use kdump-tools anymore:
electing previously unselected package kdump-tools.

Code:
(Reading database ... 91643 files and directories currently installed.)

Preparing to unpack .../kdump-tools_1%3a1.6.5-1_amd64.deb ...

Unpacking kdump-tools (1:1.6.5-1) ...

Setting up kdump-tools (1:1.6.5-1) ...


Creating config file /etc/default/kdump-tools with new version

dpkg: error processing package kdump-tools (--configure):

 installed kdump-tools package post-installation script subprocess returned error exit status 1

Processing triggers for man-db (2.8.5-2) ...

Processing triggers for systemd (241-5) ...

Errors were encountered while processing:

 kdump-tools

E: Sub-process /usr/bin/dpkg returned an error code (1)
I'm to tired after all those hours with none results... maybe someone has this issue too..

thanks in advance.
 
Last edited:

Semmo

New Member
May 27, 2019
26
0
1
33
After some research I found something: https://github.com/zfsonlinux/zfs/issues/6476
You have to add MODULES=most in the "/usr/share/initramfs-tools/conf-hooks.d/zfs" otherwise it wouldn't install. After that i tried it with the sysrq trigger but still got no kdump file in /var/crash.

So I tried to install Proxmox in a VM, one installation with ext4 and one with zfs. I made the same changes (only the MODULES=most wasn't needed with ext4) and yes, with ZFS the kdump doesn't work. It crashes but the dump doesnt happen. With ext4 it boots the crash kernel and dump the files and then reboots properly.. so it seems to be a problem with ZFS.

This still wouldn't help me to find out why my host is crashing but now i know that's a bug with kdump/zfs why i can not track it.. (again)
Since netconsole is not an option for my hosted server (and it seems to not support vlan, so a vlan switch + other hosted server is not an option) I have no way to find a solution.

@n1nj4888 What was you last step when the crashes stopped? Have you ever dumped a crash after this? Are you running the new version?

BTW: It crashes with the VE 6.0 too... :(
 

n1nj4888

Member
Jan 13, 2019
109
2
18
39
@n1nj4888 What was you last step when the crashes stopped? Have you ever dumped a crash after this? Are you running the new version?
I can’t really recall now. I was only getting crashes every so often on 5.4 and after I put netconsole on, I didn’t see any further crashes (I’m not suggesting netconsole affected this), even after I did the kernel upgrade to the last 5.4 version I mentioned above. I do recall doing some BIOS updates around the time so perhaps that could have improved the situation?

I’ve since moved to PVE6 on ZFS boot rather than ext4 and again haven’t seen any further crashes. I haven’t yet implemented netconsole on PVE6 as yet - Indeed, I’d have to do a little more research about how to implement that given the ZFS boot on UEFI now uses systemd-boot instead of grub.
 

RCK

Active Member
Oct 20, 2009
52
0
26
@n1nj4888
Thanks for the detailed description! But unfortunately my host is on a single root server and I have no access to a other machine in the same network.
Hello,

I managed to use netconsole on a remote debian server :)
First, verify that you can write UDP message to your netconsole receiver:
- setup your debian rsyslog as n1nj4888 describe it (port 5555)
- open all firewall between your proxmox host and your debian rsyslog with port UDP 5555
- test communication with the following command on your proxmox
echo "This is my udp data" > /dev/udp/213.182.49.210/22555

Next, add loglevel=7 to your /etc/default/grub
GRUB_CMDLINE_LINUX_DEFAULT="quiet loglevel=7 ...
update-grub

Finally, start netconsole after the system has started
- find your local proxmox ip (192.168.0.99)
ip addr |grep 'inet '
- find your gateway (192.168.0.254)
netstat -rn | grep ^0.0.0.0
- find the mac of the gameway (14:dd:a9:4b:b1:10)
ping -c 1 192.168.0.254 > /dev/null
arp -n 192.168.0.254
- launch the command with your IP, your ETH, and the good GATEWAY MAC
modprobe netconsole netconsole=5555@192.168.0.99/vmbr0,5555@your.debian.ip/14:dd:a9:4b:b1:10

And it's working :)
 
Last edited:

RCK

Active Member
Oct 20, 2009
52
0
26
I updated to VE 6.0 now and can not install / use kdump-tools anymore:
electing previously unselected package kdump-tools.
By reading my previous post, you will be able to install netconsole over internet without VPN on Proxmox 6.0 :)
 

Semmo

New Member
May 27, 2019
26
0
1
33
I finally captured something when my server freezes... but now I don't know what to do with it :/ That's my netconsole output right before the crash:

Code:
[616084.497241] general protection fault: 0000 [#2] SMP PTI
[616084.497243] kvm[11243]: segfault at 2c ip 00007f36cb5eea48 sp 00007fff0e5562c0 error 4 in libglib-2.0.so.0.5800.3[7f36cb5bc000+7e000]
[616084.497252] CPU: 2 PID: 27547 Comm: Server thread Tainted: P      D    O      5.0.21-2-pve #1
[616084.497258] Hardware name: System manufacturer System Product Name/P8H77-M PRO, BIOS 9012 09/18/2018
[616084.497259] Code: 20 00 00 00 00 48 c7 44 24 28 00 00 00 00 c7 44 24 18 01 00 00 00 4c 89 f6 4c 89 e7 e8 c1 d5 ff ff 85 c0 74 4d 48 8b 74 24 08 <8b> 46 2c 89 c2 83 e2 41 83 fa 01 75 df 85 ed 74 06 44 3b 6e 28 7c
[616084.497264] RIP: 0010:sched_clock_cpu+0x0/0xc0
[616084.497268] Code: 72 05 00 48 89 43 08 e8 ee c9 f6 ff 48 89 03 48 03 05 f4 4e 77 01 48 2b 43 08 5b 48 89 05 e0 4e 77 01 5d c3 66 0f 1f 44 00 00 <55> 48 89 e5 41 54 53 0f 1f 44 00 00 e8 bf c9 f6 ff 48 03 05 c8 4e
[616084.497274] RSP: 0018:ffffabfcfca5bb48 EFLAGS: 00010046
[616084.497277] RAX: 0000000000000002 RBX: ffffcbfcbfa9c080 RCX: 0000000000000000
[616084.497280] RDX: 0000000000000000 RSI: 0000000000000002 RDI: 0000000000000002
[616084.497283] RBP: ffffabfcfca5bb68 R08: 0000000000000002 R09: 0000000000022ac0
[616084.497286] R10: ffffcbfcbfa9c248 R11: 0000000000000001 R12: 00000000fffffffb
[616084.497290] R13: 0000000000000000 R14: 0000000000000000 R15: ffff9ba19daea800
[616084.497303] FS:  00007ff2570fb700(0000) GS:ffff9ba23f880000(0000) knlGS:0000000000000000
[616084.497306] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[616084.497309] CR2: 000000000425f2e8 CR3: 00000007f7910005 CR4: 00000000001626e0
[616084.497313] Call Trace:
[616084.497318]  ? record_times+0x1b/0xc0
[616084.497322]  psi_task_change+0xf2/0x220
[616084.497344]  deactivate_task+0xe0/0x120
[616084.497357]  __schedule+0x115/0x870
[616084.497362]  ? hrtimer_start_range_ns+0x1b5/0x2c0
[616084.497366]  schedule+0x2c/0x70
[616084.497379]  futex_wait_queue_me+0xc4/0x120
[616084.497381]  futex_wait+0x15b/0x250
[616084.497384]  ? __hrtimer_init+0xc0/0xc0
[616084.497387]  do_futex+0x3cd/0xc50
[616084.497390]  ? __seccomp_filter+0x73/0x630
[616084.497394]  ? pick_next_task_fair+0x270/0x6e0
[616084.497399]  ? _copy_from_user+0x3e/0x60
[616084.497403]  __x64_sys_futex+0x143/0x180
[616084.497408]  do_syscall_64+0x5a/0x110
[616084.497412]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[616084.497416] RIP: 0033:0x7ff2be1a6ed9
[616084.497419] Code: 89 54 24 68 48 85 c0 0f 88 c4 01 00 00 e8 3f 31 00 00 4c 8d 54 24 60 41 89 c0 31 d2 8b 74 24 4c 4c 89 ef b8 ca 00 00 00 0f 05 <48> 3d 00 f0 ff ff 0f 87 db 02 00 00 44 89 c7 e8 73 31 00 00 48 8b
[616084.497425] RSP: 002b:00007ff2570fa690 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca
[616084.497429] RAX: ffffffffffffffda RBX: 00007ff2b8c681b0 RCX: 00007ff2be1a6ed9
[616084.497432] RDX: 0000000000000000 RSI: 0000000000000080 RDI: 00007ff2b8c681dc
[616084.497436] RBP: 0000000000000000 R08: 0000000000000000 R09: 00007ffc6a2ed080
[616084.497438] R10: 00007ff2570fa6f0 R11: 0000000000000246 R12: 00007ff2b8c68188
[616084.497442] R13: 00007ff2b8c681dc R14: 00007ff2570fa7b0 R15: 00007ff2b8c681d4
[616084.497459] Modules linked in: tcp_diag inet_diag binfmt_misc veth ebtable_filter ebtables ip6t_REJECT nf_reject_ipv6 ip6table_filter ip6_tables xt_mac ipt_REJECT nf_reject_ipv4 xt_set xt_physdev xt_addrtype xt_multiport xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_comment xt_tcpudp xt_mark ip_set_hash_net ip_set arc4 md4 cmac nls_utf8 cifs ccm fscache iptable_filter bpfilter softdog nfnetlink_log nfnetlink snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio i915 intel_rapl x86_pkg_temp_thermal intel_powerclamp kvmgt coretemp vfio_mdev kvm_intel mdev vfio_iommu_type1 vfio kvm drm_kms_helper irqbypass snd_hda_intel drm snd_hda_codec crct10dif_pclmul i2c_algo_bit crc32_pclmul snd_hda_core fb_sys_fops syscopyarea ghash_clmulni_intel snd_hwdep sysfillrect aesni_intel sysimgblt snd_pcm aes_x86_64 crypto_simd snd_timer cryptd snd glue_helper soundcore mei_me ie31200_edac input_leds serio_raw mei eeepc_wmi intel_cstate asus_wmi sparse_keymap pcspkr
[616084.497496]  intel_rapl_perf wmi_bmof mac_hid vhost_net vhost tap ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp sunrpc libiscsi_tcp libiscsi scsi_transport_iscsi ip_tables x_tables autofs4 zfs(PO) zunicode(PO) zlua(PO) zcommon(PO) znvpair(PO) zavl(PO) icp(PO) spl(O) btrfs xor zstd_compress raid6_pq libcrc32c netconsole i2c_i801 ahci lpc_ich r8168(O) libahci wmi video
[616084.497533] ---[ end trace cb12480aec780cbe ]---
[616084.497537] RIP: 0010:__x86_indirect_thunk_rax+0x0/0x20
[616084.497549] Code: 89 c8 e9 3c ba c5 ff c1 e1 03 01 d1 89 ca e9 67 c2 c5 ff 48 8d 0c c8 e9 dc bf c5 ff b9 f2 ff ff ff 30 c0 e9 17 c2 c5 ff 90 90 <e8> 07 00 00 00 f3 90 0f ae e8 eb f9 48 89 04 24 c3 66 66 2e 0f 1f
[616084.497553] RSP: 0018:ffffabfcd18cba80 EFLAGS: 00010282
[616084.497555] RAX: ffffffff95cca0b0 RBX: ffff9ba2095dbbc0 RCX: ffff9ba2095dbbc8
[616084.497558] RDX: ffffabfcd18cbc00 RSI: ffff9ba2095dbbc8 RDI: ffff9ba151224d00
[616084.497561] RBP: ffffabfcd18cba90 R08: ffff9ba151224d01 R09: 0000000000000004
[616084.497564] R10: ffff9ba151224d00 R11: 0000000000000000 R12: 0000000000000019
[616084.497567] R13: 0000000000000000 R14: 0000000000000000 R15: ffffabfcd18cbb0c
[616084.497579] FS:  00007ff2570fb700(0000) GS:ffff9ba23f880000(0000) knlGS:0000000000000000
[616084.497583] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[616084.497585] CR2: 000000000425f2e8 CR3: 00000007f7910005 CR4: 00000000001626e0
Has anyone an idea? The only thing i can see is the second line:
[616084.497243] kvm[11243]: segfault at 2c ip 00007f36cb5eea48 sp 00007fff0e5562c0 error 4 in libglib-2.0.so.0.5800.3[7f36cb5bc000+7e000]
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE and Proxmox Mail Gateway. We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get your own in 60 seconds.

Buy now!