Server crashes after VM starts

nxet

Member
Sep 13, 2020
17
3
8
124
I'm running into this problem which I cannot figure out on my own.
Everything worked great until a week or so ago, there were no changes to any configuration.
It's worth noting I'm running relatively old hardware, so I'm starting to think this might be hardware failure, but I really can't understand the logs and I'd appreciate some hints.

Sep 26 18:38:33 pmox kernel: [ 151.413709] device tap2021i0 entered promiscuous mode
...
Sep 26 18:38:33 pmox kernel: [ 151.494605] fwbr2021i0: port 2(tap2021i0) entered forwarding state
Sep 26 18:38:44 pmox kernel: [ 161.953136] page:ffffee5586ff9840 refcount:0 mapcount:0 mapping:0000080000000000 index:0x0
Sep 26 18:38:44 pmox kernel: [ 161.953154] PGD 0 P4D 0
Sep 26 18:38:44 pmox kernel: [ 161.953158] Oops: 0000 [#1] SMP NOPTI
Sep 26 18:38:44 pmox kernel: [ 161.953162] CPU: 1 PID: 2347 Comm: kvm Tainted: P O 5.4.60-1-pve #1
Sep 26 18:38:44 pmox kernel: [ 161.953166] Hardware name: System manufacturer System Product Name/M3A78-CM, BIOS 2801 08/23/2010
Sep 26 18:38:44 pmox kernel: [ 161.953174] RIP: 0010:__dump_page.cold.5+0x238/0x2a0
Sep 26 18:38:44 pmox kernel: [ 161.953177] Code: ff ff 48 8b 43 08 a8 01 74 16 48 83 e8 01 f6 40 18 01 74 11 48 c7 c6 56 18 38 a5 e9 7e fe ff ff 48 89 d8 eb e9 4d 85 ed 74 38 <49> 8b 45 00 49 8b 75 70 48 85 c0 74 37 48 8b 80 38 01 00 00 48 85
Sep 26 18:38:44 pmox kernel: [ 161.953184] RSP: 0018:ffffb53dc4bab598 EFLAGS: 00010006
...
Sep 26 18:38:44 pmox kernel: [ 161.953210] CR2: 0000080000000000 CR3: 00000001e1bd8000 CR4: 00000000000006e0
Sep 26 18:38:44 pmox kernel: [ 161.953213] Call Trace:
Sep 26 18:38:44 pmox kernel: [ 161.953221] bad_page.cold.118+0x59/0xb3
...
Sep 26 18:38:44 pmox kernel: [ 161.953356] entry_SYSCALL_64_after_hwframe+0x44/0xa9
Sep 26 18:38:44 pmox kernel: [ 161.953359] RIP: 0033:0x7f7a4445ff59
Sep 26 18:38:44 pmox kernel: [ 161.953363] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 07 6f 0c 00 f7 d8 64 89 01 48
Sep 26 18:38:44 pmox kernel: [ 161.953368] RSP: 002b:00007fff75614788 EFLAGS: 00000246 ORIG_RAX: 00000000000000d1
...
Sep 26 18:38:44 pmox kernel: [ 161.953384] R13: 0000000000000000 R14: 00007fff756147d0 R15: 0000000000000001
Sep 26 18:38:44 pmox kernel: [ 161.953388] Modules linked in: veth ebtable_filter ebtables ip6table_raw ip6t_REJECT nf_reject_ipv6 ip6table_filter ip6_tables iptable_raw ipt_REJECT nf_reject_ipv4 xt_mark xt_NFLOG xt_limit xt_set xt_physdev xt_addrtype xt_comment xt_multiport xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_tcpudp ip_set_hash_net ip_set iptable_filter bpfilter softdog nfnetlink_log nfnetlink dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio radeon ppdev ttm drm_kms_helper edac_mce_amd snd_hda_codec_hdmi pcspkr serio_raw snd_hda_codec_via snd_hda_codec_generic ledtrig_audio kvm_amd drm snd_hda_intel ccp snd_intel_dspcfg kvm irqbypass snd_hda_codec k10temp i2c_algo_bit snd_hda_core fb_sys_fops snd_hwdep syscopyarea snd_pcm sysfillrect sysimgblt snd_timer snd soundcore parport_pc asus_atk0110 parport mac_hid zfs(PO) zunicode(PO) zlua(PO) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) vhost_net vhost tap ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi
Sep 26 18:38:44 pmox kernel: [ 161.953428] sunrpc hwmon_vid ip_tables x_tables autofs4 btrfs xor zstd_compress raid6_pq libcrc32c pata_acpi ohci_pci psmouse pata_atiixp i2c_piix4 ahci ohci_hcd ehci_pci r8169 ehci_hcd libahci realtek floppy
Sep 26 18:38:44 pmox kernel: [ 161.953465] CR2: 0000080000000000
Sep 26 18:38:44 pmox kernel: [ 161.953468] ---[ end trace 3938b33683fffb05 ]---
Sep 26 18:38:44 pmox kernel: [ 161.953472] RIP: 0010:__dump_page.cold.5+0x238/0x2a0
Sep 26 18:38:44 pmox kernel: [ 161.953475] Code: ff ff 48 8b 43 08 a8 01 74 16 48 83 e8 01 f6 40 18 01 74 11 48 c7 c6 56 18 38 a5 e9 7e fe ff ff 48 89 d8 eb e9 4d 85 ed 74 38 <49> 8b 45 00 49 8b 75 70 48 85 c0 74 37 48 8b 80 38 01 00 00 48 85
Sep 26 18:38:44 pmox kernel: [ 161.953481] RSP: 0018:ffffb53dc4bab598 EFLAGS: 00010006
...
Sep 26 18:38:44 pmox kernel: [ 161.953505] CR2: 0000080000000000 CR3: 00000001e1bd8000 CR4: 00000000000006e0


As you can see, the VM is set to autostart with 120s delay by the way, as soon as the machine goes up the whole system goes in flames.
The whole output looks quite cryptic to me, but the RIP: 0010:__dump_page.cold.5+0x238/0x2a0 line really got me thinking this might be a problem with either the RAM (2x DDR2 800Mhz 4Gb sticks) or the ZFS pool (running on 3x *old* 2TB drives).
But then again I feel quite lost with all this information.

Thanks again for any pointer in the right direction!
 

Attachments

  • log.txt
    9.3 KB · Views: 2
The whole output looks quite cryptic to me, but the RIP: 0010:__dump_page.cold.5+0x238/0x2a0 line really got me thinking this might be a problem with either the RAM (2x DDR2 800Mhz 4Gb sticks) or the ZFS pool (running on 3x *old* 2TB drives).
Could be likely. Worth to test the memory.
 
Could be likely. Worth to test the memory.

Thanks for your input, I did indeed run a memtest for about an hour, and there were no issues reported.

Would you suggest running it again maybe for a solid 24h? Would that make any difference? Given that the server crashes after ~2min from boot, I ruled out overheating as a cause, but then again I’m open to suggestions because my home lab has been off for too long already!
 
Thanks for your input, I did indeed run a memtest for about an hour, and there were no issues reported.
Well, memtest is not the most accurate tool. I thought more about removing DIMMs and see if the error persists.
 
So I tried to swap DIMMs, luckily I had a couple spares.
Running with just one of the two DIMMs (total 4Gb RAM) worked "fine", but both DIMMs had nearly identical performances. After testing about 2h each I used a third DIMM paired with one of the two already installed.
The machine has now been running over 14h and there doesn't seem to be any issue. I'd love to say I can rule that it was indeed one of the old DIMMs but I'll keep you posted in a couple days with some more usage.
Thanks again for your help!
 
Also check if the dimms run at the correct speed. Should be selected by SPD by default but on my end that also didn't work at some point.
Running PC8500 at PC10666 also can provoke some side effect !)
 
Also check if the dimms run at the correct speed. Should be selected by SPD by default but on my end that also didn't work at some point.
Running PC8500 at PC10666 also can provoke some side effect !)
Unfortunately the RAM is running at 400Mhz while they would support up to a whopping 800Mhz. But I believe there's something wrong with the MoBo, I already upgraded the BIOS to the latest (2010) release and the configuration doesn't seem to allow anything higher than 400Mhz (this was before using this machine for proxmox).

Either way, the server has shown no major issues so far, and this problem seems to be resolved for now: replacing the DIMMs definitely helped.
Unfortunately I've been dumb enough to upgrade the kernel as soon as it was available, so I guess we'll never know if there was also some issue in the previous version clashing with the old hardware I'm forcing it to run on.

Thanks again for your help, much appreciated.
 
You are very welcome. Thanks for the update on your progress.
One note though on your RAM timings. I just mentioned the potential overclock of modules as an issue.
Same can hit you when the clockspeed is too slow (IIRC I had that once on an Opteron based system)
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!