Proxmox crash PTI related ?

Inglebard

Well-Known Member
May 20, 2016
100
6
58
31
Hi,

My proxmox crash during backup of a VM :

Code:
Feb 27 23:43:02 bsi-proxmox kernel: [127818.541593] BUG: unable to handle kernel paging request at ffff93e009eaaa0f
Feb 27 23:43:02 bsi-proxmox kernel: [127818.541627] IP: _raw_spin_trylock+0x9/0x30
Feb 27 23:43:02 bsi-proxmox kernel: [127818.541643] PGD 6a731067
Feb 27 23:43:02 bsi-proxmox kernel: [127818.541644] P4D 6a731067
Feb 27 23:43:02 bsi-proxmox kernel: [127818.541654] PUD 0
Feb 27 23:43:02 bsi-proxmox kernel: [127818.541665]
Feb 27 23:43:02 bsi-proxmox kernel: [127818.541682] Oops: 0000 [#1] SMP PTI
Feb 27 23:43:02 bsi-proxmox kernel: [127818.541696] Modules linked in: nfsv3 nfs_acl nfs lockd grace fscache ip_set ip6table_filter ip6_tables iptable_filter softdog nfnetlink_log nfnetlink nls_iso8859_1 dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel snd_hda_codec_hdmi kvm snd_hda_codec_realtek snd_hda_codec_generic irqbypass snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_pcm snd_timer crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc snd aesni_intel i915 aes_x86_64 hci_uart crypto_simd soundcore btbcm glue_helper wmi_bmof serdev cryptd eeepc_wmi btqca mxm_wmi asus_wmi btintel bluetooth sparse_keymap drm_kms_helper drm intel_cstate pcspkr i2c_algo_bit intel_rapl_perf ecdh_generic mei_me fb_sys_fops mei syscopyarea sysfillrect sysimgblt intel_lpss_acpi

proxmox-ve: 5.1-41 (running kernel: 4.13.13-6-pve)
pve-manager: 5.1-46 (running version: 5.1-46/ae8241d4)
pve-kernel-4.13.13-6-pve: 4.13.13-41
pve-kernel-4.13.13-5-pve: 4.13.13-38
pve-kernel-4.13.13-4-pve: 4.13.13-35
pve-kernel-4.13.13-3-pve: 4.13.13-34
pve-kernel-4.13.13-2-pve: 4.13.13-33
pve-kernel-4.13.13-1-pve: 4.13.13-31
pve-kernel-4.13.8-3-pve: 4.13.8-30
pve-kernel-4.13.8-2-pve: 4.13.8-28
pve-kernel-4.13.8-1-pve: 4.13.8-27
pve-kernel-4.13.4-1-pve: 4.13.4-26
corosync: 2.4.2-pve3
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: 1.2-2
libjs-extjs: 6.0.1-2
libpve-access-control: 5.0-8
libpve-common-perl: 5.0-28
libpve-guest-common-perl: 2.0-14
libpve-http-server-perl: 2.0-8
libpve-storage-perl: 5.0-17
libqb0: 1.0.1-1
lvm2: 2.02.168-pve6
lxc-pve: 2.1.1-2
lxcfs: 2.0.8-2
novnc-pve: 0.6-4
proxmox-widget-toolkit: 1.0-11
pve-cluster: 5.0-20
pve-container: 2.0-19
pve-docs: 5.1-16
pve-firewall: 3.0-5
pve-firmware: 2.0-3
pve-ha-manager: 2.0-5
pve-i18n: 1.0-4
pve-libspice-server1: 0.12.8-3
pve-qemu-kvm: 2.9.1-9
pve-xtermjs: 1.0-2
qemu-server: 5.0-22
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3
zfsutils-linux: 0.7.6-pve1~bpo9

Any idea why ? Seems to be related to the latest kernel update ?
 
could you check the logs for a more complete trace? can you reproduce the issue or was it a one-off?
 
Hi,
Where can I find the log ?
No, I can't reproduce the issue, it's an automatic backup and it happens only once since the update.

journal, /var/log/messages, ...
 
There is nothing at the hour of the crash in /var/log/messages and I already post content of syslog in my first post.
I don't find more information.
 
then let's hope it was a fluke - if it happens again please try to collect as much information as possible about the circumstances (and of course all traces/error messsages/etc).
 
Hi,
It happens again, here is more detail :

Code:
Mar 25 23:35:09 bsi-proxmox kernel: general protection fault: 0000 [#1] SMP PTI
Mar 25 23:35:09 bsi-proxmox kernel: Modules linked in: nfsv3 nfs_acl nfs lockd grace fscache ip_set ip6table_filter ip6_tables iptable_filter softdog nfnetlink_log nfnetlink nls_iso8859_1 dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel snd_hda_codec i915 eeepc_wmi snd_hda_core asus_wmi intel_rapl snd_hwdep hci_uart snd_pcm sparse_keymap x86_pkg_temp_thermal intel_powerclamp wmi_bmof btbcm serdev coretemp snd_timer mxm_wmi kvm_intel kvm drm_kms_helper drm snd irqbypass soundcore crct10dif_pclmul crc32_pclmul ghash_clmulni_intel i2c_algo_bit pcbc aesni_intel btqca aes_x86_64 btintel crypto_simd glue_helper cryptd intel_cstate intel_rapl_perf fb_sys_fops syscopyarea pcspkr sysfillrect sysimgblt bluetooth ecdh_generic intel_lpss_acpi mei_me
Mar 25 23:35:10 bsi-proxmox kernel:  intel_lpss acpi_als mei mac_hid shpchp video kfifo_buf acpi_pad industrialio wmi zfs(PO) zunicode(PO) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) vhost_net vhost tap ib_iser rdma_cm iw_cm ib_cm ib_core sunrpc iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip_tables x_tables autofs4 btrfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear i2c_i801 r8169 mii ahci libahci i2c_hid hid
Mar 25 23:35:10 bsi-proxmox kernel: CPU: 3 PID: 54 Comm: kswapd0 Tainted: P           O    4.13.13-6-pve #1
Mar 25 23:35:10 bsi-proxmox kernel: Hardware name: System manufacturer System Product Name/PRIME B250M-K, BIOS 0809 07/07/2017
Mar 25 23:35:10 bsi-proxmox kernel: task: ffff98f6f0d7df00 task.stack: ffffbee000920000
Mar 25 23:35:10 bsi-proxmox kernel: RIP: 0010:rmap_get_first+0x27/0x60 [kvm]
Mar 25 23:35:10 bsi-proxmox kernel: RSP: 0018:ffffbee0009238d0 EFLAGS: 00010286
Mar 25 23:35:10 bsi-proxmox kernel: RAX: ffff98f62b913b9c RBX: 2b06009dc83a029e RCX: 0000000000074100
Mar 25 23:35:10 bsi-proxmox kernel: RDX: ffff98f6c08e01c0 RSI: ffffbee0009238f8 RDI: ffffbee00cc5c000
Mar 25 23:35:10 bsi-proxmox kernel: RBP: ffffbee0009238d8 R08: 0000000000000001 R09: 0000000000000000
Mar 25 23:35:10 bsi-proxmox kernel: R10: ffffe1cfc2f04100 R11: 0000000000000000 R12: ffff98f6c1b00000
Mar 25 23:35:10 bsi-proxmox kernel: R13: ffffffffc0aae700 R14: 0000000000074100 R15: 0000000000000001
Mar 25 23:35:10 bsi-proxmox kernel: FS:  0000000000000000(0000) GS:ffff98f6f6d80000(0000) knlGS:0000000000000000
Mar 25 23:35:10 bsi-proxmox kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Mar 25 23:35:10 bsi-proxmox kernel: CR2: 00007ffc51919b80 CR3: 000000007f40a005 CR4: 00000000003626e0
Mar 25 23:35:10 bsi-proxmox kernel: Call Trace:
Mar 25 23:35:10 bsi-proxmox kernel:  kvm_age_rmapp+0x3b/0x170 [kvm]
Mar 25 23:35:10 bsi-proxmox kernel:  ? mark_spte_for_access_track+0xc0/0xc0 [kvm]
Mar 25 23:35:10 bsi-proxmox kernel:  kvm_handle_hva_range+0x12f/0x1a0 [kvm]
Mar 25 23:35:10 bsi-proxmox kernel:  kvm_age_hva+0x17/0x20 [kvm]
Mar 25 23:35:10 bsi-proxmox kernel:  kvm_mmu_notifier_clear_flush_young+0x44/0x80 [kvm]
Mar 25 23:35:10 bsi-proxmox kernel:  __mmu_notifier_clear_flush_young+0x61/0x90
Mar 25 23:35:10 bsi-proxmox kernel:  page_referenced_one+0xee/0x190
Mar 25 23:35:10 bsi-proxmox kernel:  rmap_walk_anon+0x113/0x270
Mar 25 23:35:10 bsi-proxmox kernel:  rmap_walk+0x48/0x60
Mar 25 23:35:10 bsi-proxmox kernel:  page_referenced+0x10d/0x170
Mar 25 23:35:10 bsi-proxmox kernel:  ? invalid_page_referenced_vma+0x80/0x80
Mar 25 23:35:10 bsi-proxmox kernel:  ? page_get_anon_vma+0x80/0x80
Mar 25 23:35:10 bsi-proxmox kernel:  shrink_active_list+0x1db/0x420
Mar 25 23:35:10 bsi-proxmox kernel:  shrink_node_memcg+0x3c2/0x780
Mar 25 23:35:10 bsi-proxmox kernel:  shrink_node+0xe1/0x310
Mar 25 23:35:10 bsi-proxmox kernel:  ? shrink_node+0xe1/0x310
Mar 25 23:35:10 bsi-proxmox kernel:  kswapd+0x386/0x770
Mar 25 23:35:10 bsi-proxmox kernel:  kthread+0x10c/0x140
Mar 25 23:35:10 bsi-proxmox kernel:  ? mem_cgroup_shrink_node+0x180/0x180
Mar 25 23:35:10 bsi-proxmox kernel:  ? kthread_create_on_node+0x70/0x70
Mar 25 23:35:10 bsi-proxmox kernel:  ret_from_fork+0x35/0x40
Mar 25 23:35:10 bsi-proxmox kernel: Code: 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 53 48 8b 07 48 85 c0 74 40 a8 01 74 1b 48 83 e0 fe c7 46 08 00 00 00 00 48 89 06 48 8b 18 <48> 8b 3b 48 85 ff 75 14 0f 0b 48 c7 06 00 00 00 00 48 8b 1f 48
Mar 25 23:35:10 bsi-proxmox kernel: RIP: rmap_get_first+0x27/0x60 [kvm] RSP: ffffbee0009238d0
 
that looks like a totally different trace. did it happen upon backup as well? how was the general load and memory situation at that time? any other error messages or log parts the could be relevant?
 
Yes, during a backup, it crashes at 5%.

This time, it rebooted, last time, it was completely frozen.

Edit :

The proxmox has 4GB RAM + 4GB SWAP . There is one VM on it with 2.5GB.
Backup are done on an via NFS and VM run on another dedicated NAS on NFS.

I got a zabbix-agent running on it, it reports 800MB+ RAM available just before crash.
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!