Proxmox host crashes fairly often, I think it may have to do with how I am mounting my hard drive to the VM

diericx

New Member
Nov 25, 2019
19
0
1
26
I recently added a VM that mounts a hard drive, and then has all of my home media services running (Deluge downloading and seeding, Sonarr and Radarr for library management and SMB server for network access to the hard drive). After adding this my host started crashing randomly and pretty often. I tried looking through the logs but it seems like its ok one second then just stops.

I can't remember exactly how I mounted my drive onto the VM, but my fstab looks like this:
Code:
UUID=a3505d34-d68f-4af4-88a3-3a9334bf0491 / ext4 defaults 0 0
/swap.img    none    swap    sw    0    0
UUID=fa605d83-106e-4143-bb20-deec7461f08c /mnt/media ext4 auto,nofail,noatime,rw,user 0 0

And my VM hardware config looks like this:
Screen Shot 2020-03-18 at 10.00.52 AM.png

I'm very hesitant to mess with the drive (worried I might wipe it somehow and I have all my content there) so I really want to try to narrow down what the cause might be. Are there any other logs I can check?

I'll attatch my /var/log/messages file. You can see that at Mar 17 22:49:58 it crashes.

This VM used to just be an Samba server until I added the other services via Docker, at which point this issue started. I specifically have a hunch its Deluge because it is constantly using the drive for downloading and seeding. I will try stopping Deluge tonight or tomorrow night to see if the server stops crashing, but I'd still like to find a root cause so I can keep Deluge running.
 

Attachments

  • messages-trunc.log
    291.5 KB · Views: 11
Last edited:
Mar 17 21:30:28 host kernel: [ 135.660589] ------------[ cut here ]------------
Mar 17 21:30:28 host kernel: [ 135.660832] NETDEV WATCHDOG: enp0s31f6 (e1000e): transmit queue 0 timed out
Mar 17 21:30:28 host kernel: [ 135.661082] WARNING: CPU: 4 PID: 2523 at net/sched/sch_generic.c:461 dev_watchdog+0x221/0x230
Mar 17 21:30:28 host kernel: [ 135.661333] Modules linked in: binfmt_misc veth ebtable_filter ebtables ip_set ip6table_filter ip6_tabl
es iptable_filter bpfilter softdog nfnetlink_log nfnetlink snd_hda_codec_hdmi intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp
kvm_intel kvm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel btusb snd_hda_codec_realtek btrtl btbcm b43 aesni_intel snd_hda_codec_
generic btintel ledtrig_audio cordic mac80211 aes_x86_64 crypto_simd cryptd glue_helper snd_hda_intel intel_cstate bluetooth snd_hda_co
dec snd_hda_core input_leds ecdh_generic cfg80211 intel_rapl_perf snd_hwdep snd_pcm intel_wmi_thunderbolt snd_timer pcspkr ssb snd mei_
me mei soundcore mxm_wmi mac_hid acpi_pad vhost_net vhost tap ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_
transport_iscsi vfio_pci vfio_virqfd irqbypass vfio_iommu_type1 vfio sunrpc ip_tables x_tables autofs4 zfs(PO) zunicode(PO) zlua(PO) zc
ommon(PO) znvpair(PO) zavl(PO) icp(PO) spl(O) btrfs xor zstd_compress usbmouse hid_generic
Mar 17 21:30:28 host kernel: [ 135.661352] usbkbd usbhid hid raid6_pq libcrc32c i2c_i801 ahci e1000e bcma libahci wmi video
Mar 17 21:30:28 host kernel: [ 135.664379] CPU: 4 PID: 2523 Comm: kvm Tainted: P O 5.0.15-1-pve #1
Mar 17 21:30:28 host kernel: [ 135.664788] Hardware name: Gigabyte Technology Co., Ltd. Z170X-UD5 TH/Z170X-UD5 TH-CF, BIOS F20 12/14/2
016
Mar 17 21:30:28 host kernel: [ 135.665212] RIP: 0010:dev_watchdog+0x221/0x230
Mar 17 21:30:28 host kernel: [ 135.665636] Code: 00 49 63 4e e0 eb 92 4c 89 ef c6 05 5a a2 ef 00 01 e8 03 2b fc ff 89 d9 4c 89 ee 48 c
7 c7 90 03 7b be 48 89 c2 e8 91 d5 78 ff <0f> 0b eb c0 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48
Mar 17 21:30:28 host kernel: [ 135.666543] RSP: 0018:ffff9913beb03e68 EFLAGS: 00010286
Mar 17 21:30:28 host kernel: [ 135.666999] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000006
Mar 17 21:30:28 host kernel: [ 135.667459] RDX: 0000000000000007 RSI: 0000000000000096 RDI: ffff9913beb16440
Mar 17 21:30:28 host kernel: [ 135.667925] RBP: ffff9913beb03e98 R08: 0000000000000000 R09: 0000000000000432
Mar 17 21:30:28 host kernel: [ 135.668394] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000001
Mar 17 21:30:28 host kernel: [ 135.668887] R13: ffff9913af6cc000 R14: ffff9913af6cc4c0 R15: ffff9913afcdbe80
Mar 17 21:30:28 host kernel: [ 135.669364] FS: 00007f235a5ff700(0000) GS:ffff9913beb00000(0000) knlGS:ffff8be8bfd00000
Mar 17 21:30:28 host kernel: [ 135.669850] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Mar 17 21:30:28 host kernel: [ 135.670339] CR2: 00007f4c49d83000 CR3: 00000007cad52004 CR4: 00000000003626e0
Mar 17 21:30:28 host kernel: [ 135.670840] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Mar 17 21:30:28 host kernel: [ 135.671338] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Mar 17 21:30:28 host kernel: [ 135.671836] Call Trace:
Mar 17 21:30:28 host kernel: [ 135.672335] <IRQ>
Mar 17 21:30:28 host kernel: [ 135.672834] ? pfifo_fast_enqueue+0x120/0x120
Mar 17 21:30:28 host kernel: [ 135.673337] call_timer_fn+0x30/0x130
Mar 17 21:30:28 host kernel: [ 135.673845] run_timer_softirq+0x3e4/0x420
Mar 17 21:30:28 host kernel: [ 135.674347] ? ktime_get+0x3c/0xa0
Mar 17 21:30:28 host kernel: [ 135.674844] ? native_apic_msr_write+0x2b/0x30
Mar 17 21:30:28 host kernel: [ 135.675349] ? lapic_next_event+0x20/0x30
Mar 17 21:30:28 host kernel: [ 135.675856] ? clockevents_program_event+0x93/0xf0
Mar 17 21:30:28 host kernel: [ 135.676369] __do_softirq+0xdc/0x2f3
Mar 17 21:30:28 host kernel: [ 135.676886] irq_exit+0xc0/0xd0
Mar 17 21:30:28 host kernel: [ 135.677396] smp_apic_timer_interrupt+0x79/0x140
Mar 17 21:30:28 host kernel: [ 135.677915] apic_timer_interrupt+0xf/0x20
Mar 17 21:30:28 host kernel: [ 135.678431] </IRQ>
Mar 17 21:30:28 host kernel: [ 135.678946] RIP: 0010:vmx_handle_external_intr+0x62/0xa0 [kvm_intel]
Mar 17 21:30:28 host kernel: [ 135.679475] Code: 0f b7 50 06 8b 48 08 0f b7 00 48 c1 e2 10 48 c1 e1 20 48 09 ca 48 09 d0 48 89 e2 48 83 e4 f0 6a 18 52 9c 6a 10 e8 be 74 c0 fc <5b> 5d c3 81 3d 31 53 02 00 11 01 00 00 76 14 65 48 8b 15 4f b8 a2
Mar 17 21:30:28 host kernel: [ 135.680587] RSP: 0018:ffffa9e962e03d08 EFLAGS: 00000086 ORIG_RAX: ffffffffffffff13
Mar 17 21:30:28 host kernel: [ 135.681163] RAX: ffffffffbe001a60 RBX: ffff991337a08000 RCX: ffffffff00000000
Mar 17 21:30:28 host kernel: [ 135.681753] RDX: ffffa9e962e03d08 RSI: fffffe60cf89fef8 RDI: ffff991337a08000
Mar 17 21:30:28 host kernel: [ 135.682334] RBP: ffffa9e962e03d10 R08: 0000000000000000 R09: 0000000000000000
Mar 17 21:30:28 host kernel: [ 135.682908] R10: 0000000000000000 R11: 0000000000000000 R12: ffff991337a0a438
Mar 17 21:30:28 host kernel: [ 135.683475] R13: ffff991337a08030 R14: ffff991337a08000 R15: 0000000000000000
Mar 17 21:30:28 host kernel: [ 135.684035] ? __irqentry_text_start+0x8/0x8
Mar 17 21:30:28 host kernel: [ 135.684590] kvm_arch_vcpu_ioctl_run+0x6a8/0x1b00 [kvm]
Mar 17 21:30:28 host kernel: [ 135.685123] ? _copy_to_user+0x2b/0x40
Mar 17 21:30:28 host kernel: [ 135.685640] ? kvm_vm_ioctl+0x6a1/0x960 [kvm]
Mar 17 21:30:28 host kernel: [ 135.686187] kvm_vcpu_ioctl+0x24b/0x610 [kvm]
Mar 17 21:30:28 host kernel: [ 135.686714] ? do_futex+0xc7/0xc60
Mar 17 21:30:28 host kernel: [ 135.687177] do_vfs_ioctl+0xa9/0x640
Mar 17 21:30:28 host kernel: [ 135.687626] ksys_ioctl+0x67/0x90
Mar 17 21:30:28 host kernel: [ 135.688060] __x64_sys_ioctl+0x1a/0x20
Mar 17 21:30:28 host kernel: [ 135.688485] do_syscall_64+0x5a/0x110
Mar 17 21:30:28 host kernel: [ 135.688905] entry_SYSCALL_64_after_hwframe+0x44/0xa9
Mar 17 21:30:28 host kernel: [ 135.689312] RIP: 0033:0x7f2369e2d427
Mar 17 21:30:28 host kernel: [ 135.689708] Code: 00 00 90 48 8b 05 69 aa 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 39 aa 0c 00 f7 d8 64 89 01 48
Mar 17 21:30:28 host kernel: [ 135.690523] RSP: 002b:00007f235a5fa678 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
Mar 17 21:30:28 host kernel: [ 135.690935] RAX: ffffffffffffffda RBX: 000000000000ae80 RCX: 00007f2369e2d427
Mar 17 21:30:28 host kernel: [ 135.691352] RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 000000000000001b
Mar 17 21:30:28 host kernel: [ 135.691769] RBP: 0000000000000000 R08: 000055f75b771310 R09: 0000000000000001
Mar 17 21:30:28 host kernel: [ 135.692194] R10: 0000000000000001 R11: 0000000000000246 R12: 00007f235caf21c0
Mar 17 21:30:28 host kernel: [ 135.692613] R13: 000055f75b74d5c0 R14: 00007f235d52e000 R15: 0000000000000000
Mar 17 21:30:28 host kernel: [ 135.693022] ---[ end trace a0b89a1a27edfd8e ]---
It seems that your 1 GbE port has issues. Best upgrade they OS (new kernel) and firmware.
 
Interesting, I didn't notice that. If you don't mind me asking, how did you find that in that entire log file? I'd love to be able to read through those logs but it's pretty hard to consume. Search for keywords like Warning? Why does it only throw a warning for a potentially catastrophic error?

Is it possible to update without a subscription? I am getting 401 Unauthorized when running this
apt-get update

Also, what firmware are you referring to? Firmware for the port?
 
Also, what firmware are you referring to? Firmware for the port?
Any. Starting from BIOS, to microcode, down to each individual hardware compontent.
 
Hi,

Where you able to resolve this problem? I have exactly the same and it would be interesting to understand it was at hardware level or somewhere else.

Thx in advance.

Cheers, M.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!