Proxmox keeps crashing/rebooting

proximity

Member
Jul 19, 2019
48
1
13
49
I had a crash on my desktop image that is running in a vm on proxmox. Since then it won't start anymore; a few seconds after starting it the host machine reboots.

It is still running 6.4 so I thought maybe upgrading to 7 would solve it so I wanted to make backups but when I run vzdump it also crashes after a few percentage of backup progress.

Any suggestions on how to debug and fix this?

thx!
 
This is messages output:
Code:
Oct 20 11:06:44 pve kernel: [   10.283075] igb 0000:06:00.0 enp6s0: igb: enp6s0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
Oct 20 11:06:44 pve kernel: [   10.390769] vmbr0: port 1(enp6s0) entered blocking state
Oct 20 11:06:44 pve kernel: [   10.390795] vmbr0: port 1(enp6s0) entered forwarding state
Oct 20 11:06:44 pve kernel: [   10.391006] IPv6: ADDRCONF(NETDEV_CHANGE): vmbr0: link becomes ready
Oct 20 11:06:55 pve kernel: [   21.259243]  zd112: p1 p2 < p5 >
Oct 20 11:37:23 pve kernel: [ 1848.590082] VFIO - User Level meta-driver version: 0.3
Oct 20 11:37:23 pve kernel: [ 1848.668739] vfio-pci 0000:0c:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=io+mem:owns=none
Oct 20 11:37:23 pve kernel: [ 1848.852684] xhci_hcd 0000:0e:00.3: remove, state 4
Oct 20 11:37:23 pve kernel: [ 1848.852701] usb usb6: USB disconnect, device number 1
Oct 20 11:37:23 pve kernel: [ 1848.852706] usb 6-2: USB disconnect, device number 2
Oct 20 11:37:23 pve kernel: [ 1848.852710] usb 6-2.4: USB disconnect, device number 3
Oct 20 11:37:23 pve kernel: [ 1848.852714] usb 6-2.4.4: USB disconnect, device number 4
Oct 20 11:37:23 pve kernel: [ 1848.853221] xhci_hcd 0000:0e:00.3: USB bus 6 deregistered
Oct 20 11:37:23 pve kernel: [ 1848.853230] xhci_hcd 0000:0e:00.3: remove, state 1
Oct 20 11:37:23 pve kernel: [ 1848.853235] usb usb5: USB disconnect, device number 1
Oct 20 11:37:23 pve kernel: [ 1848.853240] usb 5-2: USB disconnect, device number 2
Oct 20 11:37:23 pve kernel: [ 1848.853243] usb 5-2.4: USB disconnect, device number 3
Oct 20 11:37:23 pve kernel: [ 1848.853247] usb 5-2.4.1: USB disconnect, device number 4
Oct 20 11:37:23 pve kernel: [ 1848.853455] usb 5-2.4.3: USB disconnect, device number 5
Oct 20 11:37:24 pve kernel: [ 1849.060179] usb 5-2.4.4: USB disconnect, device number 6
Oct 20 11:37:24 pve kernel: [ 1849.060193] usb 5-2.4.4.1: USB disconnect, device number 7
Oct 20 11:37:24 pve kernel: [ 1849.233052] xhci_hcd 0000:0e:00.3: USB bus 5 deregistered
Oct 20 11:37:24 pve kernel: [ 1849.251522] vfio-pci 0000:0e:00.3: refused to change power state from D0 to D3hot
Oct 20 11:37:24 pve kernel: [ 1849.605411] device tap103i0 entered promiscuous mode
Oct 20 11:37:24 pve kernel: [ 1849.624971] fwbr103i0: port 1(fwln103i0) entered blocking state
Oct 20 11:37:24 pve kernel: [ 1849.624979] fwbr103i0: port 1(fwln103i0) entered disabled state
Oct 20 11:37:24 pve kernel: [ 1849.625012] device fwln103i0 entered promiscuous mode
Oct 20 11:37:24 pve kernel: [ 1849.625035] fwbr103i0: port 1(fwln103i0) entered blocking state
Oct 20 11:37:24 pve kernel: [ 1849.625064] fwbr103i0: port 1(fwln103i0) entered forwarding state
Oct 20 11:37:24 pve kernel: [ 1849.626852] vmbr0: port 2(fwpr103p0) entered blocking state
Oct 20 11:37:24 pve kernel: [ 1849.626859] vmbr0: port 2(fwpr103p0) entered disabled state
Oct 20 11:37:24 pve kernel: [ 1849.626887] device fwpr103p0 entered promiscuous mode
Oct 20 11:37:24 pve kernel: [ 1849.626909] vmbr0: port 2(fwpr103p0) entered blocking state
Oct 20 11:37:24 pve kernel: [ 1849.626936] vmbr0: port 2(fwpr103p0) entered forwarding state
Oct 20 11:37:24 pve kernel: [ 1849.628698] fwbr103i0: port 2(tap103i0) entered blocking state
Oct 20 11:37:24 pve kernel: [ 1849.628710] fwbr103i0: port 2(tap103i0) entered disabled state
Oct 20 11:37:24 pve kernel: [ 1849.628767] fwbr103i0: port 2(tap103i0) entered blocking state
Oct 20 11:37:24 pve kernel: [ 1849.628796] fwbr103i0: port 2(tap103i0) entered forwarding state
Oct 20 11:37:25 pve kernel: [ 1850.329481] vfio-pci 0000:0c:00.0: vfio_ecap_init: hiding ecap 0x19@0x900
Oct 20 11:37:25 pve kernel: [ 1850.332258] vfio-pci 0000:0c:00.0: Invalid PCI ROM header signature: expecting 0xaa55, got 0x0000
Oct 20 11:37:25 pve kernel: [ 1850.352964] vfio-pci 0000:0e:00.3: enabling device (0000 -> 0002)
Oct 20 11:38:07 pve kernel: [    0.000000] Linux version 5.11.22-5-pve (build@proxmox) (gcc (Debian 8.3.0-6) 8.3.0, GNU ld (GNU Binutils for Debian) 2.31.1) #1 SMP PVE 5.11.22-10~bpo10+1 (Tue, 28 Sep 2021 10:30:51 +0200) ()

and this is the output when running the backup:
Code:
INFO: starting new backup job: vzdump 103 --mode stop --dumpdir /data2
INFO: Starting Backup of VM 103 (qemu)
INFO: Backup started at 2021-10-20 11:37:23
INFO: status = stopped
INFO: backup mode: stop
INFO: ionice priority: 7
INFO: VM Name: Kids
INFO: include disk 'scsi0' 'data2:vm-103-disk-0' 100G
INFO: include disk 'efidisk0' 'data2:vm-103-disk-1' 1M
INFO: creating vzdump archive '/data2/vzdump-qemu-103-2021_10_20-11_37_23.vma'
INFO: starting kvm to execute backup task
INFO: started backup task '092b1630-d52a-448c-af85-8bf967947843'
INFO:   2% (2.4 GiB of 100.0 GiB) in 3s, read: 817.0 MiB/s, write: 764.7 MiB/s
INFO:   4% (4.6 GiB of 100.0 GiB) in 6s, read: 765.6 MiB/s, write: 715.8 MiB/s
 
it may show more , like hardware related stuff.

keep a couple of screens running something like the following.
Code:
 journalctl -ef -p 7

Code:
watch -n 10 dmesg -T

perhaps you'll catch something
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!