Proxmox keeps crashing/rebooting

proximity

Well-Known Member
Jul 19, 2019
50
1
48
51
I had a crash on my desktop image that is running in a vm on proxmox. Since then it won't start anymore; a few seconds after starting it the host machine reboots.

It is still running 6.4 so I thought maybe upgrading to 7 would solve it so I wanted to make backups but when I run vzdump it also crashes after a few percentage of backup progress.

Any suggestions on how to debug and fix this?

thx!
 
This is messages output:
Code:
Oct 20 11:06:44 pve kernel: [   10.283075] igb 0000:06:00.0 enp6s0: igb: enp6s0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
Oct 20 11:06:44 pve kernel: [   10.390769] vmbr0: port 1(enp6s0) entered blocking state
Oct 20 11:06:44 pve kernel: [   10.390795] vmbr0: port 1(enp6s0) entered forwarding state
Oct 20 11:06:44 pve kernel: [   10.391006] IPv6: ADDRCONF(NETDEV_CHANGE): vmbr0: link becomes ready
Oct 20 11:06:55 pve kernel: [   21.259243]  zd112: p1 p2 < p5 >
Oct 20 11:37:23 pve kernel: [ 1848.590082] VFIO - User Level meta-driver version: 0.3
Oct 20 11:37:23 pve kernel: [ 1848.668739] vfio-pci 0000:0c:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=io+mem:owns=none
Oct 20 11:37:23 pve kernel: [ 1848.852684] xhci_hcd 0000:0e:00.3: remove, state 4
Oct 20 11:37:23 pve kernel: [ 1848.852701] usb usb6: USB disconnect, device number 1
Oct 20 11:37:23 pve kernel: [ 1848.852706] usb 6-2: USB disconnect, device number 2
Oct 20 11:37:23 pve kernel: [ 1848.852710] usb 6-2.4: USB disconnect, device number 3
Oct 20 11:37:23 pve kernel: [ 1848.852714] usb 6-2.4.4: USB disconnect, device number 4
Oct 20 11:37:23 pve kernel: [ 1848.853221] xhci_hcd 0000:0e:00.3: USB bus 6 deregistered
Oct 20 11:37:23 pve kernel: [ 1848.853230] xhci_hcd 0000:0e:00.3: remove, state 1
Oct 20 11:37:23 pve kernel: [ 1848.853235] usb usb5: USB disconnect, device number 1
Oct 20 11:37:23 pve kernel: [ 1848.853240] usb 5-2: USB disconnect, device number 2
Oct 20 11:37:23 pve kernel: [ 1848.853243] usb 5-2.4: USB disconnect, device number 3
Oct 20 11:37:23 pve kernel: [ 1848.853247] usb 5-2.4.1: USB disconnect, device number 4
Oct 20 11:37:23 pve kernel: [ 1848.853455] usb 5-2.4.3: USB disconnect, device number 5
Oct 20 11:37:24 pve kernel: [ 1849.060179] usb 5-2.4.4: USB disconnect, device number 6
Oct 20 11:37:24 pve kernel: [ 1849.060193] usb 5-2.4.4.1: USB disconnect, device number 7
Oct 20 11:37:24 pve kernel: [ 1849.233052] xhci_hcd 0000:0e:00.3: USB bus 5 deregistered
Oct 20 11:37:24 pve kernel: [ 1849.251522] vfio-pci 0000:0e:00.3: refused to change power state from D0 to D3hot
Oct 20 11:37:24 pve kernel: [ 1849.605411] device tap103i0 entered promiscuous mode
Oct 20 11:37:24 pve kernel: [ 1849.624971] fwbr103i0: port 1(fwln103i0) entered blocking state
Oct 20 11:37:24 pve kernel: [ 1849.624979] fwbr103i0: port 1(fwln103i0) entered disabled state
Oct 20 11:37:24 pve kernel: [ 1849.625012] device fwln103i0 entered promiscuous mode
Oct 20 11:37:24 pve kernel: [ 1849.625035] fwbr103i0: port 1(fwln103i0) entered blocking state
Oct 20 11:37:24 pve kernel: [ 1849.625064] fwbr103i0: port 1(fwln103i0) entered forwarding state
Oct 20 11:37:24 pve kernel: [ 1849.626852] vmbr0: port 2(fwpr103p0) entered blocking state
Oct 20 11:37:24 pve kernel: [ 1849.626859] vmbr0: port 2(fwpr103p0) entered disabled state
Oct 20 11:37:24 pve kernel: [ 1849.626887] device fwpr103p0 entered promiscuous mode
Oct 20 11:37:24 pve kernel: [ 1849.626909] vmbr0: port 2(fwpr103p0) entered blocking state
Oct 20 11:37:24 pve kernel: [ 1849.626936] vmbr0: port 2(fwpr103p0) entered forwarding state
Oct 20 11:37:24 pve kernel: [ 1849.628698] fwbr103i0: port 2(tap103i0) entered blocking state
Oct 20 11:37:24 pve kernel: [ 1849.628710] fwbr103i0: port 2(tap103i0) entered disabled state
Oct 20 11:37:24 pve kernel: [ 1849.628767] fwbr103i0: port 2(tap103i0) entered blocking state
Oct 20 11:37:24 pve kernel: [ 1849.628796] fwbr103i0: port 2(tap103i0) entered forwarding state
Oct 20 11:37:25 pve kernel: [ 1850.329481] vfio-pci 0000:0c:00.0: vfio_ecap_init: hiding ecap 0x19@0x900
Oct 20 11:37:25 pve kernel: [ 1850.332258] vfio-pci 0000:0c:00.0: Invalid PCI ROM header signature: expecting 0xaa55, got 0x0000
Oct 20 11:37:25 pve kernel: [ 1850.352964] vfio-pci 0000:0e:00.3: enabling device (0000 -> 0002)
Oct 20 11:38:07 pve kernel: [    0.000000] Linux version 5.11.22-5-pve (build@proxmox) (gcc (Debian 8.3.0-6) 8.3.0, GNU ld (GNU Binutils for Debian) 2.31.1) #1 SMP PVE 5.11.22-10~bpo10+1 (Tue, 28 Sep 2021 10:30:51 +0200) ()

and this is the output when running the backup:
Code:
INFO: starting new backup job: vzdump 103 --mode stop --dumpdir /data2
INFO: Starting Backup of VM 103 (qemu)
INFO: Backup started at 2021-10-20 11:37:23
INFO: status = stopped
INFO: backup mode: stop
INFO: ionice priority: 7
INFO: VM Name: Kids
INFO: include disk 'scsi0' 'data2:vm-103-disk-0' 100G
INFO: include disk 'efidisk0' 'data2:vm-103-disk-1' 1M
INFO: creating vzdump archive '/data2/vzdump-qemu-103-2021_10_20-11_37_23.vma'
INFO: starting kvm to execute backup task
INFO: started backup task '092b1630-d52a-448c-af85-8bf967947843'
INFO:   2% (2.4 GiB of 100.0 GiB) in 3s, read: 817.0 MiB/s, write: 764.7 MiB/s
INFO:   4% (4.6 GiB of 100.0 GiB) in 6s, read: 765.6 MiB/s, write: 715.8 MiB/s
 
it may show more , like hardware related stuff.

keep a couple of screens running something like the following.
Code:
 journalctl -ef -p 7

Code:
watch -n 10 dmesg -T

perhaps you'll catch something