[SOLVED] HA managed vm restarted several times without eye seeing reasons

kolesya

Well-Known Member
Oct 6, 2017
30
0
46
48
Howdy. We have PVE cluster:
pveversion -V
proxmox-ve: 7.1-1 (running kernel: 5.11.22-5-pve)
pve-manager: 7.1-6 (running version: 7.1-6/4e61e21c)
pve-kernel-5.13: 7.1-4
pve-kernel-helper: 7.1-4
pve-kernel-5.11: 7.0-10
pve-kernel-5.4: 6.4-6
pve-kernel-5.13.19-1-pve: 5.13.19-2
pve-kernel-5.11.22-7-pve: 5.11.22-12
pve-kernel-5.11.22-5-pve: 5.11.22-10
pve-kernel-5.4.140-1-pve: 5.4.140-1
pve-kernel-5.4.44-1-pve: 5.4.44-1
pve-kernel-4.4.134-1-pve: 4.4.134-112
pve-kernel-4.4.35-1-pve: 4.4.35-77
ceph-fuse: 14.2.21-1
corosync: 3.1.5-pve2
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown: 0.8.36+pve1
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.22-pve2
libproxmox-acme-perl: 1.4.0
libproxmox-backup-qemu0: 1.2.0-1
libpve-access-control: 7.1-4
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.0-14
libpve-guest-common-perl: 4.0-3
libpve-http-server-perl: 4.0-3
libpve-storage-perl: 7.0-15
libqb0: 1.0.5-1
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 4.0.9-4
lxcfs: 4.0.8-pve2
novnc-pve: 1.2.0-3
proxmox-backup-client: 2.1.1-1
proxmox-backup-file-restore: 2.1.1-1
proxmox-mini-journalreader: 1.2-1
proxmox-widget-toolkit: 3.4-3
pve-cluster: 7.1-2
pve-container: 4.1-2
pve-docs: 7.1-2
pve-edk2-firmware: 3.20210831-2
pve-firewall: 4.2-5
pve-firmware: 3.3-3
pve-ha-manager: 3.3-1
pve-i18n: 2.6-2
pve-qemu-kvm: 6.1.0-2
pve-xtermjs: 4.12.0-1
qemu-server: 7.1-4
smartmontools: 7.2-pve2
spiceterm: 3.2-2
swtpm: 0.7.0~rc1+2
vncterm: 1.7-1
zfsutils-linux: 2.1.1-pve3

& freenas storage connected via nfs
 
Last edited:
one of the vms restarted several times.
all i can find in logs for last 3 times reboot:

1. Nov 22 14:58:15 node1 QEMU[1688058]: kvm: ../softmmu/physmem.c:3193: address_space_unmap: Assertion `mr != NULL' failed.
Nov 22 14:58:15 node1 kernel: [3986137.675473] vmbr2: port 3(tap100i0) entered disabled state
Nov 22 14:58:15 node1 kernel: [3986137.675670] vmbr2: port 3(tap100i0) entered disabled state
Nov 22 14:58:15 node1 systemd[1]: 100.scope: Succeeded.
Nov 22 14:58:15 node1 systemd[1]: 100.scope: Consumed 3w 1d 4h 20min 51.899s CPU time.
Nov 22 14:58:16 node1 qmeventd[2855253]: Starting cleanup for 100
Nov 22 14:58:16 node1 qmeventd[2855253]: Finished cleanup for 100
Nov 22 14:58:20 node1 pve-ha-lrm[2855280]: starting service vm:100

2. Nov 22 15:52:10 node1 QEMU[2855301]: kvm: ../softmmu/physmem.c:3193: address_space_unmap: Assertion `mr != NULL' failed.
Nov 22 15:52:11 node1 kernel: [3989372.993092] vmbr2: port 3(tap100i0) entered disabled state
Nov 22 15:52:11 node1 kernel: [3989372.993244] vmbr2: port 3(tap100i0) entered disabled state
Nov 22 15:52:11 node1 systemd[1]: 100.scope: Succeeded.
Nov 22 15:52:11 node1 systemd[1]: 100.scope: Consumed 31min 7.749s CPU time.
Nov 22 15:52:11 node1 qmeventd[2867897]: Starting cleanup for 100
Nov 22 15:52:11 node1 qmeventd[2867897]: Finished cleanup for 100
Nov 22 15:52:12 node1 pve-ha-lrm[2867903]: starting service vm:100

3. Nov 24 11:18:08 node1 QEMU[3327010]: kvm: ../block/block-backend.c:1189: blk_wait_while_drained: Assertion `blk->in_flight > 0' failed.
Nov 24 11:18:08 node1 kernel: [4145731.659682] vmbr2: port 3(tap100i0) entered disabled state
Nov 24 11:18:08 node1 kernel: [4145731.659909] vmbr2: port 3(tap100i0) entered disabled state
Nov 24 11:18:08 node1 systemd[1]: 100.scope: Succeeded.
Nov 24 11:18:08 node1 systemd[1]: 100.scope: Consumed 7h 37min 5.517s CPU time.
Nov 24 11:18:09 node1 qmeventd[3505459]: Starting cleanup for 100
Nov 24 11:18:09 node1 qmeventd[3505459]: Finished cleanup for 100
Nov 24 11:18:14 node1 pve-ha-lrm[3505503]: starting service vm:100
 
Last edited:
the VM crashed (the Assertion messages indicate the reasons). could you post the VM and storage configs as well?
 
qm config 100
agent: 1
balloon: 0
boot: cdn
bootdisk: virtio0
cores: 6
hotplug: disk,network,usb,cpu
ide2: none,media=cdrom
memory: 34816
name: <vm-name>
net0: virtio=62:A9:85:05:0E:C4,bridge=vmbr2
numa: 0
ostype: l26
scsihw: virtio-scsi-pci
smbios1: uuid=140b3542-e740-4819-8591-6fc977c00a2d
sockets: 2
virtio0: sas-ssd:100/vm-100-disk-0.qcow2,size=120G
virtio1: sas-ssd:100/vm-100-disk-1.qcow2,backup=0,size=850G
virtio2: nlsas-hdd:100/vm-100-disk-0.qcow2,backup=0,size=200G
virtio3: sas-ssd:100/vm-100-disk-2.qcow2,backup=0,size=500G
virtio4: nlsas-hdd:100/vm-100-disk-1.qcow2,backup=0,size=300G
vmgenid: 5137291d-7e1a-44b1-b636-a5b2a305519c
 
nfs: sas-ssd
export /mnt/sas_ssd/ssd-nfs
path /mnt/pve/sas-ssd
server <ip address>
content images
options vers=3

nfs: nlsas-hdd
export /mnt/nlsas-hdd
path /mnt/pve/nlsas-hdd
server <ip address>
content iso,images
options vers=3
 
Last edited:
I'd suggest switching to virtio-scsi instead of virtio-block, and install the latest round of updates including kernel (there have been some issues with io_uring that might have caused this behaviour).
 
we're always updating nodes asap. but usually ain't able to reboot it for a long time to boot new kernel.
 
Last edited:
seems like solved. perhaps after one of updates. will watch it some more days...
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!