Having a strang issue. Environment is a 5 node cluster, CEPH underneath, all SSD. About 73 VMs running, tons of RAM at CPU available.
A guest, which is a CentOS 7.9 linux box, occasionally locks up around when backup occurs. Backing up to a remote PBS. Guest-tools are installed on the guest and communicating with the host. We backup nightly, and it doesn't lock up every night. But maybe 1 or 2 times a week. When I say lock up, the login prompt does allow you to type a username in on console, but never returns a prompt for password. The console is spammed as shown here. Guest is linux kernel is 3.10. From the research I have done, it does appear to be some sort of disk subsystem inaccessibility issue. However many, many other VMs operate on this cluster and that node with no issue with uptimes of 100's of days. I will say that we don't run a lot of CentOS (mostly debian and windows). I haven't power-cycled it yet, so if there any commands I can run on the hypervisor to hep troubleshoot, let me know and I will run them. Any help or pointers appreciated!

A guest, which is a CentOS 7.9 linux box, occasionally locks up around when backup occurs. Backing up to a remote PBS. Guest-tools are installed on the guest and communicating with the host. We backup nightly, and it doesn't lock up every night. But maybe 1 or 2 times a week. When I say lock up, the login prompt does allow you to type a username in on console, but never returns a prompt for password. The console is spammed as shown here. Guest is linux kernel is 3.10. From the research I have done, it does appear to be some sort of disk subsystem inaccessibility issue. However many, many other VMs operate on this cluster and that node with no issue with uptimes of 100's of days. I will say that we don't run a lot of CentOS (mostly debian and windows). I haven't power-cycled it yet, so if there any commands I can run on the hypervisor to hep troubleshoot, let me know and I will run them. Any help or pointers appreciated!
Code:
root@pvea2:~# qm status 142 --verbose
balloon: 17179869184
ballooninfo:
actual: 17179869184
free_mem: 3602145280
last_update: 1704892454
major_page_faults: 1568
max_mem: 17179869184
mem_swapped_in: 0
mem_swapped_out: 0
minor_page_faults: 722860004
total_mem: 16655044608
blockstat:
scsi0:
account_failed: 1
account_invalid: 1
failed_flush_operations: 0
failed_rd_operations: 0
failed_unmap_operations: 0
failed_wr_operations: 0
failed_zone_append_operations: 0
flush_operations: 1045232
flush_total_time_ns: 1325208395575
idle_time_ns: 31477011384646
invalid_flush_operations: 0
invalid_rd_operations: 0
invalid_unmap_operations: 0
invalid_wr_operations: 0
invalid_zone_append_operations: 0
rd_bytes: 1180863488
rd_merged: 0
rd_operations: 63496
rd_total_time_ns: 84561163209
timed_stats:
unmap_bytes: 0
unmap_merged: 0
unmap_operations: 0
unmap_total_time_ns: 0
wr_bytes: 178115051520
wr_highest_offset: 322119630848
wr_merged: 0
wr_operations: 10838567
wr_total_time_ns: 2099085771014
zone_append_bytes: 0
zone_append_merged: 0
zone_append_operations: 0
zone_append_total_time_ns: 0
cpus: 12
disk: 0
diskread: 1180863488
diskwrite: 178115051520
freemem: 3602145280
maxdisk: 0
maxmem: 17179869184
mem: 13052899328
name: PNET-voipmonitor.voice.planet.net
netin: 759955683
netout: 397782855
nics:
tap142i0:
netin: 274156480
netout: 249630649
tap142i1:
netin: 485799203
netout: 148152206
pid: 4011717
proxmox-support:
backup-max-workers: 1
pbs-dirty-bitmap: 1
pbs-dirty-bitmap-migration: 1
pbs-dirty-bitmap-savevm: 1
pbs-library-version: 1.4.1 (UNKNOWN)
pbs-masterkey: 1
query-bitmap-info: 1
qmpstatus: running
running-machine: pc-i440fx-8.1+pve0
running-qemu: 8.1.2
status: running
uptime: 512854
vmid: 142
