Virtual machines freezes with no console vnc output.

Lamarus · Jun 12, 2023

Hi ! I have a cluster of pve 7.2.7 and ceph 16.2.9

About a month ago, some virtual machines in my cluster with CIS rolled on them (https://github.com/ansible-lockdown/UBUNTU20-CIS ) started to hang up for no reason. Only resetting the machine helps. There is nothing in the logs of the machines that would suggest a reason. In the ceph logs, there are errors about osd.54 like:

...


** File Read Latency Histogram By Level [default] **

2023-06-11T02:00:57.664+0600 7f69beb2e700  0 bluestore(/var/lib/ceph/osd/ceph-54) log_latency slow operation observed for kv_final, latency = 5.295015335s
2023-06-11T02:00:57.668+0600 7f69beb2e700  0 bluestore(/var/lib/ceph/osd/ceph-54) log_latency_fn slow operation observed for _txc_committed_kv, latency = 5.264364243s, txc
= 0x55d756e7c000
2023-06-11T02:00:57.668+0600 7f69beb2e700  0 bluestore(/var/lib/ceph/osd/ceph-54) log_latency_fn slow operation observed for _txc_committed_kv, latency = 5.263929367s, txc
= 0x55d6f2c1fc00
2023-06-11T02:00:57.668+0600 7f69beb2e700  0 bluestore(/var/lib/ceph/osd/ceph-54) log_latency_fn slow operation observed for _txc_committed_kv, latency = 5.148843288s, txc
= 0x55d773da0000
2023-06-11T02:01:12.929+0600 7f69c8dda700  0 bad crc in data 4237898677 != exp 706411430 from v1:192.168.160.5:0/171261133
2023-06-11T02:01:20.709+0600 7f69ae31c700  0 bluestore(/var/lib/ceph/osd/ceph-54) log_latency slow operation observed for submit_transact, latency = 5.904376507s
2023-06-11T02:01:20.709+0600 7f69beb2e700  0 bluestore(/var/lib/ceph/osd/ceph-54) log_latency slow operation observed for kv_final, latency = 5.099318504s
2023-06-11T02:01:20.709+0600 7f69beb2e700  0 bluestore(/var/lib/ceph/osd/ceph-54) log_latency_fn slow operation observed for _txc_committed_kv, latency = 5.101370811s, txc
= 0x55d7d2caa380
2023-06-11T02:01:20.709+0600 7f69beb2e700  0 bluestore(/var/lib/ceph/osd/ceph-54) log_latency_fn slow operation observed for _txc_committed_kv, latency = 5.905891418s, txc
= 0x55d761e26000
...

many lines like:

2023-06-11T02:01:52.926+0600 7f69c8dda700  1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f69af31e700' had timed out after 15.000000954s

and after that:

2023-06-11T02:02:23.654+0600 7f69af31e700  0 bluestore(/var/lib/ceph/osd/ceph-54) log_latency slow operation observed for submit_transact, latency = 45.771850586s
2023-06-11T02:02:23.654+0600 7f69af31e700  1 heartbeat_map reset_timeout 'OSD::osd_op_tp thread 0x7f69af31e700' had timed out after 15.000000954s
2023-06-11T02:02:23.654+0600 7f69beb2e700  0 bluestore(/var/lib/ceph/osd/ceph-54) log_latency_fn slow operation observed for _txc_committed_kv, latency = 45.773029327s, txc
 = 0x55d7aa9bdc00
2023-06-11T02:02:34.362+0600 7f69b9b24700  4 rocksdb: [db_impl/db_impl_write.cc:1665] [default] New memtable created with log file: #318321. Immutable memtables: 0.

2023-06-11T02:02:34.366+0600 7f69c0340700  4 rocksdb: (Original Log Time 2023/06/11-02:02:34.367705) [db_impl/db_impl_compaction_flush.cc:2611] Compaction nothing to do
2023-06-11T02:02:34.366+0600 7f69bfb3f700  4 rocksdb: (Original Log Time 2023/06/11-02:02:34.367851) [db_impl/db_impl_compaction_flush.cc:2190] Calling FlushMemTableToOutpu
tFile with column family [default], flush slots available 1, compaction slots available 1, flush slots scheduled 1, compaction slots scheduled 0
2023-06-11T02:02:34.366+0600 7f69bfb3f700  4 rocksdb: [flush_job.cc:318] [default] [JOB 17566] Flushing memtable with next log file: 318321

but the disk does not fall out of the ceph cluster and continues to work later. Please help me figure out what the reason may be.

UPDATE1: I changed default SCSI controller to virtio and vm's stopped hanging tightly for now.

fiona · Jun 13, 2023

Hi,
do the messages in the Ceph logs correspond with the time the hangs happened? When you get a hanging VM, please follow the steps described here: https://forum.proxmox.com/threads/vms-freeze-with-100-cpu.127459/post-561792
Maybe it's the same issue, but even if not, it might give a hint why the VM is stuck.

How does the load on the node hosting OSD 54 look like?

Maybe upgrading to the latest version of Proxmox VE 7.4 helps?

Lamarus · Jun 13, 2023

fiona said:
Hi,
do the messages in the Ceph logs correspond with the time the hangs happened? When you get a hanging VM, please follow the steps described here: https://forum.proxmox.com/threads/vms-freeze-with-100-cpu.127459/post-561792
Maybe it's the same issue, but even if not, it might give a hint why the VM is stuck.

How does the load on the node hosting OSD 54 look like?

Maybe upgrading to the latest version of Proxmox VE 7.4 helps?

Hi ! And big thanks to you for your reply. I will check all stuff and come back later with info.

Lamarus · Jun 14, 2023

fiona said:
Hi,
do the messages in the Ceph logs correspond with the time the hangs happened? When you get a hanging VM, please follow the steps described here: https://forum.proxmox.com/threads/vms-freeze-with-100-cpu.127459/post-561792
Maybe it's the same issue, but even if not, it might give a hint why the VM is stuck.

How does the load on the node hosting OSD 54 look like?

Maybe upgrading to the latest version of Proxmox VE 7.4 helps?

Yesterday one of the vm's hung up. There were reports of disk problems in the syslog shortly before, but no one flew out of the cluster:

Jun 13 20:36:33 petr-stor4 smartd[1489]: Device: /dev/bus/0 [megaraid_disk_18] [SAT], 40 Currently unreadable (pending) sectors
Jun 13 20:36:33 petr-stor4 smartd[1489]: Device: /dev/bus/0 [megaraid_disk_18] [SAT], 40 Offline uncorrectable sectors

Debug info:

Code:

strace -c -p $(cat /var/run/qemu-server/136.pid)
strace: Process 27329 attached
^Cstrace: Process 27329 detached
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 24.89    0.028130          16      1661           ppoll
 22.44    0.025361           7      3335           clock_gettime
 21.11    0.023857           3      6344           write
 10.22    0.011548         384        30           sendmsg
  9.11    0.010302           3      3322           gettimeofday
  7.81    0.008832           5      1553           recvmsg
  4.40    0.004975           3      1631           read
  0.01    0.000011           1         6           close
  0.01    0.000006           1         6           getsockname
  0.01    0.000006           1         6           accept4
  0.00    0.000004           0        12           fcntl
------ ----------- ----------- --------- --------- ----------------
100.00    0.113032           6     17906           total



gdb --batch --ex 't a a bt' -p $(cat /var/run/qemu-server/136.pid)
[New LWP 27330]
[New LWP 27495]
[New LWP 27496]
[New LWP 27498]
[New LWP 27628]
[New LWP 27684]
[New LWP 27685]
[New LWP 27741]
[New LWP 27747]
[New LWP 27748]
[New LWP 27807]
[New LWP 28399]
[New LWP 34468]
[New LWP 34933]
[New LWP 47079]
[New LWP 231416]

warning: Could not load vsyscall page because no executable was specified
0x00007f9b4beab4f6 in ?? ()

Thread 17 (LWP 231416 "iou-wrk-27495"):
#0  0x0000000000000000 in ?? ()
Backtrace stopped: Cannot access memory at address 0x0

Thread 16 (LWP 47079 "iou-wrk-27329"):
#0  0x0000000000000000 in ?? ()
Backtrace stopped: Cannot access memory at address 0x0

Thread 15 (LWP 34933 "iou-wrk-27329"):
#0  0x0000000000000000 in ?? ()
Backtrace stopped: Cannot access memory at address 0x0

Thread 14 (LWP 34468 "iou-wrk-27495"):
#0  0x0000000000000000 in ?? ()
Backtrace stopped: Cannot access memory at address 0x0

Thread 13 (LWP 28399 "iou-wrk-27496"):
#0  0x0000000000000000 in ?? ()
Backtrace stopped: Cannot access memory at address 0x0

Thread 12 (LWP 27807 "iou-wrk-27329"):
#0  0x0000000000000000 in ?? ()
Backtrace stopped: Cannot access memory at address 0x0

Thread 11 (LWP 27748 "iou-wrk-27329"):
#0  0x0000000000000000 in ?? ()
Backtrace stopped: Cannot access memory at address 0x0

Thread 10 (LWP 27747 "iou-wrk-27495"):
#0  0x0000000000000000 in ?? ()
Backtrace stopped: Cannot access memory at address 0x0

Thread 9 (LWP 27741 "iou-wrk-27496"):
#0  0x0000000000000000 in ?? ()
Backtrace stopped: Cannot access memory at address 0x0

Thread 8 (LWP 27685 "iou-wrk-27496"):
#0  0x0000000000000000 in ?? ()
Backtrace stopped: Cannot access memory at address 0x0

Thread 7 (LWP 27684 "iou-wrk-27496"):
#0  0x0000000000000000 in ?? ()
Backtrace stopped: Cannot access memory at address 0x0

Thread 6 (LWP 27628 "iou-wrk-27495"):
#0  0x0000000000000000 in ?? ()
Backtrace stopped: Cannot access memory at address 0x0

Thread 5 (LWP 27498 "kvm"):
#0  0x00007f9b4bf8e7b2 in ?? ()
#1  0x00000000000009b8 in ?? ()
#2  0x0000000000000000 in ?? ()

Thread 4 (LWP 27496 "kvm"):
#0  0x00007f9b4beaccc7 in ?? ()
#1  0x00005556571c6f87 in ?? ()
#2  0x0000000000000400 in ?? ()
#3  0x0000000000000000 in ?? ()

Thread 3 (LWP 27495 "kvm"):
#0  0x00007f9b4beaccc7 in ?? ()
#1  0x00005556571c6f87 in ?? ()
#2  0x0000000000000014 in ?? ()
#3  0x0000000000000000 in ?? ()

Thread 2 (LWP 27330 "kvm"):
#0  0x00007f9b4beb09b9 in ?? ()
#1  0x000055565733a9fa in ?? ()
#2  0x0000000000000000 in ?? ()

Thread 1 (LWP 27329 "kvm"):
#0  0x00007f9b4beab4f6 in ?? ()
#1  0xffffffff6488bebd in ?? ()
#2  0x0000555658de67f0 in ?? ()
#3  0x0000000000000008 in ?? ()
#4  0x0000000000000000 in ?? ()
[Inferior 1 (process 27329) detached]


qm config 136
agent: 0
boot: order=scsi0;ide2;net0
cores: 2
cpu: host
description: Main System%3A MongoDB%3A 192.168.151.58/24 Template%3A ubuntu22-cis-level-1-sysadmin
ide2: none,media=cdrom
memory: 4096
meta: creation-qemu=6.2.0,ctime=1660818675
name: dev-mongo
net0: virtio=42:39:14:73:A0:CB,bridge=vmbr1
net1: virtio=A2:4B:E1:87:35:F4,bridge=vmbr0,firewall=1
numa: 0
ostype: l26
scsi0: hdd_storage:vm-136-disk-0,aio=native,size=50G
scsi1: hdd_storage:vm-136-disk-1,aio=native,size=100G
smbios1: uuid=4287ffa7-54f8-4d91-81ed-56a5be544982
sockets: 1
vmgenid: e3598efd-b6fd-453c-be8e-33691b959d60


pveversion -v
proxmox-ve: 7.2-1 (running kernel: 5.15.35-2-pve)
pve-manager: 7.2-7 (running version: 7.2-7/d0dd0e85)
pve-kernel-5.15: 7.2-4
pve-kernel-helper: 7.2-4
pve-kernel-5.4: 6.4-16
pve-kernel-5.15.35-2-pve: 5.15.35-5
pve-kernel-5.4.178-1-pve: 5.4.178-1
pve-kernel-5.4.174-2-pve: 5.4.174-2
pve-kernel-4.4.134-1-pve: 4.4.134-112
pve-kernel-4.4.98-5-pve: 4.4.98-105
pve-kernel-4.4.95-1-pve: 4.4.95-99
ceph: 16.2.9-pve1
ceph-fuse: 16.2.9-pve1
corosync: 3.1.5-pve2
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown: 0.8.36+pve1
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.24-pve1
libproxmox-acme-perl: 1.4.2
libproxmox-backup-qemu0: 1.3.1-1
libpve-access-control: 7.2-3
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.2-2
libpve-guest-common-perl: 4.1-2
libpve-http-server-perl: 4.1-3
libpve-storage-perl: 7.2-5
libqb0: 1.0.5-1
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 4.0.12-1
lxcfs: 4.0.12-pve1
novnc-pve: 1.3.0-3
proxmox-backup-client: 2.2.3-1
proxmox-backup-file-restore: 2.2.3-1
proxmox-mini-journalreader: 1.3-1
proxmox-widget-toolkit: 3.5.1
pve-cluster: 7.2-1
pve-container: 4.2-1
pve-docs: 7.2-2
pve-edk2-firmware: 3.20210831-2
pve-firewall: 4.2-5
pve-firmware: 3.4-2
pve-ha-manager: 3.3-4
pve-i18n: 2.7-2
pve-qemu-kvm: 7.2.0-8
pve-xtermjs: 4.16.0-1
qemu-server: 7.2-3
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.7.1~bpo11+1
vncterm: 1.7-1
zfsutils-linux: 2.1.4-pve1

fiona · Jun 14, 2023

Nessero said:
Yesterday one of the vm's hung up. There were reports of disk problems in the syslog shortly before, but no one flew out of the cluster:
Jun 13 20:36:33 petr-stor4 smartd[1489]: Device: /dev/bus/0 [megaraid_disk_18] [SAT], 40 Currently unreadable (pending) sectors Jun 13 20:36:33 petr-stor4 smartd[1489]: Device: /dev/bus/0 [megaraid_disk_18] [SAT], 40 Offline uncorrectable sectors

What is the output of smartctl -a /dev/bus/0? AFAIU, if there are no other messages about IO errors this might not be critical, but is definitely a hint to keep a closer look at the drive.

Nessero said:

Debug info:

Code:

strace -c -p $(cat /var/run/qemu-server/136.pid)
strace: Process 27329 attached
^Cstrace: Process 27329 detached
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 24.89    0.028130          16      1661           ppoll
 22.44    0.025361           7      3335           clock_gettime
 21.11    0.023857           3      6344           write
 10.22    0.011548         384        30           sendmsg
  9.11    0.010302           3      3322           gettimeofday
  7.81    0.008832           5      1553           recvmsg
  4.40    0.004975           3      1631           read
  0.01    0.000011           1         6           close
  0.01    0.000006           1         6           getsockname
  0.01    0.000006           1         6           accept4
  0.00    0.000004           0        12           fcntl
------ ----------- ----------- --------- --------- ----------------
100.00    0.113032           6     17906           total

Pretty safe to assume that it's not the same issue as in the other thread. Telling from just this, QEMU itself seems to be running like usual. What do the system logs within the guest show?

Nessero said:

Code:

gdb --batch --ex 't a a bt' -p $(cat /var/run/qemu-server/136.pid)
[New LWP 27330]
[New LWP 27495]
[New LWP 27496]
[New LWP 27498]
[New LWP 27628]
[New LWP 27684]
[New LWP 27685]
[New LWP 27741]
[New LWP 27747]
[New LWP 27748]
[New LWP 27807]
[New LWP 28399]
[New LWP 34468]
[New LWP 34933]
[New LWP 47079]
[New LWP 231416]

warning: Could not load vsyscall page because no executable was specified
0x00007f9b4beab4f6 in ?? ()

Thread 17 (LWP 231416 "iou-wrk-27495"):
#0  0x0000000000000000 in ?? ()
Backtrace stopped: Cannot access memory at address 0x0

Thread 16 (LWP 47079 "iou-wrk-27329"):
#0  0x0000000000000000 in ?? ()
Backtrace stopped: Cannot access memory at address 0x0

Thread 15 (LWP 34933 "iou-wrk-27329"):
#0  0x0000000000000000 in ?? ()
Backtrace stopped: Cannot access memory at address 0x0

Thread 14 (LWP 34468 "iou-wrk-27495"):
#0  0x0000000000000000 in ?? ()
Backtrace stopped: Cannot access memory at address 0x0

Thread 13 (LWP 28399 "iou-wrk-27496"):
#0  0x0000000000000000 in ?? ()
Backtrace stopped: Cannot access memory at address 0x0

Thread 12 (LWP 27807 "iou-wrk-27329"):
#0  0x0000000000000000 in ?? ()
Backtrace stopped: Cannot access memory at address 0x0

Thread 11 (LWP 27748 "iou-wrk-27329"):
#0  0x0000000000000000 in ?? ()
Backtrace stopped: Cannot access memory at address 0x0

Thread 10 (LWP 27747 "iou-wrk-27495"):
#0  0x0000000000000000 in ?? ()
Backtrace stopped: Cannot access memory at address 0x0

Thread 9 (LWP 27741 "iou-wrk-27496"):
#0  0x0000000000000000 in ?? ()
Backtrace stopped: Cannot access memory at address 0x0

Thread 8 (LWP 27685 "iou-wrk-27496"):
#0  0x0000000000000000 in ?? ()
Backtrace stopped: Cannot access memory at address 0x0

Thread 7 (LWP 27684 "iou-wrk-27496"):
#0  0x0000000000000000 in ?? ()
Backtrace stopped: Cannot access memory at address 0x0

Thread 6 (LWP 27628 "iou-wrk-27495"):
#0  0x0000000000000000 in ?? ()
Backtrace stopped: Cannot access memory at address 0x0

Thread 5 (LWP 27498 "kvm"):
#0  0x00007f9b4bf8e7b2 in ?? ()
#1  0x00000000000009b8 in ?? ()
#2  0x0000000000000000 in ?? ()

Thread 4 (LWP 27496 "kvm"):
#0  0x00007f9b4beaccc7 in ?? ()
#1  0x00005556571c6f87 in ?? ()
#2  0x0000000000000400 in ?? ()
#3  0x0000000000000000 in ?? ()

Thread 3 (LWP 27495 "kvm"):
#0  0x00007f9b4beaccc7 in ?? ()
#1  0x00005556571c6f87 in ?? ()
#2  0x0000000000000014 in ?? ()
#3  0x0000000000000000 in ?? ()

Thread 2 (LWP 27330 "kvm"):
#0  0x00007f9b4beb09b9 in ?? ()
#1  0x000055565733a9fa in ?? ()
#2  0x0000000000000000 in ?? ()

Thread 1 (LWP 27329 "kvm"):
#0  0x00007f9b4beab4f6 in ?? ()
#1  0xffffffff6488bebd in ?? ()
#2  0x0000555658de67f0 in ?? ()
#3  0x0000000000000008 in ?? ()
#4  0x0000000000000000 in ?? ()
[Inferior 1 (process 27329) detached]

Unfortunately, there is no debug information there. Did you install the pve-qemu-kvm-dbg package? Or was the VM running for long (i.e. started with an earlier version of pve-qemu-kvm than the currently installed one)?

Lamarus · Jun 14, 2023

proxmox and ceph have been on our supermicro servers for 5 years. Nothing on Hardware and firmware has changed. And trace of another vm:

Code:

strace -c -p $(cat /var/run/qemu-server/140.pid)
strace: Process 28829 attached
^Cstrace: Process 28829 detached
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
  0.00    0.000000           0      1623           read
  0.00    0.000000           0      6312           write
  0.00    0.000000           0         6           close
  0.00    0.000000           0        30           sendmsg
  0.00    0.000000           0      1545           recvmsg
  0.00    0.000000           0         6           getsockname
  0.00    0.000000           0        12           fcntl
  0.00    0.000000           0      3320           gettimeofday
  0.00    0.000000           0      3333           clock_gettime
  0.00    0.000000           0      1660           ppoll
  0.00    0.000000           0         6           accept4
------ ----------- ----------- --------- --------- ----------------
100.00    0.000000           0     17853           total


gdb --batch --ex 't a a bt' -p $(cat /var/run/qemu-server/140.pid)
[New LWP 28830]
[New LWP 29005]
[New LWP 29006]
[New LWP 29007]
[New LWP 29008]
[New LWP 29011]
[New LWP 29600]
[New LWP 29956]
[New LWP 29976]
[New LWP 30016]
[New LWP 30017]
[New LWP 30658]
[New LWP 30803]
[New LWP 30805]
[New LWP 30864]
[New LWP 30987]
[New LWP 31052]
[New LWP 37168]
[New LWP 1748628]
[New LWP 3149346]
[New LWP 4106043]
[New LWP 929403]
[New LWP 1240957]
[New LWP 2853908]
[New LWP 3359996]
[New LWP 206050]

warning: Could not load vsyscall page because no executable was specified
0x00007feb627524f6 in ?? ()

Thread 27 (LWP 206050 "iou-wrk-29005"):
#0  0x0000000000000000 in ?? ()
Backtrace stopped: Cannot access memory at address 0x0

Thread 26 (LWP 3359996 "iou-wrk-28829"):
#0  0x0000000000000000 in ?? ()
Backtrace stopped: Cannot access memory at address 0x0

Thread 25 (LWP 2853908 "iou-wrk-28829"):
#0  0x0000000000000000 in ?? ()
Backtrace stopped: Cannot access memory at address 0x0

Thread 24 (LWP 1240957 "iou-wrk-28829"):
#0  0x0000000000000000 in ?? ()
Backtrace stopped: Cannot access memory at address 0x0

Thread 23 (LWP 929403 "iou-wrk-28829"):
#0  0x0000000000000000 in ?? ()
Backtrace stopped: Cannot access memory at address 0x0

Thread 22 (LWP 4106043 "iou-wrk-29005"):
#0  0x0000000000000000 in ?? ()
Backtrace stopped: Cannot access memory at address 0x0

Thread 21 (LWP 3149346 "iou-wrk-29006"):
#0  0x0000000000000000 in ?? ()
Backtrace stopped: Cannot access memory at address 0x0

Thread 20 (LWP 1748628 "iou-wrk-29008"):
#0  0x0000000000000000 in ?? ()
Backtrace stopped: Cannot access memory at address 0x0

Thread 19 (LWP 37168 "iou-wrk-29007"):
#0  0x0000000000000000 in ?? ()
Backtrace stopped: Cannot access memory at address 0x0

Thread 18 (LWP 31052 "iou-wrk-29007"):
#0  0x0000000000000000 in ?? ()
Backtrace stopped: Cannot access memory at address 0x0

Thread 17 (LWP 30987 "iou-wrk-29006"):
#0  0x0000000000000000 in ?? ()
Backtrace stopped: Cannot access memory at address 0x0

Thread 16 (LWP 30864 "iou-wrk-29005"):
#0  0x0000000000000000 in ?? ()
Backtrace stopped: Cannot access memory at address 0x0

Thread 15 (LWP 30805 "iou-wrk-29005"):
#0  0x0000000000000000 in ?? ()
Backtrace stopped: Cannot access memory at address 0x0

Thread 14 (LWP 30803 "iou-wrk-29006"):
#0  0x0000000000000000 in ?? ()
Backtrace stopped: Cannot access memory at address 0x0

Thread 13 (LWP 30658 "iou-wrk-29008"):
#0  0x0000000000000000 in ?? ()
Backtrace stopped: Cannot access memory at address 0x0

Thread 12 (LWP 30017 "iou-wrk-29007"):
#0  0x0000000000000000 in ?? ()
Backtrace stopped: Cannot access memory at address 0x0

Thread 11 (LWP 30016 "iou-wrk-29008"):
#0  0x0000000000000000 in ?? ()
Backtrace stopped: Cannot access memory at address 0x0

Thread 10 (LWP 29976 "iou-wrk-29008"):
#0  0x0000000000000000 in ?? ()
Backtrace stopped: Cannot access memory at address 0x0

Thread 9 (LWP 29956 "iou-wrk-29006"):
#0  0x0000000000000000 in ?? ()
Backtrace stopped: Cannot access memory at address 0x0

Thread 8 (LWP 29600 "iou-wrk-29007"):
#0  0x0000000000000000 in ?? ()
Backtrace stopped: Cannot access memory at address 0x0

Thread 7 (LWP 29011 "kvm"):
#0  0x00007feb628357b2 in ?? ()
#1  0x0000000000000058 in ?? ()
#2  0x0000000000000000 in ?? ()

Thread 6 (LWP 29008 "kvm"):
#0  0x00007feb62753cc7 in ?? ()
#1  0x000055aaa9407f87 in ?? ()
#2  0x0000000000000400 in ?? ()
#3  0x0000000000000000 in ?? ()

Thread 5 (LWP 29007 "kvm"):
#0  0x00007feb62753cc7 in ?? ()
#1  0x000055aaa9407f87 in ?? ()
#2  0x0000000000000014 in ?? ()
#3  0x0000000000000000 in ?? ()

Thread 4 (LWP 29006 "kvm"):
#0  0x00007feb62753cc7 in ?? ()
#1  0x000055aaa9407f87 in ?? ()
#2  0x0000000000000001 in ?? ()
#3  0x0000000000000000 in ?? ()

Thread 3 (LWP 29005 "kvm"):
#0  0x00007feb62753cc7 in ?? ()
#1  0x000055aaa9407f87 in ?? ()
#2  0x0000000000000008 in ?? ()
#3  0x0000000000000000 in ?? ()

Thread 2 (LWP 28830 "kvm"):
#0  0x00007feb627579b9 in ?? ()
#1  0x000055aaa957b9fa in ?? ()
#2  0x0000000000000000 in ?? ()

Thread 1 (LWP 28829 "kvm"):
#0  0x00007feb627524f6 in ?? ()
#1  0xffffffff64896acb in ?? ()
#2  0x000055aaaaa90400 in ?? ()
#3  0x0000000000000009 in ?? ()
#4  0x0000000000000000 in ?? ()
[Inferior 1 (process 28829) detached]


qm config 140
agent: 0
balloon: 0
boot: order=scsi0;ide2;net0
cores: 2
description: Main System%3A  CockroachBD%0AIP%3A           192.168.151.20/24%0ATemplate%3A     ubuntu22-cis-level-1
ide2: none,media=cdrom
memory: 4096
meta: creation-qemu=6.2.0,ctime=1660818675
name: dev-cockroach-1
net0: virtio=62:E5:12:C3:A5:3A,bridge=vmbr1
net1: virtio=EE:1D:3C:C0:59:42,bridge=vmbr0,firewall=1
numa: 0
ostype: l26
scsi0: hdd_storage:vm-140-disk-0,aio=native,iothread=1,size=50G
scsi1: hdd_storage:vm-140-disk-1,aio=native,iothread=1,size=300G
smbios1: uuid=cf37ccc9-5bcf-48db-8f5f-8f7369ec5e71
sockets: 2
vmgenid: b3051043-c96b-47a0-8a21-1c97dbfe2e95

Lamarus · Aug 12, 2023

after many config tests that config started to be stable for our installation. I don't know why.

fiona · Aug 14, 2023

Nessero said:
after many config tests that config started to be stable for our installation. I don't know why.

View attachment 54170

Doesn't seem too different from the other configurations you posted. The CPU count is just 1 instead of 2 and the storage is ceph_storage instead of hdd_storage. Is that a new storage?

I also noticed there was no scsihw setting for the other two configs. For backwards compatibility, a rather old emulated LSI controller is used then, might also not have been the best.

One of the other configurations had iothread on the disks (but missing the required VirtIO SCSI single setting?) and one of them had CPU type host.

Virtual machines freezes with no console vnc output.

Lamarus

Well-Known Member

fiona

Proxmox Staff Member

Lamarus

Well-Known Member

Lamarus

Well-Known Member

fiona

Proxmox Staff Member

Lamarus

Well-Known Member

Lamarus

Well-Known Member

fiona

Proxmox Staff Member

We value your privacy