[SOLVED] FreeBSD VM's deinit issues under latest Proxmox

mach · Jan 13, 2022

Hello,

Since upgrading from 6.4 to 7.1 we have experienced many issues with FreeBSD vm's. Please keep in mind that we had no issues on 6.4.

For example our dovecot FreeBSD 13 (latest patch release) server during busier times would start to deinit dovecot processes, and the load average jumps from below 1, to over 200. Nothing is actually using any CPU, so this definitely seems like disk io issue. Dovecot processes just keep on starting and never get killed, ending up with thousands of deinit processes. Only solution is to reboot the VM.

There are no errors logged anywhere, on proxmox host or freebsd vm.

Proxmox summary shows excessively high CPU usage and Disk IO, network is low:

This happens on both raw and qcow2 VM's. I have tried switching from default io_uring to native and threads, as well as combinations of no cache and writeback using VirtIO SCSI single.

On FreeBSD vm, I have tried different time counters from HPET, TSC-low, to kvmclock.

And I've disabled balloon memory just in case.

I have also tried different CPU options from host, to actual host processor to kvm64.

This happens randomly, usually during busier times. Sometimes it happens within few hours, sometimes it takes days to happen.

I have also tried pve-kernel-5.13.19-1-pve and pve-kernel-5.15.7-1-pve.

I believe this has something to do with the issue that was occurring on Linux VM's with IO errors. pve-qemu-kvm_6.1.0-3 doesn't fix this issue on FreeBSD VMs.

vm config:

Code:

qm config 188
agent: 1
balloon: 0
boot: cdn
bootdisk: scsi0
cores: 24
cpu: host,flags=+aes
machine: q35
memory: 49152
name: garibaldi
net0: virtio=0A:06:9A:F4:7A:01,bridge=vmbr0,firewall=1,queues=8
numa: 0
onboot: 1
ostype: l26
scsi0: local-zfs:vm-188-disk-0,aio=threads,cache=writeback,discard=on,format=raw,size=256G,ssd=1
scsi1: storage:vm-188-disk-0,aio=threads,backup=0,cache=writeback,discard=on,format=raw,size=2T,ssd=1
scsihw: virtio-scsi-single
smbios1: uuid=4479ea4e-6825-42fd-bee4-1a194dacf635
sockets: 1
vmgenid: 9ead784d-701c-4817-8614-9cc019ebe2f6

pveversion:

Code:

pveversion -v
proxmox-ve: 7.1-1 (running kernel: 5.13.19-2-pve)
pve-manager: 7.1-8 (running version: 7.1-8/5b267f33)
pve-kernel-5.15: 7.1-7
pve-kernel-helper: 7.1-6
pve-kernel-5.13: 7.1-5
pve-kernel-5.4: 6.4-11
pve-kernel-5.15.7-1-pve: 5.15.7-1
pve-kernel-5.15.5-1-pve: 5.15.5-1
pve-kernel-5.13.19-2-pve: 5.13.19-4
pve-kernel-5.4.157-1-pve: 5.4.157-1
ceph-fuse: 14.2.21-1
corosync: 3.1.5-pve2
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown: residual config
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.22-pve2
libproxmox-acme-perl: 1.4.0
libproxmox-backup-qemu0: 1.2.0-1
libpve-access-control: 7.1-5
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.0-14
libpve-guest-common-perl: 4.0-3
libpve-http-server-perl: 4.0-4
libpve-storage-perl: 7.0-15
libqb0: 1.0.5-1
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 4.0.11-1
lxcfs: 4.0.11-pve1
novnc-pve: 1.3.0-1
proxmox-backup-client: 2.1.2-1
proxmox-backup-file-restore: 2.1.2-1
proxmox-mini-journalreader: 1.3-1
proxmox-widget-toolkit: 3.4-4
pve-cluster: 7.1-3
pve-container: 4.1-3
pve-docs: 7.1-2
pve-edk2-firmware: 3.20210831-2
pve-firewall: 4.2-5
pve-firmware: 3.3-4
pve-ha-manager: 3.3-1
pve-i18n: 2.6-2
pve-qemu-kvm: 6.1.0-3
pve-xtermjs: 4.12.0-1
qemu-server: 7.1-4
smartmontools: 7.2-pve2
spiceterm: 3.2-2
swtpm: 0.7.0~rc1+2
vncterm: 1.7-1
zfsutils-linux: 2.1.1-pve3

Also notice that the disk written is at almost 3.4TB, but net traffic in is only 6.7GB... So impossible to have written that much data:

Code:

qm status 188 --verbose
blockstat:
        scsi0:
                account_failed: 1
                account_invalid: 1
                failed_flush_operations: 0
                failed_rd_operations: 0
                failed_unmap_operations: 0
                failed_wr_operations: 0
                flush_operations: 1
                flush_total_time_ns: 269077
                idle_time_ns: 1478509445
                invalid_flush_operations: 0
                invalid_rd_operations: 0
                invalid_unmap_operations: 0
                invalid_wr_operations: 0
                rd_bytes: 12456033280
                rd_merged: 0
                rd_operations: 422602
                rd_total_time_ns: 145656193055
                timed_stats:
                unmap_bytes: 0
                unmap_merged: 0
                unmap_operations: 0
                unmap_total_time_ns: 0
                wr_bytes: 80045617152
                wr_highest_offset: 268511662080
                wr_merged: 0
                wr_operations: 1967992
                wr_total_time_ns: 226722352806
        scsi1:
                account_failed: 1
                account_invalid: 1
                failed_flush_operations: 0
                failed_rd_operations: 0
                failed_unmap_operations: 0
                failed_wr_operations: 0
                flush_operations: 1
                flush_total_time_ns: 382650
                idle_time_ns: 909307514
                invalid_flush_operations: 0
                invalid_rd_operations: 0
                invalid_unmap_operations: 0
                invalid_wr_operations: 0
                rd_bytes: 58119712768
                rd_merged: 0
                rd_operations: 1252234
                rd_total_time_ns: 383142803225
                timed_stats:
                unmap_bytes: 0
                unmap_merged: 0
                unmap_operations: 0
                unmap_total_time_ns: 0
                wr_bytes: 3310958258176
                wr_highest_offset: 2188121796608
                wr_merged: 0
                wr_operations: 100482026
                wr_total_time_ns: 11465880717734
cpus: 24
disk: 0
diskread: 70575746048
diskwrite: 3391003875328
maxdisk: 274877906944
maxmem: 51539607552
mem: 45357627372
name: garibaldi
netin: 6770949174
netout: 42921508171
nics:
        tap188i0:
                netin: 6770949174
                netout: 42921508171
pid: 3885990
proxmox-support:
        pbs-dirty-bitmap: 1
        pbs-dirty-bitmap-migration: 1
        pbs-dirty-bitmap-savevm: 1
        pbs-library-version: 1.2.0 (6e555bc73a7dcfb4d0b47355b958afd101ad27b5)
        pbs-masterkey: 1
        query-bitmap-info: 1
qmpstatus: running
running-machine: pc-q35-6.1+pve0
running-qemu: 6.1.0
status: running
uptime: 94978
vmid: 188

Waschbüsch · Jan 24, 2022

I have seen similar situations on my mailservers (exim + dovecot). Load gets high, queues are stuck. People can no longer fetch mail (not only do no new mail come in, but connections to dovecot just time out).
But the same issue occurs with other VMs (my build box for poudriere for instance).
I also tried different settings regarding cache and async io.

In 99% of cases, manually issuing a sync from inside the VM 'fixed' things and processes all of a sudden resumed actually doing stuff and processing their workloads.

It is as if the OS never gets around to actually flushing stuff to disk, filling up queues and buffers which causes userland processes to wait in vain until a timeout occurs?

Since I have not seen these issues with FreeBSD 12 based VMs yet (though the few I still have are not busy, so it may just be coincidence), my working assumption hab been that FreeBSD 13 introduced some change / optimization that somehow causes this.
But from what you describe, it was not an issue with older qemu/kvm?

mach · Jan 24, 2022

Yes, it's a bug in FreeBSD 13 RELEASE including the latest patch. It has only been fixed in STABLE. Not sure when it will make it to RELEASE, maybe with the next version 13.1.

So the options are to either turn off journaling or upgrade to STABLE kernel for now.

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=255473

Waschbüsch · Jan 24, 2022

mach said:
Yes, it's a bug in FreeBSD 13 RELEASE including the latest patch. It has only been fixed in STABLE. Not sure when it will make it to RELEASE, maybe with the next version 13.1.

So the options are to either turn off journaling or upgrade to STABLE kernel for now.

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=255473

Thanks for the link!

Search

Search

[SOLVED] FreeBSD VM's deinit issues under latest Proxmox

mach

Member

Waschbüsch

Renowned Member

mach

Member

Waschbüsch

Renowned Member

We value your privacy