Proxmox VE 8.0 released!

t.lamprecht · Nov 22, 2023

hepo said:
We also managed to confirm that the issue is cased by the backup, we are using PBS, datastore on TrueNAS Core server over NFS.
Triggered two of-schedule backups that caused few random VMs to freeze, some databases (although still responding via ssh) were damaged.

Backing up to NFS (indirectly) might add some latency that can stall the guest a bit, especially if IO-thread is turned off (which is not really useful for increasing bandwidth, but can help a lot to reduce latency and stalls for the VM overall). Anyhow, the odd thing here is that it worked OK for you with Proxmox VE 7. Can you post some details about affected VMs (config and what runs in them, i.e., OS and roughly the applications that probably cause most of the load)

hepo · Nov 22, 2023

Thanks for engaging!

Some details on the backup infra:

PBS server is VM on the PVE cluster
TrueNAS server has 128GB RAM (plenty of ARC)
ZFS pool is striped mirror of HDDs

VM for the example will be 4142, VM config:

Code:

root@pvelw11:~# cat /etc/pve/qemu-server/4142.conf
agent: 1,fstrim_cloned_disks=1
boot: order=scsi0;net0
cores: 32
cpu: x86-64-v2-AES
memory: 65536
name: prod-lws142-dbcl42
net0: virtio=AA:94:57:24:A1:B7,bridge=vmbr0,firewall=1
numa: 0
onboot: 1
ostype: l26
scsi0: ceph:vm-4142-disk-0,discard=on,iothread=1,size=32G,ssd=1
scsi1: ceph:vm-4142-disk-1,discard=on,iothread=1,size=100G,ssd=1
scsi2: ceph:vm-4142-disk-2,discard=on,iothread=1,size=40G,ssd=1
scsihw: virtio-scsi-single
smbios1: uuid=01079099-6335-4743-b82a-1cc6094961d3
sockets: 1
vmgenid: 7bb5b2a3-ad50-4252-9bc1-30918035e25f

As general remark, we use virtio-scsi-single and iothreads on all VMs, removed the iothreads on few VMs for testing.

Here's extract from the latest backup job on this VM

Code:

INFO: VM Name: prod-lws142-dbcl42
INFO: include disk 'scsi0' 'ceph:vm-4142-disk-0' 32G
INFO: include disk 'scsi1' 'ceph:vm-4142-disk-1' 100G
INFO: include disk 'scsi2' 'ceph:vm-4142-disk-2' 40G
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: creating Proxmox Backup Server archive 'vm/4142/2023-11-21T13:59:49Z'
INFO: issuing guest-agent 'fs-freeze' command
INFO: issuing guest-agent 'fs-thaw' command
INFO: started backup task 'a11eae08-59d3-42ac-8553-9cfe8c6a248c'
INFO: resuming VM again
INFO: scsi0: dirty-bitmap status: OK (440.0 MiB of 32.0 GiB dirty)
INFO: scsi1: dirty-bitmap status: OK (14.1 GiB of 100.0 GiB dirty)
INFO: scsi2: dirty-bitmap status: OK (548.0 MiB of 40.0 GiB dirty)
INFO: using fast incremental mode (dirty-bitmap), 15.1 GiB dirty of 172.0 GiB total
INFO: 100% (15.1 GiB of 15.1 GiB) in 3s, read: 5.0 GiB/s, write: 5.0 GiB/s
INFO: backup is sparse: 4.00 MiB (0%) total zero data
INFO: backup was done incrementally, reused 156.93 GiB (91%)
INFO: transferred 15.09 GiB in 4 seconds (3.8 GiB/s)
INFO: adding notes to backup
INFO: Finished Backup of VM 4142 (00:02:03)
INFO: Backup finished at 2023-11-21 14:01:52
INFO: Backup job finished successfully

The VM itself is running Ubuntu 20.04.6 LTS (all VMs are running Ubuntu)
We use Docker everywhere, most of the servers with issues are MySQL servers (currently doing replication only)

Code:

       Name                    Command                State    
-----------------------------------------------------------------
mysql-cluster-node   docker-entrypoint.sh mysqld   Up
pmm-client           /entrypoint.py                Up            
portainer_agent      ./agent                       Up            
watchtower           /watchtower                   Up (healthy)

hepo · Nov 22, 2023

Reviewing the backup jobs, just noticed one VM that had an error

Code:

INFO: Starting Backup of VM 4138 (qemu)
INFO: Backup started at 2023-11-21 13:20:11
INFO: status = running
INFO: VM Name: prod-lws138-dbcl33
INFO: include disk 'scsi0' 'ceph:vm-4138-disk-0' 32G
INFO: include disk 'scsi1' 'ceph:vm-4138-disk-1' 80G
INFO: include disk 'scsi2' 'ceph:vm-4138-disk-2' 40G
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: creating Proxmox Backup Server archive 'vm/4138/2023-11-21T13:20:11Z'
INFO: issuing guest-agent 'fs-freeze' command
INFO: issuing guest-agent 'fs-thaw' command
ERROR: VM 4138 qmp command 'guest-fsfreeze-thaw' failed - got timeout
INFO: started backup task 'b28cd3c5-8d68-48ee-bf85-fe5e1c554a28'
INFO: resuming VM again
INFO: scsi0: dirty-bitmap status: OK (900.0 MiB of 32.0 GiB dirty)
INFO: scsi1: dirty-bitmap status: OK (17.9 GiB of 80.0 GiB dirty)
INFO: scsi2: dirty-bitmap status: OK (2.6 GiB of 40.0 GiB dirty)
INFO: using fast incremental mode (dirty-bitmap), 21.4 GiB dirty of 152.0 GiB total
INFO:  93% (20.0 GiB of 21.4 GiB) in 3s, read: 6.7 GiB/s, write: 6.7 GiB/s
INFO:  95% (20.5 GiB of 21.4 GiB) in 6s, read: 160.0 MiB/s, write: 141.3 MiB/s
INFO:  98% (21.0 GiB of 21.4 GiB) in 9s, read: 156.0 MiB/s, write: 156.0 MiB/s
INFO: 100% (21.4 GiB of 21.4 GiB) in 12s, read: 141.3 MiB/s, write: 117.3 MiB/s
INFO: backup is sparse: 64.00 MiB (0%) total zero data
INFO: backup was done incrementally, reused 130.77 GiB (86%)
INFO: transferred 21.38 GiB in 12 seconds (1.8 GiB/s)
INFO: adding notes to backup
INFO: Finished Backup of VM 4138 (00:03:14)

VM Config

Code:

root@pvelw13:~# cat /etc/pve/qemu-server/4138.conf
agent: 1,fstrim_cloned_disks=1
boot: order=scsi0;net0
cores: 32
cpu: x86-64-v2-AES
memory: 65536
name: prod-lws138-dbcl33
net0: virtio=F6:16:67:6C:4A:46,bridge=vmbr0,firewall=1
numa: 0
onboot: 1
ostype: l26
scsi0: ceph:vm-4138-disk-0,discard=on,iothread=1,size=32G,ssd=1
scsi1: ceph:vm-4138-disk-1,discard=on,iothread=1,size=80G,ssd=1
scsi2: ceph:vm-4138-disk-2,discard=on,iothread=1,size=40G,ssd=1
scsihw: virtio-scsi-single
smbios1: uuid=bf8d41e0-9e74-4555-987a-4bf0ab5880ec
sockets: 1
vmgenid: 4463668e-7858-44ec-82d7-4338e9a99b64

Same OS and Docker containers as the previous example

Code:

       Name                    Command                State     
-----------------------------------------------------------------
mysql-cluster-node   docker-entrypoint.sh mysqld   Up
pmm-client           /entrypoint.py                Up             
portainer_agent      ./agent                       Up             
watchtower           /watchtower                   Up (healthy)

intecsoft · Nov 29, 2023

hepo said:

Hi team,

We just finished upgrading to version 8....
We are running 3 node cluster with Ceph, we are using no-subscription repo on this cluster.

Syslog on all nodes has tons of the following

Code:

Nov 18 19:38:40 pvelw11 ceph-crash[2163]: WARNING:ceph-crash:post /var/lib/ceph/crash/2023-11-17T17:42:59.187044Z_5d617bc9-8bbe-45f6-8f69-2b46318e0e39 as client.admin failed: 2023-11-18T19:38:39.986+0000 7f2d3d26f6c0 -1 auth: unable to find a keyring on /etc/pve/priv/ceph.client.admin.keyring: (13) Permission denied
Nov 18 19:38:40 pvelw11 ceph-crash[2163]: 2023-11-18T19:38:39.994+0000 7f2d3d26f6c0 -1 auth: unable to find a keyring on /etc/pve/priv/ceph.client.admin.keyring: (13) Permission denied
Nov 18 19:38:40 pvelw11 ceph-crash[2163]: 2023-11-18T19:38:39.994+0000 7f2d3d26f6c0 -1 auth: unable to find a keyring on /etc/pve/priv/ceph.client.admin.keyring: (13) Permission denied
Nov 18 19:38:40 pvelw11 ceph-crash[2163]: 2023-11-18T19:38:39.994+0000 7f2d3d26f6c0 -1 auth: unable to find a keyring on /etc/pve/priv/ceph.client.admin.keyring: (13) Permission denied
Nov 18 19:38:40 pvelw11 ceph-crash[2163]: 2023-11-18T19:38:39.994+0000 7f2d3d26f6c0 -1 monclient: keyring not found
Nov 18 19:38:40 pvelw11 ceph-crash[2163]: [errno 13] RADOS permission denied (error connecting to the cluster)

The file is in place, dont see any permission issues (same as other cluster we have).

Code:

root@pvelw11:/etc/pve/priv# ls -la
total 5
drwx------ 2 root www-data    0 Sep 22  2022 .
drwxr-xr-x 2 root www-data    0 Jan  1  1970 ..
drwx------ 2 root www-data    0 Sep 22  2022 acme
-rw------- 1 root www-data 1675 Nov 18 07:16 authkey.key
-rw------- 1 root www-data 1573 Nov 17 17:46 authorized_keys
drwx------ 2 root www-data    0 Sep 23  2022 ceph
-rw------- 1 root www-data  151 Sep 23  2022 ceph.client.admin.keyring
-rw------- 1 root www-data  228 Sep 23  2022 ceph.mon.keyring
-rw------- 1 root www-data 4500 Nov 17 17:46 known_hosts
drwx------ 2 root www-data    0 Sep 22  2022 lock
drwx------ 2 root www-data    0 Oct 19  2022 metricserver
-rw------- 1 root www-data 3243 Sep 22  2022 pve-root-ca.key
-rw------- 1 root www-data    3 Oct 19  2022 pve-root-ca.srl
drwx------ 2 root www-data    0 Sep 30  2022 storage
-rw------- 1 root www-data    2 Jul  2 16:10 tfa.cfg

Here's the content of the crash report

Code:

root@pvelw11:~# cat /var/lib/ceph/crash/2023-11-17T17\:42\:59.187044Z_5d617bc9-8bbe-45f6-8f69-2b46318e0e39/meta
{
    "crash_id": "2023-11-17T17:42:59.187044Z_5d617bc9-8bbe-45f6-8f69-2b46318e0e39",
    "timestamp": "2023-11-17T17:42:59.187044Z",
    "process_name": "ceph-osd",
    "entity_name": "osd.5",
    "ceph_version": "17.2.6",
    "utsname_hostname": "pvelw11",
    "utsname_sysname": "Linux",
    "utsname_release": "5.15.131-1-pve",
    "utsname_version": "#1 SMP PVE 5.15.131-2 (2023-11-14T11:32Z)",
    "utsname_machine": "x86_64",
    "os_name": "Debian GNU/Linux 12 (bookworm)",
    "os_id": "12",
    "os_version_id": "12",
    "os_version": "12 (bookworm)",
    "assert_condition": "end_time - start_time_func < cct->_conf->osd_fast_shutdown_timeout",
    "assert_func": "int OSD::shutdown()",
    "assert_file": "./src/osd/OSD.cc",
    "assert_line": 4368,
    "assert_thread_name": "signal_handler",
    "assert_msg": "./src/osd/OSD.cc: In function 'int OSD::shutdown()' thread 7f1e97a32700 time 2023-11-17T17:42:59.177646+0000\n./src/osd/OSD.cc: 4368: FAILED ceph_assert(end_time - start_time_func < cct->_conf->osd_fast_shutdown_timeout)\n",
    "backtrace": [
        "/lib/x86_64-linux-gnu/libpthread.so.0(+0x13140) [0x7f1e9b8ed140]",
        "gsignal()",
        "abort()",
        "(ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x17e) [0x55b20d071042]",
        "/usr/bin/ceph-osd(+0xc25186) [0x55b20d071186]",
        "(OSD::shutdown()+0x1364) [0x55b20d169764]",
        "(SignalHandler::entry()+0x648) [0x55b20d7f2dc8]",
        "/lib/x86_64-linux-gnu/libpthread.so.0(+0x7ea7) [0x7f1e9b8e1ea7]",
        "clone()"
    ]
}

Dont see any issues with Ceph status

Code:

root@pvelw11:~# ceph -s
  cluster:
    id:     a447dbaf-a9ea-442f-a072-cb5b333afe73
    health: HEALTH_OK

  services:
    mon: 3 daemons, quorum pvelw13,pvelw12,pvelw11 (age 26h)
    mgr: pvelw13(active, since 26h), standbys: pvelw12, pvelw11
    osd: 6 osds: 6 up (since 26h), 6 in (since 13M)

  data:
    pools:   2 pools, 33 pgs
    objects: 462.67k objects, 1.7 TiB
    usage:   4.5 TiB used, 30 TiB / 35 TiB avail
    pgs:     33 active+clean

  io:
    client:   273 KiB/s rd, 23 MiB/s wr, 18 op/s rd, 3.10k op/s wr

Looking for assistance please.

Thanks

Code:

rm -rf /var/lib/ceph/crash/*

see https://forum.proxmox.com/threads/c...or-calling-conf_read_file.134061/#post-591716
seems to help. The crashreports are old and timestamp matches with update-date.
After restart manager and monitor nothing is logged like this

apollo13 · Jan 22, 2024

I am having similar problems as @hepo. In our case it is also the backup causing the hang but only after the upgrade to PVE 8. For tests we switched to the following configuration:
* Run PBS directly on on node (let's call it node 3) with a SSD raid 1 mirror (for those tests while waiting for new disks, this is the mirror the OS runs on as well)
* Backup over the Ceph network (100 GBit interlink) from node 2 to node 3
* PBS is limited to 300 MB/s because otherwise the backup via Ceph and the 100 GBit interlink would saturate the disks easily, leading to other problems

We also use docker on all machines, interestingly for now only this machine seems affected. We have still CentOS7 on it.

@hepo Wanna set up a call so we go through our maschines and maybe find a common cause?

fiona · Jan 22, 2024

Hi,

apollo13 said:
* PBS is limited to 300 MB/s because otherwise the backup via Ceph and the 100 GBit interlink would saturate the disks easily, leading to other problems

for VM backups, it's better to set a bwlimit as part of the backup job (additionally or instead of the PBS limit). Because then the reading on the source side will be limited and there's less potential to interfere with guest IO.

If you already have pve-qemu-kvm>=8.1, and you have iothread enabled on your disks, you might want to try disabling that and see if it helps.

apollo13 · Jan 22, 2024

Hi Fiona,

thank you for your response. A few questions if you don't mind:

fiona said:
for VM backups, it's better to set a bwlimit as part of the backup job (additionally or instead of the PBS limit). Because then the reading on the source side will be limited and there's less potential to interfere with guest IO.

I assume I can solely set this in vzdump.conf or is that exposed in the UI somewhere? While I am on there, any other settings to change?

fiona said:
If you already have pve-qemu-kvm>=8.1, and you have iothread enabled on your disks, you might want to try disabling that and see if it helps.

Yes I to 8.1.2-4 to be precise. Would the changes in 5/6 be relevant to that as well (https://github.com/proxmox/pve-qemu/blob/master/debian/changelog)? What is the recommended configuration for Disks on Ceph nowadays. I thought iothread was recommended. Just for completeness my current disk settings are: virtio-scsi-single, ssd emulation, iothread. The Ceph Cluster is backed by NVME storage.

Thank you very much,
Florian

P.S.: How is your backup fleecing investigation going?

fiona · Jan 22, 2024

apollo13 said:
I assume I can solely set this in vzdump.conf or is that exposed in the UI somewhere? While I am on there, any other settings to change?

Yes, it is not currently exposed in the UI. So either in the vzdump.conf file or for a specific job with e.g. pvesh set /cluster/backup/backup-e9ee601b-41ad --bwlimit <limit in KiB/s>. See the /etc/pve/jobs.cfg for the correct backup ID. You can also try lowering the number of workers with --performance 'max-workers=4'.

apollo13 said:
Yes I to 8.1.2-4 to be precise. Would the changes in 5/6 be relevant to that as well (https://github.com/proxmox/pve-qemu/blob/master/debian/changelog)?

Version 8.1.2-6 is identical to 8.1.2-4 except for a fix for a rare issue during disk resize with iothread active. All versions >= 8.1 unfortunately contain an issue with iothread that can lead to hanging IO in the guest. 8.1.2-5 contained an attempted fix for that, but it was incomplete and caused a much more common issue: https://forum.proxmox.com/threads/pve-100-cpu-on-all-kvm-while-vms-are-idle-at-0-5-cpu.138140/

apollo13 said:
What is the recommended configuration for Disks on Ceph nowadays. I thought iothread was recommended.

Yes, it would be except for the aformentioned issue. If turning iothread off helps, it might be that you are affected by that.

apollo13 said:
Just for completeness my current disk settings are: virtio-scsi-single, ssd emulation, iothread. The Ceph Cluster is backed by NVME storage.

If you have SSDs, then it makes sense, yes

apollo13 said:
P.S.: How is your backup fleecing investigation going?

Not too bad. I'm hoping to send an initial RFC to the mailing list later this week.

hepo · Jan 22, 2024

"happy" to see more people reporting this issue as well as the issue being recognised and work being done to remediate...
@apollo13 we have reinstalled the cluster back to version 7 since this issue was not resolved for more than a month.
Happy to sit on a call to discuss setups and potential resolution, however I am not in a position to do any testing anymore.

apollo13 · Jan 22, 2024

hepo said:
"happy" to see more people reporting this issue as well as the issue being recognised and work being done to remediate...
@apollo13 we have reinstalled the cluster back to version 7 since this issue was not resolved for more than a month.
Happy to sit on a call to discuss setups and potential resolution, however I am not in a position to do any testing anymore.

Ok, I guess there isn't much to compare then, will update this thread in a week or so whether disabling iothread helped. Interestingly enough I couldn't trigger the issue reliably here :/

hepo · Jan 22, 2024

Moving to version 8 is inevitable, I would love to see the issue resolved first, appreciate any updates you can/will provide!

Search

Search

Proxmox VE 8.0 released!

t.lamprecht

Proxmox Staff Member

hepo

Active Member

hepo

Active Member

intecsoft

Member

apollo13

Well-Known Member

fiona

Proxmox Staff Member

apollo13

Well-Known Member

fiona

Proxmox Staff Member

hepo

Active Member

apollo13

Well-Known Member

hepo

Active Member

We value your privacy