Proxmox Virtual Environment 9.1 available!

Wanted to provide an update with my 9.1.2 move. I too found the ISO upload bug. Download via url is the only way to get ISOs loaded at this time. Error shown:

starting file import from: /var/tmp/pveupload-93b7660434769d3896ec263ed8f3647c
TASK ERROR: failed to stat '/var/tmp/pveupload-93b7660434769d3896ec263ed8f3647c'
 
Hi,
Wanted to provide an update with my 9.1.2 move. I too found the ISO upload bug. Download via url is the only way to get ISOs loaded at this time. Error shown:
please open a separate thread and provide more details. Is the upload done on the same node you are logged in or another one? Is there any information in the system logs/journal? Note that 9.1.2 is not the latest available version (anymore).
 
Hi,

please open a separate thread and provide more details. Is the upload done on the same node you are logged in or another one? Is there any information in the system logs/journal? Note that 9.1.2 is not the latest available version (anymore).
doesn't look like it's needed. 9.1.4 fixed whatever bug it was.
 
https://pve.proxmox.com/wiki/Upgrade_from_8_to_9 does not mention QDevice that I see...is there a need and/or recommended process to upgrade that? Just upgrade it to Trixie also?
Imho this should be enough since the qdevice daemon by design lightweight and configuration-less, so I would expect a flawless upgrade:
QDevice Technical Overview
The Corosync Quorum Device (QDevice) is a daemon which runs on each cluster node. It provides a configured number of votes to the cluster’s quorum subsystem, based on an externally running third-party arbitrator’s decision. Its primary use is to allow a cluster to sustain more node failures than standard quorum rules allow. This can be done safely as the external device can see all nodes and thus choose only one set of nodes to give its vote. This will only be done if said set of nodes can have quorum (again) after receiving the third-party vote.

Currently, only QDevice Net is supported as a third-party arbitrator. This is a daemon which provides a vote to a cluster partition, if it can reach the partition members over the network. It will only give votes to one partition of a cluster at any time. It’s designed to support multiple clusters and is almost configuration and state free. New clusters are handled dynamically and no configuration file is needed on the host running a QDevice.

The only requirements for the external host are that it needs network access to the cluster and to have a corosync-qnetd package available. We provide a package for Debian based hosts, and other Linux distributions should also have a package available through their respective package manager.

Note Unlike corosync itself, a QDevice connects to the cluster over TCP/IP. The daemon can also run outside the LAN of the cluster and isn’t limited to the low latencies requirements of corosync.

https://pve.proxmox.com/wiki/Cluster_Manager#_corosync_external_vote_support

Maybe there are some edgecases not mentioned in the docs or I'm not aware of but I can't remember any discussion of problems with the corosync-qnetd daemon (which is used to provide the service on the qdevice-node) regarding the update from PVE8 to 9 and from Debian Bookworm to Trixie.

Propably you could even get away with waiting longer before updating the qdevice to trixie (although to be honest I don't see much of a point in it, ymmv)

In my two-node cluster in my homelab I didn't had any problems either, but of course one example in a homelab doesn't say much.
 
Will Snapshots as volume chains be removed from tech preview in the next release? Curious as I am considering a hardware refresh, and from my tests performance on FC attached storage is way better than Ceph. Snapshots are the last hurdle for me.
 
  • Like
Reactions: markeczzz
Finally got around to upgrading from 8, which was pretty easy (thanks!). A couple notes on the upgrade docs...

If you don't disable the audit messages, which appear every 10 seconds, any prompt waiting for input scrolls off the console screen in about 30-40 seconds. For instance at 77% it stops and asks to replace /etc/crontab (changes the PATH order - not sure if that would trigger this by itself - and we have one entry we added). Since I got a phone call right then :rolleyes: it's not immediately obvious it's waiting for input and may seem "hung."

The upgrade also prompts to replace a systemd-boot configuration file, which also isn't mentioned in the upgrade doc. Presumably that's not relevant since it will be removed immediately after the upgrade, but it does stop and ask.

After upgrade, pve8to9 shows it is missing one vote ("WARN: total votes < expected votes") but pvecm status on other nodes, and the web GUI, shows all votes. The warning from pve8to9 disappears after node reboot.

I was viewing the last node's web GUI. After upgrade, after force-reload, on a node's Summary page, if I choose any period longer than Day the graph was empty and dated 12/31/69 18:00, so day 0. However this cleared up in a few minutes so is presumably a side effect of the upgrade.

In Tag View, right clicking a tag shows "Tag 'tagname' (undefined)" for all tags...?
 
hi, we are having problems with vzdump full backup via cifs to samba network share since upgrade our 3-node cluster from 8 to 9.1.

the backup is via 10gigE to a samba server and ran fine for several years.

now i'm getting kernel trace like this on regulare intervals, which does not always to seem to have effect on successful backup run.

i already tried to reduce backup write pressue to samba share by reducing zstd threads to 1, but it still happens on one host:

i did not yet find existing reports/tickets for this, though chatgpt was telling that there may have major rework being done in cifs client with kernel 6.17.
further search for this indeed confirms this: https://kernelnewbies.org/Linux_6.10
"SMB netfs, cifs: Delegate high-level I/O to netfslib commit, commit, commit, commit, commit, commit, commit, commit, commit, commit, commit, commit, commit, commit, commit, commit "

maybe anybody knows more about this or has made similar observation?

i was so happy for long that we got rid of nfs...


Code:
[445562.837074] INFO: task task UPID:pve2::2013136 blocked for more than 122 seconds.
[445562.837086]       Tainted: P           O        6.17.13-2-pve #1
[445562.837089] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[445562.837091] task:task UPID:pve2: state:D stack:0     pid:2013136 tgid:2013136 ppid:1      task_flags:0x400140 flags:0x00004002
[445562.837128] Call Trace:
[445562.837131]  <TASK>
[445562.837137]  __schedule+0x468/0x1310
[445562.837150]  ? dbuf_find+0x254/0x260 [zfs]
[445562.837591]  schedule+0x27/0xf0
[445562.837598]  schedule_preempt_disabled+0x15/0x30
[445562.837603]  __mutex_lock.constprop.0+0x508/0xa20
[445562.837612]  __mutex_lock_slowpath+0x13/0x20
[445562.837617]  mutex_lock+0x3b/0x50
[445562.837623]  netfs_writepages+0x93/0x3f0 [netfs]
[445562.837672]  do_writepages+0xc4/0x180
[445562.837680]  filemap_fdatawrite_wbc+0x58/0x80
[445562.837689]  __filemap_fdatawrite_range+0x6c/0xa0
[445562.837702]  filemap_write_and_wait_range+0x5b/0x130
[445562.837711]  cifs_flush+0x91/0x130 [cifs]
[445562.837889]  filp_flush+0x3f/0xb0
[445562.837897]  __x64_sys_close+0x33/0x90
[445562.837902]  x64_sys_call+0x1742/0x2330
[445562.837910]  do_syscall_64+0x80/0x8f0
[445562.837921]  ? __f_unlock_pos+0x12/0x20
[445562.837928]  ? ksys_write+0x8d/0xf0
[445562.837936]  ? ptep_set_access_flags+0x4a/0x70
[445562.837942]  ? wp_page_reuse+0x97/0xc0
[445562.837948]  ? do_wp_page+0x92e/0xef0
[445562.837954]  ? zpl_iter_write+0x167/0x1e0 [zfs]
[445562.838287]  ? ___pte_offset_map+0x1c/0x180
[445562.838295]  ? __handle_mm_fault+0xadb/0xfd0
[445562.838304]  ? count_memcg_events+0xd7/0x1a0
[445562.838311]  ? handle_mm_fault+0x254/0x370
[445562.838317]  ? do_user_addr_fault+0x2f8/0x830
[445562.838325]  ? irqentry_exit_to_user_mode+0x2e/0x290
[445562.838330]  ? irqentry_exit+0x43/0x50
[445562.838335]  ? exc_page_fault+0x90/0x1b0
[445562.838339]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[445562.838345] RIP: 0033:0x7de2b27eb687
[445562.838351] RSP: 002b:00007fff40c3e850 EFLAGS: 00000202 ORIG_RAX: 0000000000000003
[445562.838358] RAX: ffffffffffffffda RBX: 00007de2b271f200 RCX: 00007de2b27eb687
[445562.838361] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000011
[445562.838364] RBP: 0000000000000011 R08: 0000000000000000 R09: 0000000000000000
[445562.838367] R10: 0000000000000000 R11: 0000000000000202 R12: 000062eb13873790
[445562.838370] R13: 0000000000000000 R14: 000062eb137c9180 R15: 0000000000000001
[445562.838377]  </TASK>
[445562.838553] INFO: task task UPID:pve2::2013136 is blocked on a mutex likely owned by task zstd:2035508.
[445562.838559] task:zstd            state:S stack:0     pid:2035508 tgid:2035508 ppid:2013136 task_flags:0x440000 flags:0x00004002
[445562.838567] Call Trace:
[445562.838570]  <TASK>
[445562.838574]  __schedule+0x468/0x1310
[445562.838583]  ? lock_timer_base+0x73/0xa0
[445562.838593]  schedule+0x27/0xf0
[445562.838600]  schedule_timeout+0x89/0x110
[445562.838608]  ? __pfx_process_timeout+0x10/0x10
[445562.838616]  wait_woken+0x7f/0x90
[445562.838624]  sk_stream_wait_memory+0x287/0x400
[445562.838633]  ? __pfx_woken_wake_function+0x10/0x10
[445562.838640]  tcp_sendmsg_locked+0x4ce/0x13a0
[445562.838651]  tcp_sendmsg+0x2c/0x50
[445562.838656]  inet_sendmsg+0x42/0x80
[445562.838664]  sock_sendmsg+0x116/0x140
[445562.838674]  smb_send_kvec+0x94/0x1d0 [cifs]
[445562.838794]  __smb_send_rqst+0x419/0x710 [cifs]
[445562.838922]  smb_send_rqst+0x14f/0x1a0 [cifs]
[445562.839047]  cifs_call_async+0x14f/0x310 [cifs]
[445562.839157]  ? __pfx_smb2_writev_callback+0x10/0x10 [cifs]
[445562.839281]  smb2_async_writev+0x395/0x6a0 [cifs]
[445562.839402]  cifs_issue_write+0x87/0x1c0 [cifs]
[445562.839512]  ? cifs_issue_write+0x87/0x1c0 [cifs]
[445562.839622]  netfs_do_issue_write+0x3b/0xc0 [netfs]
[445562.839664]  netfs_advance_write+0x10d/0x320 [netfs]
[445562.839696]  ? rolling_buffer_append+0x4d/0xf0 [netfs]
[445562.839727]  netfs_write_folio+0x29a/0x880 [netfs]
[445562.839760]  netfs_writepages+0x119/0x3f0 [netfs]
[445562.839790]  do_writepages+0xc4/0x180
[445562.839798]  filemap_fdatawrite_wbc+0x58/0x80
[445562.839805]  __filemap_fdatawrite_range+0x6c/0xa0
[445562.839826]  filemap_write_and_wait_range+0x5b/0x130
[445562.839834]  cifs_flush+0x91/0x130 [cifs]
[445562.839945]  filp_flush+0x3f/0xb0
[445562.839951]  __x64_sys_close+0x33/0x90
[445562.839957]  x64_sys_call+0x1742/0x2330
[445562.839964]  do_syscall_64+0x80/0x8f0
[445562.839973]  ? netfs_buffered_write_iter_locked+0xa0/0xc0 [netfs]
[445562.840006]  ? cifs_put_writer+0x5f/0x70 [cifs]
[445562.840108]  ? cifs_strict_writev+0x1bd/0x350 [cifs]
[445562.840199]  ? rw_verify_area+0x57/0x190
[445562.840204]  ? vfs_write+0x274/0x490
[445562.840212]  ? ksys_write+0xd9/0xf0
[445562.840219]  ? __x64_sys_write+0x19/0x30
[445562.840224]  ? x64_sys_call+0x79/0x2330
[445562.840228]  ? do_syscall_64+0xb8/0x8f0
[445562.840234]  ? cpu_clock_sample_group+0xbd/0x180
[445562.840244]  ? posix_cpu_clock_get+0x6c/0xb0
[445562.840249]  ? _copy_to_user+0x31/0x60
[445562.840258]  ? put_timespec64+0x3c/0x70
[445562.840268]  ? __x64_sys_clock_gettime+0xa4/0xe0
[445562.840288]  ? x64_sys_call+0x19b8/0x2330
[445562.840292]  ? do_syscall_64+0xb8/0x8f0
[445562.840297]  ? do_syscall_64+0xb8/0x8f0
[445562.840303]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
 
Last edited:
  • Like
Reactions: Johannes S