That link no longer works. There is Proxmox Cluster File System (pmxcfs).
I tested lvm on iscsi on my home cluster and seems to work fine, but that does not really guarantee it's safe for production.
The secret is in /etc/ksmtuned.conf setting
KSM_THRES_COEF=20
which reserves 20% of ram for your host OS. Requires restarting ksmtuned.service.
I'm not sure if that's always safe when VM's are running.
A totally different explanation could be NUMA. If you have multiple CPUs and VMs using lots of memory, it could be that one CPU is low on memory. See for example https://sitano.github.io/2014/08/20/numa-swap/
Peter Maloney did some more work on estiname-size.py, also adding optional json output:
https://github.com/petermaloney/pve/blob/main/pbs/estiname-size.py
When trying to grow one particular disk for a VM. The "please wait" takes about 4 seconds.
Between 0.025 and 0.040 seconds.
Maybe there's some lock involved somewhere?
For people looking for "Could not determine current size of volume", this is the place :)
I'm having the same problem with a qcow on an NFS mount, qemu-img info returns instantly with correct data.
Great, that does the trick indeed, just threw warning like
lvremove snapshot 'pve/snap_vm-101-disk-0_premigration' error: Failed to find logical volume "pve/snap_vm-101-disk-0_premigration"
which was the problem indeed
In my case i got it fixed by running on one node:
pveceph mgr destroy pvetest1
pveceph mgr create
and waited a bit. After this the "got timeout(500)" and other timeout issues went away.