Interesting. Yeah, I have to say, with the many years I use Linux (~25) I'm not used to experience such behaviour with things do lockout other stuff like this here. At worst, things slow down, but having lock like stops or timeouts of this size are surprising for me.
Thanks a lot sharing you experience. We're having still the same issues but disable features like replication while doing backups. I'll have a look how disks are configured and if we can reproduce it with the help of your post here.
Thanks for the hints. I'll try that. Network is most probably saturated. We have separate swichtes for servers and our office. I didn't expect that the network would be so much saturated that the small amount for the cluster communication would not go through, but it seems this backup affects or...
We did Update yesterday both nodes and this morning we have the same separation.
I send attached the possible start of the issue on node 1 (called vmhost02. node2 is called vmhost03) at about 02:00 I guess.
We do backup at 02:00 every day to our tape system. node1 also replicates vms to node2...
I got to the office and found the same situation again. I can report that whenever the daily backup by pve gets executed (with about 12 VMs) at 00:00 one of the replication jobs gets a timeout at exactly that time. I don't know if that correlates with the quorum lost. Further Replications and...
Same again today and restarted both corosync servies to connect the nodes again.
What should I post configuratio wise or regarding status of theses "events"?
I think I have a similiar / same issue here.
Two nodes cluster (new r740 and r340) over one switch loosing quorum within a day or so. Once I restarted corosync.service on one node, today I had to restart them on both.
Ask me anything.
The VMs where completely blocked. Even PROXMOX UI was stalled for a long time. I/O delay was at about 30-40%.
We collect more factual information about that.
Thank you. While I understand that this option might help somehow somewhere else, I also want to state that the above behaviour occurs when I move disks with the build in GUI command "Move disk". So using "-t…" might have an effect on some operations, but it does not help in the situation...
I moved all my disks to an external storage, rebuilt the pool and now move back these disks... surprise: I get the same behaviour.
If I don't know it better I tink I do experience the very same what is decribed here...
It is a
Dell r740, 1x 12/24x Xeon Gold 6146 @3.2GHz
PERC H740P Mini (HBA Mode)
128GB RAM
sda, sdb: SAS 3, TOSHIBA, AL15SEB24EQY, SSD, 960GB
sdc, sdd: SAS 3, SAMSUNG, MZILS960HEHP0D3, 2.4TB, 10k
root@vmhost02:~# zpool list
NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
rpool 2.17T 747G 1.44T - - 5% 33% 1.07x ONLINE -
root@vmhost02:~# zpool status -v
pool: rpool
state: ONLINE
status: Some supported features are not...
I migrate VMDK disks to zvols and while I running something like
qemu-img convert -f vmdk sv0044_2.vmdk -O raw /dev/zvol/rpool/vm-1044-disk-1
I get quite high IO delay on the proxmox node and even VMs that are running are non-responsive and logging
[ 3019.135857] sd 2:0:0:1: [sdb] abort
[...
This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
By continuing to use this site, you are consenting to our use of cookies.