I noticed that during a PBS backup of a large MySQL VM (with guest agent installed), Mysql is not able to perform requests (select, update, insert)...
I thought that the FS lock triggered by qemu-ga guest-fsfreeze takes only few seconds in order to have a consistent FS.
Is it the normal...
I have recently replaced all my HDD for 10 SSD for my ZFS Datastore.
This is now extremely fast specially verify jobs (even if we can discuss the interest of verifying snapshots on a ZFS pool).
The weird thing is that PBS reports fragmentation on the related pool :
AFAIK, on a SSD...
Seems that latest pve7.1 changes the way we record totp on /etc/pve/priv/tfa.cfg.
I have a problem with that because the 'Secret' is now showed
V2 method allows root to know all secrets keys for all users. Some of them use the same TOTP secret for several services...
We usually make KVM VMs. But for history reasons we still have some old containers. Openvz containers which were migrated to lxc since PVE4.
Now with pve7 some LXCs refuse to start properly (mainly due to old systemd < 232). Also, LXC live migration is impossible.
So I'm wondering if...
I tried to upgrade a 6 nodes pve7 cluster yesterday.
We use Ceph and HA for all VMs.
I was able to upgrade 4 nodes without issue.
But on the fifth node, I lost the whole cluster. All nodes rebooted !
Syslogs for nodes 22 to 27 :
update ok for nodes 22,23,24 and 27
upgrade of node 25...
Yesterday an updated PVE7 node crashed - Dell R440 - Xeon(R) Silver 4210 CPU @ 2.20GHz (2 Sockets) with 256 GB RAM
The weird thing is that network was still OK - and corosync said that all 6 nodes are reachable.
the GUI showed involved node grayed - its Vms continue to respond to ping...
I recently upgraded a 6 nodes Ceph PVE cluster 6.4 to 7
We discovered higher CPU usage, performance issues.
I want to recreate all cluster.
What I plan to do for each node :
- Migrate all VM to another node
- Stop all OSD and then remove them from config
- shutdown the node.
I recently upgraded a 6 nodes PVE cluster to PVE7
Users report me some issues (lag, freezed browsers for 1 min) for several LAMP VMs...
Everything seems to be ok inside the VMs including pve nodes ressources...
I plan to upgrade a switch stack (2 switches) which means a downtime of 5 min (or more) I guess.
6 PVE Ceph nodes are connected to this stack, with nearly 80 KVM VMs in HA.
Even with shutting down all VM for 10 min, I think that PVE nodes would reboot, and I don't what could be the...
I recently noticed that my debian 10 VMs reports high disk usage (80-100%) with atop.
Here are some screenshots for the same vm (completely idle) ran by several pve kernels
any ideas ?
Running some basics 'dd' tests writes inside the vm...
On a relatively updated PVE node, I do some PBS backups.
Backups are made with snapshot mode and qemu-guest-agent is installed on all VMs (Qemu Guest Agent is enabled indeed).
I noticed yesterday that during the backup, I have lot of iowaits on the VM.
Backup started @8:41 and ended...
I have the following rules applied at datacenter level :
Everything is working as expected for Hosts nodes
But not for a VM (by now only hosts nodes are protected by PVE FW)
When one of my zabbix proxies tries to ping the VM, it is rejected as you can see in the logs
I've recently moved all my backups jobs to a dedicated PBS platform.
Thus, I removed CephFS on all involved nodes. However, I noticed that CephFS mount on /mnt/pve is still there even after rebooting the node. When browsing/mnt/pve/ the bash shell crashes...
Is it normal to leave CephFS...
I don't know why but since upgrade to v 6.3-2 (with ceph octopus), day after day all my backups are no longer using dirty-bitmaps. (PBS v1.0-5)
On all logs I have : existing bitmap was invalid and has been cleared.
What could be done in order to check what is going on ?