Really? I noticed snmpd stopped responding when we had an nfs server unavailable but I didn't check to see if it was using soft mounts. If processes don't fail and we can't kill them when a CIFS or NFS volume goes away that's a major problem.
Hi
Using ceph as your vm storage would be a good approach if you wanted to use drives in the pve nodes themselves. But you'd need to use 10GbE networking at least, and probably 2 different networks for ceph public and cluster traffic. Ceph performs well and provides you with good data...
During some testing tonight sending backups to a CIFS volume, there was a problem with samba on the target server (it consumed over 80GB of RAM and SWAP which trashed the server). The unavailability of the CIFS server impacted on the backups that were running. I'd expected the backup tasks to...
Just following up on this again. We installed samba on our primary backup target and exposed the raid set as a CIFS volume. A full backup to the CIFS target worked perfectly. No CPU spikes or network drop-outs.
Interesting though, the network throughput to the target looked significantly...
I just found something very interesting. I setup a CIFS share on the same FreeNAS box we've been testing against and used that as the backup target. I sat on the console of a VM and watched it's CPU graph while it was being backed up. No spikes and it never became unresponsive. Also, none of...
Hi sumsum
Thanks for your post. That is exactly the same situation we are seeing on 6.1-2, kernel 5.3.10-1-pve, and pve-manager 6.1-3. And yes, disabling guest agent doesn't improve things for us either. I've had a console session open during the backups and the console freezes for many...
Hi
IO on the entire platform is minimal at the moment as we're still evaluating Proxmox. The storage side of the platform is ceph using 100% nvme drives, with a 40GbE public network and 10GbE cluster network supporting about 8 test VMs. There's also dual PVE cluster networks and a network for...
Happy new year everybody.
We've migrated some production VMs to our new Proxmox cluster. The only issue we are seeing is that our monitoring system is generating "host unreachable" alerts for the VMs each night during backups. We assumed it was related to pings getting dropped due to network...
Some further info in case it's useful. Looking into this, the slow ops log messages started after a brief network outage yesterday (spanning tree topology change from a failover test). The messages below were logged in addition to the slow log messages. They were logged once per minute after...
Just some more data for this, we had the same issue on one of our nodes today. We're running 14.2.5-pve1. "ceph -s" showed
13 slow ops, oldest one blocked for 74234 sec, mon.hv4 has slow ops
On node hv4 we were seeing
Dec 22 13:17:58 hv4 ceph-mon[2871]: 2019-12-22 13:17:58.475...
Hi
That's interesting. These nodes were only reinstalled a few days ago. I expected they'd be up to date so perhaps those new packages were available from the test repo I just enabled.
The boxes were working fine so the new firmware was a requirement of the new kernel. Surely apt would pull...
On the other nodes in the cluster I did
root@ed-hv4:~# apt update && apt install pve-kernel-5.3 pve-firmware
Hit:2 http://ftp.au.debian.org/debian buster InRelease
Get:3 http://ftp.au.debian.org/debian buster-updates InRelease [49.3 kB]
Get:1 http://security-cdn.debian.org buster/updates...
I've gone back through my scroll buffers and found some of the upgrades. Details from the first node (that failed) are below. The firmware package wasn't updated. I'll include the details from one of the other nodes in another post as it wont let me include them both in the one post (too...
Hi
Just FYI, we lost our Broadcom based 10GbE NICs (bnx2x) after installing the new kernel. We had to bring up a link on an onboard 1GbE and install the pve-firmware package to get the node back on the network.
Thanks
David
...
Hi
We worked this out over the weekend. The issue regarding the OSDs not starting was related to ownership of the dev mappers. The dm were still owned by root so running ceph-osd as the 'ceph' user was failing with permission denied.
We resolved the problem and tested the process a couple of...
Hi
We're testing PVE and Ceph, trying out failure conditions. We're simulating a failed cluster node where we want to reinstall and bring up the existing ceph OSDs. There are a few forum threads and notes about trying to reinstall a node, but nothing clear or complete that we've been able to...
This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
By continuing to use this site, you are consenting to our use of cookies.