Same issue here upgrading from 6.4 to 7.2. Needed to install ifupdown2 prior to rebooting. Miss that step and you have to physically get onto the node to comment out the 'auto ...' lines in the interfaces file to regain access via the VMBRx interface.
The upgrade documentation states, "The...
I've noticed something similar on our 5 node cluster with a 10Gbps migration network. When I update my nodes I use the following process. Migrate all VMs from node 1 to node 2. Update and reboot node 1, then migrate the VMs back to node 1. If my nodes have been running for a while I see slow...
Thank you!
The first command yields the same on my production versus updated POC cluster. The second command does show 5.2.0-4 versus 5.2.0-3 so it appears to have worked. I live migrated the VMs back and forth on the POC. I'll update and do the same migrating on the production cluster now.
I've noticed the same phenomenon with one of our 2012R2 servers. It happened at the same time we had the issue reported in this thread. It has also happened before the issue in this thread came to be. I noticed it the first time after a Windows update.
We were affected too. 5 node cluster running ceph on servers about 18 months old. 600TB of spinners.
I've disabled Proxmox backups to NFS and a PBS until the issue is resolved as it appeared to have been trigger during a backup.
Thank you for the reply. Doesn't sound like something I'd use.
Just for reference, is there a way to tell if it's actually running? I added the following to my ceph.conf file on a POC cluster but wasn't sure if it was truly enabled?? I didn't see any increase in RAM usage beyond what was...
Has anyone ever tried using this feature? I've added it to the [global] section of the ceph.conf on my POC cluster but I'm not sure how to tell if it's actually working.
TIA
I had heard there were issues with 14.2.5 so I've been waiting for a new release. I see Nautilus 14.2.6 has been released. Will there be a corresponding release from Proxmox?
TIA
FYI - Last night I applied the latest updates from the no-cost repository. Now I get the following when I bulk migrate VMs from node to node.
Check VM 309: precondition check passed
Migrating VM 309
Use of uninitialized value $val in pattern match (m//) at /usr/share/perl5/PVE/RESTHandler.pm...
New drive installed. Since the osd was already down and out I destroyed it, shut down the node and replaced this non-hot swapable drive in the mid-bay of the server. Booted it back up, tested the drive and recreated the osd and associated it with the VNMe for db/wal. Worked like a charm!
Thx...
Can anyone confirm if destroying an osd via the GUI will also destroy the associated db/wal I initially created on the VNMe? Then just create the replacement osd on the new drive referencing the NVMe as before??
TIA!
Dell confirmed the failing drive via iDRAC. The replacement is on the way. Is the process for replacing a drive with an associated DB/WAL via the version 6 GUI documented somewhere?
These are 8TB spinners in (5) brand new Dell R740xd servers. I'll contact Dell and get this drive replaced. I'll also check the Raw_Read_Errors on the other spinners. I did have a couple of the other adjacent drives go down and out last week. I <think> it was the other drives with Raw_Read_Errors.
Just ran a smartctl and got the following. I also ran it on the other 16 drives in this node. sdm and sdn also have Raw_Read_Errors. The rest look clean. I might have a few drives which need replacing??
I have an OSD which keeps toggling to down and out. Here's what I'm seeing in the syslog. Any clue here why this would be happening?
Sep 23 02:22:26 SeaC01N02 kernel: [533115.376053] sd 0:0:16:0: [sdo] tag#262 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Sep 23 02:22:26 SeaC01N02...
Thanks for the reply Alwin!
1. Increasing the RAM on the Windows VM did indeed allow me to get through my copy process without hitting the wall. Of course, doubling the size of the copy process got me back to a wall, but further on in the process. :) I now understanding the caching going on on...
Here are my rados results for the tests shown. I don't see a "client throughput" or iops that high?? Not sure what could be going on with my config. Just trying to get an idea of where to start looking...
rados -p testpool bench 60 write -b 4M -t16 --no-cleanup
Total time run...
This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
By continuing to use this site, you are consenting to our use of cookies.