I re-added p38 and it worked this time.
Now I will move VMs over and update and reboot old nodes.
Somehow something went wrong when adding it the first time.
I guess using HTTP GUI process for adding nodes could be improved to address issues like mine.
I removed node p38 and now GUI shows cluster.
Now back to the original plan.
Hopefully someone will find my information in the future useful, that's why I'm writing all this.
It is not a bug, but expected behavior. You told PM to delete VM with it's disk and it did just that.
All data was actually deleted only after the running VM was shut down, because it still had file pointer open to the disk file, allowing you to transfer data off.
I wanted to follow the procedure as described above, but noticed I do not have a cluster anymore, while cluster still works?!?
So GUI shows no cluster, but shows cluster nodes, pvecm shows as usual.
root@p35:~# pvecm status
Cluster information
-------------------
Name: XYZ
Config...
Hi @tom
thank you for your suggestion. I was just in a process of updating.
I want to add new node to the cluster, so I can live migrate VMs to it and then update and reboot old node.
Repeat the process with all nodes.
But I can not do that, because I can not add new node, to migrate VMs to...
I ran out of time to continue debugging.
I will provide more info from the new node, now running reparately from this cluster network, so I can look at its files.
However, I think something went horribly wrong when joining a cluster and it might even be a bug.
Looking at the logs on primary cluster node, where new cluster node was joined I see these errors:
Jan 29 14:44:38 p35 pmxcfs[6037]: [dcdb] notice: wrote new corosync config '/etc/corosync/corosync.conf' (version = 4)
Jan 29 14:44:38 p35 corosync[6204]: [CFG ] Config reload requested by...
I did start it again, to get some more errors.
Once it started cluster on old nodes stopped working. pvecm status did not return any value as long as new node was online and this was logged.
Jan 29 15:06:19 p35 corosync[6204]: [TOTEM ] A new membership (1.55d) was formed. Members
Jan 29...
Hi,
i did what I did many times before. Added a node to existing cluster. After adding it, whole cluster went down (VMs were running but PVE stopped).
Here is how it looked on one node:
[Fri Jan 29 14:46:36 2021] INFO: task pvesr:42198 blocked for more than 120 seconds.
[Fri Jan 29 14:46:36...
Uf, .. you can not.
ZFS does not (yet) support removing of devices.
You basically created RAID 0 (extended ZFS over two disks) and if you remove one, it will be missing half of the data.
There are two options.
Create a new pool and send data over.
Create RAID 10 with 4 disks, by adding mirror...
Tuxis, tnx for the info.
I know SLOG is used only for sync writes, so it really should not be a connecting point that slows down SSDs also.
tburger, tnx for the ideas.
I have on SAS controller:
02:00.0 Serial Attached SCSI controller: LSI Logic / Symbios Logic SAS2308 PCI-Express Fusion-MPT...
As you use failing SATA drives, when a failure accurs, system tries many times to access data.
While it is doing so, it can not use other SATA drives and other arrays can also be affected.
I understand that you use different SATA controllers, but obviously they somehow influence each other.
If...
I agree I could do some more testing, but do not have the time a.t.m. nor I wish to play on production cluster.
I will just pull these HDDs out and create HDD only nodes. When have the time, setup another node for testing this scenario.
I also think I know what your problem is and will reply...
I have the same assumption, but have no idea how to monitor queue depth on hardware (SATA/SAS) controller. I might be able to monitor and adjust queue depth on ZFS, as I remember vaguely such options.
They are Seagate Enterprise Capacity 3.5 HDD 6TB 7200RPM 12Gb/s SAS 128 MB Cache Internal...
Hi guys,
a few days ago, I did backup import onto HDD pool on the same server that also has SSD pool.
After a few minutes all guest VMs (they run only on SSD pool) started reporting problems with hung tasks, etc and services on them stopped working.
Host has had high IOWait and low CPU usage...
This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
By continuing to use this site, you are consenting to our use of cookies.