I have narrowed down the issue some more.
It seems the issue starts when the hosts are running backups of the virtual machines.
I start getting I/O errors on some of my virual machines, and the file systems get corrupted on some of them. (those with larger disk images ranging from 250 to...
Update:
Switching VM disk images to use writethrough cache seems to improve the situation, however fsyncs are noticeably slower to the fuse file system using proxmox 4 than it was on 2.x
Previous system Proxmox 2.x
Upgrade to 4.x
Additional info. VM disk images on shared folder mounted with fuse (MooseFS)
This has been the case for several years, running without issues on Proxmox 2.x
VM's under heavy load loses disk access.
Linux VM logs:
How do I stop and then start all Proxmox services on a clustered node?
I still have VMs running on the cluster, and only want to restart services on specific nodes.
Web interface is completely down.
Web interface entirely broken at the moment.
Random services crashing on both servers I restarted (no VM's running and unable to start any VM on them after the restart)
[ 720.224373] Call Trace:
[ 720.224376] [<ffffffff8185c215>] schedule+0x35/0x80
[ 720.224378] [<ffffffff8185c4ce>]...
It seems some services are not starting
vwk-prox04:~# service pve-manager status
● pve-manager.service - PVE VM Manager
Loaded: loaded (/lib/systemd/system/pve-manager.service; enabled)
Active: activating (start) since Mon 2017-04-24 09:51:37 SAST; 1h 11min ago
Main PID: 2275 (pvesh)...
The server just rebooted, however I still get the following error:
[ 240.224061] INFO: task spiceproxy:1829 blocked for more than 120 seconds.
[ 240.224116] Tainted: P O 4.4.49-1-pve #1
[ 240.224160] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message...
Unless proxmox has some built in nfs mount that I am unaware of, there should not be any?
vwk-prox01:~# pvesm status
MFS dir 1 57430241024 41122953984 16307287040 72.11%
local dir 1 98952796 87066000 6837248 93.22%
local-lvm lvmthin 1...
I can ssh and run the command on all servers, no issues.
On the servers with the error i am however unable to access the /etc/pve/qemu-server folder.
The terminal stops responding completely.
All VMS still responsive fortunately.
I have shut down all the VMS on one of the hosts, and tried...
I think I finally got it!
I forced the removal of the physical volume on the old drive
pvremove /dev/sdb1 --force --force
That cleared the duplicate VG name, and everything seems to still be up and running.
Lol yeah, that's putting it mildly.
I have 7 servers with exactly the same issue. If I screw up, I do so with consistency.
Fortunately this issue has no bad side effects, except for severely reduced local storage space.
In most cases I should be able to determine the old drives reliably.
Sure, gotcha - the UUID I need to remove is rj3oK2-fJha-Dnj0-ssBF-yBX5-T5Hf-copeiM
vgrename rj3oK2-fJha-Dnj0-ssBF-yBX5-T5Hf-copeiM pve_old
WARNING: Duplicate VG name pve: rj3oK2-fJha-Dnj0-ssBF-yBX5-T5Hf-copeiM (created here) takes precedence over 71h4As-Gd2h-MwFn-tjRo-QJHn-qx06-JL4Ag3...
Ah, sorry - unable to rename because of missing PV, unable to remove missing due to duplicate VG name.
vgimportclone -n pve_old /dev/sdb1
WARNING: Duplicate VG name pve: rj3oK2-fJha-Dnj0-ssBF-yBX5-T5Hf-copeiM (created here) takes precedence over 71h4As-Gd2h-MwFn-tjRo-QJHn-qx06-JL4Ag3...
This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
By continuing to use this site, you are consenting to our use of cookies.