host frozen

ssomm

New Member
Jun 22, 2016
4
0
1
52
Hi, we have put in production a single Proxmox server a couple of weeks ago with local storage RAID10. We have four running vm's with local images. Everything worked fine until this morning. The server was like in frozen state and obviously the vm's where unreachable. From the GUI I didn't find anything in task history neither in syslog. I'm not able to find an explanation about this problem. Any suggest will be very appreciated, thanks
 
Hi,
there a lot of possibilities...
Bios-Updates, faulty power supply, faulty ram, overheating (dirty fans).

Udo

Thanks Udo for the reply. I'm checking the system but, at the moment, I can exclude overheating and faulty power supply because this is a node in a server (Dell C6100) where the other nodes had no problem. Power and fans are shared between nodes. For the RAM I will check as soon as possible with memtest but it's so long... I must migrate the VM on another node and check it. I suppose that the quickest way for do that is build a cluster so I have a second question, if it's possible: can I create a cluster on a running proxmox server with running vm's, add a second clean proxmox srv and migrate the vm's to the clean one and then check the RAM of the first server? (I read in this forum a similar question but the answer doesn't appear to me clear). Thanks again
 

I suppose that the quickest way for do that is build a cluster so I have a second question, if it's possible: can I create a cluster on a running proxmox server with running vm's, add a second clean proxmox srv and migrate the vm's to the clean one and then check the RAM of the first server? (I read in this forum a similar question but the answer doesn't appear to me clear). Thanks again
Hi,
yes you can create an cluster with running VMs, only the nodes which join the cluster must be empty.

nevertheless it's allways an good idea to have an valid backup from the VMs (and an additional backup of /etc/pve).

Udo
 
WIP on the same server of ssomm...
seems that there's a problem after creating cluster, at the start line ... on first node :(

pvecm create cluster_name

after reboot we see that the proxmox start but all the vm won't start because of corosync error.
in short: we see that in /etc/pve/corosync.conf there's the old IP address of server 192.168.100.158 instead of 192.168.1.241 (and obv. on copied file /etc/corosync.conf). how do we correct this issue? we find this procedure for change the readonly file but, as we have 5 vm on the first node, we ask a go-not-go from an expert :)
https://pve.proxmox.com/wiki/Editing_corosync.conf
service corosync stop
pmxcfs -l
.. change file and ...

service corosync start
service pve-cluster start
service pvedaemon restart
service pveproxy restart

PS: for workaround now we change the file in /etc (has rw access) and service cososync start ... but if we reboot proxmox will replace with the wrong version in /etc/pve.
Thanks for any help
Nicola
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!