node nosync after 1.8 upgrade

lozair

Member
Nov 4, 2008
89
0
6
Hi all,
After upgrade from 1.7 to 1.8 i have a node which wnat no sync to cluster.
I have removed the node from the cluster and add it again but i have always the same result :
On the master node pveca -l :

CID----IPADDRESS----ROLE-STATE--------UPTIME---LOAD----MEM---DISK
1 : 10.16.2.1 M A 16:01 0.05 3% 4%
2 : 10.16.2.2 N A 16:34 0.37 24% 3%
3 : 10.16.2.3 N A 17:54 0.78 29% 3%
4 : 10.16.2.4 N A 23:59 0.23 36% 3%
6 : 10.16.2.6 N A 1 day 00:46 0.09 24% 3%
7 : 10.16.2.5 N S 22:29 0.15 26% 3%


On the failed node pveca -l :
CID----IPADDRESS----ROLE-STATE--------UPTIME---LOAD----MEM---DISK
1 : 10.16.2.1 M A 16:03 0.13 3% 4%
2 : 10.16.2.2 N A 16:37 1.70 24% 3%
3 : 10.16.2.3 N A 17:56 0.17 29% 3%
4 : 10.16.2.4 N ERROR: 500 Server closed connection without sending any data back

6 : 10.16.2.6 N A 1 day 00:49 0.05 24% 3%
7 : 10.16.2.5 N S 22:31 0.70 26% 3%


Thks for your help

Regards
 
hi,
I have the following in messages file :
Jul 4 09:00:30 virt5 pvemirror[19387]: starting cluster syncronization
Jul 4 09:00:30 virt5 pvemirror[19387]: syncing master configuration from '10.16.2.1'
Jul 4 09:00:31 virt5 pvemirror[19387]: syncing vzlist from '10.16.2.4' failed: 500 Server closed connection without sending any data back
 
we use shared storage on san.
Our vm use LVM disk on san.
we use only kvm virtualisation and no openvz.
all seems ok on the cluster, all our 50 vms running smoothly.....
Deleting the faiing node and adding it again doesn't resolve the problem
 
yep all seems okvirt5:~# pvesm listbackup dir 0 1 83214928 8881340 11%guests lvm 0 1 1572851712 1279275008 81%local dir 0 1 83214928 8881340 11%
 
Try to remove the following file on the failing node:

# rm /var/lib/pve-manager/vzlist

Does that help?
 
i have removed the file but nothing better.
always the same nosync status for the node....
 
it seems that the "virt5" node as a specific problem with the "virt4" node.....
how work the sync mechanism, i view perl module PVE::etc
It seems all is encapsulated on the ssh port, is it correct ?
i attempt scp transfert between hosts and all is fine....
 
ok the problem is resolved.
The failing node was not updated, the update procedure has failed and we don't take care about that.......
all other nodes are in 1.8 and this node in 1.7.
re-updating the node solve the problem.
all is synchronized

thks for your help