Hi to all
Please, if somebody can help me, i will be very grateful
I have a serious problem with a VM that is in HA:
1- The VM crashes more or less once a week or is super slow after of more or less a week:
- I can't connect by ssh to the VM (when the VM is super slow)
- When i connect by ssh to PVE Host proxmox7 (when the VM is super slow), i see that the VM is using all cores of my HOST to 162%
Then first i run by CLI "qm stop 112" (because the option "shutdown" give me a timeout), and after that this VM isn't running, I run on PVE Host proxmox7 htop, and see that some other proccess is very high:
2- PVE Host show me this message in the tag "syslog" of PVE GUI
3- When i had configured this VM in HA, this VM did not boot automatically (with the "service rgmanager" and "join_fence" started), so i had that do click on "start" of PVE GUI for that the VM starts
- This is a part of my configuration of cluster.conf with the problem:
4- This is the configuration of my PVE Nodes:
proxmox-ve-2.6.32: 3.2-126 (running kernel: 2.6.32-29-pve)
pve-manager: 3.2-4 (running version: 3.2-4/e24a91c1)
pve-kernel-2.6.32-27-pve: 2.6.32-121
pve-kernel-2.6.32-28-pve: 2.6.32-124
pve-kernel-2.6.32-29-pve: 2.6.32-126
lvm2: 2.02.98-pve4
clvm: 2.02.98-pve4
corosync-pve: 1.4.5-1
openais-pve: 1.1.4-3
libqb0: 0.11.1-2
redhat-cluster-pve: 3.2.0-2
resource-agents-pve: 3.9.2-4
fence-agents-pve: 4.0.5-1
pve-cluster: 3.0-12
qemu-server: 3.1-16
pve-firmware: 1.1-3
libpve-common-perl: 3.0-18
libpve-access-control: 3.0-11
libpve-storage-perl: 3.0-19
pve-libspice-server1: 0.12.4-3
vncterm: 1.1-6
vzctl: 4.0-1pve5
vzprocps: 2.0.11-2
vzquota: 3.1-2
pve-qemu-kvm: 1.7-8
ksm-control-daemon: 1.1-1
glusterfs-client: 3.4.2-1
5- This is the configuration of my VM:
boot: c
bootdisk: virtio0
cores: 4
cpu: host
ide2: none,media=cdrom
memory: 8192
name: Centos63-x64-Mail
net0: virtio=866:B3:2F:0A:40,bridge=vmbr1
net1: virtio=D2F:CF:14B:6B,bridge=vmbr0
ostype: l26
sockets: 1
virtio0: Storage_HA-01-proxmox7-proxmox8:vm-112-disk-1,cache=diectsync,size=571756M
General notes:
-The Servers are DELL
- My RAID controller is by Hardware (MegaRAID SAS 2008), and don't have cache memory, but i think that is isn't a problem for that the VM hangs
- Fence for me isn't the problem (at least for now).
- The VM with problems is a CentOS 6.3 (that have his original kernel, but when this VM was running on PVE 2.3, never had problems, inclusive when this VM was in HA)
- My PVE Hosts and the VM uses I/O deadline scheduler after i did change of PVE to 3.2 version (before, PVE Host and VM had cfq configured)
- The LVM Virtual Group PVE don't have space free in my PVE Host, but TOM (a staff member of PVE) in the past said me that have free space on this VG is only necessary for the live backups of CTs that are running in this VG.
My Questions:
1- Why my VM hangs or is super slow after of more or less a week?, and how can i fix it?
2- Why rgmanager don't start this VM when i apply "reboot" to my PVE Node with problems, or when I did the settings in "HA" (please see my configuration of the file cluster.conf above and thinks that the fence is only in manual mode, ie with human interaction, then, never the VM will have that run in the other Node while i don't apply the manual fence)? ... may be that I should not have two "pvevm" with different "domain" directives in my cluster.conf file? or have PVE a bug?
3- Is correct that rgmanager shows the message that the VM 112 is running so very repetitively?, and if it is bad, how can i fix it?
4- What is better for the hardware bios configuration, power saving controlled by the hardware or by the kernel of OS?
Best regards
Cesar
Please, if somebody can help me, i will be very grateful
I have a serious problem with a VM that is in HA:
1- The VM crashes more or less once a week or is super slow after of more or less a week:
- I can't connect by ssh to the VM (when the VM is super slow)
- When i connect by ssh to PVE Host proxmox7 (when the VM is super slow), i see that the VM is using all cores of my HOST to 162%
Then first i run by CLI "qm stop 112" (because the option "shutdown" give me a timeout), and after that this VM isn't running, I run on PVE Host proxmox7 htop, and see that some other proccess is very high:
2- PVE Host show me this message in the tag "syslog" of PVE GUI
Code:
May 26 10:45:01 kvm7 /USR/SBIN/CRON[15429]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
May 26 10:45:07 proxmox7 pmxcfs[2903]: [status] notice: received log
May 26 10:45:33 proxmox7 rgmanager[15461]: [pvevm] VM 112 is running
May 26 10:45:43 proxmox7 rgmanager[15489]: [pvevm] VM 112 is running
May 26 10:46:23 proxmox7 rgmanager[15545]: [pvevm] VM 112 is running
May 26 10:46:43 proxmox7 rgmanager[15580]: [pvevm] VM 112 is running
May 26 10:47:03 proxmox7 rgmanager[15615]: [pvevm] VM 112 is running
May 26 10:47:43 proxmox7 rgmanager[15670]: [pvevm] VM 112 is running
May 26 10:47:53 proxmox7 rgmanager[15698]: [pvevm] VM 112 is running
May 26 10:48:23 proxmox7 rgmanager[15746]: [pvevm] VM 112 is running
May 26 10:49:03 proxmox7 rgmanager[15795]: [pvevm] VM 112 is running
May 26 10:49:13 proxmox7 rgmanager[15823]: [pvevm] VM 112 is running
May 26 10:49:53 proxmox7 rgmanager[15878]: [pvevm] VM 112 is running
May 26 10:50:13 proxmox7 rgmanager[15913]: [pvevm] VM 112 is running
May 26 10:50:33 proxmox7 rgmanager[15954]: [pvevm] VM 112 is running
May 26 10:51:14 proxmox7 rgmanager[16003]: [pvevm] VM 112 is running
May 26 10:51:23 proxmox7 rgmanager[16037]: [pvevm] VM 112 is running
May 26 10:51:53 proxmox7 rgmanager[16079]: [pvevm] VM 112 is running
May 26 10:52:23 proxmox7 rgmanager[16127]: [pvevm] VM 112 is running
May 26 10:52:33 proxmox7 rgmanager[16155]: [pvevm] VM 112 is running
May 26 10:53:03 proxmox7 rgmanager[16197]: [pvevm] VM 112 is running
May 26 10:53:23 proxmox7 rgmanager[16238]: [pvevm] VM 112 is running
May 26 10:53:43 proxmox7 rgmanager[16273]: [pvevm] VM 112 is running
May 26 10:54:13 proxmox7 rgmanager[16315]: [pvevm] VM 112 is running
May 26 10:54:23 proxmox7 rgmanager[16349]: [pvevm] VM 112 is running
May 26 10:54:44 proxmox7 rgmanager[16384]: [pvevm] VM 112 is running
...etc..etc...etc
3- When i had configured this VM in HA, this VM did not boot automatically (with the "service rgmanager" and "join_fence" started), so i had that do click on "start" of PVE GUI for that the VM starts
- This is a part of my configuration of cluster.conf with the problem:
Code:
<rm>
<pvevm autostart="1" vmid="112" domain="VM-Mail"/>
<pvevm autostart="1" vmid="113" domain="VM-Order"/>
<failoverdomains>
<failoverdomain name="VM-Mail" restricted="1" ordered="1" nofailback="1">
<failoverdomainnode name="proxmox7" priority="1"/>
<failoverdomainnode name="proxmox8" priority="10"/>
</failoverdomain>
<failoverdomain name="VM-Order" restricted="1" ordered="1" nofailback="1">
<failoverdomainnode name="proxmox8" priority="1"/>
<failoverdomainnode name="proxmox7" priority="10"/>
</failoverdomain>
</failoverdomains>
</rm>
4- This is the configuration of my PVE Nodes:
proxmox-ve-2.6.32: 3.2-126 (running kernel: 2.6.32-29-pve)
pve-manager: 3.2-4 (running version: 3.2-4/e24a91c1)
pve-kernel-2.6.32-27-pve: 2.6.32-121
pve-kernel-2.6.32-28-pve: 2.6.32-124
pve-kernel-2.6.32-29-pve: 2.6.32-126
lvm2: 2.02.98-pve4
clvm: 2.02.98-pve4
corosync-pve: 1.4.5-1
openais-pve: 1.1.4-3
libqb0: 0.11.1-2
redhat-cluster-pve: 3.2.0-2
resource-agents-pve: 3.9.2-4
fence-agents-pve: 4.0.5-1
pve-cluster: 3.0-12
qemu-server: 3.1-16
pve-firmware: 1.1-3
libpve-common-perl: 3.0-18
libpve-access-control: 3.0-11
libpve-storage-perl: 3.0-19
pve-libspice-server1: 0.12.4-3
vncterm: 1.1-6
vzctl: 4.0-1pve5
vzprocps: 2.0.11-2
vzquota: 3.1-2
pve-qemu-kvm: 1.7-8
ksm-control-daemon: 1.1-1
glusterfs-client: 3.4.2-1
5- This is the configuration of my VM:
boot: c
bootdisk: virtio0
cores: 4
cpu: host
ide2: none,media=cdrom
memory: 8192
name: Centos63-x64-Mail
net0: virtio=866:B3:2F:0A:40,bridge=vmbr1
net1: virtio=D2F:CF:14B:6B,bridge=vmbr0
ostype: l26
sockets: 1
virtio0: Storage_HA-01-proxmox7-proxmox8:vm-112-disk-1,cache=diectsync,size=571756M
General notes:
-The Servers are DELL
- My RAID controller is by Hardware (MegaRAID SAS 2008), and don't have cache memory, but i think that is isn't a problem for that the VM hangs
- Fence for me isn't the problem (at least for now).
- The VM with problems is a CentOS 6.3 (that have his original kernel, but when this VM was running on PVE 2.3, never had problems, inclusive when this VM was in HA)
- My PVE Hosts and the VM uses I/O deadline scheduler after i did change of PVE to 3.2 version (before, PVE Host and VM had cfq configured)
- The LVM Virtual Group PVE don't have space free in my PVE Host, but TOM (a staff member of PVE) in the past said me that have free space on this VG is only necessary for the live backups of CTs that are running in this VG.
My Questions:
1- Why my VM hangs or is super slow after of more or less a week?, and how can i fix it?
2- Why rgmanager don't start this VM when i apply "reboot" to my PVE Node with problems, or when I did the settings in "HA" (please see my configuration of the file cluster.conf above and thinks that the fence is only in manual mode, ie with human interaction, then, never the VM will have that run in the other Node while i don't apply the manual fence)? ... may be that I should not have two "pvevm" with different "domain" directives in my cluster.conf file? or have PVE a bug?
3- Is correct that rgmanager shows the message that the VM 112 is running so very repetitively?, and if it is bad, how can i fix it?
4- What is better for the hardware bios configuration, power saving controlled by the hardware or by the kernel of OS?
Best regards
Cesar
Attachments
Last edited: