Hi,
I love Proxmox, but we continue to be plagued by Proxmox 2.3 stability issues.
We have broken the problem down into a few different issues:
I'd like to know where to look/what to try to understand/fix these issues -- all suggestions are appreciated!!
All of our nodes are:
I love Proxmox, but we continue to be plagued by Proxmox 2.3 stability issues.
We have broken the problem down into a few different issues:
- IDE drives no longer appear to be reliable (with both local and CEPH storage). For machines that we cannot upgrade to VIO, we are migrating back to the unsupported PVE 1.9 cluster where these machines run for months without issues (best case on PVE 2.3 is days and for some VM's, not even 24 hours). We get either "lost interrupt" (linux) or "soft error" or "timeout" depending on the OS.
- WARNING: unable to connect to VM 197 socket - timeout after 31 retries
We get these in the daemon.log (mostly from pvestatd and pvedaemon, but also from qm) - The latest OpenBSD (5.3, vio disk, vio network, ballooning) will hang under intense disk io (2GB rsync copy)
I'd like to know where to look/what to try to understand/fix these issues -- all suggestions are appreciated!!
All of our nodes are:
pve-manager: 2.3-13 (pve-manager/2.3/7946f1f1)running kernel: 2.6.32-19-pve
proxmox-ve-2.6.32: 2.3-95
pve-kernel-2.6.32-11-pve: 2.6.32-66
pve-kernel-2.6.32-19-pve: 2.6.32-95
lvm2: 2.02.95-1pve2
clvm: 2.02.95-1pve2
corosync-pve: 1.4.4-4
openais-pve: 1.1.4-2
libqb: 0.10.1-2
redhat-cluster-pve: 3.1.93-2
resource-agents-pve: 3.9.2-3
fence-agents-pve: 3.1.9-1
pve-cluster: 1.0-36
qemu-server: 2.3-20
pve-firmware: 1.0-21
libpve-common-perl: 1.0-49
libpve-access-control: 1.0-26
libpve-storage-perl: 2.3-7
vncterm: 1.0-4
vzctl: 4.0-1pve2
vzprocps: 2.0.11-2
vzquota: 3.1-1
pve-qemu-kvm: 1.4-10
ksm-control-daemon: 1.1-1