Problems with pve-kernel-2.6.32

Discussion in 'Proxmox VE: Installation and configuration' started by Tallaril, Feb 12, 2012.

  1. Tallaril

    Tallaril New Member

    Joined:
    Feb 18, 2011
    Messages:
    23
    Likes Received:
    0
    Hi there,

    i have a strange problem with the above mentioned kernel. I'm running a cluster with 3 machines. 2 of them work fine, but the 3rd one is giving me continous errors in the syslog:

    Feb 12 16:22:29 bazinga kernel: [drm:drm_edid_block_valid] *ERROR* EDID checksum is invalid, remainder is 128
    Feb 12 16:22:29 bazinga kernel: [drm:drm_edid_block_valid] *ERROR* Raw EDID:
    Feb 12 16:22:29 bazinga kernel: <3>ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ................
    Feb 12 16:22:29 bazinga kernel: <3>ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ................
    Feb 12 16:22:29 bazinga kernel: <3>ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ................
    Feb 12 16:22:29 bazinga kernel: <3>ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ................
    Feb 12 16:22:29 bazinga kernel: <3>ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ................
    Feb 12 16:22:29 bazinga kernel: <3>ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ................
    Feb 12 16:22:29 bazinga kernel: <3>ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ................
    Feb 12 16:22:29 bazinga kernel: <3>ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ................
    Feb 12 16:22:29 bazinga kernel:
    Feb 12 16:22:29 bazinga kernel: [drm:drm_edid_block_valid] *ERROR* EDID checksum is invalid, remainder is 128
    Feb 12 16:22:29 bazinga kernel: [drm:drm_edid_block_valid] *ERROR* Raw EDID:
    Feb 12 16:22:29 bazinga kernel: <3>ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ................
    Feb 12 16:22:29 bazinga kernel: <3>ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ................
    Feb 12 16:22:29 bazinga kernel: <3>ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ................
    Feb 12 16:22:29 bazinga kernel: <3>ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ................
    Feb 12 16:22:29 bazinga kernel: <3>ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ................
    Feb 12 16:22:29 bazinga kernel: <3>ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ................
    Feb 12 16:22:29 bazinga kernel: <3>ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ................
    Feb 12 16:22:29 bazinga kernel: <3>ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ................


    This error starts at the moment i've installed this kernel. Proxmox installation works, but cluster configuration not.

    It seems that the fuse mounted /etc/pve is not working anymore after "pvecm add" .....

    Does anybody seen this error before?
     
    #1 Tallaril, Feb 12, 2012
    Last edited: Feb 12, 2012
  2. udo

    udo Well-Known Member
    Proxmox Subscriber

    Joined:
    Apr 22, 2009
    Messages:
    5,835
    Likes Received:
    158
    Hi,
    What kernel do you install? Is it a pve-installation or on top on squeeze?

    Please post the output of "pveversion -v".

    Udo
     
  3. Tallaril

    Tallaril New Member

    Joined:
    Feb 18, 2011
    Messages:
    23
    Likes Received:
    0
    This is a squeeze system with an pve kernel.

    # pveversion -v
    pve-manager: 2.0-18 (pve-manager/2.0/16283a5a)
    running kernel: 2.6.32-6-pve
    proxmox-ve-2.6.32: 2.0-55
    pve-kernel-2.6.32-6-pve: 2.6.32-55
    lvm2: 2.02.88-2pve1
    clvm: 2.02.88-2pve1
    corosync-pve: 1.4.1-1
    openais-pve: 1.1.4-1
    libqb: 0.6.0-1
    redhat-cluster-pve: 3.1.8-3
    pve-cluster: 1.0-17
    qemu-server: 2.0-13
    pve-firmware: 1.0-14
    libpve-common-perl: 1.0-11
    libpve-access-control: 1.0-5
    libpve-storage-perl: 2.0-9
    vncterm: 1.0-2
    vzctl: 3.0.29-3pve8
    vzprocps: 2.0.11-2
    vzquota: 3.0.12-3
    pve-qemu-kvm: 1.0-1
    ksm-control-daemon: 1.1-1


    The error show a problem with radeon or DVI ... but this is a server within a Datacenter, there no monitor attached and just a graphic onboard.




    # lspci -k
    00:00.0 Host bridge: Intel Corporation 3200/3210 Chipset DRAM Controller (rev 01)
    Subsystem: Super Micro Computer Inc Device d280
    Kernel driver in use: i3200_edac
    00:01.0 PCI bridge: Intel Corporation 3200/3210 Chipset Host-Primary PCI Express Bridge (rev 01)
    00:06.0 PCI bridge: Intel Corporation 3210 Chipset Host-Secondary PCI Express Bridge (rev 01)
    00:1a.0 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #4 (rev 02)
    Subsystem: Super Micro Computer Inc Device d280
    Kernel driver in use: uhci_hcd
    00:1a.1 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #5 (rev 02)
    Subsystem: Super Micro Computer Inc Device d280
    Kernel driver in use: uhci_hcd
    00:1a.2 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #6 (rev 02)
    Subsystem: Super Micro Computer Inc Device d280
    Kernel driver in use: uhci_hcd
    00:1a.7 USB Controller: Intel Corporation 82801I (ICH9 Family) USB2 EHCI Controller #2 (rev 02)
    Subsystem: Super Micro Computer Inc Device d280
    Kernel driver in use: ehci_hcd
    00:1c.0 PCI bridge: Intel Corporation 82801I (ICH9 Family) PCI Express Port 1 (rev 02)
    00:1c.4 PCI bridge: Intel Corporation 82801I (ICH9 Family) PCI Express Port 5 (rev 02)
    00:1c.5 PCI bridge: Intel Corporation 82801I (ICH9 Family) PCI Express Port 6 (rev 02)
    00:1d.0 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #1 (rev 02)
    Subsystem: Super Micro Computer Inc Device d280
    Kernel driver in use: uhci_hcd
    00:1d.1 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #2 (rev 02)
    Subsystem: Super Micro Computer Inc Device d280
    Kernel driver in use: uhci_hcd
    00:1d.2 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #3 (rev 02)
    Subsystem: Super Micro Computer Inc Device d280
    Kernel driver in use: uhci_hcd
    00:1d.7 USB Controller: Intel Corporation 82801I (ICH9 Family) USB2 EHCI Controller #1 (rev 02)
    Subsystem: Super Micro Computer Inc Device d280
    Kernel driver in use: ehci_hcd
    00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev 92)
    00:1f.0 ISA bridge: Intel Corporation 82801IR (ICH9R) LPC Interface Controller (rev 02)
    Subsystem: Super Micro Computer Inc Device d280
    00:1f.2 IDE interface: Intel Corporation 82801IR/IO/IH (ICH9R/DO/DH) 4 port SATA IDE Controller (rev 02)
    Subsystem: Super Micro Computer Inc Device d280
    Kernel driver in use: ata_piix
    00:1f.3 SMBus: Intel Corporation 82801I (ICH9 Family) SMBus Controller (rev 02)
    Subsystem: Super Micro Computer Inc Device d280
    Kernel driver in use: i801_smbus
    00:1f.5 IDE interface: Intel Corporation 82801I (ICH9 Family) 2 port SATA IDE Controller (rev 02)
    Subsystem: Super Micro Computer Inc Device d280
    Kernel driver in use: ata_piix
    00:1f.6 Signal processing controller: Intel Corporation 82801I (ICH9 Family) Thermal Subsystem (rev 02)
    Subsystem: Super Micro Computer Inc Device 0000
    01:00.0 PCI bridge: Intel Corporation 6700PXH PCI Express-to-PCI Bridge A (rev 09)
    01:00.1 PIC: Intel Corporation 6700/6702PXH I/OxAPIC Interrupt Controller A (rev 09)
    Subsystem: Super Micro Computer Inc Device d280
    01:00.2 PCI bridge: Intel Corporation 6700PXH PCI Express-to-PCI Bridge B (rev 09)
    01:00.3 PIC: Intel Corporation 6700PXH I/OxAPIC Interrupt Controller B (rev 09)
    Subsystem: Super Micro Computer Inc Device d280
    03:01.0 RAID bus controller: 3ware Inc 9550SX SATA-II RAID PCI-X
    Subsystem: 3ware Inc 9550SX SATA-II RAID PCI-X
    Kernel driver in use: 3w-9xxx
    04:00.0 RAID bus controller: 3ware Inc 9650SE SATA-II RAID PCIe (rev 01)
    Subsystem: 3ware Inc 9650SE SATA-II RAID PCIe
    Kernel driver in use: 3w-9xxx
    0d:00.0 Ethernet controller: Intel Corporation 82573E Gigabit Ethernet Controller (Copper) (rev 03)
    Subsystem: Super Micro Computer Inc Device 108c
    Kernel driver in use: e1000e
    0f:00.0 Ethernet controller: Intel Corporation 82573L Gigabit Ethernet Controller
    Subsystem: Super Micro Computer Inc Device 109a
    Kernel driver in use: e1000e
    11:04.0 VGA compatible controller: ATI Technologies Inc ES1000 (rev 02)
    Subsystem: Super Micro Computer Inc Device d280
    Kernel driver in use: radeon


    Anyway, the proxmox installation is no problem and as a single node it should work fine (even with this error). But as soon as i try to connect this server to the cluster, all data in /etc/pve dissapear and the cluster configuration fails.

    The only thing i've found in the log is:


    vestatd[6259]: WARNING: ipcc_send_rec failed: Verbindungsaufbau abgelehnt


    This server is connected with an internal address to the other nodes. (IP multicast workd fine with the other nodes)


    This is the only machine with this trouble....
     
  4. udo

    udo Well-Known Member
    Proxmox Subscriber

    Joined:
    Apr 22, 2009
    Messages:
    5,835
    Likes Received:
    158
    Hmm,
    do you can ssh to the other clusternodes without password?
    Show the output of "pvecm status" any hints (compare with the output of the running cluster).

    Udo
     
  5. Tallaril

    Tallaril New Member

    Joined:
    Feb 18, 2011
    Messages:
    23
    Likes Received:
    0
    this is the output on the master:

    # pvecm status
    Version: 6.2.0
    Config Version: 13
    Cluster Name: t4hosting
    Cluster Id: 50071
    Cluster Member: Yes
    Cluster Generation: 60
    Membership state: Cluster-Member
    Nodes: 2
    Expected votes: 2
    Total votes: 2
    Node votes: 1
    Quorum: 2
    Active subsystems: 5
    Flags:
    Ports Bound: 0
    Node name: bazinga-2
    Node ID: 1
    Multicast addresses: 239.192.195.91
    Node addresses: 109.234.106.16


    and this on the not working node:

    pvecm status
    Version: 6.2.0
    Config Version: 12
    Cluster Name: t4hosting
    Cluster Id: 50071
    Cluster Member: Yes
    Cluster Generation: 12
    Membership state: Cluster-Member
    Nodes: 1
    Expected votes: 3
    Total votes: 1
    Node votes: 1
    Quorum: 2 Activity blocked
    Active subsystems: 2
    Flags:
    Ports Bound: 0
    Node name: bazinga
    Node ID: 3
    Multicast addresses: 239.192.195.91
    Node addresses: 109.234.106.10

    Yes, i can do a ssh connect without password.
     
  6. udo

    udo Well-Known Member
    Proxmox Subscriber

    Joined:
    Apr 22, 2009
    Messages:
    5,835
    Likes Received:
    158
    Hi,
    hmm - without quorum no go...
    What happens, if you change expected votes to 1 on the non-working node?
    ( pvecm expected 1)

    Udo
     
  7. Tallaril

    Tallaril New Member

    Joined:
    Feb 18, 2011
    Messages:
    23
    Likes Received:
    0
    The main problem is, that the folder /etc/pve on the not working node looks like this:

    /etc/pve# ll
    insgesamt 1
    -r--r----- 1 root www-data 465 12. Feb 17:43 cluster.conf
    lr-xr-x--- 1 root www-data 0 1. Jan 1970 local -> nodes/bazinga
    lr-xr-x--- 1 root www-data 0 1. Jan 1970 openvz -> nodes/bazinga/openvz
    lr-xr-x--- 1 root www-data 0 1. Jan 1970 qemu-server -> nodes/bazinga/qemu-server


    and on th eworking cluster nodes like this:

    /etc/pve# ll
    insgesamt 4
    -rw-r----- 1 root www-data 451 6. Feb 20:11 authkey.pub
    -rw-r----- 1 root www-data 415 12. Feb 18:11 cluster.conf
    -rw-r----- 1 root www-data 465 12. Feb 18:11 cluster.conf.old
    lrwxr-x--- 1 root www-data 0 1. Jan 1970 local -> nodes/bazinga-2
    drwxr-x--- 2 root www-data 0 6. Feb 20:11 nodes
    lrwxr-x--- 1 root www-data 0 1. Jan 1970 openvz -> nodes/bazinga-2/openvz
    drwx------ 2 root www-data 0 6. Feb 20:11 priv
    -rw-r----- 1 root www-data 1533 6. Feb 20:11 pve-root-ca.pem
    -rw-r----- 1 root www-data 1679 6. Feb 20:11 pve-www.key
    lrwxr-x--- 1 root www-data 0 1. Jan 1970 qemu-server -> nodes/bazinga-2/qemu-server
    -rw-r----- 1 root www-data 526 9. Feb 12:17 storage.cfg
    -rw-r----- 1 root www-data 89 8. Feb 21:49 user.cfg
    -rw-r----- 1 root www-data 1473 11. Feb 14:55 vzdump.cron


    The not working nodes folder looks ok until i use pvecm add .... after this command it looks like above showed
     
  8. udo

    udo Well-Known Member
    Proxmox Subscriber

    Joined:
    Apr 22, 2009
    Messages:
    5,835
    Likes Received:
    158
    Hi,
    that's normal without quorum! An clustermember without quorum have the /etc/pve-filesystem readonly. If your new node get the quorum (and joined sucessfull) the pve-fs will sync with the right data.

    Udo
     
  9. Tallaril

    Tallaril New Member

    Joined:
    Feb 18, 2011
    Messages:
    23
    Likes Received:
    0
    All right, thanks, that's clear now.

    I've changed it to expected 1 ... but no changes on the error. I try to restart cman:

    # /etc/init.d/cman restart
    Stopping cluster:
    Stopping dlm_controld... [ OK ]
    Stopping fenced... [ OK ]
    Stopping cman... [ OK ]
    Waiting for corosync to shutdown:[ OK ]
    Unloading kernel modules... [ OK ]
    Unmounting configfs... [ OK ]
    Starting cluster:
    Checking if cluster has been disabled at boot... [ OK ]
    Checking Network Manager... [ OK ]
    Global setup... [ OK ]
    Loading kernel modules... [ OK ]
    Mounting configfs... [ OK ]
    Starting cman... /usr/share/cluster/cluster.rng:992: element ref: Relax-NG parser error : Reference PVEVM has no matching definition
    /usr/share/cluster/cluster.rng:992: element ref: Relax-NG parser error : Internal found no define for ref PVEVM
    Relax-NG schema /usr/share/cluster/cluster.rng failed to compile
    [ OK ]
    Waiting for quorum... Timed-out waiting for cluster
    [FAILED]

    That's of course the non working node ......
     
  10. Tallaril

    Tallaril New Member

    Joined:
    Feb 18, 2011
    Messages:
    23
    Likes Received:
    0
    I guess i know what the problem is.

    The master and the first node are within the same rack and connected to the same switch, but this node is within a different rack and connected to a different switch.

    All three nodes are connected with eth1 to an own internal switch.

    Is it possible, that quorum try's to connect just through eth0 instead of the internal address?

    my /etc/hosts looks like this:

    127.0.0.1 localhost109.234.106.16 bazinga-2.t4hosting.de bazinga-2


    192.168.10.4 leonidas
    192.168.10.2 bazinga-2
    192.168.10.3 bazinga

    but pvecm status shows:
    # pvecm status
    Version: 6.2.0
    Config Version: 17
    Cluster Name: t4hosting
    Cluster Id: 50071
    Cluster Member: Yes
    Cluster Generation: 68
    Membership state: Cluster-Member
    Nodes: 2
    Expected votes: 4
    Total votes: 2
    Node votes: 1
    Quorum: 3 Activity blocked
    Active subsystems: 1
    Flags:
    Ports Bound: 0
    Node name: bazinga-2
    Node ID: 1
    Multicast addresses: 239.192.195.91
    Node addresses: 109.234.106.16
     
  11. Tallaril

    Tallaril New Member

    Joined:
    Feb 18, 2011
    Messages:
    23
    Likes Received:
    0
    Ok, i did it .... it was indeed the IP multicast ....

    the problem was, taht the external IP had the same shortcut (bazinga) as the internal ... i've removed the shortcut from the external ... and since then ... it works ...

    Thanks for your help udo :)
     
  1. This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
    By continuing to use this site, you are consenting to our use of cookies.
    Dismiss Notice