RDS/Terminal Server random freeze, console not responding

Discussion in 'Proxmox VE: Installation and configuration' started by check-ict, Aug 20, 2016.

  1. check-ict

    check-ict Member

    Joined:
    Apr 19, 2011
    Messages:
    93
    Likes Received:
    1
    Hello,

    We have a issue with out VM's for over 2 years now. We host multiple Terminal Servers within our Proxmox cluster. They are based on Server 2012 R2 with multiple user sessions and normal programs like Office.

    Sometimes, these VM's randomly freeze. This happends only once in 2 or 3 month per VM. But we are running about 15 Terminal Servers now and it's a routine now to reboot them every week.

    We changed all hardware 3 months ago, moving from Hybrid ZFS storage to SSD ZFS storage. The IO is never a issue, the SSD's are really fast and the disk IO is minimal. Also we changed all Proxmox nodes with new hardware and installed Proxmox 4.2 (old setup was 3.4). We connect to the ZFS storage with ZFS over a bond of 4x1Gbit (intel quad port).

    We also run about 120 other VM's who don't have these issues, it only happends on Terminal Servers where users log in.

    This is what happends:
    The VM becomes unreachable, however you can still ping the VM and Nagios checks work (like disk, cpu, uptime etc.). Noone is able to login to the server. When we open the console, the welcome screen of Windows shows and you have to send CTRL+ALT+DELETE. When we sent this command, nothing happends. Also a reset or shutdown does not work. The only way to fix it is to stop the VM and then start it.

    When we look back in the logs, nothing was wrong. There was no load, no special user action, no errors...

    We also tried E1000 and IDE instead of Virtio but this doesn't solve the problem. We do notice that Virtio disks on a Terminal Server seem to hang faster (once a month per VM, where IDE is about once in 3 months).

    These 15 terminal servers all have different software, different installation times, some have latest updates and some not. They are for different customers and have different workload. The only thing they have in common is they are Terminal Servers where user log in to.

    We allready disabled SWAP, DEP and AV. Disabling SWAP seems to have a little impact, it crashes less.

    Our indications go to storage, however we have really fast storage. We use 2 live SSD servers (24x1TB SSD each storage unit) and VM's have problems on both of them. Our previous storage was 12x2TB disks + 2x SSD for cache, they had the same problems and my thoughts were that these units might be overloaded on peak moments by other VM's.

    Also we noticed that this problem does not occur when we use local disks. It only happends on NFS shared storage so far. We have a Proxmox node that has 8x1TB disks + 2x500 GB SSD with ZFS and that works fine, Terminal Servers don't crash on this stand-alone server.

    Current set-up:
    5x Proxmox 4.2 nodes with 2 ports in LACP bond
    2x SSD ZFS NFS nodes with 4 ports in ALB bond
    Managed 48p gigabit switch with LACP memberships for the Proxmox nodes

    Any suggestions what we can do to prevent the Terminal Servers from freezing? Any idea why the Proxmox console does not work on the moment the VM freezes? And what can prevent a "reset" from working?

    The biggest question is, is this a Proxmox/KVM issue? NFS issue? Windows issue? We need to know because last time we thought it was hardware/proxmox version and after a large upgrade of all hardware and software the result is the same.
     
  2. sumsum

    sumsum Member
    Proxmox Subscriber

    Joined:
    Oct 26, 2009
    Messages:
    157
    Likes Received:
    2
    Have you ever did some NFS Performance Tests ? How does your NFS configuration Looks like?

    We've had similar Problems on Proxmox verions prior version 3. some of Our Engineers nailed the issues down by tweaking the tcp Stack and NFS configuration.

    Your issues may be NFS related and not Hardware.
     
  3. check-ict

    check-ict Member

    Joined:
    Apr 19, 2011
    Messages:
    93
    Likes Received:
    1
    We have no special NFS settings.

    We installed Debian 8 with nfs-kernel-server, installed zfs on linux and did "zfs sharenfs=on data/proxmox".

    Please let me know if I need to tweak something. We can try this on a third test server and check the results.
     
  4. LnxBil

    LnxBil Well-Known Member

    Joined:
    Feb 21, 2015
    Messages:
    3,804
    Likes Received:
    346
    Naturally, if you only have a problem in windows, it's a windows problem. What about the logs there? Since monitoring reports nothing wrong as you said, it has to be something with the terminal server itself. Have you tried to debug the network stack to see if something is wrong there? QEMU and therefore Proxmox VE opens for every VM at least one (according to the the number of NICs) tap interfaces, where you can tcpdump the traffic and analyze with wireshark. Maybe this helps.
     
  5. check-ict

    check-ict Member

    Joined:
    Apr 19, 2011
    Messages:
    93
    Likes Received:
    1
    I will check with tcpdump on a hanging VM next time. Still weird that Proxmox console also locks up, only stop/start works.
     
  6. christophe

    christophe Member

    Joined:
    Mar 31, 2011
    Messages:
    143
    Likes Received:
    2
    Hi,

    Anything new on this problem?
    We had a few years ago similar random hangs with nothing in event log.
    Then we moved to a network syslog and saw event id 129 (viostor). In fact, VM does NOT hang, but becomes unresponsive and seems freezed. It is just waiting for storage...
    See this thread : https://forum.proxmox.com/threads/some-windows-guests-hanging-every-night.20046.
    All affected VMs were RDS servers.
    Storage was on a SAN, not a NAS, but your description seems really close to what we observed...

    One very interesting point is that you didn't see the problem happen on local storage : i'll try that.

    Christophe.
     
  7. check-ict

    check-ict Member

    Joined:
    Apr 19, 2011
    Messages:
    93
    Likes Received:
    1
    Nothing new yet, we just stop/start the VM's that freeze.

    We have multiple customers that have a local server. We install them stand-alone with Proxmox/ZFS and use local disk storage. These customers have no problems running a Terminal Server. We also have 1 server in our datacenter that runs 2 Terminal Servers without freezes.

    We created a extra Nagios check that tells us if a Terminal Server is frozen so we can reboot them asap. This reduces the impact a lot, so a customer will only notice a hangup once a year during business hours.

    Not a perfect solution but it works for now. Starting to look at other virtualisation software meanwhile.
     
    mstsas likes this.
  8. mstsas

    mstsas New Member
    Proxmox Subscriber

    Joined:
    Nov 21, 2014
    Messages:
    5
    Likes Received:
    0
    Hi,

    I have a similar issue on new installation of Proxmox 5.3.

    # pveversion -v
    proxmox-ve: 5.3-1 (running kernel: 4.15.18-9-pve)
    pve-manager: 5.3-8 (running version: 5.3-8/2929af8e)
    pve-kernel-4.15: 5.2-12
    pve-kernel-4.15.18-9-pve: 4.15.18-30
    corosync: 2.4.4-pve1
    criu: 2.11.1-1~bpo90
    glusterfs-client: 3.8.8-1
    ksm-control-daemon: 1.2-2
    libjs-extjs: 6.0.1-2
    libpve-access-control: 5.1-3
    libpve-apiclient-perl: 2.0-5
    libpve-common-perl: 5.0-43
    libpve-guest-common-perl: 2.0-19
    libpve-http-server-perl: 2.0-11
    libpve-storage-perl: 5.0-35
    libqb0: 1.0.3-1~bpo9
    lvm2: 2.02.168-pve6
    lxc-pve: 3.1.0-1
    lxcfs: 3.0.2-2
    novnc-pve: 1.0.0-2
    proxmox-widget-toolkit: 1.0-22
    pve-cluster: 5.0-33
    pve-container: 2.0-33
    pve-docs: 5.3-1
    pve-edk2-firmware: 1.20181023-1
    pve-firewall: 3.0-17
    pve-firmware: 2.0-6
    pve-ha-manager: 2.0-6
    pve-i18n: 1.0-9
    pve-libspice-server1: 0.14.1-1
    pve-qemu-kvm: 2.12.1-1
    pve-xtermjs: 1.0-5
    qemu-server: 5.0-44
    smartmontools: 6.5+svn4324-1
    spiceterm: 3.0-5
    vncterm: 1.5-3

    I've 3 VM identical with Windows 2012 R2 and Terminal Server. Randomly once of this server become unresponsive and solution is start/stop.

    I've checked host log, VM log and there no error or warnings. Any suggestion will be appreciated.
     
  9. mstsas

    mstsas New Member
    Proxmox Subscriber

    Joined:
    Nov 21, 2014
    Messages:
    5
    Likes Received:
    0
    How to check if Terminal Server is frozen?
     
  10. check-ict

    check-ict Member

    Joined:
    Apr 19, 2011
    Messages:
    93
    Likes Received:
    1
    We solved our problem. We used stable virtio drivers (not latest) and we disable RSC on ipv4 and ipv6 on the virtio adapters.
     
  11. mstsas

    mstsas New Member
    Proxmox Subscriber

    Joined:
    Nov 21, 2014
    Messages:
    5
    Likes Received:
    0
    Ok check-ict,

    we too use stable virtio drivers but, I don't know how disable RSC. Can you give me instruction, please?!
     
  12. check-ict

    check-ict Member

    Joined:
    Apr 19, 2011
    Messages:
    93
    Likes Received:
    1
    On the hosted Windows Server OS, navigate to Network Connections, open Ethernet Adapter properties and disable RSC for both IPv4 and iPv6 Once you have manually disabled RSC on the network adapter, you must restart the machine for the changes to take effect.
     
  13. mstsas

    mstsas New Member
    Proxmox Subscriber

    Joined:
    Nov 21, 2014
    Messages:
    5
    Likes Received:
    0
    I've applied this changed Monday but, Yesterday the issue came back!!!! :(
    Hosted Windows Server response very slowly both vnc console and rdp logon (10-20 sec. timeout). We accept suggestions on where to investigate, please!!!
     
  1. This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
    By continuing to use this site, you are consenting to our use of cookies.
    Dismiss Notice