New Kernel and bug fixes

Did the problem with bsod's on win 2003 x64 with high network bandwith fixed in this kernel?
Oh, I am not the only one who experienced this. Do you have links to a bug report or discussions? Does this for sure occur only on x64?
I just tried with most recent proxmox and a XP x64 guest with virtio-win-0.1-30, and it's still there. However, I don't get a BSOD, but errors when writing to a windows share due to dropped network packets.
 
Oh, I am not the only one who experienced this. Do you have links to a bug report or discussions? Does this for sure occur only on x64?
I just tried with most recent proxmox and a XP x64 guest with virtio-win-0.1-30, and it's still there. However, I don't get a BSOD, but errors when writing to a windows share due to dropped network packets.
maybe can you post an issue here:
https://github.com/YanVugenfirer/kvm-guest-drivers-windows

the redhat developper is really reactive.
 
dear all

I am using proxmox 2.1 version provided by OVH (on debian squeeze 64) and it works perfectly.

but when I do update & full-upgrade , without any other change, the machine becomes unreachable

after re starting the machine by ovh maintenance I got the following from pveversion -v. (see below)

It seems that the grub file is not updated with pve kernel by the process. proxmow web iterface is reachable but nothing works

thanks for your help and suggestions

Yes, I had the same problem with grub, but I could not boot at all!

What I did was to boot into recovery, chroot into my filesystem and reinstall and update grub. Don't know if you have a recovery-console with OVH?

All runs well now, no issues since the upgrade, not with Proxmox at least. Had to reinstall a few modules though, but all was good after that.
 
Unimportant UI bug: I noticed a spelling mistake in the host console output when shutting down my home server earlier. When stopping containers it has "Stoping" (missing a P).
Didn't think it warranted it's own thread :p
 
I had a problem with my server. It constantly flipflopped on the connection to the NUT (Network UPS Tools) server. Also, my SSH connection to my proxmoxve server (10 meters away) was laggy and scrolled in jumps. I noticed that something in dmesg had crashed, so I rebooted and it seems like the symptons vent away. But not the problem, because dmesg still has a problem. I have attached my pveversion diffed against the top one posted first in this thread. I have also attached my dmesg which contains the Call Trace: All my problems started after upgrading to Linux dkproxmoxve1.laerdal.global 2.6.32-14-pve #1 SMP Tue Aug 21 08:24:37 CEST 2012 x86_64 GNU/Linux
 

Attachments

  • diff.pveversion.txt
    1,018 bytes · Views: 9
  • dmesg.zip
    14.8 KB · Views: 5
The problem with slow laggy network is back, no change since last posting
Hi,
can you try to remove intel_iommu=on from your grub ?
I see a bad stacktrace in your dmesg
------------[ cut here ]------------
WARNING: at drivers/pci/intel-iommu.c:2775 intel_unmap_page+0x15f/0x180() (Not tainted)
 
You are very welcome to provide a kernel package, the machine is still in burnin fase so I do not have to schedule downtime. I have changed GRUB_CMDLINE_LINUX_DEFAULT="quiet", run update-grub2 and rebooted.
 
I have 2 questions about the cman stopping bug. https://bugzilla.proxmox.com/show_bug.cgi?id=238

1) Does the fix also prevent cman from stopping during normal operation if quorum/connection is lost, or just when it boots?

2) And will a node now reconnect if it loses connection? Before, cman was down, so it would never reconnect, so this was impossible to test.
 
1) Does the fix also prevent cman from stopping during normal operation if quorum/connection is lost, or just when it boots?

cman does not stop during normal operation if quorum/connection is lost (that is new to me).
 
A bit the other way around... Randomly I find cman stopped... and *if* quorum is lost *as a result* (due to more than one lost), I have to reboot some nodes to get quorum again before /etc/init.d/cman restart will make the node rejoin. And I'm curious on how to fix it.

I don't know for sure, but I think some transmission fails for unknown reasons, and then quorum is lost on that one node (and retained on the rest), and then cman is found stopped afterwards. Quorum on the whole cluster is only lost if enough nodes drop out that the votes is lower than expected.

Here's a short sample. If you'd like, I can start a separate thread and dump all my saved logs in there.

Code:
 root@bcvm2:~# clustat
Could not connect to CMAN: Connection refused
root@bcvm2:~# /etc/init.d/cman status
Found stale pid file
root@bcvm2:~# /etc/init.d/cman start
Starting cluster: 
   Checking if cluster has been disabled at boot... [  OK  ]
   Checking Network Manager... [  OK  ]
   Global setup... [  OK  ]
   Loading kernel modules... [  OK  ]
   Mounting configfs... [  OK  ]
   Starting cman... [  OK  ]
   Starting qdiskd... [  OK  ]
   Waiting for quorum... Timed-out waiting for cluster
[FAILED]
 
cman does not stop during normal operation if quorum/connection is lost (that is new to me).

The forum lost my last post... so forgive me if it appears and then this is duplicate.

Here's what it looked like yesterday:

Code:
root@bcvm2:~# clustat
Could not connect to CMAN: Connection refused

root@bcvm2:~# /etc/init.d/cman status
Found stale pid file

The first node that dropped out had this in a log:

Code:
# gunzip -c /var/log/cluster/corosync.log.1.gz  | less
[...]
Aug 29 19:19:32 corosync [TOTEM ] FAILED TO RECEIVE
[...]

So I believe that some random packet failed and caused the node cluster communication to fail, and then rather than retrying, cman crashed or ended intentionally. Based on your post, I guess it wasn't intentional.

And then when trying to restart cman on a server (to regain quorum on the first server that was still connected).

Code:
root@bcvm2:~# /etc/init.d/cman start
Starting cluster: 
   Checking if cluster has been disabled at boot... [  OK  ]
   Checking Network Manager... [  OK  ]
   Global setup... [  OK  ]
   Loading kernel modules... [  OK  ]
   Mounting configfs... [  OK  ]
   Starting cman... [  OK  ]
   Starting qdiskd... [  OK  ]
   Waiting for quorum... Timed-out waiting for cluster
[FAILED]

root@bcvm2:~# clustat
Cluster Status for bcproxmox1 @ Thu Aug 30 10:27:40 2012
Member Status: Inquorate

 Member Name                                                     ID   Status
 ------ ----                                                     ---- ------
 bcvm2                                                               1 Online, Local
 bcvm3                                                               2 Offline
 bcvm1                                                               3 Offline
 /dev/loop1                                                          0 Offline, Quorum Disk

(And I'm guessing you'll have something to say about a loop device qdisk (for NFS), but it seems free of any side effects, and I can't use iSCSI without adding a new server; and the first time I had this exact same problem, I had no qdisk or loop device)
 
cman does not stop during normal operation if quorum/connection is lost (that is new to me).

The forum lost my last 2 posts... so forgive me if it appears and then this is duplicate.

Here's what it looked like yesterday:

Code:
root@bcvm2:~# clustat
Could not connect to CMAN: Connection refused

root@bcvm2:~# /etc/init.d/cman status
Found stale pid file

The first node that dropped out had this in a log:

Code:
# gunzip -c /var/log/cluster/corosync.log.1.gz  | less
[...]
Aug 29 19:19:32 corosync [TOTEM ] FAILED TO RECEIVE
[...]

So I believe that some random packet failed and caused the node cluster communication to fail, and then rather than retrying, cman crashed or ended intentionally. Based on your post, I guess it wasn't intentional.

And then when trying to restart cman on a server (to regain quorum on the first server that was still connected).

Code:
root@bcvm2:~# /etc/init.d/cman start
Starting cluster:
   Checking if cluster has been disabled at boot... [  OK  ]
   Checking Network Manager... [  OK  ]
   Global setup... [  OK  ]
   Loading kernel modules... [  OK  ]
   Mounting configfs... [  OK  ]
   Starting cman... [  OK  ]
   Starting qdiskd... [  OK  ]
   Waiting for quorum... Timed-out waiting for cluster
[FAILED]

root@bcvm2:~# clustat
Cluster Status for bcproxmox1 @ Thu Aug 30 10:27:40 2012
Member Status: Inquorate

 Member Name                                                     ID   Status
 ------ ----                                                     ---- ------
 bcvm2                                                               1 Online, Local
 bcvm3                                                               2 Offline
 bcvm1                                                               3 Offline
 /dev/loop1                                                          0 Offline, Quorum Disk

(And I'm guessing you'll have something to say about a loop device qdisk (for NFS), but it seems free of any side effects, and I can't use iSCSI without adding a new server; and the first time I had this exact same problem, I had no qdisk or loop device)

And I have lots more logs to share, if you'd like to deal with this in another thread.
 
The forum lost my last 2 posts... so forgive me if it appears and then this is duplicate.

the forum does not loose posts. post from new member are moderated, if you post you will see a short note.

as soon as you are a valued member of the forum, your posts will be visible immediately without moderation.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!