New Kernel and bug fixes

argonym

New Member
Aug 20, 2012
2
0
1
Did the problem with bsod's on win 2003 x64 with high network bandwith fixed in this kernel?
Oh, I am not the only one who experienced this. Do you have links to a bug report or discussions? Does this for sure occur only on x64?
I just tried with most recent proxmox and a XP x64 guest with virtio-win-0.1-30, and it's still there. However, I don't get a BSOD, but errors when writing to a windows share due to dropped network packets.
 

spirit

Famous Member
Apr 2, 2010
3,565
164
83
www.odiso.com
Oh, I am not the only one who experienced this. Do you have links to a bug report or discussions? Does this for sure occur only on x64?
I just tried with most recent proxmox and a XP x64 guest with virtio-win-0.1-30, and it's still there. However, I don't get a BSOD, but errors when writing to a windows share due to dropped network packets.
maybe can you post an issue here:
https://github.com/YanVugenfirer/kvm-guest-drivers-windows

the redhat developper is really reactive.
 

fafdk

New Member
Aug 24, 2012
1
0
1
Denmark
dear all

I am using proxmox 2.1 version provided by OVH (on debian squeeze 64) and it works perfectly.

but when I do update & full-upgrade , without any other change, the machine becomes unreachable

after re starting the machine by ovh maintenance I got the following from pveversion -v. (see below)

It seems that the grub file is not updated with pve kernel by the process. proxmow web iterface is reachable but nothing works

thanks for your help and suggestions
Yes, I had the same problem with grub, but I could not boot at all!

What I did was to boot into recovery, chroot into my filesystem and reinstall and update grub. Don't know if you have a recovery-console with OVH?

All runs well now, no issues since the upgrade, not with Proxmox at least. Had to reinstall a few modules though, but all was good after that.
 

tin

Member
Aug 14, 2010
107
2
18
Northwest NSW, Australia
Unimportant UI bug: I noticed a spelling mistake in the host console output when shutting down my home server earlier. When stopping containers it has "Stoping" (missing a P).
Didn't think it warranted it's own thread :p
 

JonB

Member
Nov 28, 2011
30
0
6
Copenhagen
I had a problem with my server. It constantly flipflopped on the connection to the NUT (Network UPS Tools) server. Also, my SSH connection to my proxmoxve server (10 meters away) was laggy and scrolled in jumps. I noticed that something in dmesg had crashed, so I rebooted and it seems like the symptons vent away. But not the problem, because dmesg still has a problem. I have attached my pveversion diffed against the top one posted first in this thread. I have also attached my dmesg which contains the Call Trace: All my problems started after upgrading to Linux dkproxmoxve1.laerdal.global 2.6.32-14-pve #1 SMP Tue Aug 21 08:24:37 CEST 2012 x86_64 GNU/Linux
 

Attachments

spirit

Famous Member
Apr 2, 2010
3,565
164
83
www.odiso.com
The problem with slow laggy network is back, no change since last posting
Hi,
can you try to remove intel_iommu=on from your grub ?
I see a bad stacktrace in your dmesg
------------[ cut here ]------------
WARNING: at drivers/pci/intel-iommu.c:2775 intel_unmap_page+0x15f/0x180() (Not tainted)
 

spirit

Famous Member
Apr 2, 2010
3,565
164
83
www.odiso.com

JonB

Member
Nov 28, 2011
30
0
6
Copenhagen
You are very welcome to provide a kernel package, the machine is still in burnin fase so I do not have to schedule downtime. I have changed GRUB_CMDLINE_LINUX_DEFAULT="quiet", run update-grub2 and rebooted.
 

peetaur

New Member
Jun 29, 2012
11
0
1
Germany
I have 2 questions about the cman stopping bug. https://bugzilla.proxmox.com/show_bug.cgi?id=238

1) Does the fix also prevent cman from stopping during normal operation if quorum/connection is lost, or just when it boots?

2) And will a node now reconnect if it loses connection? Before, cman was down, so it would never reconnect, so this was impossible to test.
 

dietmar

Proxmox Staff Member
Staff member
Apr 28, 2005
16,521
324
103
Austria
www.proxmox.com
1) Does the fix also prevent cman from stopping during normal operation if quorum/connection is lost, or just when it boots?
cman does not stop during normal operation if quorum/connection is lost (that is new to me).
 

peetaur

New Member
Jun 29, 2012
11
0
1
Germany
A bit the other way around... Randomly I find cman stopped... and *if* quorum is lost *as a result* (due to more than one lost), I have to reboot some nodes to get quorum again before /etc/init.d/cman restart will make the node rejoin. And I'm curious on how to fix it.

I don't know for sure, but I think some transmission fails for unknown reasons, and then quorum is lost on that one node (and retained on the rest), and then cman is found stopped afterwards. Quorum on the whole cluster is only lost if enough nodes drop out that the votes is lower than expected.

Here's a short sample. If you'd like, I can start a separate thread and dump all my saved logs in there.

Code:
 root@bcvm2:~# clustat
Could not connect to CMAN: Connection refused
root@bcvm2:~# /etc/init.d/cman status
Found stale pid file
root@bcvm2:~# /etc/init.d/cman start
Starting cluster: 
   Checking if cluster has been disabled at boot... [  OK  ]
   Checking Network Manager... [  OK  ]
   Global setup... [  OK  ]
   Loading kernel modules... [  OK  ]
   Mounting configfs... [  OK  ]
   Starting cman... [  OK  ]
   Starting qdiskd... [  OK  ]
   Waiting for quorum... Timed-out waiting for cluster
[FAILED]
 

peetaur

New Member
Jun 29, 2012
11
0
1
Germany
cman does not stop during normal operation if quorum/connection is lost (that is new to me).
The forum lost my last post... so forgive me if it appears and then this is duplicate.

Here's what it looked like yesterday:

Code:
root@bcvm2:~# clustat
Could not connect to CMAN: Connection refused

root@bcvm2:~# /etc/init.d/cman status
Found stale pid file
The first node that dropped out had this in a log:

Code:
# gunzip -c /var/log/cluster/corosync.log.1.gz  | less
[...]
Aug 29 19:19:32 corosync [TOTEM ] FAILED TO RECEIVE
[...]
So I believe that some random packet failed and caused the node cluster communication to fail, and then rather than retrying, cman crashed or ended intentionally. Based on your post, I guess it wasn't intentional.

And then when trying to restart cman on a server (to regain quorum on the first server that was still connected).

Code:
root@bcvm2:~# /etc/init.d/cman start
Starting cluster: 
   Checking if cluster has been disabled at boot... [  OK  ]
   Checking Network Manager... [  OK  ]
   Global setup... [  OK  ]
   Loading kernel modules... [  OK  ]
   Mounting configfs... [  OK  ]
   Starting cman... [  OK  ]
   Starting qdiskd... [  OK  ]
   Waiting for quorum... Timed-out waiting for cluster
[FAILED]

root@bcvm2:~# clustat
Cluster Status for bcproxmox1 @ Thu Aug 30 10:27:40 2012
Member Status: Inquorate

 Member Name                                                     ID   Status
 ------ ----                                                     ---- ------
 bcvm2                                                               1 Online, Local
 bcvm3                                                               2 Offline
 bcvm1                                                               3 Offline
 /dev/loop1                                                          0 Offline, Quorum Disk
(And I'm guessing you'll have something to say about a loop device qdisk (for NFS), but it seems free of any side effects, and I can't use iSCSI without adding a new server; and the first time I had this exact same problem, I had no qdisk or loop device)
 

peetaur

New Member
Jun 29, 2012
11
0
1
Germany
cman does not stop during normal operation if quorum/connection is lost (that is new to me).
The forum lost my last 2 posts... so forgive me if it appears and then this is duplicate.

Here's what it looked like yesterday:

Code:
root@bcvm2:~# clustat
Could not connect to CMAN: Connection refused

root@bcvm2:~# /etc/init.d/cman status
Found stale pid file
The first node that dropped out had this in a log:

Code:
# gunzip -c /var/log/cluster/corosync.log.1.gz  | less
[...]
Aug 29 19:19:32 corosync [TOTEM ] FAILED TO RECEIVE
[...]
So I believe that some random packet failed and caused the node cluster communication to fail, and then rather than retrying, cman crashed or ended intentionally. Based on your post, I guess it wasn't intentional.

And then when trying to restart cman on a server (to regain quorum on the first server that was still connected).

Code:
root@bcvm2:~# /etc/init.d/cman start
Starting cluster:
   Checking if cluster has been disabled at boot... [  OK  ]
   Checking Network Manager... [  OK  ]
   Global setup... [  OK  ]
   Loading kernel modules... [  OK  ]
   Mounting configfs... [  OK  ]
   Starting cman... [  OK  ]
   Starting qdiskd... [  OK  ]
   Waiting for quorum... Timed-out waiting for cluster
[FAILED]

root@bcvm2:~# clustat
Cluster Status for bcproxmox1 @ Thu Aug 30 10:27:40 2012
Member Status: Inquorate

 Member Name                                                     ID   Status
 ------ ----                                                     ---- ------
 bcvm2                                                               1 Online, Local
 bcvm3                                                               2 Offline
 bcvm1                                                               3 Offline
 /dev/loop1                                                          0 Offline, Quorum Disk
(And I'm guessing you'll have something to say about a loop device qdisk (for NFS), but it seems free of any side effects, and I can't use iSCSI without adding a new server; and the first time I had this exact same problem, I had no qdisk or loop device)

And I have lots more logs to share, if you'd like to deal with this in another thread.
 

tom

Proxmox Staff Member
Staff member
Aug 29, 2006
13,797
445
103
The forum lost my last 2 posts... so forgive me if it appears and then this is duplicate.
the forum does not loose posts. post from new member are moderated, if you post you will see a short note.

as soon as you are a valued member of the forum, your posts will be visible immediately without moderation.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE and Proxmox Mail Gateway. We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get your own in 60 seconds.

Buy now!