Error kvm : cpu0 unhandled wrmsr & unhandled rdmsr

htjioe

New Member
Jul 2, 2009
8
0
1
Hi all,

I have aproblem restarting a guest on my proxmox machine. It runs for ages but since last week almost all guest are torn down.
I see the following messages on the console:

kvm: 4283: cpu0 kvm_set_msr_common: MSR_IA32_MCG_STATUS 0x0, nop
kvm: 4283: cpu0 kvm_set_msr_common: MSR_IA32_MCG_CTL 0xffffffffffffffff, nop
kvm: 4283: cpu0 unhandled wrmsr: 0x400 data ffffffffffffffff
kvm: 4283: cpu0 unhandled wrmsr: 0x198 data 0
kvm: 4283: cpu0 kvm_set_msr_common: MSR_IA32_MCG_STATUS 0x0, nop
kvm: 4283: cpu0 unhandled rdmsr: 0x401
kvm: 4283: cpu0 unhandled rdmsr: 0x401
kvm: 4283: cpu0 unhandled rdmsr: 0x401
kvm: 4283: cpu0 unhandled rdmsr: 0x401
kvm: 4283: cpu0 unhandled rdmsr: 0x401
kvm: 4283: cpu0 unhandled rdmsr: 0x401
printk: 38 messages suppressed.
kvm: 4283: cpu0 unhandled rdmsr: 0x401
vmtab108i0d0: no IPv6 routers present
vmbr0: port 4(vmtab108i0d0) entering disabled state
vmbr0: port 4(vmtab108i0d0) entering disabled state

My installation version is:
pve-manager: 1.6-8 (pve-manager/1.6/5296)
running kernel: 2.6.24-9-pve
pve-kernel-2.6.24-9-pve: 2.6.24-18
qemu-server: 1.1-24
pve-firmware: 1.0-9
libpve-storage-perl: 1.0-16
vncterm: 0.9-2
vzctl: 3.0.24-1pve4
vzdump: 1.2-8
vzprocps: 2.0.11-1dso2
vzquota: 3.0.11-1


The guest are linux and windows.

Anybody have this experience?

Thanks in advance..

Regards,

H2T
 
Hi Tom,

Thanks for the quick response!
I have followed your advise and upgrade the system to version 2.6.32. And still had the problem! But I figure out part of the problem why the guest doesn't start though! The root filesystem of proxmox is full! Stupid me :-(

So i have moved some guest to another partition and now it start well.

But still I have this error message:
kvm: 3910: cpu0 unhandled wrmsr: 0x198 data 0
kvm: 3910: cpu1 unhandled wrmsr: 0x198 data 0
vmtab107i0d0: no IPv6 routers present

the version that i have now is:
pve-manager: 1.6-8 (pve-manager/1.6/5296)
running kernel: 2.6.32-4-pve
proxmox-ve-2.6.32: 1.6-25
pve-kernel-2.6.32-4-pve: 2.6.32-25
qemu-server: 1.1-24
pve-firmware: 1.0-9
libpve-storage-perl: 1.0-16
vncterm: 0.9-2
vzctl: 3.0.24-1pve4
vzdump: 1.2-8
vzprocps: 2.0.11-1dso2
vzquota: 3.0.11-1
pve-qemu-kvm: 0.13.0-2
ksm-control-daemon: 1.0-4

Any help is appreciated.

Thanks in advance.

Regards,

H2T
 
looks good, so your guest works? I assume yes.

and you can ignore these warning messages.
 
Happened across some detail on that in case anyone's wondering what it is that we're ignoring:

unhandled wrmsr: 0x198 data 0
From the kvm mailing list last week:
Its harmless. KVM included MSR_IA32_PERF_STATUS (0x198) in the msr
save/restore list, but there is no write emulation for it, which
generates the warning.

Before I knew it was a warning I was worried that it was one message per vcpu, but it's just that it doesn't log >10.
 
Last edited:
Happened across some detail on that in case anyone's wondering what it is that we're ignoring:


From the kvm mailing list last week:


Before I knew it was a warning I was worried that it was one message per vcpu, but it's just that it doesn't log >10.

Actually, this is far from harmless. It basically makes proxmox useless for a production environment, and here is why:

1) The soft lockup will hang the system for X seconds (where X is defined in /proc/sys/kernel/hung_task_timeout_secs). Setting this to 0 causes it to recover faster, but it still hangs regularly (about once per 5minutes)

2) The soft lockup (no matter how brief) will disconnect SSH, cause Apache to quit responding and serving a web application, will cause file transfers to fail immediately, and lots of other fun quirks I am discovering.
 
Actually, this is far from harmless. It basically makes proxmox useless for a production environment, and here is why:

1) The soft lockup will hang the system for X seconds (where X is defined in /proc/sys/kernel/hung_task_timeout_secs). Setting this to 0 causes it to recover faster, but it still hangs regularly (about once per 5minutes)

2) The soft lockup (no matter how brief) will disconnect SSH, cause Apache to quit responding and serving a web application, will cause file transfers to fail immediately, and lots of other fun quirks I am discovering.

can you define a simple use case to reproduce this? You tell that every five minutes you got connection breaks and other problems?`

I have these warnings but no problems at all - so how to reproduce?
 
can you define a simple use case to reproduce this? You tell that every five minutes you got connection breaks and other problems?`

I have these warnings but no problems at all - so how to reproduce?

Ah my apologies. Here is how I reproduce it:

1) create a new KVM VM
-storage located on nfs
-image format: raw, disk type:VIRTIO, network cart: virtio, everything else default
2) Install Ubuntu Server 10.10 x64 (all defaults)
3) Boot VM, and after around an hour or so of doing normal tasks (system updates, setting up apache/ftp/ssh/etc), it will start to freeze up

I've also tried using 'disk type' = IDE and 'image format' = vmdk
 
so you are just running doing "normal" things and the system is freezing? not normal and not seen here. There are a lot of users running Ubuntu.

I just installed (again) the same here on a IMS, no issues so far (will run some benchmarks, stress tests).
 
can you test with 10.04?
 
I can confirm the problem exists with 10.04 as well (using a turnkey appliance, i've tried the 'core', 'lamp', and 'torrent server' appliances, all experience the same soft lockup issue)

I also noticed that they occur on the host (proxmox install) as well. When they occur on the host, all VMs quit responding for a while.
 
Hi,

I've several 10.04 VM's and no problems with LOCAL storage. Perhaps this little detail may be the problem:

That could very well be the issue. There aren't many ways to tweak the NFS client from the proxmox web ui, so i'll SSH in and investigate.
 
I'm at a loss as to what is causing this. I see that other users are reporting similar issues.

Could anyone please provide some assistance with helping me pinpoint the cause of these cpu soft lockups?
 
Last edited by a moderator:
I do see the same issue on a Dell PowerEdge R600 with 48GB of RAM and megaraid SAS. I have a mix of KVMs and OpenVZ containers, and have selected virt* in KVM..

But I don't believe that is necessarily the issue here. It seems that roughly every 5 minutes a 20 - 40 second 'lockup' would occur. The host is also locked during this time .. so I'm thinking that this may be a top-down (caused by host) and not a bottom-up problem (caused by VM)?

Only thing left in syslog that I am suspicious of ..

Jan 13 14:25:01 proxmox /USR/SBIN/CRON[10888]: (root) CMD ([ -x /usr/lib/sysstat/sa1 ] && { [ -r "$DEFAULT" ] && . "$DEFAULT" ; [ "$ENABLED" = "true" ] && exec /usr/lib/sysstat/sa1 $SA1_OPTIONS 1 1 ; })
Jan 13 14:30:01 proxmox /USR/SBIN/CRON[11006]: (root) CMD (test -x /usr/lib/atsar/atsa1 && /usr/lib/atsar/atsa1)
Jan 13 14:35:01 proxmox /USR/SBIN/CRON[11098]: (root) CMD ([ -x /usr/lib/sysstat/sa1 ] && { [ -r "$DEFAULT" ] && . "$DEFAULT" ; [ "$ENABLED" = "true" ] && exec /usr/lib/sysstat/sa1 $SA1_OPTIONS 1 1 ; })
Jan 13 14:40:01 proxmox /USR/SBIN/CRON[11297]: (root) CMD (test -x /usr/lib/atsar/atsa1 && /usr/lib/atsar/atsa1)
Jan 13 14:45:01 proxmox /USR/SBIN/CRON[11718]: (root) CMD ([ -x /usr/lib/sysstat/sa1 ] && { [ -r "$DEFAULT" ] && . "$DEFAULT" ; [ "$ENABLED" = "true" ] && exec /usr/lib/sysstat/sa1 $SA1_OPTIONS 1 1 ; })

--

proxmox:/var/log# pveversion -v
pve-manager: 1.4-9 (pve-manager/1.4/4390)
qemu-server: 1.1-6
pve-kernel: 2.6.24-16
pve-qemu-kvm: 0.11.0-2
pve-firmware: 1
vncterm: 0.9-2
vzctl: 3.0.23-1pve3
vzdump: 1.2-3
vzprocps: 2.0.11-1dso2
vzquota: 3.0.11-1

--

proxmox:/var/log# free
total used free shared buffers cached
Mem: 49429012 27183492 22245520 0 150976 19999460
-/+ buffers/cache: 7033056 42395956
Swap: 49283064 2724 49280340

--

proxmox:/var/log# pveperf
CPU BOGOMIPS: 72352.92
REGEX/SECOND: 573266
HD SIZE: 94.49 GB (/dev/pve/root)
BUFFERED READS: 357.28 MB/sec
AVERAGE SEEK TIME: 4.89 ms
FSYNCS/SECOND: 2506.30
DNS EXT: 49.57 ms
DNS INT: 33.17 ms (internal.com)


I am on an old release -- 2.6.24-8-pve. I will upgrade and test again..

Jason
 
Last edited by a moderator:
Hi,

I have R610 servers from Dell running Proxmox, install in Jully 2009.

Since the last version (PVE 1.7 running 2.6.35) i have the following messages only when i start a VM :

kvm: 15829: cpu0 unhandled wrmsr: 0x198 data 0
kvm: 15829: cpu1 unhandled wrmsr: 0x198 data 0
kvm: 15829: cpu2 unhandled wrmsr: 0x198 data 0
kvm: 15829: cpu3 unhandled wrmsr: 0x198 data 0

Sometimes messages like:

CE: hpet increased min_delta_ns to 7500 nsec
hrtimer: interrupt took 46965 ns

Some of my VN running 2008 R2 loses network connectivity. I must deactivate / reactivate the network connexion or restart the guest.
I'm currently testing with e1000 in place of virtio

Time drift, resolved for the moment with trick in the thread :
http://forum.proxmox.com/threads/5112-time-drift-in-all-guests

Host hardware :

2 x Intel Xeon E5520
4 x Broadcom LOM NetXtreme II BCM5709

Guest hdd on iSCSI.
 
Hi,
Since the last version (PVE 1.7 running 2.6.35) i have the following messages only when i start a VM :

kvm: 15829: cpu0 unhandled wrmsr: 0x198 data 0
kvm: 15829: cpu1 unhandled wrmsr: 0x198 data 0
kvm: 15829: cpu2 unhandled wrmsr: 0x198 data 0
kvm: 15829: cpu3 unhandled wrmsr: 0x198 data 0

Sometimes messages like:

CE: hpet increased min_delta_ns to 7500 nsec
hrtimer: interrupt took 46965 ns

Some of my VN running 2008 R2 loses network connectivity. I must deactivate / reactivate the network connexion or restart the guest.
I'm currently testing with e1000 in place of virtio

We have the same problem (losing network connectivity) since long before 1.7 with all Debian Lenny & Squeeze guests (default Installations / custom Java applications). Usually a /etc/init.d/networking restart fixes the problem but sooner or later it will happen again and again. Sometimes in such short intervals that it really becomes annoying. No messages in dmesg on either host or guest. The unhandled instruction warnings do not seem to be related though.

This happens on a DL380G5 (1xQuad E5420, 12GB RAM, 146-SAS HP) set up like so:

pve-manager: 1.7-10 (pve-manager/1.7/5323)
running kernel: 2.6.35-1-pve
proxmox-ve-2.6.35: 1.7-9
pve-kernel-2.6.35-1-pve: 2.6.35-9
qemu-server: 1.1-28
pve-firmware: 1.0-10
libpve-storage-perl: 1.0-16
vncterm: 0.9-2
vzctl: 3.0.24-1pve4
vzdump: 1.2-10
vzprocps: 2.0.11-1dso2
vzquota: 3.0.11-1
pve-qemu-kvm: 0.13.0-3
ksm-control-daemon: 1.0-4
As well as on a DL360G6 (2xQuad E5504, 24GB RAM, 8x500GB SAS-HP) set up like so:

pve-manager: 1.7-10 (pve-manager/1.7/5323)
running kernel: 2.6.35-1-pve
proxmox-ve-2.6.35: 1.7-9
pve-kernel-2.6.32-4-pve: 2.6.32-30
pve-kernel-2.6.35-1-pve: 2.6.35-9
qemu-server: 1.1-28
pve-firmware: 1.0-10
libpve-storage-perl: 1.0-16
vncterm: 0.9-2
vzctl: 3.0.24-1pve4
vzdump: 1.2-10
vzprocps: 2.0.11-1dso2
vzquota: 3.0.11-1
pve-qemu-kvm: 0.13.0-3
ksm-control-daemon: 1.0-4
This issue has been reported before and wasn't solved nor could the Proxmox stuff reproduce this. On our side this problem has been around for a very long time but for some strange reason doesn't happen "as often" to the VMs we do use in production only on our testing and staging system where we virtualize, bad enough... :(
 
We upgrade to 1.8, and we have same issue (before upgrade we have no problem).

After this:
kvm: 3127: cpu0 unhandled wrmsr: 0x198 data 0
kvm: 3127: cpu1 unhandled wrmsr: 0x198 data 0
We got:
BUG: unable to handle kernel paging request at 0000000041cb7793
IP: [<ffffffffa02894a9>] kvm_set_irq+0x5e/0x109 [kvm]
PGD 755d96067 PUD 765897067 PMD 0
Oops: 0000 [#1] SMP
last sysfs file: /sys/kernel/mm/ksm/run
CPU 0
Modules linked in: nfs lockd fscache nfs_acl auth_rpcgss sunrpc xt_newmac xt_multiport sha1_generic drbd lru_cache vhost_net kvm_intel kvm crc32c ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi 8021q garp bridge stp xt_tcpudp iptable_filter ip_tables x_tables mptctl mptbase snd_pcm snd_timer snd soundcore snd_page_alloc i7core_edac tpm_tis edac_core pcspkr tpm serio_raw tpm_bios ahci libahci cciss igb dca [last unloaded: scsi_wait_scan]
Pid: 51, comm: events/0 Not tainted 2.6.35-1-pve #1 /ProLiant DL160 G6
RIP: 0010:[<ffffffffa02894a9>] [<ffffffffa02894a9>] kvm_set_irq+0x5e/0x109 [kvm]
RSP: 0018:ffff88042e1d7d20 EFLAGS: 00010246
RAX: 0000000041cb535b RBX: ffff88041ddc8960 RCX: 0000000000000001
RDX: 0000000021438000 RSI: 0000000000000000 RDI: 0000000041cb535b
RBP: ffff88042e1d7e00 R08: ffff880001e156c0 R09: 0000000000000000
R10: 0000000000000000 R11: ffff88082c9b9b50 R12: 0000000021438000
R13: 0000000000000001 R14: 0000000000000000 R15: ffff88042e1d8000
FS: 0000000000000000(0000) GS:ffff880001e00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000041cb7793 CR3: 000000079871d000 CR4: 00000000000026e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process events/0 (pid: 51, threadinfo ffff88042e1d6000, task ffff88042e1d8000)
Stack:
ffff880001e156c0 ffff88042e1d8000 ffff88042e1d7d70 0000000041cb535b
<0> ffff880001e156c0 ffff88082b848000 ffff880001e156c0 0000000000000000
<0> ffff88082b848000 ffff880001e156c0 ffff88042e1d7e20 ffffffff814b12d9
Call Trace:
[<ffffffff814b12d9>] ? schedule+0x593/0x5f8
[<ffffffff814b2f0e>] ? common_interrupt+0xe/0x13
[<ffffffffa028a0af>] irqfd_inject+0x25/0x3a [kvm]
[<ffffffff81066b2f>] worker_thread+0x1a9/0x24d
[<ffffffff814b12d9>] ? schedule+0x593/0x5f8
[<ffffffffa028a08a>] ? irqfd_inject+0x0/0x3a [kvm]
[<ffffffff8106a86c>] ? autoremove_wake_function+0x0/0x3d
[<ffffffff81066986>] ? worker_thread+0x0/0x24d
[<ffffffff8106a390>] kthread+0x82/0x8a
[<ffffffff8100ab24>] kernel_thread_helper+0x4/0x10
[<ffffffff8106a30e>] ? kthread+0x0/0x8a
[<ffffffff8100ab20>] ? kernel_thread_helper+0x0/0x10
Code: 8b 1d dc 90 02 00 48 85 db 74 19 48 8b 7b 08 44 89 f1 44 89 ea 44 89 e6 ff 13 48 83 c3 10 48 83 3b 00 eb e5 48 8b 85 38 ff ff ff <48> 8b 90 38 24 00 00 44 3b a2 28 01 00 00 72 0b 31 db 41 83 cc
RIP [<ffffffffa02894a9>] kvm_set_irq+0x5e/0x109 [kvm]
RSP <ffff88042e1d7d20>
CR2: 0000000041cb7793
---[ end trace c7186ceef569aca0 ]---

Versions (same issue with 2.5.35-10 kernel):
pve-manager: 1.8-15 (pve-manager/1.8/)
running kernel: 2.6.35-1-pve
proxmox-ve-2.6.35: 1.8-10
pve-kernel-2.6.35-1-pve: 2.6.35-9
qemu-server: 1.1-30
pve-firmware: 1.0-11
libpve-storage-perl: 1.0-17
vncterm: 0.9-2
vzctl: 3.0.24-1pve4
vzdump: 1.2-11
vzprocps: 2.0.11-2
vzquota: 3.0.11-1
pve-qemu-kvm: 0.14.0-3
ksm-control-daemon: 1.0-5

All guest goes inaccessible.
 
Same issues here (host and guests locking up), wondering if anyone has found a solution to this?

Code:
kvm: 2607: cpu0 unhandled wrmsr: 0x198 data 0
kvm: 2607: cpu1 unhandled wrmsr: 0x198 data 0
kvm: 2607: cpu0 unhandled wrmsr: 0x198 data 0
kvm: 2607: cpu1 unhandled wrmsr: 0x198 data 0
kvm: 20636: cpu0 unhandled wrmsr: 0x198 data 0
kvm: 20636: cpu1 unhandled wrmsr: 0x198 data 0

Code:
pve-manager: 1.8-13 (pve-manager/1.8/5696)
running kernel: 2.6.35-1-pve
proxmox-ve-2.6.32: 1.8-31
pve-kernel-2.6.32-4-pve: 2.6.32-31
pve-kernel-2.6.35-1-pve: 2.6.35-10
qemu-server: 1.1-30
pve-firmware: 1.0-10
libpve-storage-perl: 1.0-16
vncterm: 0.9-2
vzctl: 3.0.24-1pve4
vzdump: 1.2-11
vzprocps: 2.0.11-1dso2
vzquota: 3.0.11-1
pve-qemu-kvm: 0.14.0-2
ksm-control-daemon: 1.0-5

Extremely disappointing because this is being used in production and will need to be moved to bare metal :(
 
we so do not see these lookups here. do you see this also with 2.6.32?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!