KVM causes Kernel Oops in "find_get_page"

davidg

New Member
Nov 5, 2009
4
0
1
I received the following kernel general protection fault on Proxmox 1.4 (stock ISO install) on my Quad Intel Q8400 while installing Vista inside KVM. Another instance of KVM was running memtest86. [The latter was to generate CPU load, and not an attempt to test memory. :) ] The oops hit after around 30 minutes of uptime.

Code:
general protection fault: 0000 [1] PREEMPT SMP
CPU: 3
Modules linked in: dm_snapshot kvm_intel kvm vzethdev vznetdev simfs vzrst vzcpt                                               tun vzdquota vzmon vzdev xt_tcpudp xt_length ipt_ttl xt_tcpmss xt_TCPMSS iptabl                                              e_mangle iptable_filter xt_multiport xt_limit ipt_tos ipt_REJECT ip_tables x_tab                                              les ipv6 ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libi                                              scsi scsi_transport_iscsi bridge snd_hda_intel snd_pcm snd_timer snd_page_alloc                                               snd_hwdep snd r8169 soundcore thermal mii evdev floppy parport_pc parport pcspkr                                               button processor intel_agp sg scsi_wait_scan virtio_blk virtio dm_mod usbhid hi                                              d usb_storage libusual sd_mod sr_mod ide_disk ide_generic ide_cd cdrom ide_core                                               shpchp pci_hotplug uhci_hcd ehci_hcd usbcore iTCO_wdt iTCO_vendor_support ata_pi                                              ix pata_acpi ata_generic libata scsi_mod i2c_i801 i2c_core isofs msdos fat
Pid: 4541, comm: kvm Not tainted 2.6.24-8-pve #1 ovz005
RIP: 0010:[<ffffffff8029b7bb>]  [<ffffffff8029b7bb>] find_get_page+0x3b/0x80
RSP: 0018:ffff810020c97c98  EFLAGS: 00010082
RAX: dfff81011bb76140 RBX: dfff81011bb76140 RCX: 0000000000000000
RDX: ffff810092756690 RSI: 00000000001415eb RDI: 0000000000000000
RBP: ffff81011287d9f0 R08: 0000000000000000 R09: ffffffff8029b3f0
R10: 0000000000ffffff R11: 0000000000000001 R12: 00000000001415eb
R13: 0000000000000020 R14: 00000000001415eb R15: ffff810058140100
FS:  0000000042e7b950(0063) GS:ffff81011b402f80(0000) knlGS:0000000000000000
CS:  0010 DS: 002b ES: 002b CR0: 000000008005003b
CR2: 0000000087ec1000 CR3: 0000000058104000 CR4: 00000000000026e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process kvm (pid: 4541, veid=0, threadinfo ffff810020c96000, task ffff8101190e00                                              00)
Stack:  ffff81011bb76100 ffff81011287d9d8 00000000001415eb ffffffff8029c7d8
 ffff810020c97da0 ffffffff8029b3f0 ffff810020c97d58 ffff810020c97e48
 ffff810058140168 ffff81011287d8b0 000000000014160b 00000000001415ea
Call Trace:
 [<ffffffff8029c7d8>] do_generic_mapping_read+0x118/0x410
 [<ffffffff8029b3f0>] file_read_actor+0x0/0x190
 [<ffffffff8029e2c6>] generic_file_aio_read+0x116/0x1d0
 [<ffffffff802d1e53>] do_sync_read+0xe3/0x130
 [<ffffffff8025c230>] autoremove_wake_function+0x0/0x30
 [<ffffffff80251281>] group_send_sig_info+0x91/0x130
 [<ffffffff80233dca>] set_next_entity+0x3a/0x80
 [<ffffffff8026166a>] getnstimeofday+0x3a/0xb0
 [<ffffffff80260052>] ktime_get_ts+0x22/0x60
 [<ffffffff802d2ea8>] vfs_read+0xc8/0x180
 [<ffffffff802d310b>] sys_pread64+0x1ab/0x1c0
 [<ffffffff8020c69e>] system_call+0x7e/0x83


Code: 48 8b 00 48 89 da 25 00 40 02 00 48 3d 00 40 02 00 74 22 f0
RIP  [<ffffffff8029b7bb>] find_get_page+0x3b/0x80
 RSP <ffff810020c97c98>
---[ end trace 9f4d5c2ebc9402aa ]---
note: kvm[4541] exited with preempt_count 1

Any ideas what the problem may be, or what additional information I could provide to help track it down?
 
Did a reset, brought up the VMs again, and after about 10 minutes got the following:

Code:
------------[ cut here ]------------
kernel BUG at /home/dietmar/svn-devel/pve-kernel-2.6.24/kvm-kmod-2.6.30.1/x86/mm                                              u.c:640!
invalid opcode: 0000 [1] PREEMPT SMP
CPU: 3
Modules linked in: kvm_intel kvm vzethdev vznetdev simfs vzrst vzcpt tun vzdquot                                              a vzmon vzdev xt_tcpudp xt_length ipt_ttl xt_tcpmss xt_TCPMSS iptable_mangle ipt                                              able_filter xt_multiport xt_limit ipt_tos ipt_REJECT ip_tables x_tables ipv6 ib_                                              iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi scsi_tr                                              ansport_iscsi bridge snd_hda_intel snd_pcm snd_timer snd_page_alloc snd_hwdep pa                                              rport_pc snd parport thermal evdev soundcore floppy pcspkr intel_agp r8169 mii b                                              utton processor sg scsi_wait_scan virtio_blk virtio dm_mod usbhid hid usb_storag                                              e libusual sd_mod sr_mod ide_disk ide_generic ide_cd cdrom ide_core shpchp pci_h                                              otplug uhci_hcd ehci_hcd usbcore iTCO_wdt iTCO_vendor_support ata_piix pata_acpi                                               ata_generic libata scsi_mod i2c_i801 i2c_core isofs msdos fat
Pid: 3705, comm: kvm Not tainted 2.6.24-8-pve #1 ovz005
RIP: 0010:[<ffffffff884d6a50>]  [<ffffffff884d6a50>] :kvm:rmap_remove+0x160/0x22                                              0
RSP: 0018:ffff810112183d18  EFLAGS: 00010246
RAX: 0000000000000000 RBX: 00000000000beec5 RCX: 0000000000000010
RDX: 0000000000000000 RSI: ffff8101124f8040 RDI: ffff8100938ee7f8
RBP: ffff8100970ef098 R08: ffffc200020d7ab0 R09: ffff8100938ee7f8
R10: ffff810000000000 R11: 00000000970ef000 R12: ffff81009706d4d0
R13: ffff8101124f8000 R14: ffff810119dc16a8 R15: ffffffff806ea690
FS:  0000000040896950(0000) GS:ffff81011b402f80(0000) knlGS:0000000000000000
CS:  0010 DS: 002b ES: 002b CR0: 000000008005003b
CR2: 000000000032e328 CR3: 000000011750a000 CR4: 00000000000026e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process kvm (pid: 3705, veid=0, threadinfo ffff810112182000, task ffff81011a52a8                                              e0)
Stack:  0000000001e97080 ffff8100970ef098 0000000001e97098 ffff810119dc0000
 ffff810114c14000 ffffffff884db69d ffff8101124f8030 ffffffff804c8d01
 ffff810119dc0000 ffff810119dc0000 0000000000000001 ffffffff884d5faa
Call Trace:
 [<ffffffff884db69d>] :kvm:paging64_invlpg+0x13d/0x260
 [<ffffffff804c8d01>] __down_read+0xb1/0xb3
 [<ffffffff884d5faa>] :kvm:kvm_mmu_invlpg+0xa/0x20
 [<ffffffff884ff5f4>] :kvm_intel:handle_invlpg+0x14/0x30
 [<ffffffff884d2d5e>] :kvm:kvm_arch_vcpu_ioctl_run+0x37e/0xb60
 [<ffffffff884ca99b>] :kvm:kvm_vcpu_ioctl+0x2fb/0x5b0
 [<ffffffff802e18af>] do_ioctl+0x2f/0xb0
 [<ffffffff802e1bbb>] vfs_ioctl+0x28b/0x300
 [<ffffffff802d3519>] generic_file_llseek+0x69/0xd0
 [<ffffffff802e1c79>] sys_ioctl+0x49/0x80
 [<ffffffff8020c69e>] system_call+0x7e/0x83


Code: 0f 0b eb fe 83 f8 02 7f d8 48 89 fa b9 03 00 00 00 48 8b 72
RIP  [<ffffffff884d6a50>] :kvm:rmap_remove+0x160/0x220
 RSP <ffff810112183d18>
---[ end trace 404556113bbc468e ]---
note: kvm[3705] exited with preempt_count 1

I am going to run a memtest on the machine overnight to rule out bad RAM; will get back to you in the morning with results.
 
Okay, running memtest+ on the machine for 8 hours didn't show any problems, so this might be a legitimate bug.

I have now seen a third oops running only memtest inside KVM, so it may not be Vista-related at all. (I did not get a copy of the oops, unfortunately).

Is anybody else able to reproduce this problem by running memtest inside KVM for an hour-or-so? I will perform a fresh install and see if it still comes up. Unfortunately, I lose access to the hardware on Monday (which needs to go into production for something else) so will not be able to test after then.
 
I installed Proxmox VE 1.4 onto a different machine with identical hardware, and migrated the VMs across to the new hardware. After 24 hours I still haven't seen any problems on the new hardware, while on the old hardware, it didn't last more than about 30 minutes before oops'ing.

I am going to write this one off to bad hardware and/or a corrupt install.

Sorry for wasting everyone's time.