Proxmox - AMD Ryzen 1700 - Nested

gepalprox

New Member
Aug 28, 2017
4
0
1
59
Hi all,

I tried many times to build a VM to be used for nested virtualization.
But it always crashed.

Is there somebody who successed to do it with an AMD Ryzen ?

My Hardware :
Asrock B350M Pro ( with the very latest bios, updated saturday)
AMD Ryzen 1700
64 Gb RAM

I add the "args: -cpu host,+svm" option in the VM config file :

cat 108.conf
args: -cpu host,+svm
bootdisk: scsi0
cores: 4
cpu: host
ide2: hitachi:iso/ubuntu-17.04-server-amd64.iso,media=cdrom
memory: 4096
name: test1704108
net0: e1000=9E:FE:B6:29:49:54,bridge=vmbr0
numa: 0
ostype: l26
scsi0: hitachi:108/vm-108-disk-2.qcow2,size=32G
scsihw: virtio-scsi-pci
smbios1: uuid=721c81cc-c992-4094-a1e9-c55991749538
sockets: 1
unused0: hitachi:108/vm-108-disk-1.qcow2

I used th 1704 ubuntu server distribution to support the Ryzen 7.

During installation, it crashed, Syslog :

Aug 28 21:02:54 1700amd01 kernel: [ 1283.609344] INFO: rcu_sched detected stalls on CPUs/tasks:
Aug 28 21:02:54 1700amd01 kernel: [ 1283.609372] 7-...: (1 GPs behind) idle=db7/140000000000000/0 softirq=2328/2328 fqs=10373
Aug 28 21:02:54 1700amd01 kernel: [ 1283.609405] (detected by 2, t=60304 jiffies, g=7447, c=7446, q=43099)
Aug 28 21:02:54 1700amd01 kernel: [ 1283.609432] Task dump for CPU 7:
Aug 28 21:02:54 1700amd01 kernel: [ 1283.609433] kvm R running task 0 2349 1 0x00000008
Aug 28 21:02:54 1700amd01 kernel: [ 1283.609436] Call Trace:
Aug 28 21:02:54 1700amd01 kernel: [ 1283.609441] __schedule+0x23b/0x6f0
Aug 28 21:02:54 1700amd01 kernel: [ 1283.609443] ? __check_object_size+0x100/0x1d7
Aug 28 21:02:54 1700amd01 kernel: [ 1283.609455] ? kvm_write_guest_offset_cached+0x98/0xf0 [kvm]
Aug 28 21:02:54 1700amd01 kernel: [ 1283.609457] ? __delay+0xf/0x20
Aug 28 21:02:54 1700amd01 kernel: [ 1283.609471] ? wait_lapic_expire+0xf7/0x160 [kvm]
Aug 28 21:02:54 1700amd01 kernel: [ 1283.609483] ? kvm_arch_vcpu_ioctl_run+0x679/0x15d0 [kvm]
Aug 28 21:02:54 1700amd01 kernel: [ 1283.609486] ? svm_vcpu_load+0xdc/0x110 [kvm_amd]
Aug 28 21:02:54 1700amd01 kernel: [ 1283.609495] ? kvm_vcpu_ioctl+0x339/0x620 [kvm]
Aug 28 21:02:54 1700amd01 kernel: [ 1283.609497] ? do_vfs_ioctl+0xa3/0x610
Aug 28 21:02:54 1700amd01 kernel: [ 1283.609499] ? SyS_futex+0x85/0x180
Aug 28 21:02:54 1700amd01 kernel: [ 1283.609501] ? SyS_ioctl+0x79/0x90
Aug 28 21:02:54 1700amd01 kernel: [ 1283.609502] ? entry_SYSCALL_64_fastpath+0x1e/0xad
Aug 28 21:02:59 1700amd01 kernel: [ 1288.397102] NMI watchdog: BUG: soft lockup - CPU#9 stuck for 22s! [sshd:3364]
Aug 28 21:02:59 1700amd01 kernel: [ 1288.397134] Modules linked in: tcp_diag inet_diag ip_set ip6table_filter ip6_tables iptable_filter softdog nfnetlink_log nfnetlink dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio libcrc32c amdkfd amd_iommu_v2 radeon edac_mce_amd ttm edac_core drm_kms_helper drm kvm_amd i2c_algo_bit snd_hda_codec_realtek kvm fb_sys_fops snd_hda_codec_generic syscopyarea sysfillrect snd_hda_codec_hdmi sysimgblt irqbypass snd_hda_intel crct10dif_pclmul snd_hda_codec crc32_pclmul snd_hda_core ghash_clmulni_intel snd_hwdep ccp pcbc aesni_intel snd_pcm aes_x86_64 shpchp crypto_simd snd_timer glue_helper snd cryptd soundcore input_leds serio_raw pcspkr wmi mac_hid 8250_dw vhost_net vhost macvtap macvlan ib_iser rdma_cm sunrpc iw_cm ib_cm ib_core configfs iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip_tables
Aug 28 21:02:59 1700amd01 kernel: [ 1288.397147] x_tables autofs4 btrfs xor raid6_pq i2c_piix4 e1000e(O) ptp pps_core r8169 mii ahci libahci fjes gpio_amdpt gpio_generic i2c_designware_platform i2c_designware_core
Aug 28 21:02:59 1700amd01 kernel: [ 1288.397152] CPU: 9 PID: 3364 Comm: sshd Tainted: G O L 4.10.15-1-pve #1
Aug 28 21:02:59 1700amd01 kernel: [ 1288.397152] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./AB350M Pro4, BIOS P3.00 07/13/2017
Aug 28 21:02:59 1700amd01 kernel: [ 1288.397153] task: ffff92b5b0048000 task.stack: ffffb2fccfb7c000
Aug 28 21:02:59 1700amd01 kernel: [ 1288.397154] RIP: 0010:smp_call_function_many+0x1ec/0x250
Aug 28 21:02:59 1700amd01 kernel: [ 1288.397154] RSP: 0018:ffffb2fccfb7fbd8 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff10
Aug 28 21:02:59 1700amd01 kernel: [ 1288.397155] RAX: 0000000000000003 RBX: 0000000000000010 RCX: 0000000000000006
Aug 28 21:02:59 1700amd01 kernel: [ 1288.397156] RDX: ffff92b5be79da60 RSI: 0000000000000010 RDI: ffff92b5be021098
Aug 28 21:02:59 1700amd01 kernel: [ 1288.397156] RBP: ffffb2fccfb7fc10 R08: ffffffffffffffc0 R09: 000000000000fdff
Aug 28 21:02:59 1700amd01 kernel: [ 1288.397156] R10: ffffd419dfdf9140 R11: 0000000000000010 R12: ffffffff87e75d20
Aug 28 21:02:59 1700amd01 kernel: [ 1288.397157] R13: 0000000000000000 R14: ffff92b5be85a380 R15: 000000000001a340
Aug 28 21:02:59 1700amd01 kernel: [ 1288.397158] FS: 00007fd703f26d40(0000) GS:ffff92b5be840000(0000) knlGS:0000000000000000
Aug 28 21:02:59 1700amd01 kernel: [ 1288.397158] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Aug 28 21:02:59 1700amd01 kernel: [ 1288.397159] CR2: 00007fd7020a6330 CR3: 00000007f0cf8000 CR4: 00000000003406e0
Aug 28 21:02:59 1700amd01 kernel: [ 1288.397159] Call Trace:
Aug 28 21:02:59 1700amd01 kernel: [ 1288.397160] ? leave_mm+0xc0/0xc0
Aug 28 21:02:59 1700amd01 kernel: [ 1288.397161] on_each_cpu+0x2d/0x60
Aug 28 21:02:59 1700amd01 kernel: [ 1288.397162] flush_tlb_kernel_range+0x4b/0x80
Aug 28 21:02:59 1700amd01 kernel: [ 1288.397163] __purge_vmap_area_lazy+0x50/0xc0
Aug 28 21:02:59 1700amd01 kernel: [ 1288.397164] vm_unmap_aliases+0x113/0x150
Aug 28 21:02:59 1700amd01 kernel: [ 1288.397165] change_page_attr_set_clr+0xf4/0x550
Aug 28 21:02:59 1700amd01 kernel: [ 1288.397167] ? bpf_convert_filter+0x66/0x950
Aug 28 21:02:59 1700amd01 kernel: [ 1288.397168] set_memory_ro+0x2f/0x40
Aug 28 21:02:59 1700amd01 kernel: [ 1288.397168] bpf_prog_select_runtime+0x2a/0xd0
Aug 28 21:02:59 1700amd01 kernel: [ 1288.397169] bpf_prepare_filter+0x374/0x3e0
Aug 28 21:02:59 1700amd01 kernel: [ 1288.397171] bpf_prog_create_from_user+0xbc/0x120
Aug 28 21:02:59 1700amd01 kernel: [ 1288.397172] ? watchdog_nmi_disable+0x70/0x70
Aug 28 21:02:59 1700amd01 kernel: [ 1288.397173] do_seccomp+0x124/0x5f0
Aug 28 21:02:59 1700amd01 kernel: [ 1288.397174] prctl_set_seccomp+0x24/0x50
Aug 28 21:02:59 1700amd01 kernel: [ 1288.397175] SyS_prctl+0x119/0x490
Aug 28 21:02:59 1700amd01 kernel: [ 1288.397176] entry_SYSCALL_64_fastpath+0x1e/0xad
Aug 28 21:02:59 1700amd01 kernel: [ 1288.397177] RIP: 0033:0x7fd7020a633a
Aug 28 21:02:59 1700amd01 kernel: [ 1288.397177] RSP: 002b:00007fffae2d68c8 EFLAGS: 00000246 ORIG_RAX: 000000000000009d
Aug 28 21:02:59 1700amd01 kernel: [ 1288.397178] RAX: ffffffffffffffda RBX: 000055c5eec6ed40 RCX: 00007fd7020a633a
Aug 28 21:02:59 1700amd01 kernel: [ 1288.397179] RDX: 000055c5eea3ff50 RSI: 0000000000000002 RDI: 0000000000000016
Aug 28 21:02:59 1700amd01 kernel: [ 1288.397179] RBP: 000055c5eec6f9a0 R08: 0000000000000000 R09: 0000000000000005
Aug 28 21:02:59 1700amd01 kernel: [ 1288.397179] R10: 00007fd7020a633a R11: 0000000000000246 R12: 0000000000000000
Aug 28 21:02:59 1700amd01 kernel: [ 1288.397180] R13: 0000000000000028 R14: 0000000000000000 R15: 00007fffae2d6d10
Aug 28 21:02:59 1700amd01 kernel: [ 1288.397180] Code: 63 d2 e8 78 05 35 00 3b 05 66 35 e8 00 89 c1 0f 8d 9a fe ff ff 48 98 49 8b 16 48 03 14 c5 e0 13 b4 88 8b 42 18 a8 01 74 09 f3 90 <8b> 42 18 a8 01 75 f7 eb bd 0f b6 4d d0 4c 89 ea 4c 89 e6 44 89

It succeed if I choose "kvm64" instead of "host" in CPU type, but nested virtualiztion is not possible.

Thanks by advance.
Regards
 
I want to add some information :
Version of Proxmox : pve-manager/5.0-23/af4267bf (running kernel: 4.10.15-1-pve)

I used a HP 360T ethernet card (dual port) instead of the realtek onboard
I tested with E1000 and Virtio driver for the network.
 
Thank you very much for your answer.

After update, same result. So I passed the "seg fault" test github.com : ryzen-test.
And I have it :
Aug 29 20:56:16 test170452 kernel: [ 475.327733] do_trap: 15 callbacks suppressed
Aug 29 20:56:16 test170452 kernel: [ 475.327738] traps: bash[29871] trap invalid opcode ip:465c64 sp:7ffeb8b571d0 error:0
Aug 29 20:56:16 test170452 kernel: [ 475.327742] in bash[400000+100000]
Aug 29 20:57:08 test170452 kernel: [ 526.769015] bash[15848]: segfault at 7b ip 0000000000435d7e sp 00007ffedeac5b90 error 6 in bash[400000+100000]


Best Regards
 
This can be triggered by a number of things, this is only an indication that your CPU suffers the bug. Try to deactivate all the power saving features of the motherboard, it helped in my case.
 
Thank you very much for your answer.

After update, same result. So I passed the "seg fault" test github.com : ryzen-test.
And I have it :
Aug 29 20:56:16 test170452 kernel: [ 475.327733] do_trap: 15 callbacks suppressed
Aug 29 20:56:16 test170452 kernel: [ 475.327738] traps: bash[29871] trap invalid opcode ip:465c64 sp:7ffeb8b571d0 error:0
Aug 29 20:56:16 test170452 kernel: [ 475.327742] in bash[400000+100000]
Aug 29 20:57:08 test170452 kernel: [ 526.769015] bash[15848]: segfault at 7b ip 0000000000435d7e sp 00007ffedeac5b90 error 6 in bash[400000+100000]


Best Regards

To my understanding, this bug should only occur when compiling on all cores right?
 
Hi all,

Thanks people who tried to help me.

I found a solution to my issue but not the root cause

What I tried :
Remove my HP Network card and use only motherboard NIC
Reinstall from the last proxmox release August ( I used the july base one updated)
remove RAM
play wit BIOS option (enable/disable SMT...)
But without a better result.

I installed the Debian Stretch choosing "Install" at the first screen.
Then, on the screen "Software Selection", by mistake (ubuntu primary choice is disabled), I let "debian desktop environment" enable, in addition to the classics "SSH" "system tools".

During the installation, I have a warning message that the Realtek firmware can't be installed, I must use "non free module".

At the end of installation, I installed the proxmox package and again have a message about Realtek module (rtl).
I tried to install it but I had a message advice me to install "pve-firmware" ; what I did.

And bingo, it works, I can create VM choosing "CPU=host".


Also I made a last try to clean the situation :
Difference with previous installation :
I disable "debian desktop environment"
I try to install realtek firmware module (non free) before updating with proxmox package, but I still have the error message (realtek firmware).
I also installed the "pve-firmware".
But the result was the same as the proxmox base installation, same issues.

So I play again the working procedure, and from this time it works fine (Now my proxmox server is a Debian base with lots of unuseful software).


My conclusion
I'm not sure that the realtek message has an importance.
It seems that something is missing in proxmox base distribution but installed with full "debian desktop environment".

My graphic card is a simple Radeon 6450.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!