Problems running Docker in VM (segfaults/gpf's)

mihalski

New Member
May 22, 2017
17
0
1
47
I've been running Docker in a VM for a couple of months and recently upgraded from 4.4 to 5.0 (new install as my upgrade screwed up).

I've created a new docker inside a VM and over the last 2 days as I've been expanding the configuration of my home automation docker running Home Assistant I've come across CONSTANT general protection and segfaults.. Is there some setting or switch for the VM I should be using when using docker?

Here are some examples:
Code:
[  185.839153] traps: python3[4877] general protection ip:7fe0bbe0ca03 sp:7fe0bcefed60 error:0
[  185.839161]  in libsqlite3.so.0.8.6[7fe0bbdeb000+2c6000]
[35901.326471] traps: python3[5636] general protection ip:7f8bc45c3af1 sp:7f8bc10160b8 error:0
[35901.326479]  in libstdc++.so.6.0.22[7f8bc44de000+340000]
[35951.982147] python3[6343]: segfault at 0 ip 00007f7006e73af1 sp 00007f7003a6f0b8 error 6 in libstdc++.so.6.0.22[7f7006d8e000+340000]
It has become literally unusable :/

Would really appreciate if someone could shed some light on this.
 
This problem only exists inside the VM, so that is why I am here trying to find out if there are some extra options I should be using when running docker inside a VM. I've found mention on the web that nested virtualisation (though I don't know if docker qualifies) requires additional settings to be modified.
 
How about these messages in the PVE system logs? Could these indicate something is amiss?

Code:
[  169.342702] kvm: SMP vm created on host with unstable TSC; guest TSC will not be reliable
[  272.283679] perf: interrupt took too long (2505 > 2500), lowering kernel.perf_event_max_sample_rate to 79750
[  417.722629] perf: interrupt took too long (3144 > 3131), lowering kernel.perf_event_max_sample_rate to 63500
[  747.749370] perf: interrupt took too long (3936 > 3930), lowering kernel.perf_event_max_sample_rate to 50750
[66542.044108] CE: hpet increased min_delta_ns to 11521 nsec

With many other people not running into this problem with identical docker containers I'm not really sure what to look at.
 
So.. Debian with Docker doesn't work.. Ubuntu with Docker doesn't work either as can be seen below:

Code:
[  204.028007] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [kworker/0:1:35]
[  204.028007] Modules linked in: ipt_MASQUERADE nf_nat_masquerade_ipv4 nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter xt_conntrack nf_nat nf_conntrack br_netfilter bridge stp llc aufs bnep ppdev btusb btrtl btbcm btintel bluetooth input_leds joydev cdc_acm serio_raw shpchp i2c_piix4 qemu_fw_cfg parport_pc parport mac_hid ib_iser rdma_cm iw_cm ib_cm ib_core configfs iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip_tables x_tables autofs4 btrfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear hid_generic usbhid psmouse hid virtio_net virtio_scsi pata_acpi floppy
[  204.028007] CPU: 0 PID: 35 Comm: kworker/0:1 Not tainted 4.10.0-19-generic #21-Ubuntu
[  204.028007] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014
[  204.028007] Workqueue: events netstamp_clear
[  204.028007] task: ffff885479046a40 task.stack: ffffb7c880750000
[  204.028007] RIP: 0010:smp_call_function_single+0xd1/0x130
[  204.028007] RSP: 0018:ffffb7c880753c88 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff10
[  204.028007] RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000830
[  204.028007] RDX: 0000000000000001 RSI: 00000000000000fb RDI: 0000000000000830
[  204.028007] RBP: ffffb7c880753cd0 R08: fffffffffffffffe R09: 0000000000000003
[  204.028007] R10: 000000000000e93c R11: 000000000000d000 R12: ffffffffb8c34bf0
[  204.028007] R13: 0000000000000000 R14: 0000000000000001 R15: ffffffffb9b92bc0
[  204.028007] FS:  0000000000000000(0000) GS:ffff88547fc00000(0000) knlGS:0000000000000000
[  204.028007] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  204.028007] CR2: 00007ffe672ed000 CR3: 00000001389a3000 CR4: 00000000000006f0
[  204.028007] Call Trace:
[  204.028007]  ? arch_unregister_cpu+0x30/0x30
[  204.028007]  ? arch_unregister_cpu+0x30/0x30
[  204.028007]  smp_call_function_many+0x207/0x250
[  204.028007]  ? netif_receive_skb_internal+0x20/0xa0
[  204.028007]  ? arch_unregister_cpu+0x30/0x30
[  204.028007]  ? netif_receive_skb_internal+0x21/0xa0
[  204.028007]  on_each_cpu+0x2d/0x60
[  204.028007]  ? netif_receive_skb_internal+0x20/0xa0
[  204.028007]  text_poke_bp+0x6a/0xf0
[  204.028007]  ? netif_receive_skb_internal+0x20/0xa0
[  204.028007]  arch_jump_label_transform+0x9b/0x120
[  204.028007]  __jump_label_update+0x77/0x90
[  204.028007]  jump_label_update+0x88/0x90
[  204.028007]  static_key_slow_inc+0x95/0xa0
[  204.028007]  static_key_enable+0x1d/0x50
[  204.028007]  netstamp_clear+0x2d/0x40
[  204.028007]  process_one_work+0x1fc/0x4b0
[  204.028007]  worker_thread+0x4b/0x500
[  204.028007]  kthread+0x101/0x140
[  204.028007]  ? process_one_work+0x4b0/0x4b0
[  204.028007]  ? kthread_create_on_node+0x60/0x60
[  204.028007]  ret_from_fork+0x2c/0x40
[  204.028007] Code: 25 28 00 00 00 75 70 48 83 c4 38 5b 41 5c 5d c3 48 8d 75 c8 48 89 d1 89 df 4c 89 e2 e8 19 fe ff ff 8b 55 e0 83 e2 01 74 0a f3 90 <8b> 55 e0 83 e2 01 75 f6 eb c3 8b 05 9f 41 13 01 85 c0 75 85 80
[  232.028007] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [kworker/0:1:35]
[  232.028007] Modules linked in: ipt_MASQUERADE nf_nat_masquerade_ipv4 nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter xt_conntrack nf_nat nf_conntrack br_netfilter bridge stp llc aufs bnep ppdev btusb btrtl btbcm btintel bluetooth input_leds joydev cdc_acm serio_raw shpchp i2c_piix4 qemu_fw_cfg parport_pc parport mac_hid ib_iser rdma_cm iw_cm ib_cm ib_core configfs iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip_tables x_tables autofs4 btrfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear hid_generic usbhid psmouse hid virtio_net virtio_scsi pata_acpi floppy
[  232.028007] CPU: 0 PID: 35 Comm: kworker/0:1 Tainted: G             L  4.10.0-19-generic #21-Ubuntu
[  232.028007] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014
[  232.028007] Workqueue: events netstamp_clear
[  232.028007] task: ffff885479046a40 task.stack: ffffb7c880750000
[  232.028007] RIP: 0010:smp_call_function_single+0xcf/0x130
[  232.028007] RSP: 0018:ffffb7c880753c88 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff10
[  232.028007] RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000830
[  232.028007] RDX: 0000000000000001 RSI: 00000000000000fb RDI: 0000000000000830
[  232.028007] RBP: ffffb7c880753cd0 R08: fffffffffffffffe R09: 0000000000000003
[  232.028007] R10: 000000000000e93c R11: 000000000000d000 R12: ffffffffb8c34bf0
[  232.028007] R13: 0000000000000000 R14: 0000000000000001 R15: ffffffffb9b92bc0
[  232.028007] FS:  0000000000000000(0000) GS:ffff88547fc00000(0000) knlGS:0000000000000000
[  232.028007] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  232.028007] CR2: 00007ffe672ed000 CR3: 00000001389a3000 CR4: 00000000000006f0
[  232.028007] Call Trace:
[  232.028007]  ? arch_unregister_cpu+0x30/0x30
[  232.028007]  ? arch_unregister_cpu+0x30/0x30
[  232.028007]  smp_call_function_many+0x207/0x250
[  232.028007]  ? netif_receive_skb_internal+0x20/0xa0
[  232.028007]  ? arch_unregister_cpu+0x30/0x30
[  232.028007]  ? netif_receive_skb_internal+0x21/0xa0
[  232.028007]  on_each_cpu+0x2d/0x60
[  232.028007]  ? netif_receive_skb_internal+0x20/0xa0
[  232.028007]  text_poke_bp+0x6a/0xf0
[  232.028007]  ? netif_receive_skb_internal+0x20/0xa0
[  232.028007]  arch_jump_label_transform+0x9b/0x120
[  232.028007]  __jump_label_update+0x77/0x90
[  232.028007]  jump_label_update+0x88/0x90
[  232.028007]  static_key_slow_inc+0x95/0xa0
[  232.028007]  static_key_enable+0x1d/0x50
[  232.028007]  netstamp_clear+0x2d/0x40
[  232.028007]  process_one_work+0x1fc/0x4b0
[  232.028007]  worker_thread+0x4b/0x500
[  232.028007]  kthread+0x101/0x140
[  232.028007]  ? process_one_work+0x4b0/0x4b0
[  232.028007]  ? kthread_create_on_node+0x60/0x60
[  232.028007]  ret_from_fork+0x2c/0x40
[  232.028007] Code: 33 1c 25 28 00 00 00 75 70 48 83 c4 38 5b 41 5c 5d c3 48 8d 75 c8 48 89 d1 89 df 4c 89 e2 e8 19 fe ff ff 8b 55 e0 83 e2 01 74 0a <f3> 90 8b 55 e0 83 e2 01 75 f6 eb c3 8b 05 9f 41 13 01 85 c0 75
[  249.218276] usb 3-2: USB disconnect, device number 4
[  249.218276] INFO: rcu_sched self-detected stall on CPU
[  249.220008] INFO: rcu_sched detected stalls on CPUs/tasks:
[  249.220008]     1-...: (1 ticks this GP) idle=13b/140000000000001/0 softirq=16951/16951 fqs=0
[  249.220008]     (detected by 0, t=21031 jiffies, g=6576, c=6575, q=135)
[  249.220008] Task dump for CPU 1:
[  249.220008] kworker/1:4     R  running task        0   101      2 0x00000000
[  249.220008] Workqueue: usb_hub_wq hub_event
[  249.220008] Call Trace:
[  249.220008]  ? kernfs_put+0xe8/0x1a0
[  249.220008]  ? __kernfs_remove+0x11e/0x250
[  249.220008]  ? kernfs_name_hash+0x17/0x80
[  249.220008]  ? kernfs_find_ns+0x72/0xd0
[  249.220008]  ? kernfs_remove_by_name_ns+0x43/0xa0
[  249.220008]  ? remove_files.isra.1+0x35/0x70
[  249.220008]  ? sysfs_remove_group+0x44/0x90
[  249.220008]  ? dpm_sysfs_remove+0x57/0x60
[  249.220008]  ? device_del+0x129/0x340
[  249.220008]  ? usb_remove_ep_devs+0x1f/0x30
[  249.220008]  ? usb_disable_device+0x9e/0x270
[  249.220008]  ? usb_disconnect+0x8a/0x270
[  249.220008]  ? hub_port_connect+0x84/0x9d0
[  249.220008]  ? hub_event+0x958/0xb10
[  249.220008]  ? add_timer_on+0xd8/0x1b0
[  249.220008]  ? process_one_work+0x1fc/0x4b0
[  249.220008]  ? worker_thread+0x4b/0x500
[  249.220008]  ? kthread+0x101/0x140
[  249.220008]  ? process_one_work+0x4b0/0x4b0
[  249.220008]  ? kthread_create_on_node+0x60/0x60
[  249.220008]  ? ret_from_fork+0x2c/0x40
[  249.220008] rcu_sched kthread starved for 21031 jiffies! g6576 c6575 f0x0 RCU_GP_WAIT_FQS(3) ->state=0x1
[  249.220008] rcu_sched       S    0     7      2 0x00000000
[  249.220008] Call Trace:
[  249.220008]  __schedule+0x233/0x6f0
[  249.220008]  schedule+0x36/0x80
[  249.220008]  schedule_timeout+0x1df/0x3f0
[  249.220008]  ? finish_task_switch+0x76/0x210
[  249.220008]  ? del_timer_sync+0x50/0x50
[  249.220008]  rcu_gp_kthread+0x539/0x900
[  249.220008]  kthread+0x101/0x140
[  249.220008]  ? rcu_note_context_switch+0xf0/0xf0
[  249.220008]  ? kthread_create_on_node+0x60/0x60
[  249.220008]  ret_from_fork+0x2c/0x40
[  249.220648]     1-...: (1 ticks this GP) idle=13b/140000000000001/0 softirq=16951/16951 fqs=1
[  249.220648]      (t=21031 jiffies g=6576 c=6575 q=135)
[  249.220648] Task dump for CPU 1:
[  249.220648] kworker/1:4     R  running task        0   101      2 0x00000000
[  249.220648] Workqueue: usb_hub_wq hub_event
[  249.220648] Call Trace:
[  249.220648]  <IRQ>
[  249.220648]  sched_show_task+0xd3/0x140
[  249.220648]  dump_cpu_task+0x37/0x40
[  249.220648]  rcu_dump_cpu_stacks+0x99/0xbd
[  249.220648]  rcu_check_callbacks+0x703/0x850
[  249.220648]  ? acct_account_cputime+0x1c/0x20
[  249.220648]  ? account_system_time+0x7a/0x110
[  249.220648]  ? tick_sched_handle.isra.15+0x60/0x60
[  249.220648]  update_process_times+0x2f/0x60
[  249.220648]  tick_sched_handle.isra.15+0x25/0x60
[  249.220648]  tick_sched_timer+0x3d/0x70
[  249.220648]  __hrtimer_run_queues+0xf3/0x270
[  249.220648]  hrtimer_interrupt+0xa3/0x1e0
[  249.220648]  local_apic_timer_interrupt+0x38/0x60
[  249.220648]  smp_apic_timer_interrupt+0x38/0x50
[  249.220648]  apic_timer_interrupt+0x89/0x90
[  249.220648] RIP: 0010:ida_simple_remove+0x8/0x50
[  249.220648] RSP: 0018:ffffb7c88099bae8 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff10
[  249.220648] RAX: 0000000000000002 RBX: ffff88535955a438 RCX: 0000000000000001
[  249.220648] RDX: 0000000000000000 RSI: 000000000000d1ba RDI: ffff885479c33790
[  249.220648] RBP: ffffb7c88099bae8 R08: 0000000000000000 R09: 000000018022001e
[  249.220648] R10: ffff88535955a5a0 R11: 0000000000000000 R12: ffff88535955a438
[  249.220648] R13: ffff88535955a9d8 R14: ffff885354c2f098 R15: ffff8853751b18d8
[  249.220648]  </IRQ>
[  249.220648]  kernfs_put+0xe8/0x1a0
[  249.220648]  __kernfs_remove+0x11e/0x250
[  249.220648]  ? kernfs_name_hash+0x17/0x80
[  249.220648]  ? kernfs_find_ns+0x72/0xd0
[  249.220648]  kernfs_remove_by_name_ns+0x43/0xa0
[  249.220648]  remove_files.isra.1+0x35/0x70
[  249.220648]  sysfs_remove_group+0x44/0x90
[  249.220648]  dpm_sysfs_remove+0x57/0x60
[  249.220648]  device_del+0x129/0x340
[  249.220648]  ? usb_remove_ep_devs+0x1f/0x30
[  249.220648]  usb_disable_device+0x9e/0x270
[  249.220648]  usb_disconnect+0x8a/0x270
[  249.220648]  hub_port_connect+0x84/0x9d0
[  249.220648]  hub_event+0x958/0xb10
[  249.220648]  ? add_timer_on+0xd8/0x1b0
[  249.220648]  process_one_work+0x1fc/0x4b0
[  249.220648]  worker_thread+0x4b/0x500
[  249.220648]  kthread+0x101/0x140
[  249.220648]  ? process_one_work+0x4b0/0x4b0
[  249.220648]  ? kthread_create_on_node+0x60/0x60
[  249.220648]  ret_from_fork+0x2c/0x40

And the attached image contents came up on the virtual screen.

Is there a way to run Docker so that the containers don't fall over?
What is happening here? Is my CPU too old? Buggy?

I just don't know how to narrow this problem down.
 

Attachments

  • Screen Shot 2017-07-10 10 31 12 PM.png
    Screen Shot 2017-07-10 10 31 12 PM.png
    26.6 KB · Views: 5
In fact the Ubuntu VM is inaccessible now.. It's running but seems to be totally unavailable.
So.. I don't know what to do now..
 
Were you able to solve the problem? I also see 'cpu soft lockup' in a KVM VM which contains a docker instance. Maybe a similar problem?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!