Wondering if anyone has been able to make nvidia-smi/cuda/etc work in an LXC container.
Feels like I'm close...configs added correctly in LXC:
lxc.mount.entry = /dev/nvidia0 dev/nvidia0 none bind,optional,create=file,uid=65534,gid=65534
lxc.mount.entry = /dev/nvidiactl dev/nvidiactl none bind,optional,create=file,uid=65534,gid=65534
lxc.mount.entry = /dev/nvidia-uvm dev/nvidia-uvm none bind,optional,create=file,uid=65534,gid=65534
devs are present, perms are set right:
root@plex1:~# ls -al /dev/nvidia*
crw-rw-rw- 1 nobody nogroup 241, 0 Oct 6 14:02 /dev/nvidia-uvm
crw-rw-rw- 1 nobody nogroup 195, 0 Oct 4 19:23 /dev/nvidia0
crw-rw-rw- 1 nobody nogroup 195, 255 Oct 4 19:23 /dev/nvidiactl
followed these:
http://sqream.com/setting-cuda-linux-containers-2/
https://stackoverflow.com/questions/25185405/using-gpu-from-a-docker-container
etc etc.
But nvidia-smi, hellocuda, devicequery etc all error out.
strace nvidia-smi -a
SNIP:
munmap(0x7fce2b7f6000, 4096) = 0
stat("/dev/nvidiactl", {st_mode=S_IFCHR|0666, st_rdev=makedev(195, 255), ...}) = 0
open("/dev/nvidiactl", O_RDWR) = -1 EPERM (Operation not permitted)
open("/dev/nvidiactl", O_RDONLY) = -1 EPERM (Operation not permitted)
fstat(1, {st_mode=S_IFCHR|0600, st_rdev=makedev(136, 3), ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fce2b7f6000
write(1, "Failed to initialize NVML: Unkno"..., 41Failed to initialize NVML: Unknown Error
Even shows up right in the dmesg of the LXC container:
root@plex1:~# dmesg | grep -i nvidia
[ 23.727998] input: HDA NVidia HDMI/DP,pcm=3 as /devices/pci0000:00/0000:00:02.0/0000:04:00.1/sound/card0/input8
[ 23.728107] input: HDA NVidia HDMI/DP,pcm=7 as /devices/pci0000:00/0000:00:02.0/0000:04:00.1/sound/card0/input9
[ 23.728365] input: HDA NVidia HDMI/DP,pcm=8 as /devices/pci0000:00/0000:00:02.0/0000:04:00.1/sound/card0/input10
[ 23.728857] input: HDA NVidia HDMI/DP,pcm=9 as /devices/pci0000:00/0000:00:02.0/0000:04:00.1/sound/card0/input11
[ 1402.275320] nvidia 0000:04:00.0: enabling device (0006 -> 0007)
[ 1402.275884] nvidia-nvlink: Nvlink Core is being initialized, major device number 242
[ 1402.276120] NVRM: loading NVIDIA UNIX x86_64 Kernel Module 367.44 Wed Aug 17 22:24:07 PDT 2016
[ 1402.353444] nvidia-uvm: Loaded the UVM driver in 8 mode, major device number 241
[ 1402.361508] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms 367.44 Wed Aug 17 21:54:40 PDT 2016
[ 1402.372654] [drm] [nvidia-drm] [GPU ID 0x00000400] Loading driver
[ 1402.377787] [drm] [nvidia-drm] [GPU ID 0x00000400] Unloading driver
[ 1402.417083] nvidia-modeset: Unloading
[ 1402.444136] nvidia-uvm: Unloaded the UVM driver in 8 mode
[ 1402.473375] nvidia-nvlink: Unregistered the Nvlink Core, major device number 242
[ 1508.371671] nvidia-nvlink: Nvlink Core is being initialized, major device number 242
[ 1508.371771] NVRM: loading NVIDIA UNIX x86_64 Kernel Module 367.44 Wed Aug 17 22:24:07 PDT 2016
[ 1508.381148] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms 367.44 Wed Aug 17 21:54:40 PDT 2016
[ 1508.391240] [drm] [nvidia-drm] [GPU ID 0x00000400] Loading driver
[178725.828295] nvidia-uvm: Loaded the UVM driver in 8 mode, major device number 241
Also mentioned here : https://forum.proxmox.com/threads/kernel-sources-for-driver-module-compilation.27063/
Any suggestions?
Feels like I'm close...configs added correctly in LXC:
lxc.mount.entry = /dev/nvidia0 dev/nvidia0 none bind,optional,create=file,uid=65534,gid=65534
lxc.mount.entry = /dev/nvidiactl dev/nvidiactl none bind,optional,create=file,uid=65534,gid=65534
lxc.mount.entry = /dev/nvidia-uvm dev/nvidia-uvm none bind,optional,create=file,uid=65534,gid=65534
devs are present, perms are set right:
root@plex1:~# ls -al /dev/nvidia*
crw-rw-rw- 1 nobody nogroup 241, 0 Oct 6 14:02 /dev/nvidia-uvm
crw-rw-rw- 1 nobody nogroup 195, 0 Oct 4 19:23 /dev/nvidia0
crw-rw-rw- 1 nobody nogroup 195, 255 Oct 4 19:23 /dev/nvidiactl
followed these:
http://sqream.com/setting-cuda-linux-containers-2/
https://stackoverflow.com/questions/25185405/using-gpu-from-a-docker-container
etc etc.
But nvidia-smi, hellocuda, devicequery etc all error out.
strace nvidia-smi -a
SNIP:
munmap(0x7fce2b7f6000, 4096) = 0
stat("/dev/nvidiactl", {st_mode=S_IFCHR|0666, st_rdev=makedev(195, 255), ...}) = 0
open("/dev/nvidiactl", O_RDWR) = -1 EPERM (Operation not permitted)
open("/dev/nvidiactl", O_RDONLY) = -1 EPERM (Operation not permitted)
fstat(1, {st_mode=S_IFCHR|0600, st_rdev=makedev(136, 3), ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fce2b7f6000
write(1, "Failed to initialize NVML: Unkno"..., 41Failed to initialize NVML: Unknown Error
Even shows up right in the dmesg of the LXC container:
root@plex1:~# dmesg | grep -i nvidia
[ 23.727998] input: HDA NVidia HDMI/DP,pcm=3 as /devices/pci0000:00/0000:00:02.0/0000:04:00.1/sound/card0/input8
[ 23.728107] input: HDA NVidia HDMI/DP,pcm=7 as /devices/pci0000:00/0000:00:02.0/0000:04:00.1/sound/card0/input9
[ 23.728365] input: HDA NVidia HDMI/DP,pcm=8 as /devices/pci0000:00/0000:00:02.0/0000:04:00.1/sound/card0/input10
[ 23.728857] input: HDA NVidia HDMI/DP,pcm=9 as /devices/pci0000:00/0000:00:02.0/0000:04:00.1/sound/card0/input11
[ 1402.275320] nvidia 0000:04:00.0: enabling device (0006 -> 0007)
[ 1402.275884] nvidia-nvlink: Nvlink Core is being initialized, major device number 242
[ 1402.276120] NVRM: loading NVIDIA UNIX x86_64 Kernel Module 367.44 Wed Aug 17 22:24:07 PDT 2016
[ 1402.353444] nvidia-uvm: Loaded the UVM driver in 8 mode, major device number 241
[ 1402.361508] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms 367.44 Wed Aug 17 21:54:40 PDT 2016
[ 1402.372654] [drm] [nvidia-drm] [GPU ID 0x00000400] Loading driver
[ 1402.377787] [drm] [nvidia-drm] [GPU ID 0x00000400] Unloading driver
[ 1402.417083] nvidia-modeset: Unloading
[ 1402.444136] nvidia-uvm: Unloaded the UVM driver in 8 mode
[ 1402.473375] nvidia-nvlink: Unregistered the Nvlink Core, major device number 242
[ 1508.371671] nvidia-nvlink: Nvlink Core is being initialized, major device number 242
[ 1508.371771] NVRM: loading NVIDIA UNIX x86_64 Kernel Module 367.44 Wed Aug 17 22:24:07 PDT 2016
[ 1508.381148] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms 367.44 Wed Aug 17 21:54:40 PDT 2016
[ 1508.391240] [drm] [nvidia-drm] [GPU ID 0x00000400] Loading driver
[178725.828295] nvidia-uvm: Loaded the UVM driver in 8 mode, major device number 241
Also mentioned here : https://forum.proxmox.com/threads/kernel-sources-for-driver-module-compilation.27063/
Any suggestions?
Last edited: