Proxmox 7 > 8 or fresh install of 8 with NVIDIA Tesla K20c

daves_nt_here

Member
Dec 27, 2021
13
2
6
49
I've tried a fresh install on 7.4 and installed the NVIDIA-Linux-x86_64-460.106.00.run
nvidia-smi shows the GPU working well. I can spin up a lxc with Plex or folding at home and get GPU processing working.

Upgrade to 8 or do a fresh install using the 460 drivers that NVIDIA recommends or 510 or another 3 versions I can't remember, and nvidia-smi will not work.
On a fresh install, installing the driver stops because it can not create the kernel module. After upgrading from 7>8 nvidia-smi says the driver is not found.
Is PVE 8 not capable of running my Tesla k20 or is their a specific driver that I haven't tried yet?
In my hours of googling, I did see somewhere (and I wish I booked marked it) something about PVE 8 upgraded the kernel and that only the NVIDIA 8xx drivers will work. Wish I could find that article again to confirm.

Below is my setup process step-by-step for PVE 7.4 & Folding At Home working with GPU processing.

Code:
Dell R720 (H710 mini flashed into IT mode)


ZFS Raid1 on the two SSD's

nano /etc/apt/sources.list.d/pve-enterprise.list
comment out repo


nano /etc/apt/sources.list
add:
deb http://download.proxmox.com/debian/pve bullseye pve-no-subscription


echo "apt update && apt -y upgrade && apt -y dist-upgrade && apt -y autoremove && apt autoclean" > update && mv update /usr/local/bin/update && chmod +x /usr/local/bin/update
update
apt install -y build-essential
apt install -y pve-headers-$(uname -r)
apt install -y pve-headers
apt install -y software-properties-common
echo -e "blacklist nouveau\noptions nouveau modeset=0" > /etc/modprobe.d/blacklist-nouveau.conf
update-initramfs -u
reboot

echo -e '\n# load nvidia modules\nnvidia-drm\nnvidia-uvm' >> /etc/modules-load.d/modules.conf
update-initramfs -u -k all
wget https://us.download.nvidia.com/tesla/460.106.00/NVIDIA-Linux-x86_64-460.106.00.run
chmod +x ./NVIDIA-Linux-x86_64-460.106.00.run
./NVIDIA-Linux-x86_64-460.106.00.run


nano /etc/udev/rules.d/70-nvidia.rules
add:
# Create /nvidia0, /dev/nvidia1 … and /nvidiactl when nvidia module is loaded
KERNEL=="nvidia", RUN+="/bin/bash -c '/usr/bin/nvidia-smi -L && /bin/chmod 666 /dev/nvidia*'"
# Create the CUDA node when nvidia_uvm CUDA module is loaded
KERNEL=="nvidia_uvm", RUN+="/bin/bash -c '/usr/bin/nvidia-modprobe -c0 -u && /bin/chmod 0666 /dev/nvidia-uvm*'"

reboot
nvidia-smi

CREATE CONTAINER - Do not start

ls -l /dev/nvidia*

nano /etc/pve/lxc/100.conf
add: <<-- Use number from (ls -l /dev/nvidia*)
lxc.cgroup2.devices.allow: c 195:* rwm
lxc.cgroup2.devices.allow: c 510:* rwm
lxc.cgroup2.devices.allow: c 236:* rwm
lxc.mount.entry: /dev/nvidia0 dev/nvidia0 none bind,optional,create=file
lxc.mount.entry: /dev/nvidiactl dev/nvidiactl none bind,optional,create=file
lxc.mount.entry: /dev/nvidia-uvm dev/nvidia-uvm none bind,optional,create=file
lxc.mount.entry: /dev/nvidia-modeset dev/nvidia-modeset none bind,optional,create=file
lxc.mount.entry: /dev/nvidia-uvm-tools dev/nvidia-uvm-tools none bind,optional,create=file

START CONTAINER

echo "apt update && apt -y upgrade && apt -y dist-upgrade && apt -y autoremove && apt autoclean" > update && mv update /usr/local/bin/update && chmod +x /usr/local/bin/update
update
reboot
wget https://us.download.nvidia.com/tesla/460.106.00/NVIDIA-Linux-x86_64-460.106.00.run
chmod +x ./NVIDIA-Linux-x86_64-460.106.00.run
./NVIDIA-Linux-x86_64-460.106.00.run --no-kernel-module
nvidia-smi
wget https://download.foldingathome.org/releases/public/release/fahclient/debian-stable-64bit/v7.6/fahclient_7.6.21_amd64.deb
dpkg -i --force-depends fahclient_7.6.21_amd64.deb

nano /etc/fahclient/config.xml

<config>
  <!-- Client Control -->
  <fold-anon v='true'/>


  <!-- HTTP Server -->
  <allow v='127.0.0.1 192.168.52.0/24'/>


  <!-- Network -->
  <proxy v=':8080'/>


  <!-- Remote Command Server -->
  <command-allow-no-pass v='127.0.0.1 192.168.52.0/24'/>


  <!-- Slot Control -->
  <power v='full'/>


  <!-- User Information -->
  <passkey v='xxxx'/>
  <team v='xxxx'/>
  <user v='Daves_nt_here'/>


  <!-- Web Server -->
  <web-allow v='127.0.0.1 192.168.52.0/24'/>


  <!-- Folding Slots -->
  <slot id='0' type='CPU'/>
  <slot id='1' type='GPU'>
  </slot>
</config>
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!