Proxmox 7 > 8 or fresh install of 8 with NVIDIA Tesla K20c

daves_nt_here

Member
Dec 27, 2021
13
2
6
50
I've tried a fresh install on 7.4 and installed the NVIDIA-Linux-x86_64-460.106.00.run
nvidia-smi shows the GPU working well. I can spin up a lxc with Plex or folding at home and get GPU processing working.

Upgrade to 8 or do a fresh install using the 460 drivers that NVIDIA recommends or 510 or another 3 versions I can't remember, and nvidia-smi will not work.
On a fresh install, installing the driver stops because it can not create the kernel module. After upgrading from 7>8 nvidia-smi says the driver is not found.
Is PVE 8 not capable of running my Tesla k20 or is their a specific driver that I haven't tried yet?
In my hours of googling, I did see somewhere (and I wish I booked marked it) something about PVE 8 upgraded the kernel and that only the NVIDIA 8xx drivers will work. Wish I could find that article again to confirm.

Below is my setup process step-by-step for PVE 7.4 & Folding At Home working with GPU processing.

Code:
Dell R720 (H710 mini flashed into IT mode)


ZFS Raid1 on the two SSD's

nano /etc/apt/sources.list.d/pve-enterprise.list
comment out repo


nano /etc/apt/sources.list
add:
deb http://download.proxmox.com/debian/pve bullseye pve-no-subscription


echo "apt update && apt -y upgrade && apt -y dist-upgrade && apt -y autoremove && apt autoclean" > update && mv update /usr/local/bin/update && chmod +x /usr/local/bin/update
update
apt install -y build-essential
apt install -y pve-headers-$(uname -r)
apt install -y pve-headers
apt install -y software-properties-common
echo -e "blacklist nouveau\noptions nouveau modeset=0" > /etc/modprobe.d/blacklist-nouveau.conf
update-initramfs -u
reboot

echo -e '\n# load nvidia modules\nnvidia-drm\nnvidia-uvm' >> /etc/modules-load.d/modules.conf
update-initramfs -u -k all
wget https://us.download.nvidia.com/tesla/460.106.00/NVIDIA-Linux-x86_64-460.106.00.run
chmod +x ./NVIDIA-Linux-x86_64-460.106.00.run
./NVIDIA-Linux-x86_64-460.106.00.run


nano /etc/udev/rules.d/70-nvidia.rules
add:
# Create /nvidia0, /dev/nvidia1 … and /nvidiactl when nvidia module is loaded
KERNEL=="nvidia", RUN+="/bin/bash -c '/usr/bin/nvidia-smi -L && /bin/chmod 666 /dev/nvidia*'"
# Create the CUDA node when nvidia_uvm CUDA module is loaded
KERNEL=="nvidia_uvm", RUN+="/bin/bash -c '/usr/bin/nvidia-modprobe -c0 -u && /bin/chmod 0666 /dev/nvidia-uvm*'"

reboot
nvidia-smi

CREATE CONTAINER - Do not start

ls -l /dev/nvidia*

nano /etc/pve/lxc/100.conf
add: <<-- Use number from (ls -l /dev/nvidia*)
lxc.cgroup2.devices.allow: c 195:* rwm
lxc.cgroup2.devices.allow: c 510:* rwm
lxc.cgroup2.devices.allow: c 236:* rwm
lxc.mount.entry: /dev/nvidia0 dev/nvidia0 none bind,optional,create=file
lxc.mount.entry: /dev/nvidiactl dev/nvidiactl none bind,optional,create=file
lxc.mount.entry: /dev/nvidia-uvm dev/nvidia-uvm none bind,optional,create=file
lxc.mount.entry: /dev/nvidia-modeset dev/nvidia-modeset none bind,optional,create=file
lxc.mount.entry: /dev/nvidia-uvm-tools dev/nvidia-uvm-tools none bind,optional,create=file

START CONTAINER

echo "apt update && apt -y upgrade && apt -y dist-upgrade && apt -y autoremove && apt autoclean" > update && mv update /usr/local/bin/update && chmod +x /usr/local/bin/update
update
reboot
wget https://us.download.nvidia.com/tesla/460.106.00/NVIDIA-Linux-x86_64-460.106.00.run
chmod +x ./NVIDIA-Linux-x86_64-460.106.00.run
./NVIDIA-Linux-x86_64-460.106.00.run --no-kernel-module
nvidia-smi
wget https://download.foldingathome.org/releases/public/release/fahclient/debian-stable-64bit/v7.6/fahclient_7.6.21_amd64.deb
dpkg -i --force-depends fahclient_7.6.21_amd64.deb

nano /etc/fahclient/config.xml

<config>
  <!-- Client Control -->
  <fold-anon v='true'/>


  <!-- HTTP Server -->
  <allow v='127.0.0.1 192.168.52.0/24'/>


  <!-- Network -->
  <proxy v=':8080'/>


  <!-- Remote Command Server -->
  <command-allow-no-pass v='127.0.0.1 192.168.52.0/24'/>


  <!-- Slot Control -->
  <power v='full'/>


  <!-- User Information -->
  <passkey v='xxxx'/>
  <team v='xxxx'/>
  <user v='Daves_nt_here'/>


  <!-- Web Server -->
  <web-allow v='127.0.0.1 192.168.52.0/24'/>


  <!-- Folding Slots -->
  <slot id='0' type='CPU'/>
  <slot id='1' type='GPU'>
  </slot>
</config>