Steps to Use CUDA after Linux Kernel Update

05 Oct, 2023

Introduction

Every time your Linux kernel gets an update, your CUDA functionality gets broken.

Steps

Here are the things that you should do to get back-

Purge everything related to NVIDIA and CUDA

sudo apt purge *nvidia nvidia* cuda* *cuda

After you have done this, check if you have any other packages that mention NVIDIA and CUDA.

dpkg -l | grep -i nvidia

And,

dpkg -l | grep -i cuda

Then, purge them all as shown before.

Reboot your system.
Then, install the latest NVIDIA driver.

sudo apt install nvidia-driver-XXX

Replace the XXX with the current version number, or the one that you desire. For me, today, it is 535.

A lot of folks skip this step, and that might cause unnecessary pain in the future.

When the NVIDIA drivers are installed, reboot your system once again.
Now is your time to install the CUDA toolkit.

sudo apt install nvidia-cuda-toolkit

Verify you installation with running nvidia-smi.

You will see a positive return in your terminal.

If you haven't installed nvidia-smi you can simply install it.

Open an ipykernel, then, import PyTorch (or whatever you use), and check if cuda is available.

>>> import torch
>>> torch.cuda.is_available()
# True

It should return true if everything worked well.

Notes

Do not be discouraged easily if you keep getting errors. Sometimes you have to manually install packages that are deemed problematic. For example, I had to run-

apt install libnvidia-fbc1-535 libnvidia-cfg1-535 xserver-xorg-video-nvidia-535 libnvidia-encode-535 libnvidia-decode-535 libnvidia-extra-535 libnvidia-compute-535

and,

apt install libnvcuvid1

before installing the drivers. Also make sure if dkms is installed.

sudo apt install cuda still gives me errors, but everything works as expected and PyTorch and XGBoost can see my GPU and use it. NVIDIA's official guides were not helpful.

So, I wrote my own. 😉

#cuda #deep-learning #gpu #linux #nvidia