Steps to Use CUDA after Linux Kernel Update
by Rito Ghosh
Introduction
Every time your Linux kernel gets an update, your CUDA functionality gets broken.
Steps
Here are the things that you should do to get back-
- Purge everything related to NVIDIA and CUDA
sudo apt purge *nvidia nvidia* cuda* *cuda
After you have done this, check if you have any other packages that mention NVIDIA and CUDA.
dpkg -l | grep -i nvidia
And,
dpkg -l | grep -i cuda
Then, purge them all as shown before.
Reboot your system.
Then, install the latest NVIDIA driver.
sudo apt install nvidia-driver-XXX
Replace the XXX with the current version number, or the one that you desire. For me, today, it is 535.
A lot of folks skip this step, and that might cause unnecessary pain in the future.
When the NVIDIA drivers are installed, reboot your system once again.
Now is your time to install the CUDA toolkit.
sudo apt install nvidia-cuda-toolkit
- Verify you installation with running
nvidia-smi
.
You will see a positive return in your terminal.
If you haven't installed nvidia-smi
you can simply install it.
- Open an ipykernel, then, import PyTorch (or whatever you use), and check if cuda is available.
>>> import torch
>>> torch.cuda.is_available()
# True
It should return true if everything worked well.
Notes
Do not be discouraged easily if you keep getting errors. Sometimes you have to manually install packages that are deemed problematic. For example, I had to run-
apt install libnvidia-fbc1-535 libnvidia-cfg1-535 xserver-xorg-video-nvidia-535 libnvidia-encode-535 libnvidia-decode-535 libnvidia-extra-535 libnvidia-compute-535
and,
apt install libnvcuvid1
before installing the drivers. Also make sure if dkms
is installed.
sudo apt install cuda
still gives me errors, but everything works as expected and PyTorch and XGBoost can see my GPU and use it. NVIDIA's official guides were not helpful.
So, I wrote my own. 😉