Important Notice: this service will be discontinued by the end of 2024 because for multiple years now, Plume is no longer under active/continuous development. Sadly each time there was hope, active development came to a stop again. Please consider using our Writefreely instance instead.

Building an AMD Deep Learning Machine (part 2)

Software Stack

Complete amd computer build, the word RADEON is lit up in red on the side of the gpu and the ram is a vibrant rainbow

Operating System

I used Ubuntu 19.04 partially because I wanted to try out the April release of Ubuntu and I knew that the newer kernels were more compatible with Vega (the amdgpu driver is merged into the kernel after 4.19 which reduces installation headaches) and the Ryzen CPU.

A note about docker

If you are not a fan of docker, for security or whatever reason, I don't advise that you use Ubuntu 19.04. This release has only python 3.7 and there are, at the moment, a few issues with running rocm 2.3 with python 3.7. This doesn't seem to be a problem on python 3.5 or 3.6 and with older versions of ROCm. However, the performance improvement with the newer version of ROCm is pretty substantial so I would use a version of Ubuntu where you can downgrade your python version instead.

Initial Software Stack

First, the debian repository has to be added

wget -qO - http://repo.radeon.com/rocm/apt/debian/rocm.gpg.key | sudo apt-key add -
echo 'deb [arch=amd64] http://repo.radeon.com/rocm/apt/debian/ xenial main' | sudo tee /etc/apt/sources.list.d/rocm.list

Then, the appropriate packages are installed.

sudo apt update
sudo apt install rocm-libs miopen-hip cxlactivitylogger
sudo apt install rocm-dev

Because we are using the amdgpu drivers in kernel 5.0 that ships with Ubuntu 19.04, we need to add the following udev rule.

echo 'SUBSYSTEM=="kfd", KERNEL=="kfd", TAG+="uaccess", GROUP="video"' | sudo tee /etc/udev/rules.d/70-kfd.rules

Docker install

I added the following line into my ~/.bash_rc file to allow for quick launching of the container:

alias drun='sudo docker run -it --network=host \
--device=/dev/kfd \
--device=/dev/dri \
--group-add video \
--cap-add=SYS_PTRACE \
--security-opt \
seccomp=unconfined \
-v $HOME/dockerx:/dockerx'

To launch the container, you then simply run drun rocm/tensorflow to drop into your container. The first time you run this, it will pull the images from dockerhub. After that, it will use the cached image.

Complete amd computer build, the word RADEON is lit up in red on the side of the gpu and the ram is a vibrant rainbow