Upgrading AWS “Deep Learning AMI Ubuntu Version” to TensorFlow 1.1.0 with GPU support

Today’s post is a short description of how to upgrade TensorFlow on the Deep Learning AWS instance so that it works with Nvidia GRID K520 (available for example on g2.2xlarge instances).


For some reason, the AWS Deep Learning AMI is using the old version of TensorFlow, even though the latest image was created in April 2017. Unfortunately, to fix that, a simple upgrade with ‘pip install’ on the TensorFlow library is not enough, as we need to upgrade the TensorFlow-GPU binary to the corresponding version. That, on the other hand, implies that we need to have CUDA 8.0 instead of the 7.5 version available by default, which makes the thing a bit more complicated.

To find out if you are using GPU device for your TensorFlow computations execute the following two lines inside the Python console:

import tensorflow as tf
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))

If you see something similar to “Device mapping: no known devices.” it means you are not utilizing your GPU and TensorFlow is using only CPU to do its work (which, of course, is many times slower).

Installing the NVIDIA stack (Driver, CUDA, cuDNN):

Check your CUDA version:

ubuntu@$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2015 NVIDIA Corporation
Built on Tue_Aug_11_14:27:32_CDT_2015
Cuda compilation tools, release 7.5, V7.5.17

If you don’t see version number 8.0 or higher for your Cuda compilation tools, you will need to install CUDA 8 and appropriate NVIDIA drivers:

sudo apt update -y && sudo apt upgrade -y
sudo apt install build-essential linux-image-extra-`uname -r` -y

Download the driver: http://www.nvidia.com/Download/driverResults.aspx/114708/en-us (look for NVIDIA-Linux-x86_64-375.66.run file)

Download appropriate CUDA version: https://developer.nvidia.com/cuda-zone (look for cuda_8.0.61_375.26_linux-run)

Download appropriate cuDNN version: https://developer.nvidia.com/cudnn - to download the cuDNN you will need to be logged in to a NVIDIA developer account (look for cudnn-8.0-linux-x64-v5.1.tgz file)

Install the driver:

chmod +x NVIDIA-Linux-x86_64-375.66.run
sudo ./NVIDIA-Linux-x86_64-375.66.run

(choose the ‘yes’/’ok’ options when asked, and you should be fine)

Install CUDA:

chmod +x cuda_8.0.61_375.26_linux-run
./cuda_8.0.61_375.26_linux-run --extract=`pwd`/extracts
sudo ./extracts/cuda-linux64-rel-8.0.61-21551265.run

Please make sure that:

  • PATH includes /usr/local/cuda-8.0/bin
  • LD_LIBRARY_PATH includes /usr/local/cuda-8.0/lib64, or, add /usr/local/cuda-8.0/lib64 to /etc/ld.so.conf and run ldconfig as root

For proper PATH and LD_LIBRARY_PATH settings you can execute:

echo -e "export CUDA_HOME=/usr/local/cuda\nexport PATH=\$PATH:\$CUDA_HOME/bin\nexport LD_LIBRARY_PATH=\$LD_LINKER_PATH:\$CUDA_HOME/lib64" >> ~/.bashrc

and ‘source .bashrc’ to refresh.

Install cuDNN:

tar -xf cudnn-8.0-linux-x64-v5.1.tgz
cd cuda
sudo cp lib64/* /usr/local/cuda/lib64/
sudo cp include/cudnn.h /usr/local/cuda/include/

Upgrading TensorFlow

Once the NVIDIA driver, CUDA, and cuDNN are properly installed we can move on to upgrading our TensorFlow and TensorFlow GPU installations:

sudo pip install 'tensorflow==1.1.0' --force-reinstall
sudo pip install 'tensorflow-gpu==1.1.0' --force-reinstall

Run the TensorFlow check again, you should see something similar to the output below:

Device mapping:
/job:localhost/replica:0/task:0/gpu:0 -> device: 0, name: GRID K520, pci bus id: 0000:00:03.0
2017-06-02 17:30:43.742925: I tensorflow/core/common_runtime/direct_session.cc:257] Device mapping:
/job:localhost/replica:0/task:0/gpu:0 -> device: 0, name: GRID K520, pci bus id: 0000:00:03.0


Upgrading the NVIDIA stack and TensorFlow itself becomes easy once you know how to match the versions of the driver, CUDA, cuDNN and TensorFlow. Using GPU acceleration for training or testing deep neural networks speeds things up considerably, e.g. from almost 9s per image while testing object detection (https://softwaremill.com/counting-objects-with-faster-rcnn/) to around 1s per image.