Tutorials

Tutorial: Deep learning with TensorFlow, Nvidia and Apache Mesos (DC/OS), Part 1

For more than five years, DC/OS has enabled some of the largest, most sophisticated enterprises in the world to achieve unparalleled levels of efficiency, reliability, and scalability from their IT infrastructure. But now it is time to pass the torch to a new generation of technology: the D2iQ Kubernetes Platform (DKP). Why? Kubernetes has now achieved a level of capability that only DC/OS could formerly provide and is now evolving and improving far faster (as is true of its supporting ecosystem). That’s why we have chosen to sunset DC/OS, with an end-of-life date of October 31, 2021. With DKP, our customers get the same benefits provided by DC/OS and more, as well as access to the most impressive pace of innovation the technology world has ever seen. This was not an easy decision to make, but we are dedicated to enabling our customers to accelerate their digital transformations, so they can increase the velocity and responsiveness of their organizations to an ever-more challenging future. And the best way to do that right now is with DKP.

May 03, 2017

Suzanne Scala

D2iQ

DC/OS 1.9 introduced GPU-based scheduling. With GPU-based scheduling, organizations can share resources of clusters for traditional and machine learning workloads, as well as dynamically allocate GPU resources inside those clusters and free them when needed. By using some of the popular libraries for machine learning (such as TensorFlow and Nvidia Docker), data scientists can test locally on their laptops and deploy to production on DC/OS without any change to their applications and models. Read on to learn more about this technology and how you can take advantage of it within Mesosphere DC/OS.

 

This is the first in a multipart tutorial to demonstrate the value of GPUs, and how running them on DC/OS makes your applications not just fast, but also efficient, reliable, and scalable.

 

Tutorial #1: Run a Tensor flow Docker image on your laptop and run a machine learning model with and without GPUs.

Tutorial #2: Run a Tensoflow Docker image on a DC/OS cluster with and without GPUs. See GPU isolation and Jupyter in action.

Tutorial #3: Deploy a dynamic, distributed TensorFlow on DC/OS from the Universe. See how TensorFlow on DC/OS dynamically consumes and releases resources on the cluster when done. Run multiple TensorFlows on the same cluster with different resource requirements.

 

Tensorflow Tutorial #1

 

In part one of this tutorial we'll use TensorFlow to launch a convolutional neural network example on your local machine, then use nvidia-docker to accelerate the TensorFlow job using GPUs. Tensorflow is a machine intelligence library with architecture specially configured to leverage GPUs for speed and efficiency.

 

Watch a video version of this tutorial here.

 

Prerequisite

 

Docker installed on your local machine.

 

Let's get going!

 

First, we'll get TensorFlow running without GPUs and time the result.

 

On your local machine, launch TensorFlow without GPUs using Docker.

 

docker run -it tensorflow/tensorflow bash

We will need to download a machine learning model to test with Tensorflow. You can find some good examples on this github repo. We'll install git using apt-get, then clone the examples repo.

 

apt-get update; apt-get install -y git  git clone https://github.com/aymericdamien/TensorFlow-Examples

Now that we have the the convolutional network example, let's run it and time the execution.

 

cd TensorFlow-Examples/examples/3_NeuralNetworks  time python convolutional_network.py

 

That took my machine about 3 minutes.

 

TensorFlow example with GPUs

 

Now, let's see how GPUs compare.

 

Note: This part of the tutorial will not work on Mac OSX because nvidia-docker does not support OSX.

 

Open a new tab to compare results with next steps with nvidia-docker.

Make sure you have the latest CUDA drivers installed on your system. Find them at https://developer.nvidia.com/cuda-downloads. As of this writing, version 8.0 is the latest.

Download and install nvidia-docker: https://github.com/NVIDIA/nvidia-docker.

Verify your nvidia-docker installation.

 

nvidia-docker run --rm nvidia/cuda nvidia-smi  nvidia-docker run --rm nvidia/cuda:7.5 nvidia-smi

Launch TensorFlow with GPUs using nvidia-docker.

 

nvidia-docker run -it tensorflow/tensorflow:latest-gpu bash

Install git and download TensorFlow-Examples, as you did in the previous section.

 

 apt-get update; apt-get install -y git  git clone https://github.com/aymericdamien/TensorFlow-Examples

Run the same convolutional network example as before.

 

cd TensorFlow-Examples/examples/3_NeuralNetworks  time python convolutional_network.py

 

That probably took about 30 seconds: 10 times faster than using just CPUs. The benefits of using GPUs are clear.

 

What's the effect of multiple GPUs?

 

While we're here, let's run the multi-GPU example to see the performance difference if we use more than one GPU for a task.

 

cd ../5_MultiGPU  time python multigpu_basics.py

 

Even faster!

Ready to get started?