

MLA 012 Docker for Machine Learning Workflows
Nov 9, 2020
31:41
Docker Benefits Over VMs
- Use Docker for machine learning to get GPU access and dynamic resource allocation.
- It overcomes virtual machine limitations like upfront resource lock and no GPU usage.
Challenges in ML Environment Setup
- Setting up ML environments manually is complex due to CUDA, cuDNN, and package version conflicts.
- Anaconda helps but doesn't fully solve system-level dependency issues like CUDA and cuDNN.
Use Anaconda for Python Environments
- Use Anaconda to manage multiple Python and package versions in isolated environments.
- This approach improves project switching and partial dependency management but still requires cloud setup replication.
Get the Snipd Podcast app to discover more snips from this episode
Get the app 1 chevron_right 2 chevron_right 3 chevron_right 4 chevron_right 5 chevron_right 6 chevron_right 7 chevron_right 8 chevron_right 9 chevron_right 10 chevron_right
Introduction
00:00 • 5min
Boon to Lennox?
04:30 • 4min
Anaconda, a Python Environment Manager, That Allows You to Install Multiple Versions of Python
08:25 • 3min
Docker - The New Way of Virtual Machine Management
10:56 • 3min
How to Setup a Guest Operating System on a Docker Host
13:47 • 2min
How to Run Docker on Windows on a Macbook Pro
16:02 • 3min
Deploy a Long Lived Machine Learning Model to the Cloud
19:02 • 2min
Deploying a Docker Container for Machine Learning Models
21:18 • 4min
The Three Big Benefits of Using Docker Instead of Aniconda
24:55 • 4min
Docker - The Future Is Docker
29:05 • 2min
Docker enables efficient, consistent machine learning environment setup across local development and cloud deployment, avoiding many pitfalls of virtual machines and manual dependency management. It streamlines system reproduction, resource allocation, and GPU access, supporting portability and simplified collaboration for ML projects. Machine learning engineers benefit from using pre-built Docker images tailored for ML, allowing seamless project switching, host OS flexibility, and straightforward deployment to cloud platforms like AWS ECS and Batch, resulting in reproducible and maintainable workflows.
Links- Notes and resources at ocdevel.com/mlg/mla-12
- Try a walking desk stay healthy & sharp while you learn & code
- Traditional machine learning development often requires configuring operating systems, GPU drivers (CUDA, cuDNN), and specific package versions directly on the host machine.
- Manual setup can lead to version conflicts, resource allocation issues, and difficulty reproducing environments across different systems or between local and cloud deployments.
- Tools like Anaconda and "pipenv" help manage Python and package versions, but they often fall short in managing system-level dependencies such as CUDA and cuDNN.
- Virtual machines (VMs) like VirtualBox or VMware allow multiple operating systems to run on a host, but they pre-allocate resources (RAM, CPU) up front and have limited access to host GPUs, restricting usability for machine learning tasks.
- Docker uses containerization to package applications and dependencies, allowing containers to share host resources dynamically and to access the GPU directly, which is essential for ML workloads.
- Dockerfiles describe the entire guest operating system and software environment in code, enabling complete automation and repeatability of environment setup.
- Containers created from Dockerfiles use only the necessary resources at runtime and avoid interfering with the host OS, making it easy to switch projects, share setups, or scale deployments.
- GPU support in Docker allows machine learning engineers to leverage their hardware regardless of host OS (with best results on Windows and Linux with Nvidia cards).
- On Windows, enabling GPU support requires switching to the Dev/Insider channel and installing specific Nvidia drivers alongside WSL2 and Nvidia-Docker.
- Macs are less suitable for GPU-accelerated ML due to their AMD graphics cards, although workarounds like PlaidML exist.
- Deploying machine learning models traditionally required manual replication of environments on cloud servers, such as EC2 instances, which is time-consuming and error-prone.
- With Docker, the same Dockerfile can be used locally and in the cloud (AWS ECS, Batch, Fargate, EKS, or SageMaker), ensuring the deployed environment matches local development exactly.
- AWS ECS is suited for long-lived container services, while AWS Batch can be used for one-off or periodic jobs, offering cost-effective use of spot instances for GPU workloads.
- Docker Hub provides pre-built images for ML environments, such as nvcr.io's CUDA/cuDNN images and HuggingFace's transformers setups, which can be inherited in custom Dockerfiles.
- These images ensure compatibility between key ML libraries (PyTorch, TensorFlow, CUDA, cuDNN) and reduce setup friction.
- Custom kitchen-sink images, like those in the "ml-tools" repository, offer a turnkey solution for getting started with machine learning in Docker.
- With Docker, each project can have a fully isolated environment, preventing dependency conflicts and simplifying switching between projects.
- Updates or configuration changes are tracked and versioned in the Dockerfile, maintaining a single source of truth for the entire environment.
- Modifying the Dockerfile to add dependencies or update versions ensures that local and cloud environments remain synchronized.
- Windows is recommended for local development with Docker, offering better desktop experience and driver support than Ubuntu for most users, particularly on laptops.
- GPU-accelerated ML is not practical on Macs due to hardware limitations, while Ubuntu is suitable for advanced users comfortable with system configuration and driver management.
- Docker
- Instructions: Windows Dev Channel & WSL2 with nvidia-docker support
- Nvidia's guide for CUDA on WSL2
- WSL2 & Docker odds-and-ends
- nvidia/cuda:10.1-cudnn7-runtime-ubuntu18.04 Docker Image
- huggingface/transformers-gpu
- ml-tools kitchen-sink Dockerfiles
- Machine learning hardware guidance
- Front-end stack + cloud-hosting info
- ML cloud-hosting info