MLA 012 Docker for Machine Learning Workflows

Nov 9, 2020

Ask episode

AI Snips

Chapters

Transcript

Episode notes

ADVICE

Docker Benefits Over VMs

Use Docker for machine learning to get GPU access and dynamic resource allocation.
It overcomes virtual machine limitations like upfront resource lock and no GPU usage.

INSIGHT

Challenges in ML Environment Setup

Setting up ML environments manually is complex due to CUDA, cuDNN, and package version conflicts.
Anaconda helps but doesn't fully solve system-level dependency issues like CUDA and cuDNN.

ADVICE

Use Anaconda for Python Environments

Use Anaconda to manage multiple Python and package versions in isolated environments.
This approach improves project switching and partial dependency management but still requires cloud setup replication.

Get the Snipd Podcast app to discover more snips from this episode

Get the app

Docker enables efficient, consistent machine learning environment setup across local development and cloud deployment, avoiding many pitfalls of virtual machines and manual dependency management. It streamlines system reproduction, resource allocation, and GPU access, supporting portability and simplified collaboration for ML projects. Machine learning engineers benefit from using pre-built Docker images tailored for ML, allowing seamless project switching, host OS flexibility, and straightforward deployment to cloud platforms like AWS ECS and Batch, resulting in reproducible and maintainable workflows.

Links

Notes and resources at ocdevel.com/mlg/mla-12
Try a walking desk stay healthy & sharp while you learn & code

Traditional Environment Setup Challenges

Traditional machine learning development often requires configuring operating systems, GPU drivers (CUDA, cuDNN), and specific package versions directly on the host machine.
Manual setup can lead to version conflicts, resource allocation issues, and difficulty reproducing environments across different systems or between local and cloud deployments.
Tools like Anaconda and "pipenv" help manage Python and package versions, but they often fall short in managing system-level dependencies such as CUDA and cuDNN.

Virtual Machines vs Containers

Virtual machines (VMs) like VirtualBox or VMware allow multiple operating systems to run on a host, but they pre-allocate resources (RAM, CPU) up front and have limited access to host GPUs, restricting usability for machine learning tasks.
Docker uses containerization to package applications and dependencies, allowing containers to share host resources dynamically and to access the GPU directly, which is essential for ML workloads.

Benefits of Docker for Machine Learning

Dockerfiles describe the entire guest operating system and software environment in code, enabling complete automation and repeatability of environment setup.
Containers created from Dockerfiles use only the necessary resources at runtime and avoid interfering with the host OS, making it easy to switch projects, share setups, or scale deployments.
GPU support in Docker allows machine learning engineers to leverage their hardware regardless of host OS (with best results on Windows and Linux with Nvidia cards).
On Windows, enabling GPU support requires switching to the Dev/Insider channel and installing specific Nvidia drivers alongside WSL2 and Nvidia-Docker.
Macs are less suitable for GPU-accelerated ML due to their AMD graphics cards, although workarounds like PlaidML exist.

Cloud Deployment and Reproducibility

Deploying machine learning models traditionally required manual replication of environments on cloud servers, such as EC2 instances, which is time-consuming and error-prone.
With Docker, the same Dockerfile can be used locally and in the cloud (AWS ECS, Batch, Fargate, EKS, or SageMaker), ensuring the deployed environment matches local development exactly.
AWS ECS is suited for long-lived container services, while AWS Batch can be used for one-off or periodic jobs, offering cost-effective use of spot instances for GPU workloads.

Using Pre-Built Docker Images

Docker Hub provides pre-built images for ML environments, such as nvcr.io's CUDA/cuDNN images and HuggingFace's transformers setups, which can be inherited in custom Dockerfiles.
These images ensure compatibility between key ML libraries (PyTorch, TensorFlow, CUDA, cuDNN) and reduce setup friction.
Custom kitchen-sink images, like those in the "ml-tools" repository, offer a turnkey solution for getting started with machine learning in Docker.

Project Isolation and Maintenance

With Docker, each project can have a fully isolated environment, preventing dependency conflicts and simplifying switching between projects.
Updates or configuration changes are tracked and versioned in the Dockerfile, maintaining a single source of truth for the entire environment.
Modifying the Dockerfile to add dependencies or update versions ensures that local and cloud environments remain synchronized.

Host OS Recommendations for ML Development

Windows is recommended for local development with Docker, offering better desktop experience and driver support than Ubuntu for most users, particularly on laptops.
GPU-accelerated ML is not practical on Macs due to hardware limitations, while Ubuntu is suitable for advanced users comfortable with system configuration and driver management.

Useful Links