MLOps & Deployment

Docker Image Development Fundamentals

Docker images provide immutable, portable environments that ensure ML models behave identically across development, testing, and production.
The Dockerfile acts as a blueprint, utilizing layered architecture to cache dependencies and minimize build times for large ML frameworks.
Optimizing image size through multi-stage builds and base image selection is critical for reducing latency in cloud-based model serving.
Reproducibility in MLOps relies on pinning specific versions of libraries and system dependencies within the image manifest.

Why It Matters

Financial Fraud Detection

Banks like JPMorgan Chase use Dockerized ML models to process millions of transactions in real-time. By containerizing their fraud detection models, they ensure that the exact same feature engineering logic used during model training is applied during inference, preventing "training-serving skew."

Autonomous Vehicle Simulation

Companies like Waymo utilize Docker to manage the massive, heterogeneous software stacks required for autonomous driving. Each simulation environment is a Docker image, allowing researchers to run thousands of parallel simulations across a cluster, each with a guaranteed, identical software environment.

Healthcare Imaging Diagnostics

Startups in the medical imaging space use Docker to distribute diagnostic models to hospitals. Because medical data is sensitive and hospital IT environments vary, providing a self-contained Docker image ensures the model runs reliably on local hospital servers without requiring complex, manual software installations.

How it Works

The Philosophy of Immutable Infrastructure

In traditional software development, "it works on my machine" is a common failure mode. In MLOps, this problem is amplified by complex dependency chains involving CUDA, cuDNN, and specific versions of PyTorch or TensorFlow. Docker solves this by encapsulating the entire environment into an immutable image. Once an image is built, it is a static snapshot. If your model works inside the container on your laptop, it is mathematically guaranteed to work in the cloud, provided the underlying architecture (e.g., x86_64 vs ARM64) is consistent. This reliability is the bedrock of modern MLOps, allowing teams to treat infrastructure as code.

Layered Architecture and Caching

Docker images are constructed as a stack of layers. When you write a Dockerfile, every RUN, COPY, or ADD command creates a new layer. Docker caches these layers; if you change a line in your Dockerfile, Docker only rebuilds that layer and all subsequent ones. For ML practitioners, this is vital. Installing torch or scikit-learn takes minutes. By placing your requirements.txt copy and installation commands before copying your source code, you ensure that changing a single line of your model logic does not trigger a massive re-download of your ML dependencies. This optimization strategy can reduce build times from ten minutes to ten seconds.

Multi-Stage Builds for ML

ML models often require heavy build tools (like gcc or cmake) to compile custom C++ extensions for libraries like PyTorch or NumPy. However, these tools are unnecessary at runtime. Multi-stage builds allow you to use a "builder" image to install dependencies and compile code, and then copy only the necessary artifacts into a "slim" runtime image. This significantly reduces the attack surface of your container and lowers the storage footprint, which is critical when deploying models to edge devices or serverless environments like AWS Lambda or Google Cloud Run.

Managing GPU Dependencies

One of the most complex aspects of ML Docker development is GPU support. Unlike CPU-only containers, GPU containers require the NVIDIA Container Toolkit. Your Dockerfile must be compatible with the host's NVIDIA driver version. Using official NVIDIA CUDA base images is the industry standard. These images are pre-configured with the necessary libraries to interface with the GPU hardware. Failing to align the CUDA version in your image with the driver version on your production server will result in runtime errors that are notoriously difficult to debug.

Common Pitfalls

"Docker images are just like Virtual Machines." While they look similar, Docker containers share the host kernel and are much lighter. Treating them like VMs leads to "fat images" that include unnecessary OS services, which increases security risks and deployment times.
"I should use the 'latest' tag for my base images." Using latest makes your builds non-deterministic, as the base image can change without notice. Always pin your base image to a specific version (e.g., python:3.9.12-slim) to ensure reproducibility.
"Installing everything in one RUN command is bad." Actually, it is good practice to combine apt-get update and apt-get install into a single RUN command. This prevents the creation of intermediate layers that contain cached package lists, keeping the image size smaller.
"I don't need to worry about user permissions." By default, Docker containers run as root. This is a significant security vulnerability; you should always create a non-privileged user within the Dockerfile and switch to it using the USER instruction.

Sample Code

Python

# Dockerfile for a scikit-learn inference service:
# FROM python:3.11-slim
# WORKDIR /app
# COPY requirements.txt .
# RUN pip install --no-cache-dir -r requirements.txt
# COPY train.py serve.py ./
# RUN python train.py          # bake model into image at build time
# CMD ["python", "serve.py"]

import joblib                          # correct serialiser for sklearn models
import numpy as np
from sklearn.linear_model import LinearRegression

def train_model():
    X = np.array([[1], [2], [3], [4], [5]])
    y = np.array([2, 4, 6, 8, 10])

    model = LinearRegression()
    model.fit(X, y)

    # joblib is the recommended way to persist sklearn objects
    # torch.save() only works with PyTorch tensors/modules, not sklearn
    joblib.dump(model, 'model.joblib')
    print("Model saved — coefficients:", model.coef_)

    loaded = joblib.load('model.joblib')
    print("Prediction for [[6]]:", loaded.predict([[6]]))

if __name__ == "__main__":
    train_model()
    # Model saved — coefficients: [2.]
    # Prediction for [[6]]: [12.]

Key Terms

Container

A lightweight, standalone, executable package of software that includes everything needed to run an application, including code, runtime, system tools, and libraries. Unlike virtual machines, containers share the host system's kernel, making them highly efficient for ML workloads.

Dockerfile

A text document that contains all the commands a user could call on the command line to assemble an image. It serves as the declarative configuration file that defines the environment state for your ML model.

Image Layer

A read-only template that represents a set of filesystem changes applied to the base image. Because Docker uses a union filesystem, each instruction in a Dockerfile creates a new layer, allowing for efficient caching and storage.

Base Image

The starting point for any Docker image, usually defined by the FROM instruction. For ML, this is often a Linux distribution pre-configured with CUDA drivers or specific Python versions to support deep learning frameworks.

Multi-stage Build

A technique used to create smaller, more secure images by separating the build environment from the runtime environment. This allows developers to include heavy compilers or build tools in an initial stage while keeping the final image lean and production-ready.

Registry

A storage and distribution system for named Docker images, such as Docker Hub or AWS Elastic Container Registry (ECR). It acts as a central repository where MLOps pipelines pull images to deploy models to production clusters.

Entrypoint

The instruction in a Dockerfile that configures the container to run as an executable. It ensures that when a container starts, it automatically executes the primary ML service or inference script without requiring manual intervention.