Dev containers in Machine Learning
Docker has become extremely popular nowadays. It is lightweight, portable, self-contained, and thus great for microservices architecture. Docker containers make it easier to build software and isolate the environment, and they can be run in many different environments with different operating systems and hardware platforms. We could discuss more how great containers are for deployment purposes, but let's take a step back and see how containers can be used for development purposes.
Let’s imagine a situation, where you need to develop a program using a particular version of the operating system or you want to test your application in different environments. You can build a docker container with an OS system of interest as a base image, and run the container without using a detach mode (-d):
docker run -it <IMAGE>
, or attach it to a running container using:
docker exec -it [container-id] bash
Then, develop and test your code using a terminal window. Ehhh…, that doesn’t sound good. Terminal software development is no longer the development environment we expect in 2024. Is there a better way?
Thankfully…. Yes, there is 🙂
VSCode Dev container
VSCode extension Dev Containers from Microsoft come in really handy here. The Dev Containers extension lets you use a Docker container as a full-featured development environment.
Among other benefits, it enables you to:
- Develop using a consistent, reproducible toolset on the same OS as your deployment environment.
- Easily switch between distinct development environments and update them safely without affecting your local system.
- Provide new team members or contributors with a simple, consistent development environment, making it easier for them to get started.
- Experiment with new technologies or create a copy of a codebase without interfering with your local setup.
All of this is available in the familiar VSCode editor, with the same features and user experience as if it were running locally, but now the workspace is isolated within a container.
Dev container definition
Create a Dockerfile.dev with your environment setup inside. The example docker file code is presented below.
FROM nvidia/cuda:11.1.1-cudnn8-runtime-ubuntu20.04
# Install python3.10
RUN : \
&& apt-get update \
&& apt-get install -y git \
&& apt-get install -y --no-install-recommends software-properties-common \
&& add-apt-repository -y ppa:deadsnakes \
&& apt-get install -y --no-install-recommends python3.10-venv \
&& apt-get install libpython3.10-dev -y \
&& apt-get clean \
&& :
# Add env to PATH
RUN python3.10 -m venv /venv
ENV PATH=/venv/bin:$PATH
# Ensures that Python output to stdout/stderr is not buffered: prevents missing information when terminating
ENV PYTHONUNBUFFERED 1
# Update pip
RUN /venv/bin/python3.10 -m pip install pip --upgrade
WORKDIR /usr/src/app
ENTRYPOINT ["tail", "-f", "/dev/null"]
Create a .devcontainer folder in your project's root directory and a dev container configuration file named devcontainer.json inside it. Alternatively, use an interactive configuration with ctrl + shift + p -> Dev Containers: Configure Container Features or ctrl + shift + p -> Dev Containers: New Dev Container.
And update the devcontainer.json file with a basic setup.
// For format details, see https://aka.ms/devcontainer.json. For config options, see the
// README at: https://github.com/devcontainers/templates/tree/main/src/docker-existing-dockerfile
{
"name": "Existing Dockerfile",
"build": {
// Sets the run context to one level up instead of the .devcontainer folder.
"context": "..",
// Update the 'dockerfile' property if you aren't using the standard 'Dockerfile' filename.
"dockerfile": "../Dockerfile.dev"
},
}
Dev container for Machine Learning
When developing a machine learning project, we often need GPU access. We can configure it from the devcontainer.json as well. Below is the example configuration for a dev container with Nvidia GPU, Jupiter notebooks, and Python extensions.
{
"name": "Existing Dockerfile",
"build": {
"context": "..",
"dockerfile": "../Dockerfile.dev"
},
"runArgs": ["--gpus","all"],
"remoteEnv": {
"PATH": "${containerEnv:PATH}:/usr/local/cuda/bin",
"LD_LIBRARY_PATH": "$LD_LIBRARY_PATH:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64",
"XLA_FLAGS": "--xla_gpu_cuda_data_dir=/usr/local/cuda"
},
// Configure tool-specific properties.
"customizations": {
"vscode": {
"extensions": [
"ms-python.python",
"ms-toolsai.jupyter",
"ms-toolsai.vscode-jupyter-cell-tags",
"ms-toolsai.jupyter-keymap",
"ms-toolsai.jupyter-renderers",
"ms-toolsai.vscode-jupyter-slideshow",
"ms-python.vscode-pylance"
]
}
}
}
Extensions specified under “customizations” will be automatically installed and don’t need to be pre-installed or installed manually. All installed extensions work in the same way, as it was running locally on your computer, providing the same functionality and the same user experience.
Running a dev container
To run a dev container select ctrl + shift + p -> Dev Containers: Reopen in Container and enjoy developing in a well-known VSCode.
Summary
In this article, we learned about setting up and using a dev container extension for VSCode for development purposes. The extension provides the same development experience as if the development was done locally on your machine and at the same time all the advantages of separating an environment with containers. It works well in remote environments. If you found the article interesting I encourage you to read other of our blog posts or check out our Machine Learning projects.
Reviewed by Adam Wawrzyński