Installing the NVIDIA Container Toolkit
Last updated
Copyright Continuum Labs - 2023
Last updated
The NVIDIA Container Toolkit enables users to build and run GPU-accelerated containers.
The toolkit includes a container runtime library and utilities to automatically configure containers to leverage NVIDIA GPUs.
The NVIDIA Container Toolkit is a set of tools and components designed to enable the use of NVIDIA GPUs within containers.
It provides a seamless way to leverage the power of NVIDIA GPUs in containerised environments, making it easier to deploy and run GPU-accelerated applications using Docker or other container runtimes.
The NVIDIA Container Toolkit should have been installed in your virtual machine.
If not this documentation will show you how to do it.
It will also provide you a high level overview of how it works and how it interacts with Docker.
The NVIDIA container stack is architected so that it can be targeted to support any container runtime in the ecosystem. The components of the stack include:
The NVIDIA Container Runtime (nvidia-container-runtime
)
The NVIDIA Container Runtime Hook (nvidia-container-toolkit
/ nvidia-container-runtime-hook
)
The NVIDIA Container Library and CLI (libnvidia-container1
, nvidia-container-cli
)
The dependencies are below:
The components of the NVIDIA container stack are packaged as the NVIDIA Container Toolkit.
How these components are used depends on the container runtime being used.
For docker
or containerd
, the NVIDIA Container Runtime (nvidia-container-runtime
) is configured as an OCI-compliant runtime, with the flow through the various components is shown in the following diagram:
This component is a command-line utility that provides various tools for interacting with the NVIDIA Container Toolkit.
It includes functionality for configuring runtimes like Docker to work with the NVIDIA Container Toolkit. It also provides utilities for generating Container Device Interface (CDI) specifications.
Overall, the NVIDIA Container Toolkit simplifies the process of leveraging NVIDIA GPUs within containers, making it easier to deploy and run CUDA-based applications in containerised environments.
It provides a set of tools and components that seamlessly integrate with container runtimes like Docker, enabling transparent GPU acceleration for containerised workloads.
The NVIDIA Container Library (libnvidia-container) is a library that provides an API for automatically configuring GNU/Linux containers to use NVIDIA GPUs.
It is designed to be agnostic of the container runtime, meaning it can work with various container runtimes, not just Docker.
The NVIDIA Container CLI (nvidia-container-cli) is a command-line utility that serves as a wrapper around the library, allowing different runtimes to invoke it and inject NVIDIA GPU support into their containers.
This component is an executable that implements the interface required by a runC prestart hook.
runC is a lightweight container runtime that is used as the default runtime by Docker.
The NVIDIA Container Runtime Hook is invoked by runC after a container is created but before it is started.
It has access to the config.json file associated with the container, which contains information about the container's configuration.
The hook uses the information from config.json to invoke the nvidia-container-cli with appropriate flags, specifying which GPU devices should be injected into the container.
The NVIDIA Container Runtime is a key component of the NVIDIA Container Toolkit, included in the nvidia-container-toolkit-base
package. Its primary purpose is to enable the use of NVIDIA GPUs within containers by integrating with the container runtime, specifically runC
.
The NVIDIA Container Runtime is a thin wrapper around the native runC
installed on the host system.
By acting as a wrapper around runC
and injecting the necessary hooks and modifications, the NVIDIA Container Runtime enables containers to access and utilize NVIDIA GPUs. It abstracts away the complexities of GPU management and provides a transparent way to leverage GPU acceleration within containerized environments.
This set of commands is used to configure a Linux system's package manager to install software from NVIDIA's production repository, specifically for the NVIDIA container toolkit.
Update the packages list from the repository:
Install the NVIDIA Container Toolkit packages:
Configure the container runtime by using the nvidia-ctk
command:
The nvidia-ctk
command modifies the /etc/docker/daemon.json
file on the host.
The file is updated so that Docker can use the NVIDIA Container Runtime.
The output may look like this:
INFO[0000] Config file does not exist; using empty config
This message indicates that a configuration file for the NVIDIA Container Toolkit did not exist prior to running the command. Therefore, the tool proceeds with an empty or default configuration to set up the necessary settings for Docker.
After you install and configure the toolkit and install an NVIDIA GPU Driver, you can verify your installation by running a sample workload.
Run a sample CUDA container
Your output should resemble the following output: