LogoLogo
Continuum WebsiteContinuum ApplicationsContinuum KnowledgeAxolotl Platform
  • TensorRT-LLM
  • The TensorRT-LLM Process
  • Performance
  • Virtual Machine Creation
  • CUDA Introduction
    • CUDA Architecture
    • Stream Multiprocessors: The Heart of GPU Computing
    • Pre Installation
    • Compatibility Assessment
    • NVCC: The NVIDIA CUDA Compiler
    • Installing Cuda
    • Installing the NVIDIA Container Toolkit
    • CUDA and bandwidth
    • Tensor Cores
  • Building TensorRT-LLM
    • Building from Source
    • TensorRT-LLM Dockerfile
      • Base Image
      • install_base.sh
      • install_cmake.sh
      • install_tensorrt.sh
      • install_pytorch.sh
      • requirements.txt
      • build_wheel.py
      • setup.py
      • Docker Makefile
      • Persistence
      • Running with persistent volumes
  • TensorRT-LLM Architecture and Process
    • The TensorRT-LLM process
    • INetworkDefinition
    • Model Definition
    • Compilation
    • Runtime Engine
    • Weight Bindings
    • Model Configuration
  • TensorRT-LLM build workflow
    • TensorRT-LLM build workflow - process
  • CUDA Graphs
    • Experimentation with CUDA Graphs
  • TensorRT-LLM Libraries
    • tensorrt_llm folders
    • tensorrt_llm/builder.py
    • tensorrt_llm/network.py
    • tensorrt_llm/module.py
    • top_model_mixin.py
    • trt-llm build command
    • trtllm-build CLI configurations
  • LLama2 installation
    • Converting Checkpoints
      • Checkpoint List - Arguments
      • Examples of running the convert_checkpoint.py script
      • convert_checkpoint examples
      • Checkpoint Script Arguments
      • checkpoint configuration file
      • run_convert_checkpoint.py script
    • LLama2 Files Analysis
    • TensorRT-LLM Build Engine Process
    • TensorRT-LLM Build Process Documentation
    • Build arguments
    • trtllm build configuration file
    • Run the buildconfig file
    • Analysis of the output from build.py
    • LLama3 configurations
    • Proposed checkpoint config file for LLama3
    • Proposed build config file for LLama3
    • run.py for inference
    • Using the models - running Llama
    • generate_int8 function
    • summarize.py script in Llama folder
    • Compiling LLama Models
  • Tasks
  • LLama Model Directory
    • llama/model.py
    • llama/utils.py
    • llama/weight.py
    • llama/convert.py
    • PreTrainedModel class
    • LlamaForCausalLM class
    • PretrainedConfig class
  • TensorRT-LLM Tutorial
  • Tutorial 2 - get inference going
  • examples/run.py
  • examples/utils.py
  • examples/summarize.py
  • The Python API
    • Layers
    • Functionals
    • functional.py
    • tensorrt_llm.functional.embedding
    • tensorrt_llm.functional.gpt_attention
    • tensorrt_llm.functional.layer_norm
    • tensorrt_llm.functional.rms_norm
    • Model
    • Quantization
    • Runtime
    • Runtime Process
  • Transformer Architecture
    • Attention Mechanism
    • Multi Head Attention
    • Positional Encoding
    • Scaled dot-product attention
    • Layer Normalisation
    • Activation Functions
    • Residual Connections
    • Position Wise Feed-Forward Layer
    • Transformer Feed-Forward Layers Are Key-Value Memories
    • KV Cache
      • Efficient Streaming Language Models with Attention Sinks
      • Input QKV tensor
    • General Notes on Model Architecture
  • Best Practices for Tuning the Performance of TensorRT-LLM
    • Optimisation Techniques
    • Batch Manager
    • Alibi
    • Relative Attention Bias
    • Beam Search
    • Rotary Positional Embedding (RoPE)
    • Numerical Precision
    • FP8 Formats for Deep Learning
  • Graph Rewriting
  • Reducing Activation Recomputation in Large Transformer Models
  • Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM
  • Numerical Position
  • TensorRT Models
  • Bloom
    • Huggingface Bloom Documentation
  • Runtime
  • Graph Rewriting (GW) module
  • FasterTransfomer Library
  • Dual ABI issues
  • Phi 2.0
  • ONNX
  • Message Passing Interface (MPI)
  • NVIDIA Nsight Systems: A Comprehensive Guide for TensorRT-LLM and Triton Inference Server
  • NCCL
Powered by GitBook
LogoLogo

Continuum - Accelerated Artificial Intelligence

  • Continuum Website
  • Axolotl Platform

Copyright Continuum Labs - 2023

On this page
  • Check the installation of the NVIDIA Container Toolkit
  • Architecture Overview
  • Installing NVIDIA Container Toolkit with the APT Package Manager
  • Configuring Docker
  • Running a Sample Workload with Docker

Was this helpful?

  1. CUDA Introduction

Installing the NVIDIA Container Toolkit

PreviousInstalling CudaNextCUDA and bandwidth

Last updated 1 year ago

Was this helpful?

Check the installation of the NVIDIA Container Toolkit

The NVIDIA Container Toolkit enables users to build and run GPU-accelerated containers.

The toolkit includes a container runtime and utilities to automatically configure containers to leverage NVIDIA GPUs.

The NVIDIA Container Toolkit is a set of tools and components designed to enable the use of NVIDIA GPUs within containers.

It provides a seamless way to leverage the power of NVIDIA GPUs in containerised environments, making it easier to deploy and run GPU-accelerated applications using Docker or other container runtimes.

The NVIDIA Container Toolkit should have been installed in your virtual machine.

If not this documentation will show you how to do it.

It will also provide you a high level overview of how it works and how it interacts with Docker.

Architecture Overview

The NVIDIA container stack is architected so that it can be targeted to support any container runtime in the ecosystem. The components of the stack include:

  • The NVIDIA Container Runtime (nvidia-container-runtime)

  • The NVIDIA Container Runtime Hook (nvidia-container-toolkit / nvidia-container-runtime-hook)

  • The NVIDIA Container Library and CLI (libnvidia-container1, nvidia-container-cli)

The dependencies are below:

├─ nvidia-container-toolkit (version)
│    ├─ libnvidia-container-tools (>= version)
│    └─ nvidia-container-toolkit-base (version)
│
├─ libnvidia-container-tools (version)
│    └─ libnvidia-container1 (>= version)
└─ libnvidia-container1 (version)

The components of the NVIDIA container stack are packaged as the NVIDIA Container Toolkit.

How these components are used depends on the container runtime being used.

For docker or containerd, the NVIDIA Container Runtime (nvidia-container-runtime) is configured as an OCI-compliant runtime, with the flow through the various components is shown in the following diagram:

NVIDIA Container Toolkit CLI

This component is a command-line utility that provides various tools for interacting with the NVIDIA Container Toolkit.

It includes functionality for configuring runtimes like Docker to work with the NVIDIA Container Toolkit. It also provides utilities for generating Container Device Interface (CDI) specifications.

Overall, the NVIDIA Container Toolkit simplifies the process of leveraging NVIDIA GPUs within containers, making it easier to deploy and run CUDA-based applications in containerised environments.

It provides a set of tools and components that seamlessly integrate with container runtimes like Docker, enabling transparent GPU acceleration for containerised workloads.

NVIDIA Container Library and CLI

The NVIDIA Container Library (libnvidia-container) is a library that provides an API for automatically configuring GNU/Linux containers to use NVIDIA GPUs.

It is designed to be agnostic of the container runtime, meaning it can work with various container runtimes, not just Docker.

The NVIDIA Container CLI (nvidia-container-cli) is a command-line utility that serves as a wrapper around the library, allowing different runtimes to invoke it and inject NVIDIA GPU support into their containers.

NVIDIA Container Runtime Hook

This component is an executable that implements the interface required by a runC prestart hook.

  • runC is a lightweight container runtime that is used as the default runtime by Docker.

  • The NVIDIA Container Runtime Hook is invoked by runC after a container is created but before it is started.

  • It has access to the config.json file associated with the container, which contains information about the container's configuration.

  • The hook uses the information from config.json to invoke the nvidia-container-cli with appropriate flags, specifying which GPU devices should be injected into the container.

NVIDIA Container Runtime

The NVIDIA Container Runtime is a key component of the NVIDIA Container Toolkit, included in the nvidia-container-toolkit-base package. Its primary purpose is to enable the use of NVIDIA GPUs within containers by integrating with the container runtime, specifically runC.

The NVIDIA Container Runtime is a thin wrapper around the native runC installed on the host system.

By acting as a wrapper around runC and injecting the necessary hooks and modifications, the NVIDIA Container Runtime enables containers to access and utilize NVIDIA GPUs. It abstracts away the complexities of GPU management and provides a transparent way to leverage GPU acceleration within containerized environments.

Installing NVIDIA Container Toolkit with the APT Package Manager

This set of commands is used to configure a Linux system's package manager to install software from NVIDIA's production repository, specifically for the NVIDIA container toolkit.

curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
  && curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
    sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
    sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
Explanation of command

curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg:

curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey

This command fetches the GPG key from NVIDIA's server. The flags used with curl do the following: -f fails silently on server errors, -s makes curl silent or quiet mode, -S shows an error message if it fails, and -L allows curl to follow redirects.

|:

This is a pipe that takes the output of the command on the left (the GPG key) and uses it as input for the command on the right.

sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg

This command processes the GPG key. --dearmor converts the key from ASCII armored format to a binary format.

The key is then saved to /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg using -o (output file option).

&&

This is a logical operator that ensures the second set of commands runs only if the first set completes successfully.

curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list |

This command fetches the repository list file for the NVIDIA container toolkit.

sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' |:

This command uses sed, a stream editor, to modify the repository list file fetched in the previous step.

It replaces the deb https:// part of the repository source with deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://.

This modification adds a signature verification step for the repository, enhancing security by ensuring that all packages installed from this repository are signed with the NVIDIA GPG key.

sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list

This command writes the modified repository list to the file /etc/apt/sources.list.d/nvidia-container-toolkit.list.

tee is used with sudo to ensure that the file is written with administrative privileges, which is necessary for modifying system configuration files.

In summary, these commands set up a secure connection to NVIDIA's container toolkit repository, ensuring that any software installed from this source is authenticated and trusted, which is crucial for system security and integrity.

Update the packages list from the repository:

sudo apt-get update

Install the NVIDIA Container Toolkit packages:

sudo apt-get install -y nvidia-container-toolkit

Configuring Docker

Configure the container runtime by using the nvidia-ctk command:

sudo nvidia-ctk runtime configure --runtime=docker

The nvidia-ctk command modifies the /etc/docker/daemon.json file on the host.

The file is updated so that Docker can use the NVIDIA Container Runtime.

The output may look like this:

INFO[0000] Config file does not exist; using empty config 
INFO[0000] Wrote updated config to /etc/docker/daemon.json 
INFO[0000] It is recommended that docker daemon be restarted.

INFO[0000] Config file does not exist; using empty config

This message indicates that a configuration file for the NVIDIA Container Toolkit did not exist prior to running the command. Therefore, the tool proceeds with an empty or default configuration to set up the necessary settings for Docker.

Restart the Docker daemon

sudo systemctl restart docker

Running a Sample Workload with Docker

After you install and configure the toolkit and install an NVIDIA GPU Driver, you can verify your installation by running a sample workload.

Run a sample CUDA container

sudo docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi

Your output should resemble the following output:

sudo docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi

Unable to find image 'ubuntu:latest' locally

latest: Pulling from library/ubuntu

bccd10f490ab: Pull complete 

Digest: sha256:77906da86b60585ce12215807090eb327e7386c8fafb5402369e421f44eff17e
Status: Downloaded newer image for ubuntu:latest

Sun Mar 31 10:45:58 2024       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.147.01   Driver Version: 529.19       CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA T500         On   | 00000000:01:00.0 Off |                  N/A |
| N/A   59C    P3    N/A /  N/A |      0MiB /  4096MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+
library
Installing the NVIDIA Container Toolkit — container-toolkit 1.14.1 documentation
Logo
Page cover image