Building from Source

This document provides instructions for building TensorRT-LLM from source code on Linux

Fetching the Sources

1. Install `git-lfs`

git lfs install

This command prepares your local Git environment to handle large files by not storing them directly in the repository but as references.

This initial setup is crucial for working with large files efficiently and is required only once per repository to ensure that your Git configuration is optimized for LFS operations.

It replaces these large files with text pointers inside Git, while storing the file contents on a remote server like GitHub LFS.

2. Clone the TensorRT-LLM repository

To start working with the TensorRT-LLM, you first need to clone the repository to your local machine.

This can be done by executing the following command in your terminal:

git clone https://github.com/NVIDIA/TensorRT-LLM.git

This command clones the entire repository from GitHub to your local directory, allowing you to work with the files, including large files that are handled efficiently through Git Large File Storage (LFS).

3. Move into the directory

cd TensorRT-LLM

4. Initialise and update the submodules

git submodule update --init --recursive

This command performs several actions:

--init initialises your local configuration file to include the submodules defined in the .gitmodules file of the repository.
--update fetches all the data from the project and checks out the appropriate commit as specified in your project.
--recursive ensures that this command is run not only in the current module but also in any nested submodules, effectively updating all the submodules within the project.

5. Pulling Large Files with Git LFS

After initialising and updating your repository's submodules, you'll need to handle large files managed with Git Large File Storage (LFS). This is where git lfs pull comes into play.

Running this command will download the large files associated with the current branch from the remote repository, based on the tracking configurations established by Git LFS.

git lfs pull

This step ensures all the necessary assets, which are too large to be efficiently managed by standard Git operations, are properly downloaded and available for use.

It's a step before proceeding with operations that depend on these large files, such as building Docker images or executing large-scale data processing tasks.

Building TensorRT-LLM in One Step

Once Git LFS is set up and the necessary files are pulled, you can proceed to build the TensorRT-LLM Docker image.

This can be done with a single command:

make -C docker release_build

This command builds a Docker image that contains everything you need to run TensorRT-LLM, simplifying the setup process and ensuring consistency across environments.

Optionally specify GPU architectures with CUDA_ARCHS

Dockerfile Analysis

Base Image

ARG BASE_IMAGE=nvcr.io/nvidia/pytorch
ARG BASE_TAG=23.08-py3
FROM ${BASE_IMAGE}:${BASE_TAG} as base

Defines two arguments BASE_IMAGE and BASE_TAG, which are used to specify the base image. In this case, it's using NVIDIA's PyTorch image.
The FROM instruction initialises a new build stage and sets the base image for subsequent instructions. The as base names this stage as base.

Setting Up Bash Environment

ENV BASH_ENV="/tmp/bash_env"
SHELL ["/bin/bash", "-c"]

Sets an environment variable BASH_ENV pointing to a file. This file (/tmp/bash_env) will be sourced whenever a bash shell is started non-interactively.
Changes the default shell to bash.

Development Stage

FROM base as devel

Starts a new stage named devel based on the base stage.

Copying and Running Scripts

The following set of instructions:

Copies various shell scripts from the host to the container.
Executes these scripts to install different software components.
Each script is removed after its execution.

These installations include

Base dependencies (install_base.sh).
CMake (install_cmake.sh).
TensorRT (install_tensorrt.sh) - an NVIDIA library for high-performance deep learning inference.
Polygraphy (install_polygraphy.sh) - a toolkit for working with TensorRT.
mpi4py (install_mpi4py.sh) - MPI for Python.
PyTorch (install_pytorch.sh) - the torch installation can be controlled by the TORCH_INSTALL_TYPE argument.

Environment variables like RELEASE_URL_TRT, TARGETARCH, and TORCH_INSTALL_TYPE are used to control the behavior of the scripts.

Wheel Stage

FROM devel as wheel

Initiates a new stage named wheel based on the devel stage.

WORKDIR /src/tensorrt_llm
...
RUN python3 scripts/build_wheel.py ${BUILD_WHEEL_ARGS}

Sets the working directory.
Copies various directories and files from the host into the container.
Runs a Python script to build a wheel file of the application, controlled by BUILD_WHEEL_ARGS.

Release Stage

FROM devel as release

Initiates the final stage named release, again based on the devel stage.

WORKDIR /app/tensorrt_llm
COPY --from=wheel /src/tensorrt_llm/build/tensorrt_llm*.whl .
...
RUN pip install tensorrt_llm*.whl && \
    rm tensorrt_llm*.whl

Sets another working directory.
Copies the wheel file built in the wheel stage and installs it using pip.
Removes the wheel file after installation.

COPY README.md ./
COPY examples examples

Copies README.md and the examples directory to the container.

The build process will take some time

Fire up the Docker Container

Once built, execute the Docker container using make -C docker release_run.

make -C docker release_run

To run as a local user instead of root, use LOCAL_USER=1.

Analysis of Dockerfile process

The provided output is from running a Docker container for NVIDIA's TensorRT-LLM

Docker Run Command Breakdown

The docker run command is used to create and start a container. The command and its options are as follows:

--rm: Automatically remove the container when it exits.

-it: Run the container in interactive mode (i.e., attached to the terminal) and allocate a pseudo-TTY.

--ipc=host: Use the host's IPC namespace, which allows the container to share memory with the host.

--ulimit memlock=-1 --ulimit stack=67108864: Set certain limits on system resources. Here, memlock=-1 removes the memory lock limit, and stack=67108864 sets the stack size.

--gpus=all: Allocate all available GPUs to the container. This is important for machine learning tasks that require GPU acceleration.

--volume /home/jack/TensorRT-LLM:/code/tensorrt_llm: Mount the host directory /home/jack/TensorRT-LLM to the container directory /code/tensorrt_llm.

--workdir /code/tensorrt_llm: Set the working directory inside the container to /code/tensorrt_llm.

--hostname-laptop-release: Set the hostname of the container.

--name tensorrt_llm-release-jack: Assign a name to the container for easy reference.

--tmpfs /tmp:exec: Mount a temporary file system (tmpfs) at /tmp with execution permissions. This can be used for temporary storage that's faster than writing to disk.

tensorrt_llm/release:latest: The Docker image to use, where tensorrt_llm/release is the image name and latest is the tag.

What is GNU? What does the MAKE command do?

make and GNU are fundamental concepts in software development, particularly in the context of building and compiling code.

What is `make`?

Function: make is a utility that automatically builds executable programs and libraries from source code by reading files called Makefiles which specify how to derive the target program.

Automation: It helps in automating the compilation process, reducing the complexity and potential errors in building software, especially large projects with multiple components and dependencies.

Efficiency: make determines which portions of a program need to be recompiled and issues commands to recompile them. This is efficient because only those parts of a program that have been modified are recompiled, saving time.

Platform: It is widely used in Unix and Unix-like systems but is available for many other operating systems.

What is GNU?

GNU Project: GNU stands for "GNU's Not Unix!" It's a recursive acronym and is part of the GNU Project, which was launched in 1983 by Richard Stallman to create a complete, free operating system.

Free Software: The GNU Project has developed a comprehensive collection of free software. When people refer to “GNU software”, they are usually referring to software released under the GNU General Public License (GPL), which is known for its commitment to free software principles.

GNU Tools: The project has produced a number of tools widely used in software development, including the GNU Compiler Collection (GCC), GNU Debugger (GDB), and GNU Make (a version of the make utility).

GNU/Linux: The combination of GNU tools and the Linux kernel resulted in the GNU/Linux operating system, commonly referred to as just “Linux”, which is used in systems around the world.

GNU Make

GNU Make: In the context of your query, GNU Make is a version of make utility developed by the GNU Project. It is an enhanced version of the original make utility and is more feature-rich and portable.

Usage in TensorRT-LLM: The make commands you mentioned in the TensorRT-LLM build process are using GNU Make. This tool simplifies the building process by reading the specified Makefile to automate the compilation and linking of the TensorRT-LLM software.

In summary, make is a tool for automating the build process in software development, and GNU is an organization that provides a variety of free software tools, including GNU Make.

PreviousBuilding TensorRT-LLM NextTensorRT-LLM Dockerfile

Last updated 1 year ago

Was this helpful?

Fetching the Sources

1. Install git-lfs

2. Clone the TensorRT-LLM repository

3. Move into the directory

4. Initialise and update the submodules

5. Pulling Large Files with Git LFS

Building TensorRT-LLM in One Step

Base Image

Setting Up Bash Environment

Development Stage

Copying and Running Scripts

Wheel Stage

Fire up the Docker Container

Docker Run Command Breakdown

What is make?

What is GNU?

GNU Make

1. Install `git-lfs`

What is `make`?