Building TensorRT-LLM

Building TensorRT-LLM from source is recommended for users needing optimal performance, debugging capabilities, or compatibility with the GNU C++11 ABI (Application Binary Interface).

The GNU C++11 ABI (Application Binary Interface)

What is an ABI?

An Application Binary Interface (ABI) is a set of rules and conventions that define how different components of a program interact with each other at the binary level.

It specifies the low-level details of how functions are called, how parameters are passed, how data structures are laid out in memory, and how the program interacts with the operating system.

The ABI is crucial for ensuring compatibility between different parts of a program, such as the application code, libraries, and the operating system. It allows compiled object code to be linked together and executed correctly, even if the components were compiled separately or with different compilers.

Some key aspects of an ABI include

Calling conventions: The rules for how functions are called, including how parameters are passed (e.g., through registers or the stack) and how return values are handled.

Data type representation: The size, alignment, and layout of data types in memory, such as integers, floating-point numbers, and structures.

Name mangling: The scheme used to encode function and variable names in the compiled binary to avoid naming conflicts between different modules.

Object file format: The format used for storing compiled object code, such as ELF (Executable and Linkable Format) on Linux and PE (Portable Executable) on Windows.

ABIs are specific to a particular architecture, operating system, and programming language.

For example, the ABI for C++ on Linux x86-64 is different from the ABI for C++ on Windows x86-64 or the ABI for C on Linux x86-64.

When an ABI change occurs, such as when a new version of a compiler or library introduces incompatible changes, it can cause issues with existing compiled code.

To minimise disruption, compiler and library authors often provide dual ABI support, allowing users to choose between the old and new ABIs during a transition period.

Tools and guidelines are also provided to help manage the transition and ensure compatibility between different components.

Understanding ABIs is essential for developers working on low-level software components, libraries, or cross-language interoperability. It helps ensure the correct integration and execution of compiled code across different parts of a program.

There are two options for building TensorRT-LLM

Build TensorRT-LLM in One Step

This option uses a single Make command to create a Docker image with TensorRT-LLM built inside it.

You can optionally specify the CUDA architectures to target, which can help reduce compilation time by restricting the supported GPU architectures.

Once the image is built, you can run the Docker container using another Make command.

Build Step-by-Step

This option is more flexible and allows you to create a development container in which you can build TensorRT-LLM (inside the container).

The process involves creating a Docker image for development, running the container, and then building TensorRT-LLM inside the container using a Python script (build_wheel.py).

The script supports various options, such as incremental builds, cleaning the build directory, and restricting the compilation to specific CUDA architectures.

The build_wheel.py script also compiles the library containing the C++ runtime of TensorRT-LLM.

PreviousTensor Cores NextBuilding from Source

Last updated 1 year ago

Was this helpful?