Building TensorRT-LLM
Building TensorRT-LLM from source is recommended for users needing optimal performance, debugging capabilities, or compatibility with the GNU C++11 ABI (Application Binary Interface).
There are two options for building TensorRT-LLM
Build TensorRT-LLM in One Step
This option uses a single Make command to create a Docker image with TensorRT-LLM built inside it.
You can optionally specify the CUDA architectures to target, which can help reduce compilation time by restricting the supported GPU architectures.
Once the image is built, you can run the Docker container using another Make command.
Build Step-by-Step
This option is more flexible and allows you to create a development container in which you can build TensorRT-LLM (inside the container).
The process involves creating a Docker image for development, running the container, and then building TensorRT-LLM inside the container using a Python script (build_wheel.py).
The script supports various options, such as incremental builds, cleaning the build directory, and restricting the compilation to specific CUDA architectures.
The build_wheel.py script also compiles the library containing the C++ runtime of TensorRT-LLM.
Last updated