tensorrt_llm/builder.py
The Builder
class
Builder
class The Builder
class in TensorRT-LLM is responsible for building the TensorRT engine from a given network definition.
It provides methods to create and configure the network, set builder configurations, and build the optimised engine for inference.
Here are the key components and functionalities of the Builder
class:
Initialization
The
Builder
class is initialized with atrt.Builder
object from the TensorRT library.It has a
strongly_typed
attribute that determines whether the network should be created with strongly typed tensors. By default, it is set toFalse
, but it can be enabled for TensorRT versions that support it.
Creating a Network
The
create_network()
method creates an emptyNetwork
object using the TensorRT builder.It sets the appropriate flags for the network creation, such as
EXPLICIT_BATCH
andSTRONGLY_TYPED
, based on the TensorRT version and thestrongly_typed
attribute.
Creating a Builder Configuration
The
create_builder_config()
method creates aBuilderConfig
object with specified configurations.It takes parameters such as precision, timing cache, tensor parallel, use_refit, int8, strongly_typed, optimisation level, profiling verbosity, and other optional arguments.
The method sets the appropriate flags and configurations in the TensorRT builder config based on the provided parameters.
Adding Optimization Profiles
The
_add_optimization_profile()
method adds optimisation profiles to the builder configuration.It iterates over the input tensors of the network and sets the minimum, optimum, and maximum shapes for each input tensor in the optimisation profile.
It also handles the sharding of input tensors based on the auto-parallel configuration.
Validating Named Dimensions
The
_validate_named_dimensions()
method validates that the named dimensions of different input tensors in each optimization profile have the same range.It ensures that the modeling in TensorRT-LLM is correct and provides user-friendly error messages if any discrepancies are found.
Refitting an Engine
The
refit_engine()
method refits an existing TensorRT engine using the weights from a given network.It requires that the engine was built with the
REFIT
flag and the network has the same structure as the engine.It uses the
trt.Refitter
class to set the named weights and refit the engine.
Building an Engine
The
build_engine()
method builds a TensorRT engine from a given network and builder configuration.It sets the plugin configuration, auto-parallel configuration, and optimization profiles in the builder configuration.
It renames the weights in the network based on the named parameters.
It then builds the serialized engine using the TensorRT builder and the configured network.
Saving Timing Cache and Configuration
The
save_timing_cache()
method serializes the timing cache of a given builder configuration to a file.The
save_config()
method saves the builder configuration to a JSON file.
The BuildConfig
class is a dataclass that represents the configuration options for building the TensorRT engine.
It includes parameters such as:
maximum input length, maximum output length, maximum batch size, beam width, number of tokens, prompt embedding table size, gather context/generation logits, strongly typed, builder optimization level, profiling verbosity, debug output, draft length, use_refit, timing cache paths, LoRA configuration, auto-parallel configuration, weight sparsity, and plugin configuration.
Overall, the Builder
class in TensorRT-LLM provides a high-level interface for building optimised TensorRT engines from network definitions, allowing customisation of various configurations and optimisations for efficient inference.
Explanation of the builder.py classes and methods
Imagine you're building a powerful race car (the TensorRT engine) from various components (the model, build configuration, etc.).
The builder.py
file acts as the assembly manual, guiding you through the process of constructing and fine-tuning your race car.
The Builder
class is like the master mechanic who oversees the entire building process.
It provides the tools and expertise needed to create the race car's blueprint (the TensorRT network) and configure its settings (the builder configuration).
The master mechanic can create a new blueprint (create_network()
) and customize the configuration (create_builder_config()
) based on your preferences, such as the desired precision, performance optimizations, and memory usage.
The BuilderConfig
class represents the configuration settings for your race car.
It's like a checklist of options you can choose from to optimise your car's performance.
You can specify the precision (e.g., float32, float16), enable performance optimisations (e.g., refit, sparse weights), and set profiling verbosity to analyse your car's performance.
The master mechanic uses this configuration to fine-tune the building process.
The BuildConfig
class is like a detailed specification sheet for your race car.
It contains all the important parameters that define your car's capabilities, such as the maximum input and output lengths, batch size, beam width, and token limits.
You can think of it as a custom order form where you specify your desired features, such as gathering context and generation logits, enabling debug output, and configuring LoRA (Low-Rank Adaptation) and auto-parallel settings.
The EngineConfig
class is like a comprehensive blueprint that combines the race car's specification sheet (BuildConfig
) with its underlying architecture (PretrainedConfig
) and version information. It provides a complete overview of your race car's configuration.
The Engine
class represents the final product—your fully assembled and optimized race car.
It encapsulates the TensorRT engine and its associated configuration. You can save your race car (save()
) for future use or load a previously built one (from_dir()
) to take it for a spin.
The build()
function is like the main assembly line where all the components come together to create your race car.
It takes the model (the engine block) and the build configuration (the custom order form) and goes through a series of optimisation steps to ensure peak performance.
It fine-tunes the model, creates the TensorRT network, applies optimisations, and finally builds the engine using the builder configuration.
Throughout the building process, the script utilizes various utility functions and classes to streamline the workflow, such as optimize_model()
for model optimization, auto_parallel()
for parallel processing, and net_guard()
for network management.
By following the steps outlined in the builder.py
file and leveraging the different classes and functions, you can create a highly optimised and customized race car (TensorRT engine) tailored to your specific needs.
In summary, the builder.py
file provides a structured and modular approach to building TensorRT engines, allowing you to fine-tune and optimize your model's performance.
By understanding the roles and relationships between the classes and functions, you can effectively navigate the building process and create powerful and efficient engines for your specific use case.
UML Analysis
The Builder
class is responsible for creating and configuring the TensorRT builder and building the engine. It has the following methods:
__init__()
: Initializes theBuilder
object and creates atrt.Builder
instance.trt_builder
: Returns thetrt.Builder
instance.create_network()
: Creates and returns aNetwork
object.create_builder_config(...)
: Creates and returns aBuilderConfig
object with the specified precision, timing cache, and other configurations._add_optimization_profile(...)
: Adds optimization profiles to the builder config based on the input tensors and auto-parallel configuration._validate_named_dimensions(...)
: Validates the named dimensions of different input tensors in each optimization profile.refit_engine(...)
: Refits a TensorRT engine using weights from the network.build_engine(...)
: Builds a TensorRT engine from the network and builder config.save_timing_cache(...)
: Serializes the timing cache of a given builder config to a file.save_config(...)
: Saves the builder config to a JSON file.
The BuilderConfig
class represents the configuration for the TensorRT builder. It has the following properties and methods:
_trt_builder_config
: The underlyingtrt.IBuilderConfig
instance.precision
: The precision used for building the engine (e.g., 'float32', 'float16', 'bfloat16').tensor_parallel
: The number of GPUs used for tensor parallelism.use_refit
: Flag indicating whether to use engine refitting.int8
: Flag indicating whether to enable INT8 precision.strongly_typed
: Flag indicating whether to use strongly typed networks.plugin_config
: ThePluginConfig
object associated with the builder config.auto_parallel_config
: The auto-parallel configuration dictionary._init(...)
: Initializes theBuilderConfig
object with the provided arguments.trt_builder_config
: Returns the underlyingtrt.IBuilderConfig
instance.to_dict()
: Returns a dictionary representation of the builder config.
The BuildConfig
class represents the configuration for building the engine.
It has various properties to control the build process, such as input and output lengths, batch size, beam width, token limits, prompt embedding table size, gather logits flags, precision, builder optimisation level, profiling verbosity, debug output, draft length, refitting, timing cache, LoRA configuration, auto-parallel configuration, weight sparsity, plugin configuration, and fused MLP usage.
It also provides methods to create BuildConfig
instances from dictionaries or JSON files and to convert the configuration to a dictionary.
The EngineConfig
class represents the configuration for an engine.
It combines the pretrained model configuration (PretrainedConfig
), build configuration (BuildConfig
), and version information. It provides methods to create EngineConfig
instances from JSON files and to convert the configuration to a dictionary.
The Engine
class represents a TensorRT engine. It has the following properties and methods:
config
: TheEngineConfig
object associated with the engine.engine
: The serialized TensorRT engine (trt.IHostMemory
).__init__(...)
: Initializes theEngine
object with the providedEngineConfig
and serialized engine.save(...)
: Saves the engine and its configuration to the specified directory.from_dir(...)
: Creates anEngine
instance from the specified directory and rank.
The get_engine_version(...)
function retrieves the version of the engine from the configuration file in the specified directory.
The build(...)
function is the main entry point for building an engine.
It takes a pretrained model (PretrainedModel
) and a build configuration (BuildConfig
) as input and returns an Engine
object.
The function performs various optimizations on the model based on the build configuration, creates a TensorRT network, and builds the engine using the Builder
class.
This breakdown provides an overview of the classes and functions in the builder.py
file and their responsibilities.
The classes are designed to encapsulate different aspects of the engine building process, such as builder configuration, build configuration, engine configuration, and the engine itself. The build(...)
function orchestrates the entire process of optimizing the model, creating the network, and building the engine using the provided configurations.
Last updated