Tasks
Conversion APIs
Study the
TopModelMixin
class and itsfrom_hugging_face()
method to understand how the conversion interface is defined.Investigate the implementation of the
from_hugging_face()
method in theLLaMAForCausalLM
class to see how the weights are converted from Hugging Face checkpoints to the TensorRT-LLM expected format.Explore other conversion methods like
from_meta_ckpt()
in theLLaMAForCausalLM
class to learn how different checkpoint formats are handled.Look into the
convert_checkpoint.py
script to see how the conversion process is simplified using the conversion APIs.
Quantization APIs
Study the
PretrainedModel
class and itsquantize()
method to understand the default implementation for AMMO-supported quantization.Investigate the
LLaMAForCausalLM
class and its overriddenquantize()
method to see how model-specific quantization is handled.Explore the
QuantConfig
class to learn about the different quantization configurations available.Look into the usage of the
quantize()
API in an MPI program to understand how quantization is performed in a distributed setting.
Build APIs
Study the
tensorrt_llm.build
API to understand how TensorRT-LLM models are built into TensorRT-LLM engines.Investigate the
BuildConfig
class to learn about the different build configurations available.Explore the
from_checkpoint()
method in thePretrainedModel
class to see how checkpoints are deserialized into model objects.
CLI Tools
Investigate the model-specific
convert_checkpoint.py
scripts in theexamples/<model xxx>/
folders to understand how to convert checkpoints using the command line.Explore the
examples/quantization/quantize.py
script to learn how to perform quantization using the CLI tool.Study the
trtllm-build
CLI tool to understand how to build TensorRT-LLM engines from checkpoints using the command line.
To dive deeper into each module and command, you can:
Read the source code of the relevant classes and methods to understand their implementation details.
Explore the documentation and comments within the code to gain insights into the purpose and usage of each module and API.
Experiment with the CLI tools and scripts by running them with different arguments and configurations to see how they behave.
Consult the TensorRT-LLM documentation and tutorials for more detailed explanations and examples of each module and command.
Engage with the TensorRT-LLM community, such as forums or chat channels, to ask questions and learn from experienced users and developers.
By investigating these modules and commands, you can gain a comprehensive understanding of the TensorRT-LLM build workflow and how to effectively utilize the conversion, quantization, and build APIs to optimize and deploy models using TensorRT-LLM.
CopyRetry
Last updated