Tasks
Conversion APIs
Study the
TopModelMixinclass and itsfrom_hugging_face()method to understand how the conversion interface is defined.Investigate the implementation of the
from_hugging_face()method in theLLaMAForCausalLMclass to see how the weights are converted from Hugging Face checkpoints to the TensorRT-LLM expected format.Explore other conversion methods like
from_meta_ckpt()in theLLaMAForCausalLMclass to learn how different checkpoint formats are handled.Look into the
convert_checkpoint.pyscript to see how the conversion process is simplified using the conversion APIs.
Quantization APIs
Study the
PretrainedModelclass and itsquantize()method to understand the default implementation for AMMO-supported quantization.Investigate the
LLaMAForCausalLMclass and its overriddenquantize()method to see how model-specific quantization is handled.Explore the
QuantConfigclass to learn about the different quantization configurations available.Look into the usage of the
quantize()API in an MPI program to understand how quantization is performed in a distributed setting.
Build APIs
Study the
tensorrt_llm.buildAPI to understand how TensorRT-LLM models are built into TensorRT-LLM engines.Investigate the
BuildConfigclass to learn about the different build configurations available.Explore the
from_checkpoint()method in thePretrainedModelclass to see how checkpoints are deserialized into model objects.
CLI Tools
Investigate the model-specific
convert_checkpoint.pyscripts in theexamples/<model xxx>/folders to understand how to convert checkpoints using the command line.Explore the
examples/quantization/quantize.pyscript to learn how to perform quantization using the CLI tool.Study the
trtllm-buildCLI tool to understand how to build TensorRT-LLM engines from checkpoints using the command line.
To dive deeper into each module and command, you can:
Read the source code of the relevant classes and methods to understand their implementation details.
Explore the documentation and comments within the code to gain insights into the purpose and usage of each module and API.
Experiment with the CLI tools and scripts by running them with different arguments and configurations to see how they behave.
Consult the TensorRT-LLM documentation and tutorials for more detailed explanations and examples of each module and command.
Engage with the TensorRT-LLM community, such as forums or chat channels, to ask questions and learn from experienced users and developers.
By investigating these modules and commands, you can gain a comprehensive understanding of the TensorRT-LLM build workflow and how to effectively utilize the conversion, quantization, and build APIs to optimize and deploy models using TensorRT-LLM.
CopyRetry
Last updated
Was this helpful?

