Tasks

Conversion APIs

Study the TopModelMixin class and its from_hugging_face() method to understand how the conversion interface is defined.
Investigate the implementation of the from_hugging_face() method in the LLaMAForCausalLM class to see how the weights are converted from Hugging Face checkpoints to the TensorRT-LLM expected format.
Explore other conversion methods like from_meta_ckpt() in the LLaMAForCausalLM class to learn how different checkpoint formats are handled.
Look into the convert_checkpoint.py script to see how the conversion process is simplified using the conversion APIs.

Quantization APIs

Study the PretrainedModel class and its quantize() method to understand the default implementation for AMMO-supported quantization.
Investigate the LLaMAForCausalLM class and its overridden quantize() method to see how model-specific quantization is handled.
Explore the QuantConfig class to learn about the different quantization configurations available.
Look into the usage of the quantize() API in an MPI program to understand how quantization is performed in a distributed setting.

Build APIs

Study the tensorrt_llm.build API to understand how TensorRT-LLM models are built into TensorRT-LLM engines.
Investigate the BuildConfig class to learn about the different build configurations available.
Explore the from_checkpoint() method in the PretrainedModel class to see how checkpoints are deserialized into model objects.

CLI Tools

Investigate the model-specific convert_checkpoint.py scripts in the examples/<model xxx>/ folders to understand how to convert checkpoints using the command line.
Explore the examples/quantization/quantize.py script to learn how to perform quantization using the CLI tool.
Study the trtllm-build CLI tool to understand how to build TensorRT-LLM engines from checkpoints using the command line.

To dive deeper into each module and command, you can:

Read the source code of the relevant classes and methods to understand their implementation details.
Explore the documentation and comments within the code to gain insights into the purpose and usage of each module and API.
Experiment with the CLI tools and scripts by running them with different arguments and configurations to see how they behave.
Consult the TensorRT-LLM documentation and tutorials for more detailed explanations and examples of each module and command.
Engage with the TensorRT-LLM community, such as forums or chat channels, to ask questions and learn from experienced users and developers.

By investigating these modules and commands, you can gain a comprehensive understanding of the TensorRT-LLM build workflow and how to effectively utilize the conversion, quantization, and build APIs to optimize and deploy models using TensorRT-LLM.

CopyRetry

PreviousCompiling LLama Models NextLLama Model Directory

Last updated 1 year ago

Was this helpful?