convert_checkpoint examples
Convert the LLaMA 7B model to TensorRT-LLM checkpoint format using a single GPU
python3 convert_checkpoint.py --model_dir llama-2-7b-chat-hf \
--output_dir ./llama-2-7b-chat-hf-output \
--dtype float16Build the TensorRT engine(s) for the LLaMA 70B model using a single GPU:
trtllm-build --checkpoint_dir ./llama-2-7b-chat-hf-output \
--output_dir ./tmp/llama/7B-chat/trt_engines/fp16/1-gpu \
--gpt_attention_plugin float16 \
--gemm_plugin float16More advanced techniques for fun
Build LLaMA 70B with INT8 Quantization and 8-way Tensor Parallelism
Build LLaMA 70B with FP8 Precision and 4-way Tensor Parallelism
Build LLaMA 70B with 16-way Tensor Parallelism for Maximum GPU Utilization
Using SmoothQuant for Enhanced Model Precision
Last updated
Was this helpful?


