LogoLogo
Continuum WebsiteContinuum ApplicationsContinuum KnowledgeAxolotl Platform
  • TensorRT-LLM
  • The TensorRT-LLM Process
  • Performance
  • Virtual Machine Creation
  • CUDA Introduction
    • CUDA Architecture
    • Stream Multiprocessors: The Heart of GPU Computing
    • Pre Installation
    • Compatibility Assessment
    • NVCC: The NVIDIA CUDA Compiler
    • Installing Cuda
    • Installing the NVIDIA Container Toolkit
    • CUDA and bandwidth
    • Tensor Cores
  • Building TensorRT-LLM
    • Building from Source
    • TensorRT-LLM Dockerfile
      • Base Image
      • install_base.sh
      • install_cmake.sh
      • install_tensorrt.sh
      • install_pytorch.sh
      • requirements.txt
      • build_wheel.py
      • setup.py
      • Docker Makefile
      • Persistence
      • Running with persistent volumes
  • TensorRT-LLM Architecture and Process
    • The TensorRT-LLM process
    • INetworkDefinition
    • Model Definition
    • Compilation
    • Runtime Engine
    • Weight Bindings
    • Model Configuration
  • TensorRT-LLM build workflow
    • TensorRT-LLM build workflow - process
  • CUDA Graphs
    • Experimentation with CUDA Graphs
  • TensorRT-LLM Libraries
    • tensorrt_llm folders
    • tensorrt_llm/builder.py
    • tensorrt_llm/network.py
    • tensorrt_llm/module.py
    • top_model_mixin.py
    • trt-llm build command
    • trtllm-build CLI configurations
  • LLama2 installation
    • Converting Checkpoints
      • Checkpoint List - Arguments
      • Examples of running the convert_checkpoint.py script
      • convert_checkpoint examples
      • Checkpoint Script Arguments
      • checkpoint configuration file
      • run_convert_checkpoint.py script
    • LLama2 Files Analysis
    • TensorRT-LLM Build Engine Process
    • TensorRT-LLM Build Process Documentation
    • Build arguments
    • trtllm build configuration file
    • Run the buildconfig file
    • Analysis of the output from build.py
    • LLama3 configurations
    • Proposed checkpoint config file for LLama3
    • Proposed build config file for LLama3
    • run.py for inference
    • Using the models - running Llama
    • generate_int8 function
    • summarize.py script in Llama folder
    • Compiling LLama Models
  • Tasks
  • LLama Model Directory
    • llama/model.py
    • llama/utils.py
    • llama/weight.py
    • llama/convert.py
    • PreTrainedModel class
    • LlamaForCausalLM class
    • PretrainedConfig class
  • TensorRT-LLM Tutorial
  • Tutorial 2 - get inference going
  • examples/run.py
  • examples/utils.py
  • examples/summarize.py
  • The Python API
    • Layers
    • Functionals
    • functional.py
    • tensorrt_llm.functional.embedding
    • tensorrt_llm.functional.gpt_attention
    • tensorrt_llm.functional.layer_norm
    • tensorrt_llm.functional.rms_norm
    • Model
    • Quantization
    • Runtime
    • Runtime Process
  • Transformer Architecture
    • Attention Mechanism
    • Multi Head Attention
    • Positional Encoding
    • Scaled dot-product attention
    • Layer Normalisation
    • Activation Functions
    • Residual Connections
    • Position Wise Feed-Forward Layer
    • Transformer Feed-Forward Layers Are Key-Value Memories
    • KV Cache
      • Efficient Streaming Language Models with Attention Sinks
      • Input QKV tensor
    • General Notes on Model Architecture
  • Best Practices for Tuning the Performance of TensorRT-LLM
    • Optimisation Techniques
    • Batch Manager
    • Alibi
    • Relative Attention Bias
    • Beam Search
    • Rotary Positional Embedding (RoPE)
    • Numerical Precision
    • FP8 Formats for Deep Learning
  • Graph Rewriting
  • Reducing Activation Recomputation in Large Transformer Models
  • Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM
  • Numerical Position
  • TensorRT Models
  • Bloom
    • Huggingface Bloom Documentation
  • Runtime
  • Graph Rewriting (GW) module
  • FasterTransfomer Library
  • Dual ABI issues
  • Phi 2.0
  • ONNX
  • Message Passing Interface (MPI)
  • NVIDIA Nsight Systems: A Comprehensive Guide for TensorRT-LLM and Triton Inference Server
  • NCCL
Powered by GitBook
LogoLogo

Continuum - Accelerated Artificial Intelligence

  • Continuum Website
  • Axolotl Platform

Copyright Continuum Labs - 2023

On this page
  • Examples of running the convert_checkpoint.py script
  • README: Converting Checkpoints with TensorRT-LLM
  • You will need to manually move the files into the right directory for now
  • Prerequisites
  • Configuration
  • Running the Script
  • How It Works

Was this helpful?

  1. LLama2 installation
  2. Converting Checkpoints

Examples of running the convert_checkpoint.py script

This process will help you create the arguments that go with the checkpoint.py script

Examples of running the convert_checkpoint.py script

  • The script provides three examples demonstrating different use cases for running the convert_heckpoint.py script.

  • Each example shows the specific command to run convert_checkpoint.py with the corresponding arguments.

  • The examples cover scenarios such as converting a checkpoint using a single GPU with FP16 or BF16 precision, and converting a checkpoint with specific model hyperparameters.

1. Convert a checkpoint using a single GPU and FP16:
python convert_checkpoint.py --model_dir ./path/to/model/directory \
                              --output_dir ./tllm_checkpoint_1gpu_fp16 \
                              --dtype float16

2. Convert a checkpoint using a single GPU and BF16:
python convert_checkpoint.py --model_dir ./path/to/model/directory \
                              --output_dir ./tllm_checkpoint_1gpu_bf16 \
                              --dtype bfloat16

3. Convert a checkpoint with specific model hyperparameters:
python convert_checkpoint.py --model_dir ./path/to/model/directory \
                              --output_dir ./path/to/output/directory \
                              --dtype float16 \
                              --n_layer 32 \
                              --n_head 32 \
                              --n_embd 4096

README: Converting Checkpoints with TensorRT-LLM

This guide explains how to use the run_convert_checkpoint.py script to run the convert_checkpoint.py script with transparent configurations using a YAML file.

First download the scripts from Github

git clone https://github.com/Continuum-Labs-HQ/tensorrt-continuum.git

You will need to manually move the files into the right directory for now

Prerequisites

  • Python 3.x

  • TensorRT-LLM

  • PyYAML (install with pip install pyyaml)

Configuration

  1. Populate the config.yaml file in the same directory as the run_convert_checkpoint.py script. This file will contain the configurations for the convert_checkpoint.py script.

  2. Open the config.yaml file and specify the desired configurations. The file should have the following structure:

model:
  model_dir: ./path/to/model
  output_dir: ./path/to/output
  dtype: float16

checkpoint:
  tp_size: 1
  pp_size: 1
  vocab_size: 32000
  n_positions: 2048
  n_layer: 32
  n_head: 32
  n_embd: 4096
  inter_size: 11008
  # Additional checkpoint arguments
  # ...
  1. Adjust the values in the config.yaml file according to your requirements. You can refer to the comments in the file for suggestions and available choices.

Running the Script

  1. Open a terminal and navigate to the directory containing the run_convert_checkpoint.py script and the config.yaml file.

  2. Run the following command:

python3 run_convert_checkpoint.py
  1. The script will read the configurations from the config.yaml file and construct the corresponding command-line arguments for the convert_checkpoint.py script.

  2. The convert_checkpoint.py script will be executed with the specified configurations.

How It Works

The run_convert_checkpoint.py script does the following:

  1. It uses the argparse module to parse the command-line argument --config, which specifies the path to the YAML configuration file (default is config.yaml).

  2. It loads the configurations from the specified YAML file using the yaml.safe_load() function.

  3. It extracts the relevant configuration values from the loaded YAML data, such as model_dir, output_dir, and dtype.

  4. It constructs the command-line arguments for the convert_checkpoint.py script based on the extracted configuration values.

  5. It iterates over the checkpoint arguments in the YAML file and adds them to the command-line arguments list. Boolean values are handled separately, adding only the argument flag if the value is True.

  6. Finally, it uses the subprocess.run() function to execute the convert_checkpoint.py script with the constructed command-line arguments.

By using the run_convert_checkpoint.py script and the config.yaml file, you can easily configure and run the convert_checkpoint.py script without explicitly specifying all the command-line arguments.

This approach provides a more transparent and manageable way to set the configurations for the checkpoint conversion process.

PreviousCheckpoint List - ArgumentsNextconvert_checkpoint examples

Last updated 1 year ago

Was this helpful?

Page cover image