LogoLogo
Continuum WebsiteContinuum ApplicationsContinuum KnowledgeAxolotl Platform
  • TensorRT-LLM
  • The TensorRT-LLM Process
  • Performance
  • Virtual Machine Creation
  • CUDA Introduction
    • CUDA Architecture
    • Stream Multiprocessors: The Heart of GPU Computing
    • Pre Installation
    • Compatibility Assessment
    • NVCC: The NVIDIA CUDA Compiler
    • Installing Cuda
    • Installing the NVIDIA Container Toolkit
    • CUDA and bandwidth
    • Tensor Cores
  • Building TensorRT-LLM
    • Building from Source
    • TensorRT-LLM Dockerfile
      • Base Image
      • install_base.sh
      • install_cmake.sh
      • install_tensorrt.sh
      • install_pytorch.sh
      • requirements.txt
      • build_wheel.py
      • setup.py
      • Docker Makefile
      • Persistence
      • Running with persistent volumes
  • TensorRT-LLM Architecture and Process
    • The TensorRT-LLM process
    • INetworkDefinition
    • Model Definition
    • Compilation
    • Runtime Engine
    • Weight Bindings
    • Model Configuration
  • TensorRT-LLM build workflow
    • TensorRT-LLM build workflow - process
  • CUDA Graphs
    • Experimentation with CUDA Graphs
  • TensorRT-LLM Libraries
    • tensorrt_llm folders
    • tensorrt_llm/builder.py
    • tensorrt_llm/network.py
    • tensorrt_llm/module.py
    • top_model_mixin.py
    • trt-llm build command
    • trtllm-build CLI configurations
  • LLama2 installation
    • Converting Checkpoints
      • Checkpoint List - Arguments
      • Examples of running the convert_checkpoint.py script
      • convert_checkpoint examples
      • Checkpoint Script Arguments
      • checkpoint configuration file
      • run_convert_checkpoint.py script
    • LLama2 Files Analysis
    • TensorRT-LLM Build Engine Process
    • TensorRT-LLM Build Process Documentation
    • Build arguments
    • trtllm build configuration file
    • Run the buildconfig file
    • Analysis of the output from build.py
    • LLama3 configurations
    • Proposed checkpoint config file for LLama3
    • Proposed build config file for LLama3
    • run.py for inference
    • Using the models - running Llama
    • generate_int8 function
    • summarize.py script in Llama folder
    • Compiling LLama Models
  • Tasks
  • LLama Model Directory
    • llama/model.py
    • llama/utils.py
    • llama/weight.py
    • llama/convert.py
    • PreTrainedModel class
    • LlamaForCausalLM class
    • PretrainedConfig class
  • TensorRT-LLM Tutorial
  • Tutorial 2 - get inference going
  • examples/run.py
  • examples/utils.py
  • examples/summarize.py
  • The Python API
    • Layers
    • Functionals
    • functional.py
    • tensorrt_llm.functional.embedding
    • tensorrt_llm.functional.gpt_attention
    • tensorrt_llm.functional.layer_norm
    • tensorrt_llm.functional.rms_norm
    • Model
    • Quantization
    • Runtime
    • Runtime Process
  • Transformer Architecture
    • Attention Mechanism
    • Multi Head Attention
    • Positional Encoding
    • Scaled dot-product attention
    • Layer Normalisation
    • Activation Functions
    • Residual Connections
    • Position Wise Feed-Forward Layer
    • Transformer Feed-Forward Layers Are Key-Value Memories
    • KV Cache
      • Efficient Streaming Language Models with Attention Sinks
      • Input QKV tensor
    • General Notes on Model Architecture
  • Best Practices for Tuning the Performance of TensorRT-LLM
    • Optimisation Techniques
    • Batch Manager
    • Alibi
    • Relative Attention Bias
    • Beam Search
    • Rotary Positional Embedding (RoPE)
    • Numerical Precision
    • FP8 Formats for Deep Learning
  • Graph Rewriting
  • Reducing Activation Recomputation in Large Transformer Models
  • Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM
  • Numerical Position
  • TensorRT Models
  • Bloom
    • Huggingface Bloom Documentation
  • Runtime
  • Graph Rewriting (GW) module
  • FasterTransfomer Library
  • Dual ABI issues
  • Phi 2.0
  • ONNX
  • Message Passing Interface (MPI)
  • NVIDIA Nsight Systems: A Comprehensive Guide for TensorRT-LLM and Triton Inference Server
  • NCCL
Powered by GitBook
LogoLogo

Continuum - Accelerated Artificial Intelligence

  • Continuum Website
  • Axolotl Platform

Copyright Continuum Labs - 2023

On this page

Was this helpful?

  1. LLama2 installation

Run the buildconfig file

To build a script that parses the buildconfig.yaml arguments and passes them to the build.py script, you can create a new Python script that reads the YAML configuration file, extracts the relevant settings, and constructs the appropriate command-line arguments for the trtllm-build command.

Here's an example of how you can create such a script:

import argparse
import subprocess
import yaml

def parse_buildconfig(config_file):
    with open(config_file, 'r') as f:
        config = yaml.safe_load(f)

    args = []

    # Model Configuration
    if 'model' in config:
        model_config = config['model']
        if 'model_dir' in model_config:
            args.extend(['--model_dir', model_config['model_dir']])
        if 'output_dir' in model_config:
            args.extend(['--output_dir', model_config['output_dir']])
        if 'dtype' in model_config:
            args.extend(['--dtype', model_config['dtype']])

    # Checkpoint Configuration
    if 'checkpoint' in config:
        checkpoint_config = config['checkpoint']
        if 'checkpoint_dir' in checkpoint_config:
            args.extend(['--checkpoint_dir', checkpoint_config['checkpoint_dir']])
        if 'tp_size' in checkpoint_config:
            args.extend(['--tp_size', str(checkpoint_config['tp_size'])])
        if 'pp_size' in checkpoint_config:
            args.extend(['--pp_size', str(checkpoint_config['pp_size'])])
        # Add more checkpoint configuration options as needed

    # Build Configuration
    if 'build' in config:
        build_config = config['build']
        if 'max_input_len' in build_config:
            args.extend(['--max_input_len', str(build_config['max_input_len'])])
        if 'max_output_len' in build_config:
            args.extend(['--max_output_len', str(build_config['max_output_len'])])
        if 'max_batch_size' in build_config:
            args.extend(['--max_batch_size', str(build_config['max_batch_size'])])
        if 'max_beam_width' in build_config:
            args.extend(['--max_beam_width', str(build_config['max_beam_width'])])
        # Add more build configuration options as needed

    return args

def main():
    parser = argparse.ArgumentParser(description='Parse buildconfig.yaml and run trtllm-build')
    parser.add_argument('--config', type=str, required=True, help='Path to the buildconfig.yaml file')
    args = parser.parse_args()

    buildconfig_args = parse_buildconfig(args.config)
    command = ['trtllm-build'] + buildconfig_args

    subprocess.run(command, check=True)

if __name__ == '__main__':
    main()

In this script:

  1. We define a function called parse_buildconfig that takes the path to the buildconfig.yaml file as input. It reads the YAML file, extracts the relevant settings from the model, checkpoint, and build configurations, and constructs a list of command-line arguments based on the settings.

  2. We define the main function that uses argparse to parse the command-line arguments. It expects a --config argument that specifies the path to the buildconfig.yaml file.

  3. Inside the main function, we call the parse_buildconfig function to parse the YAML file and obtain the command-line arguments.

  4. We construct the trtllm-build command by concatenating the base command with the parsed command-line arguments.

  5. Finally, we use subprocess.run to execute the trtllm-build command with the provided arguments.

To use this script, save it to a file (e.g., run_trtllm_build.py) and run it from the command line, providing the path to the buildconfig.yaml file:

trtllm-build --checkpoint_dir ./tllm_checkpoint_1gpu_fp16 \
            --output_dir ./llama/7B/trt_engines/fp16/1-gpu \
            --gemm_plugin float16
python3 buildrun.py --config buildconfig.yaml

This script will parse the buildconfig.yaml file, extract the relevant settings, and pass them as command-line arguments to the trtllm-build command.

Note: Make sure you have the necessary dependencies installed (yaml and argparse) before running the script.

Previoustrtllm build configuration fileNextAnalysis of the output from build.py

Last updated 1 year ago

Was this helpful?