LogoLogo
Continuum WebsiteContinuum ApplicationsContinuum KnowledgeAxolotl Platform
  • TensorRT-LLM
  • The TensorRT-LLM Process
  • Performance
  • Virtual Machine Creation
  • CUDA Introduction
    • CUDA Architecture
    • Stream Multiprocessors: The Heart of GPU Computing
    • Pre Installation
    • Compatibility Assessment
    • NVCC: The NVIDIA CUDA Compiler
    • Installing Cuda
    • Installing the NVIDIA Container Toolkit
    • CUDA and bandwidth
    • Tensor Cores
  • Building TensorRT-LLM
    • Building from Source
    • TensorRT-LLM Dockerfile
      • Base Image
      • install_base.sh
      • install_cmake.sh
      • install_tensorrt.sh
      • install_pytorch.sh
      • requirements.txt
      • build_wheel.py
      • setup.py
      • Docker Makefile
      • Persistence
      • Running with persistent volumes
  • TensorRT-LLM Architecture and Process
    • The TensorRT-LLM process
    • INetworkDefinition
    • Model Definition
    • Compilation
    • Runtime Engine
    • Weight Bindings
    • Model Configuration
  • TensorRT-LLM build workflow
    • TensorRT-LLM build workflow - process
  • CUDA Graphs
    • Experimentation with CUDA Graphs
  • TensorRT-LLM Libraries
    • tensorrt_llm folders
    • tensorrt_llm/builder.py
    • tensorrt_llm/network.py
    • tensorrt_llm/module.py
    • top_model_mixin.py
    • trt-llm build command
    • trtllm-build CLI configurations
  • LLama2 installation
    • Converting Checkpoints
      • Checkpoint List - Arguments
      • Examples of running the convert_checkpoint.py script
      • convert_checkpoint examples
      • Checkpoint Script Arguments
      • checkpoint configuration file
      • run_convert_checkpoint.py script
    • LLama2 Files Analysis
    • TensorRT-LLM Build Engine Process
    • TensorRT-LLM Build Process Documentation
    • Build arguments
    • trtllm build configuration file
    • Run the buildconfig file
    • Analysis of the output from build.py
    • LLama3 configurations
    • Proposed checkpoint config file for LLama3
    • Proposed build config file for LLama3
    • run.py for inference
    • Using the models - running Llama
    • generate_int8 function
    • summarize.py script in Llama folder
    • Compiling LLama Models
  • Tasks
  • LLama Model Directory
    • llama/model.py
    • llama/utils.py
    • llama/weight.py
    • llama/convert.py
    • PreTrainedModel class
    • LlamaForCausalLM class
    • PretrainedConfig class
  • TensorRT-LLM Tutorial
  • Tutorial 2 - get inference going
  • examples/run.py
  • examples/utils.py
  • examples/summarize.py
  • The Python API
    • Layers
    • Functionals
    • functional.py
    • tensorrt_llm.functional.embedding
    • tensorrt_llm.functional.gpt_attention
    • tensorrt_llm.functional.layer_norm
    • tensorrt_llm.functional.rms_norm
    • Model
    • Quantization
    • Runtime
    • Runtime Process
  • Transformer Architecture
    • Attention Mechanism
    • Multi Head Attention
    • Positional Encoding
    • Scaled dot-product attention
    • Layer Normalisation
    • Activation Functions
    • Residual Connections
    • Position Wise Feed-Forward Layer
    • Transformer Feed-Forward Layers Are Key-Value Memories
    • KV Cache
      • Efficient Streaming Language Models with Attention Sinks
      • Input QKV tensor
    • General Notes on Model Architecture
  • Best Practices for Tuning the Performance of TensorRT-LLM
    • Optimisation Techniques
    • Batch Manager
    • Alibi
    • Relative Attention Bias
    • Beam Search
    • Rotary Positional Embedding (RoPE)
    • Numerical Precision
    • FP8 Formats for Deep Learning
  • Graph Rewriting
  • Reducing Activation Recomputation in Large Transformer Models
  • Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM
  • Numerical Position
  • TensorRT Models
  • Bloom
    • Huggingface Bloom Documentation
  • Runtime
  • Graph Rewriting (GW) module
  • FasterTransfomer Library
  • Dual ABI issues
  • Phi 2.0
  • ONNX
  • Message Passing Interface (MPI)
  • NVIDIA Nsight Systems: A Comprehensive Guide for TensorRT-LLM and Triton Inference Server
  • NCCL
Powered by GitBook
LogoLogo

Continuum - Accelerated Artificial Intelligence

  • Continuum Website
  • Axolotl Platform

Copyright Continuum Labs - 2023

On this page
  • TensorRT-LLM/docker/Makefile
  • Variable Definitions
  • Target Definitions
  • Phony Targets
  • Specific Build Targets
  • Helper Functions

Was this helpful?

  1. Building TensorRT-LLM
  2. TensorRT-LLM Dockerfile

Docker Makefile

Previoussetup.pyNextPersistence

Last updated 1 year ago

Was this helpful?

//Makefile

This Makefile is designed for managing the build, push, run, and development processes of Docker images and containers for NVIDIA TensorRT Large Language Models (LLM).

A Makefile is a special file used by the make utility to automate the building, testing, and deployment of software projects.

It defines a set of rules and dependencies that specify how to compile and link source code files, generate documentation, run tests, and perform other project-related tasks.

Makefiles are commonly used in C/C++ projects but can be used with any programming language or project.

Variable Definitions

The Makefile starts by defining several variables using the ?= operator, which assigns a default value to a variable if it is not already defined.

BASE_IMAGE and BASE_TAG specify the default base image and tag for the Docker build, extracted from the Dockerfile.multi file.

IMAGE_NAME and IMAGE_TAG define the name and tag of the new Docker image to be built.

USER_ID, USER_NAME, GROUP_ID, and GROUP_NAME store the local user information.

  • LOCAL_USER is a flag to indicate whether to add the current user to the Docker image and run the container with that user.

  • Other variables like STAGE, IMAGE_WITH_TAG, PUSH_TO_STAGING, DOCKER_BUILD_OPTS, DOCKER_BUILD_ARGS, DOCKER_PROGRESS, CUDA_ARCHS, BUILD_WHEEL_ARGS, TORCH_INSTALL_TYPE, CUDA_VERSION, CUDNN_VERSION, NCCL_VERSION, CUBLAS_VERSION, TRT_VERSION, DEVEL_IMAGE, GIT_COMMIT, and TRT_LLM_VERSION are defined to customise the Docker build process.

Target Definitions

  • The Makefile defines several targets, each representing a specific task or stage of the build process.

  • The %_build target is responsible for building the Docker image. It uses the docker build command with various build arguments and options specified by the variables.

  • The %_user target adds the local user to the Docker image using a separate Dockerfile.user.

  • The %_push target pushes the built Docker image to a registry. It checks the PUSH_TO_STAGING flag to determine whether to push to a staging registry or the main registry.

  • The %_pull target pulls the specified Docker image from a registry.

  • The %_run target runs the Docker container with the specified options and environment variables.

Phony Targets

  • The Makefile defines several phony targets like build, push, and run, which are used to trigger the corresponding actions.

  • These targets are marked as phony using the .PHONY directive to ensure they are always executed, even if a file with the same name exists.

Specific Build Targets

  • The Makefile defines specific build targets for different stages and environments, such as devel_%, wheel_%, release_%, jenkins_%, jenkins-aarch64_%, centos7_%, ubuntu22_%, old-cuda_%, and trtllm_%.

  • These targets set specific variables and options for each stage or environment, allowing for customised builds.

Helper Functions

  • The Makefile includes helper functions defined using the define directive.

  • The add_local_user function is used to add the local user to the Docker image.

  • The rewrite_tag function rewrites the image tag to include a staging suffix.

The purpose of this Makefile is to automate the build, push, and run processes for a Docker image related to the TensorRT-LLM project.

It provides flexibility and customisation options through the use of variables and targets, allowing for different build configurations and environments.

The Makefile leverages the power of make to simplify complex build workflows and ensure consistent and reproducible builds.

By defining targets and dependencies, it enables users to easily trigger specific actions, such as building the Docker image, pushing it to a registry, or running the container with desired options.

Overall, this Makefile serves as a centralized and automated build system for the TensorRT-LLM project, making it easier for developers to manage and deploy the project using Docker containers.

TensorRT-LLM
docker
Page cover image