LogoLogo
Ctrlk
Continuum WebsiteContinuum ApplicationsContinuum KnowledgeAxolotl Platform
  • TensorRT-LLM
  • The TensorRT-LLM Process
  • Performance
  • Virtual Machine Creation
  • CUDA Introduction
  • Building TensorRT-LLM
  • TensorRT-LLM Architecture and Process
  • TensorRT-LLM build workflow
  • CUDA Graphs
  • TensorRT-LLM Libraries
    • tensorrt_llm folders
    • tensorrt_llm/builder.py
    • tensorrt_llm/network.py
    • tensorrt_llm/module.py
    • top_model_mixin.py
    • trt-llm build command
    • trtllm-build CLI configurations
  • LLama2 installation
  • Tasks
  • LLama Model Directory
  • TensorRT-LLM Tutorial
  • Tutorial 2 - get inference going
  • examples/run.py
  • examples/utils.py
  • examples/summarize.py
  • The Python API
  • Transformer Architecture
  • Best Practices for Tuning the Performance of TensorRT-LLM
  • Graph Rewriting
  • Reducing Activation Recomputation in Large Transformer Models
  • Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM
  • Numerical Position
  • TensorRT Models
  • Bloom
  • Runtime
  • Graph Rewriting (GW) module
  • FasterTransfomer Library
  • Dual ABI issues
  • Phi 2.0
  • ONNX
  • Message Passing Interface (MPI)
  • NVIDIA Nsight Systems: A Comprehensive Guide for TensorRT-LLM and Triton Inference Server
  • NCCL
Powered by GitBook
On this page

Was this helpful?

TensorRT-LLM Libraries

tensorrt_llm folderstensorrt_llm/builder.pytensorrt_llm/network.pytensorrt_llm/module.pytop_model_mixin.pytrt-llm build commandtrtllm-build CLI configurations
PreviousExperimentation with CUDA GraphsNexttensorrt_llm folders

Was this helpful?

LogoLogo

Continuum - Accelerated Artificial Intelligence

  • Continuum Website
  • Axolotl Platform

Copyright Continuum Labs - 2023