LogoLogo
Ctrlk
Continuum WebsiteContinuum ApplicationsContinuum KnowledgeAxolotl Platform
  • TensorRT-LLM
  • The TensorRT-LLM Process
  • Performance
  • Virtual Machine Creation
  • CUDA Introduction
  • Building TensorRT-LLM
  • TensorRT-LLM Architecture and Process
  • TensorRT-LLM build workflow
  • CUDA Graphs
  • TensorRT-LLM Libraries
  • LLama2 installation
  • Tasks
  • LLama Model Directory
    • llama/model.py
    • llama/utils.py
    • llama/weight.py
    • llama/convert.py
    • PreTrainedModel class
    • LlamaForCausalLM class
    • PretrainedConfig class
  • TensorRT-LLM Tutorial
  • Tutorial 2 - get inference going
  • examples/run.py
  • examples/utils.py
  • examples/summarize.py
  • The Python API
  • Transformer Architecture
  • Best Practices for Tuning the Performance of TensorRT-LLM
  • Graph Rewriting
  • Reducing Activation Recomputation in Large Transformer Models
  • Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM
  • Numerical Position
  • TensorRT Models
  • Bloom
  • Runtime
  • Graph Rewriting (GW) module
  • FasterTransfomer Library
  • Dual ABI issues
  • Phi 2.0
  • ONNX
  • Message Passing Interface (MPI)
  • NVIDIA Nsight Systems: A Comprehensive Guide for TensorRT-LLM and Triton Inference Server
  • NCCL
Powered by GitBook
On this page

Was this helpful?

LLama Model Directory

llama/model.pyllama/utils.pyllama/weight.pyllama/convert.pyPreTrainedModel classLlamaForCausalLM classPretrainedConfig class
PreviousTasksNextllama/model.py

Was this helpful?

LogoLogo

Continuum - Accelerated Artificial Intelligence

  • Continuum Website
  • Axolotl Platform

Copyright Continuum Labs - 2023