LogoLogo
Continuum WebsiteContinuum ApplicationsContinuum KnowledgeAxolotl Platform
  • TensorRT-LLM
  • The TensorRT-LLM Process
  • Performance
  • Virtual Machine Creation
  • CUDA Introduction
    • CUDA Architecture
    • Stream Multiprocessors: The Heart of GPU Computing
    • Pre Installation
    • Compatibility Assessment
    • NVCC: The NVIDIA CUDA Compiler
    • Installing Cuda
    • Installing the NVIDIA Container Toolkit
    • CUDA and bandwidth
    • Tensor Cores
  • Building TensorRT-LLM
    • Building from Source
    • TensorRT-LLM Dockerfile
      • Base Image
      • install_base.sh
      • install_cmake.sh
      • install_tensorrt.sh
      • install_pytorch.sh
      • requirements.txt
      • build_wheel.py
      • setup.py
      • Docker Makefile
      • Persistence
      • Running with persistent volumes
  • TensorRT-LLM Architecture and Process
    • The TensorRT-LLM process
    • INetworkDefinition
    • Model Definition
    • Compilation
    • Runtime Engine
    • Weight Bindings
    • Model Configuration
  • TensorRT-LLM build workflow
    • TensorRT-LLM build workflow - process
  • CUDA Graphs
    • Experimentation with CUDA Graphs
  • TensorRT-LLM Libraries
    • tensorrt_llm folders
    • tensorrt_llm/builder.py
    • tensorrt_llm/network.py
    • tensorrt_llm/module.py
    • top_model_mixin.py
    • trt-llm build command
    • trtllm-build CLI configurations
  • LLama2 installation
    • Converting Checkpoints
      • Checkpoint List - Arguments
      • Examples of running the convert_checkpoint.py script
      • convert_checkpoint examples
      • Checkpoint Script Arguments
      • checkpoint configuration file
      • run_convert_checkpoint.py script
    • LLama2 Files Analysis
    • TensorRT-LLM Build Engine Process
    • TensorRT-LLM Build Process Documentation
    • Build arguments
    • trtllm build configuration file
    • Run the buildconfig file
    • Analysis of the output from build.py
    • LLama3 configurations
    • Proposed checkpoint config file for LLama3
    • Proposed build config file for LLama3
    • run.py for inference
    • Using the models - running Llama
    • generate_int8 function
    • summarize.py script in Llama folder
    • Compiling LLama Models
  • Tasks
  • LLama Model Directory
    • llama/model.py
    • llama/utils.py
    • llama/weight.py
    • llama/convert.py
    • PreTrainedModel class
    • LlamaForCausalLM class
    • PretrainedConfig class
  • TensorRT-LLM Tutorial
  • Tutorial 2 - get inference going
  • examples/run.py
  • examples/utils.py
  • examples/summarize.py
  • The Python API
    • Layers
    • Functionals
    • functional.py
    • tensorrt_llm.functional.embedding
    • tensorrt_llm.functional.gpt_attention
    • tensorrt_llm.functional.layer_norm
    • tensorrt_llm.functional.rms_norm
    • Model
    • Quantization
    • Runtime
    • Runtime Process
  • Transformer Architecture
    • Attention Mechanism
    • Multi Head Attention
    • Positional Encoding
    • Scaled dot-product attention
    • Layer Normalisation
    • Activation Functions
    • Residual Connections
    • Position Wise Feed-Forward Layer
    • Transformer Feed-Forward Layers Are Key-Value Memories
    • KV Cache
      • Efficient Streaming Language Models with Attention Sinks
      • Input QKV tensor
    • General Notes on Model Architecture
  • Best Practices for Tuning the Performance of TensorRT-LLM
    • Optimisation Techniques
    • Batch Manager
    • Alibi
    • Relative Attention Bias
    • Beam Search
    • Rotary Positional Embedding (RoPE)
    • Numerical Precision
    • FP8 Formats for Deep Learning
  • Graph Rewriting
  • Reducing Activation Recomputation in Large Transformer Models
  • Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM
  • Numerical Position
  • TensorRT Models
  • Bloom
    • Huggingface Bloom Documentation
  • Runtime
  • Graph Rewriting (GW) module
  • FasterTransfomer Library
  • Dual ABI issues
  • Phi 2.0
  • ONNX
  • Message Passing Interface (MPI)
  • NVIDIA Nsight Systems: A Comprehensive Guide for TensorRT-LLM and Triton Inference Server
  • NCCL
Powered by GitBook
LogoLogo

Continuum - Accelerated Artificial Intelligence

  • Continuum Website
  • Axolotl Platform

Copyright Continuum Labs - 2023

On this page
  • Configuration
  • Class Attributes
  • Methods

Was this helpful?

  1. LLama Model Directory

PreTrainedModel class

From the Transformers Library

The PreTrainedModel class is a base class provided by the Transformers library that serves as a foundation for all pretrained models.

It provides common functionality and methods for loading, saving, and modifying pretrained models.

Configuration

  • The PreTrainedModel class takes a config parameter, which is an instance of the PretrainedConfig class or its subclasses.

  • The config object stores the configuration of the model, such as the number of layers, hidden size, attention heads, etc.

Class Attributes

  • config_class: A subclass of PretrainedConfig that is used as the configuration class for the specific model architecture.

  • load_tf_weights: A callable method for loading weights from a TensorFlow checkpoint into a PyTorch model.

  • base_model_prefix: A string indicating the attribute associated with the base model in derived classes that add modules on top of the base model.

  • is_parallelizable: A boolean flag indicating whether the model supports parallelization.

  • main_input_name: The name of the main input to the model (e.g., input_ids for NLP models).

Methods

  • from_pretrained: A class method that instantiates a pretrained model from a configuration and pretrained weights.

    • It allows loading models from a local directory, a remote repository, or a TensorFlow/Flax checkpoint.

    • It supports various options such as specifying the configuration, state dictionary, cache directory, etc.

  • save_pretrained: A method to save the model's configuration and state dictionary to a specified directory.

  • push_to_hub: A method to upload the model to the Hugging Face Model Hub repository.

  • from_tf: A method to load the model weights from a TensorFlow checkpoint.

  • from_flax: A method to load the model weights from a Flax checkpoint.

Parallelization and Distributed Training

  • The is_parallelizable attribute indicates whether the model can be parallelized across multiple devices or not.

  • The from_pretrained method supports loading models in a distributed manner using the device_map argument, which allows specifying the device placement for each submodule of the model.

Model Modifications

  • The PreTrainedModel class provides methods to modify the model's architecture, such as resize_token_embeddings to resize the input token embeddings and prune_heads to prune the attention heads.

Quantization and Optimization

  • The from_pretrained method supports quantization and optimization configurations through the quantization_config argument, allowing for quantized model loading using libraries like bitsandbytes.

Saving and Loading

  • The save_pretrained method allows saving the model's configuration and state dictionary to a specified directory.

  • The from_pretrained method supports loading models from a saved directory, a pre-trained model configuration, or a TensorFlow/Flax checkpoint.

The PreTrainedModel class provides a unified interface for working with pretrained models in the Transformers library.

It abstracts away the complexities of loading, saving, and modifying models, making it easier to use and extend pretrained models for various tasks.

Developers can subclass the PreTrainedModel class to create their own custom models while leveraging the common functionalities provided by the base class.

This promotes code reuse, maintainability, and consistency across different model architectures.

Overall, the PreTrainedModel class is a fundamental building block in the Transformers library, enabling seamless integration and utilisation of pretrained models in a wide range of natural language processing and computer vision tasks.

Previousllama/convert.pyNextLlamaForCausalLM class

Last updated 1 year ago

Was this helpful?

Page cover image