LogoLogo
Continuum WebsiteContinuum ApplicationsContinuum KnowledgeAxolotl Platform
  • TensorRT-LLM
  • The TensorRT-LLM Process
  • Performance
  • Virtual Machine Creation
  • CUDA Introduction
    • CUDA Architecture
    • Stream Multiprocessors: The Heart of GPU Computing
    • Pre Installation
    • Compatibility Assessment
    • NVCC: The NVIDIA CUDA Compiler
    • Installing Cuda
    • Installing the NVIDIA Container Toolkit
    • CUDA and bandwidth
    • Tensor Cores
  • Building TensorRT-LLM
    • Building from Source
    • TensorRT-LLM Dockerfile
      • Base Image
      • install_base.sh
      • install_cmake.sh
      • install_tensorrt.sh
      • install_pytorch.sh
      • requirements.txt
      • build_wheel.py
      • setup.py
      • Docker Makefile
      • Persistence
      • Running with persistent volumes
  • TensorRT-LLM Architecture and Process
    • The TensorRT-LLM process
    • INetworkDefinition
    • Model Definition
    • Compilation
    • Runtime Engine
    • Weight Bindings
    • Model Configuration
  • TensorRT-LLM build workflow
    • TensorRT-LLM build workflow - process
  • CUDA Graphs
    • Experimentation with CUDA Graphs
  • TensorRT-LLM Libraries
    • tensorrt_llm folders
    • tensorrt_llm/builder.py
    • tensorrt_llm/network.py
    • tensorrt_llm/module.py
    • top_model_mixin.py
    • trt-llm build command
    • trtllm-build CLI configurations
  • LLama2 installation
    • Converting Checkpoints
      • Checkpoint List - Arguments
      • Examples of running the convert_checkpoint.py script
      • convert_checkpoint examples
      • Checkpoint Script Arguments
      • checkpoint configuration file
      • run_convert_checkpoint.py script
    • LLama2 Files Analysis
    • TensorRT-LLM Build Engine Process
    • TensorRT-LLM Build Process Documentation
    • Build arguments
    • trtllm build configuration file
    • Run the buildconfig file
    • Analysis of the output from build.py
    • LLama3 configurations
    • Proposed checkpoint config file for LLama3
    • Proposed build config file for LLama3
    • run.py for inference
    • Using the models - running Llama
    • generate_int8 function
    • summarize.py script in Llama folder
    • Compiling LLama Models
  • Tasks
  • LLama Model Directory
    • llama/model.py
    • llama/utils.py
    • llama/weight.py
    • llama/convert.py
    • PreTrainedModel class
    • LlamaForCausalLM class
    • PretrainedConfig class
  • TensorRT-LLM Tutorial
  • Tutorial 2 - get inference going
  • examples/run.py
  • examples/utils.py
  • examples/summarize.py
  • The Python API
    • Layers
    • Functionals
    • functional.py
    • tensorrt_llm.functional.embedding
    • tensorrt_llm.functional.gpt_attention
    • tensorrt_llm.functional.layer_norm
    • tensorrt_llm.functional.rms_norm
    • Model
    • Quantization
    • Runtime
    • Runtime Process
  • Transformer Architecture
    • Attention Mechanism
    • Multi Head Attention
    • Positional Encoding
    • Scaled dot-product attention
    • Layer Normalisation
    • Activation Functions
    • Residual Connections
    • Position Wise Feed-Forward Layer
    • Transformer Feed-Forward Layers Are Key-Value Memories
    • KV Cache
      • Efficient Streaming Language Models with Attention Sinks
      • Input QKV tensor
    • General Notes on Model Architecture
  • Best Practices for Tuning the Performance of TensorRT-LLM
    • Optimisation Techniques
    • Batch Manager
    • Alibi
    • Relative Attention Bias
    • Beam Search
    • Rotary Positional Embedding (RoPE)
    • Numerical Precision
    • FP8 Formats for Deep Learning
  • Graph Rewriting
  • Reducing Activation Recomputation in Large Transformer Models
  • Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM
  • Numerical Position
  • TensorRT Models
  • Bloom
    • Huggingface Bloom Documentation
  • Runtime
  • Graph Rewriting (GW) module
  • FasterTransfomer Library
  • Dual ABI issues
  • Phi 2.0
  • ONNX
  • Message Passing Interface (MPI)
  • NVIDIA Nsight Systems: A Comprehensive Guide for TensorRT-LLM and Triton Inference Server
  • NCCL
Powered by GitBook
LogoLogo

Continuum - Accelerated Artificial Intelligence

  • Continuum Website
  • Axolotl Platform

Copyright Continuum Labs - 2023

On this page
  • Initialisation
  • Class Attributes
  • Common Attributes
  • Methods
  • Auto Class Registration
  • Serialization and Deserialization

Was this helpful?

  1. LLama Model Directory

PretrainedConfig class

The PretrainedConfig class is the base class for all configuration classes in the Transformers library. It provides a unified interface for handling configuration parameters common to all models, as well as methods for loading, saving, and updating configurations.

Let's analyse the class in detail:

Initialisation

  • The PretrainedConfig class is initialized with arbitrary keyword arguments (**kwargs).

  • It defines several common parameters such as output_hidden_states, output_attentions, return_dict, is_encoder_decoder, is_decoder, etc., which are used by various models.

Class Attributes

  • model_type: An identifier for the model type, serialised into the JSON file and used to recreate the correct object in AutoConfig.

  • is_composition: A boolean indicating whether the config class is composed of multiple sub-configs.

  • keys_to_ignore_at_inference: A list of keys to ignore when looking at dictionary outputs of the model during inference.

  • attribute_map: A dictionary that maps model-specific attribute names to standardized attribute names.

Common Attributes

  • The class defines common attributes such as vocab_size, hidden_size, num_attention_heads, num_hidden_layers, which are present in all subclasses.

Methods

  • from_pretrained: A class method that instantiates a PretrainedConfig (or a derived class) from a pretrained model configuration.

    • It takes the pretrained_model_name_or_path as input, which can be a model identifier, a path to a directory containing the configuration file, or a URL to a saved configuration JSON file.

    • It supports additional parameters such as cache_dir, force_download, revision, etc., to control the behavior of downloading and caching the configuration files.

  • save_pretrained: A method to save the configuration object to a directory, so that it can be re-loaded using the from_pretrained method.

    • It takes the save_directory as input and saves the configuration JSON file in that directory.

    • It also supports pushing the configuration to the Hugging Face Model Hub using the push_to_hub parameter.

  • to_dict: A method that serializes the configuration instance to a Python dictionary.

  • to_json_string: A method that serializes the configuration instance to a JSON string.

  • to_json_file: A method that saves the configuration instance to a JSON file.

  • update: A method that updates the attributes of the configuration instance with attributes from a dictionary.

  • update_from_string: A method that updates the attributes of the configuration instance from a string representation.

Auto Class Registration

  • The register_for_auto_class method allows registering the configuration class with a given auto class (e.g., AutoConfig).

  • This is useful for custom configurations to be automatically discoverable by the AutoConfig class.

Serialization and Deserialization

  • The to_dict, to_json_string, and to_json_file methods provide functionality to serialize the configuration instance to different formats.

  • The from_dict and from_json_file methods allow instantiating a PretrainedConfig from a dictionary or a JSON file, respectively.

The PretrainedConfig class serves as a foundation for all configuration classes in the Transformers library.

It provides a standardised way to handle configuration parameters, load and save configurations, and interact with pretrained models.

Subclasses of PretrainedConfig can extend or override the base class methods and attributes to define model-specific configurations. This allows for a consistent and unified approach to working with configurations across different models in the library.

The class also supports integration with the Hugging Face Model Hub, enabling easy sharing and loading of pretrained configurations from the hub.

Overall, the PretrainedConfig class is a crucial component in the Transformers library, facilitating the management and organization of model configurations in a standardized and efficient manner.

PreviousLlamaForCausalLM classNextTensorRT-LLM Tutorial

Last updated 1 year ago

Was this helpful?